#dri-devel on 2021-06-08 — irc logs at oftc.irclog.whitequark.org

00:51 ngcortes has quit [Ping timeout: 480 seconds]

01:06 Lvl4Sword has joined #dri-devel

01:06 Lvl4Sword is now known as Guest1239

01:06 Guest1239 has quit [autokilled: Possible spambot. Mail support@oftc.net if you think this is in error. (2021-06-08 01:06:32)]

01:11 CME has quit [Ping timeout: 480 seconds]

01:49 pg_docbot has joined #dri-devel

01:49 pg_docbot has quit [Remote host closed the connection]

01:54 <imirkin> anholt: you reviewed the earlier patches referenced in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11204 -- mind having a look at that one as well?

02:45 macromorgan_ has joined #dri-devel

02:45 macromorgan has quit [Read error: Connection reset by peer]

02:48 CME has joined #dri-devel

02:53 karolherbst has quit [Ping timeout: 480 seconds]

03:11 khfeng has joined #dri-devel

03:26 vivijim has quit [Remote host closed the connection]

03:35 jcline has quit [Quit: Bye.]

03:46 Tito1337 has joined #dri-devel

03:46 Tito1337 has quit [Remote host closed the connection]

03:48 mbrost has joined #dri-devel

03:50 gpoo has joined #dri-devel

03:54 sdutt has quit [Remote host closed the connection]

03:58 blue__penquin has joined #dri-devel

04:20 danvet has joined #dri-devel

04:27 noahhsmith[m|gr] has joined #dri-devel

04:27 noahhsmith[m|gr] has quit [Remote host closed the connection]

04:43 Duke`` has joined #dri-devel

04:43 DavidMartin[m] has joined #dri-devel

04:44 DavidMartin[m] has quit [Remote host closed the connection]

04:45 gpoo has quit [Ping timeout: 480 seconds]

04:53 pixelgeek has joined #dri-devel

04:53 pixelgeek has quit [Remote host closed the connection]

04:57 shankaru1 has joined #dri-devel

05:17 dviola has joined #dri-devel

05:29 thellstrom1 has joined #dri-devel

05:29 thellstrom has quit [Remote host closed the connection]

05:37 thellstrom1 has quit [Ping timeout: 480 seconds]

05:41 Duke`` has quit [Ping timeout: 480 seconds]

05:43 itoral has joined #dri-devel

06:11 curro has quit [Ping timeout: 480 seconds]

06:13 frieder has joined #dri-devel

06:19 dviola has quit [Quit: WeeChat 3.1]

06:22 bluestang has joined #dri-devel

06:23 bluestang has quit [Remote host closed the connection]

06:32 jfb4 has joined #dri-devel

06:32 jfb4 has quit [Remote host closed the connection]

06:34 blue__penquin has quit [Remote host closed the connection]

06:35 blue__penquin has joined #dri-devel

06:49 aissen_ has quit []

06:50 aissen has joined #dri-devel

06:53 pekkari has joined #dri-devel

06:55 pnowack has joined #dri-devel

06:59 mlankhorst has joined #dri-devel

07:02 idr_ has joined #dri-devel

07:03 idr has quit [Remote host closed the connection]

07:04 profit_ has joined #dri-devel

07:04 profit_ has quit [Remote host closed the connection]

07:39 rasterman has joined #dri-devel

07:50 yk has quit [Remote host closed the connection]

07:55 lemonzest has joined #dri-devel

08:09 lplc has joined #dri-devel

08:14 xp4ns3 has joined #dri-devel

08:18 pcercuei has joined #dri-devel

08:19 mbrost has quit [Remote host closed the connection]

08:22 yk has joined #dri-devel

08:33 rgallaispou has quit [Remote host closed the connection]

08:43 karolherbst has joined #dri-devel

08:53 rgallaispou has joined #dri-devel

08:54 blue__penquin has quit [Remote host closed the connection]

08:55 blue__penquin has joined #dri-devel

08:58 tzimmermann has joined #dri-devel

10:15 matt_c has joined #dri-devel

10:15 matt_c has quit [autokilled: Suspected spammer. Mail support@oftc.net with questions (2021-06-08 10:15:39)]

10:31 hch12907_ has joined #dri-devel

10:37 hch12907 has quit [Ping timeout: 480 seconds]

10:45 thellstrom has joined #dri-devel

11:04 jshmlr has joined #dri-devel

11:04 jshmlr has quit [Remote host closed the connection]

11:12 blue__penquin has quit [Remote host closed the connection]

11:13 blue__penquin has joined #dri-devel

11:22 gpoo has joined #dri-devel

11:26 macromorgan_ has quit [Remote host closed the connection]

11:27 macromorgan has joined #dri-devel

11:28 hch12907_ is now known as hch12907

11:31 kiero has joined #dri-devel

11:31 kiero has quit [Remote host closed the connection]

11:32 <mareko> do modifiers have a way to select optimal tiling for rotated display?

11:34 <emersion> mareko, no

11:35 <emersion> only way to know right now is allocate a buffer with all of the plane's modifiers, do an atomic test-only commit to see if the modifier is supported for rotated planes

11:35 <emersion> and if not, prune the modifier and repeat

11:35 heluecht[m] has joined #dri-devel

11:36 heluecht[m] has quit [Remote host closed the connection]

11:37 <mareko> emersion: can we not ask drm whether the display is rotated?

11:37 <emersion> mareko: i don't understand that question

11:37 <emersion> user-space decides whether to rotate a plane or not

11:38 <emersion> maybe you can tell more about what you're trying to achieve?

11:38 <mareko> emersion: set optimal tiling for rotated buffers

11:39 <emersion> mareko: okay, so the question is not about finding a *supported* modifier for rotated buffers, but finding the *optimal* one?

11:39 <mareko> yes

11:40 <emersion> because on AMD hw, the optimal modifier is not the same if the buffer is going to be rotated via KMS?

11:40 <mareko> there might also be tiling modes that are supported when non-rotated, but unsupported when rotated

11:41 <emersion> these are two different questions

11:41 <mareko> let's focus on the latter question for now

11:42 <emersion> so optimal modifier

11:43 <mareko> I guess the answer is that it's unsupported

11:43 <emersion> there's no way to do this right now. GBM would need to know that the buffer will be rotated via KMS

11:43 <emersion> however that would leak display engine details into GBM

11:43 <emersion> which isn't what we want in the long run

11:44 <emersion> so GBM would need to know that the buffer'

11:44 <mareko> or Mesa can use rotated tiling if width < height

11:44 <emersion> s consumer prefers a different modifier than what radeonsi prefers

11:45 <emersion> hm. i don't know if heuristics like this are a good idea. daniels?

11:46 <emersion> fwiw, the "find a supported modifier" problem is somewhat easier to solve

11:47 <daniels> emersion: but you already know what I'm going to say :(

11:50 <emersion> daniels, tranches?

11:51 <emersion> mareko, do amd modifiers tell whether the tiling is rotated?

11:51 flibitijibibo has quit [Remote host closed the connection]

11:52 <pq> sounds like we would need KMS to give us a set of supported modifiers based on plane configuration

11:52 flibitijibibo has joined #dri-devel

11:52 <pq> ...in tranches

11:53 <emersion> … and with buffer constraints as well!

11:54 <pq> "optimal" is a hard problem, because there is no component that would be aware of both display and rendering preferences simultaneously and be able to reason about the trade-offs.

11:55 <mareko> emersion: we don't currently expose rotated tiling, but yes, the rotated flag can be extracted from the modifier uint64_t

11:55 <bnieuwenhuizen> mareko: I think the width < height thing isn't going to work for displays that natively are tall

11:55 <emersion> right. the only component in the middle of render and display is the compositor, and it shouldn't have driver-specific logic

11:55 <bnieuwenhuizen> nor for overlays

11:56 <emersion> yeah, for overlays you'll often guess wrong

11:57 mp has joined #dri-devel

11:57 <pq> it seems impossible to keep display separated from rendering while needing an optimal solution, so where would we compromise?

11:58 mp has quit [Remote host closed the connection]

11:59 <pq> Did even the Unix Device Memory Allocator plans cater for optimal?

11:59 <emersion> is it even possible to take a good decision on e.g. split render/display SoCs?

12:00 <pq> if you define your goals carefully, I'm sure there exists an optimal solution, but...

12:01 <pq> it may end up being holistic, e.g. minimize whole-system power consumption

12:01 <emersion> eh

12:01 Lightkey has quit [Ping timeout: 480 seconds]

12:01 <pq> we have to start by defining the problem

12:01 <emersion> i guess we should solve "find a supported buffer" first

12:01 <mareko> there are 2 ways to rotate: 1) while compositing (not viable for fullscreen apps/video) 2) in display hw (any fullscreen app/video or overlay); the hw might require different modifiers for rotated and non-rotated because of the walking pattern thrashing TLB etc.

12:02 <emersion> mareko: i wasn't aware about the "require" part. i thought any non-linear buffer could be rotated by KMS

12:02 <emersion> is it "require" or "prefer"?

12:03 <mareko> require

12:04 <mareko> it depends on the hw, e.g. you might have enough bandwidth/TLB for 4 non-rotated displays, but if you rotate, that numbers might drop to max 2 display if you have incorrect tiling

12:04 <emersion> to solve the "require" part, either do the KMS test-only dance described earlier, or add a KMS API to return a list of modfiiers for a given KMS configuration

12:04 <emersion> ah, interesting

12:04 <mareko> if you connect more displays, it will flicker

12:04 <emersion> to solve the "prefer" part, organize that modifier list in preference tranches

12:05 <mareko> low power devices might require optimal tiling for 1 display

12:06 <emersion> instead of saying "the rotated plane supports modifiers [A, B, C, D]", the driver would say "the rotated plane supports modifiers [A, B] but if you really can't use those it also supports [C, D]"

12:06 <emersion> on top of all of that, add other buffer constraints like alignment etc

12:07 <emersion> then you end up with quite a few lines of code to type to introduce all of the new uAPI

12:08 <emersion> i hope we can work on this step by step

12:08 <emersion> first add a new uAPI to return a list of modifiers for a given KMS configuration

12:09 <emersion> scratch that

12:09 <emersion> first add a new uAPI to check if a given KMS configuration would work, without having to allocate a buffer

12:10 <emersion> then add on top of this new uAPI to return a list of modifiers if the configuration _can_ work

12:10 <emersion> then also return buffer constraints

12:10 <emersion> then also organize all of the returned data into preference tranches

12:10 Lightkey has joined #dri-devel

12:18 <pq> mareko, you said "it will flicker". Is the driver not properly rejecting KMS configurations that do not work?

12:20 thelounge6753161 has joined #dri-devel

12:20 thelounge6753161 has quit [Remote host closed the connection]

12:20 <pq> emersion, we are also supposed to have the global modifier pruning algorithm in compositors with KMS atomic test, and we don't. That would probably go a long way at least in making sure outputs can be lit.

12:21 <pq> it's probably not feasible for fishing out workable plane configurations though, but if the use case is fullscreen apps, might be enough

12:21 xp4ns3 has quit []

12:22 <emersion> yeah, per-device modifier pruning would help for the "find something supported but not optimal" part

12:22 <pq> intel has roughly the same problem with its Y_TILING IIRC, right? And that's not even rotated.

12:23 <emersion> yea

12:23 blue__penquin has quit [Remote host closed the connection]

12:23 <pq> since mareko said "required", it sounded to me like the problem simplified to "something supported" rather than "optimal from many supported combinations"

12:25 <pq> I guess the first step is to get that modifier pruning going in userspace, before thinking about adding more UAPI.

12:28 <emersion> well

12:28 <emersion> i don't really like it

12:28 <emersion> let's say, i have no plans to implement it

12:34 neonking has joined #dri-devel

12:45 <pq> emersion, is having to allocate a buffer to test the biggest problem or something else?

12:46 itoral has quit [Remote host closed the connection]

12:46 <emersion> i guess even if we have the enhanced uAPI, we'll still have to have per-device stuff going on

12:47 <emersion> having to allocate a buffer is enough to make me not motivated to fix it :P

12:52 <MrCooper> mareko: why is rotating while compositing not viable for fullscreen apps/video? IME there's only a small performance hit, though I suppose it might be significant for energy consumption

12:57 thellstrom1 has joined #dri-devel

12:57 thellstrom has quit [Remote host closed the connection]

12:58 vivijim has joined #dri-devel

13:00 sdutt has joined #dri-devel

13:00 sdutt has quit []

13:01 sdutt has joined #dri-devel

13:10 vivijim has quit [Remote host closed the connection]

13:12 vivijim has joined #dri-devel

13:17 thellstrom1 has quit [Ping timeout: 480 seconds]

13:19 shankaru1 has quit []

13:22 fool has joined #dri-devel

13:22 fool has quit [Remote host closed the connection]

13:24 jcline has joined #dri-devel

13:29 <mareko> MrCooper: I'm assuming the compositor is idle when displaying fullscreen apps/video

13:31 <pq> it's still doing KMS commits every frame at the very least

13:31 <MrCooper> that doesn't answer my question :) yes it can make the difference between the compositor drawing one quad or nothing, but why would the former make what "not viable"?

13:50 pekkari has quit []

13:54 <alyssa> @Vulkan spec ninjas: Is it acceptable for an implementation to advertise VK_FORMAT_FEATURE_{SRC,DST}_BIT for linearTiling but not optimalTiling?

13:56 <dj-death> alyssa: there are required features & linear/optimal tilings, so need to check exactly what feature

13:56 <dj-death> and what format

13:56 <kusma> alyssa: Depends on the format, yeah...

13:56 <alyssa> ^TRANSFER_{SRC,DST}_BIT

13:57 <alyssa> dj-death: Right, I see the table but can't tell if it's for the union or the intersection of linear/optimal features

13:57 <kusma> Sure you don't mean VK_FORMAT_FEATURE_BLIT_{SRC,DST}_BIT?

13:57 <kusma> alyssa:

13:58 <kusma> For instance, the compressed formats require VK_FORMAT_FEATURE_BLIT_SRC_BIT for optimalTiling if the feature-bit it exposed.

13:59 <kusma> And there's a bunch of other formats that require it. See https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#features-required-format-support

13:59 <alyssa> kusma: No, I mean TRANSFER_{SRC,DST}_BIT

13:59 <alyssa> vkCmdCopyImage and friends

14:00 <kusma> OK, I somehow missed that in the vk header :P

14:00 <alyssa> which has nightmarish interactions with AFBC

14:00 <dj-death> alyssa: how are you going to upload to optimal memory then?

14:00 <kusma> Yeah, that one is probably OK...

14:00 <alyssa> dj-death: er.. glTexSubImage2D....? o:)

14:01 <alyssa> in our GL driver it's handled internally as a blit from a linear staging buffer

14:02 <kusma> alyssa: TRANSFER_{SRC,DST}_BIT is what allows doing that, though...

14:02 <alyssa> hrm.

14:03 <alyssa> One real ugly case is CopyImage between AFBC textures of different formats

14:03 <kusma> "Formats that are required to support VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT must also support VK_FORMAT_FEATURE_TRANSFER_SRC_BIT and VK_FORMAT_FEATURE_TRANSFER_DST_BIT"

14:03 <alyssa> Nod..

14:06 jhuizy has joined #dri-devel

14:06 jhuizy has quit [Remote host closed the connection]

14:07 <alyssa> Hm.

14:09 <alyssa> Notet to self: CRC is broken when interacting with imageStore()

14:13 blaudioslave has joined #dri-devel

14:13 blaudioslave has quit [autokilled: Suspected spammer. Mail support@oftc.net with questions (2021-06-08 14:13:41)]

14:13 shankaru1 has joined #dri-devel

14:34 hakzsam has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

14:34 hakzsam has joined #dri-devel

14:38 <robclark> alyssa: copyimage vs UBWC is same ball of wax

14:39 <alyssa> robclark: Yeah, I just saw turnip's solution

14:39 <alyssa> namely, handle all the easy cases and bail in the hard ones that nothing but the CTS uses.

14:39 <alyssa> encouraging panvk to go the same route, looks to avoid a metric crapton of complexity :)

14:40 <robclark> it is the reason I haven't gotten around to finishing that one last tiny extension for gles32

14:41 <alyssa> it appears the ball of wax isn't /so/ bad if you eat the extra staging blit in the awful case

14:49 <jekstrand> emersion: Got a link to a mesa branch for VK_EXT_physical_device_drm?

14:50 karolherbst has quit [Quit: Konversation terminated!]

14:50 karolherbst has joined #dri-devel

14:50 <dj-death> jekstrand: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/?scope=all&utf8=%E2%9C%93&state=opened&search=physical_device_drm :)

14:51 <jekstrand> dj-death: Thanks!

15:03 enunes- has left #dri-devel [#dri-devel]

15:06 Duke`` has joined #dri-devel

15:06 enunes has joined #dri-devel

15:16 mbrost has joined #dri-devel

15:22 lemonzest has quit [Quit: Quitting]

15:25 <bnieuwenhuizen> MrCooper: power usage between the GFX part of the GPU being on and off can be very significant

15:32 <danvet> MrCooper, kernel orders pageflip vs setcrtc on atomic drivers already

15:32 <danvet> they complete in the order you've done the ioctl calls

15:33 adjtm has joined #dri-devel

15:41 <mareko> MrCooper: battery life

15:43 <mareko> rotating via a blit is viable until it's not

15:58 SpiritOfSummer has joined #dri-devel

15:59 SpiritOfSummer has quit [autokilled: Suspected spammer. Mail support@oftc.net with questions (2021-06-08 15:59:08)]

16:03 frieder has quit [Remote host closed the connection]

16:07 xp4ns3 has joined #dri-devel

16:09 <MrCooper> danvet: I could swear I was able to reproduce the race even on Intel

16:09 <MrCooper> maybe I misremember

16:10 <MrCooper> bnieuwenhuizen mareko: OK, makes sense for fullscreen video where GFX can be off completely; "fullscreen apps/video" just sounded more general :)

16:10 <danvet> MrCooper, you mean 1. page_flip ioctl 2. setcrtc ioctl?

16:10 <MrCooper> yep

16:10 <danvet> and then the page flip completes before the setcrtc and we end up scanning out the wrong plane?

16:11 <danvet> that should be impossible with atomic

16:11 <MrCooper> or maybe it was 1. page flip 2. VT switch

16:11 <danvet> could very well have been busted on all legacy drivers

16:11 <danvet> but I thought we had various "stall for pending page flip" in our crtc disable hooks

16:11 <danvet> but then legacy helpers were funky

16:12 <danvet> MrCooper, should be the same, fbcon/next compositor just does a setcrtc

16:12 <MrCooper> right, that was my thinking

16:21 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

16:22 tweaks has joined #dri-devel

16:23 tweaks has quit [autokilled: Suspected spammer. Mail support@oftc.net with questions (2021-06-08 16:23:19)]

16:26 <zmike> anholt: you have any more comments on !11134 or can it marge

16:27 <anholt> added a comment

16:29 jernej has joined #dri-devel

16:30 Ben64 has joined #dri-devel

16:31 Ben64 has quit [Remote host closed the connection]

16:41 Danct12 has quit [Quit: Quitting]

16:42 Danct12 has joined #dri-devel

16:42 idr_ is now known as idr

16:43 gouchi has joined #dri-devel

16:51 alanc has quit [Remote host closed the connection]

16:51 alanc has joined #dri-devel

16:56 <daniels> mareko: so KMS doesn't tell us whether or not a given modifier is usable together with rotation ... but it also doesn't tell us which dimensions/scaling/etc are suitable for rotation, or whether a given modifier might decrease global availability (Intel Y-tiling vs. FIFO capacity, Rockchip being able to decode AFBC on any plane but only one per CRTC, etc)

17:06 mceier has joined #dri-devel

17:08 jiggie has joined #dri-devel

17:08 jiggie has quit [Remote host closed the connection]

17:10 <cmarcelo> cwabbott: given the resolution from Memory Model WG in https://gitlab.freedesktop.org/mesa/mesa/-/issues/4475#note_935978 are you OK if I move forward with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9938?

17:18 useretail has joined #dri-devel

17:18 useretail has quit [Remote host closed the connection]

17:36 jernej has quit [Ping timeout: 480 seconds]

17:38 Danct12 has quit [Remote host closed the connection]

17:40 Danct12 has joined #dri-devel

17:40 Danct12 has quit [Remote host closed the connection]

17:40 Danct12 has joined #dri-devel

17:43 khfeng has quit [Ping timeout: 480 seconds]

17:45 ngcortes has joined #dri-devel

17:49 tzimmermann has quit [Quit: Leaving]

18:10 jernej has joined #dri-devel

18:20 sigmaris_ has joined #dri-devel

18:25 sigmaris has quit [Ping timeout: 480 seconds]

18:44 shankaru1 has quit []

18:51 Danct12 has quit [Ping timeout: 480 seconds]

18:54 shankaru1 has joined #dri-devel

19:01 mlankhorst has quit [Ping timeout: 480 seconds]

19:06 andrey-konovalov has joined #dri-devel

19:22 jordiv[m] has joined #dri-devel

19:23 jordiv[m] has quit [Remote host closed the connection]

19:35 ngcortes has quit [Remote host closed the connection]

19:50 <bl4ckb0ne> can I have a review for https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10982 please

19:50 <bl4ckb0ne> (bumping vulkan xml to 1.2.179)

19:51 <airlied> isn't 1.2.180 out? :-P

19:52 <bl4ckb0ne> huh it is

19:53 <jekstrand> Yup. New spec releases every other Monday

19:53 <bl4ckb0ne> gonna make it 1.2.180 then

19:54 <jekstrand> While you're at it, please don't pull in vulkan.hpp

19:54 mbrost has quit [Remote host closed the connection]

19:54 <jekstrand> Or any SDK headers

19:55 <bl4ckb0ne> so only icd and core?

19:55 <jekstrand> I pull VulkanDocs, run "make" in the xml/ folder, and pull the headers it generates.

19:55 <jekstrand> Which should also be what's in https://github.com/KhronosGroup/Vulkan-Headers

19:56 <bl4ckb0ne> i think i pulled the headers from Vulkan-Headers for 179

19:57 <jekstrand> I don't know why the SDK header and hpp are in there

19:57 <jekstrand> They're not built by VulkanDocs

19:58 <jekstrand> And we don't need either of them

20:00 <bl4ckb0ne> probably a mistake from my part

20:00 <jekstrand> no worries

20:01 <jekstrand> I should really check my update_vulkan_headers script into the tree

20:05 <zmike> what if you just made a cron job to trigger an MR updating the headers every monday? 🤔

20:06 <alyssa> zmike: CI would fail every other monday, though.

20:16 <bl4ckb0ne> huh weird the khronos copyright went back to 2020

20:25 thellstrom has joined #dri-devel

20:26 <zmike> jekstrand: btw I ran that failure case you got today about a billion (imperial units) times and it doesn't seem to me that there's any possible way it crashes outside of some kind of spectacular system failure

20:27 <jekstrand> bl4ckb0ne: That's because there's something wrong with your header update

20:27 <bl4ckb0ne> yup

20:27 <jekstrand> bl4ckb0ne: Did you pull master or main?

20:27 <bl4ckb0ne> pulled master instead of main

20:27 <bl4ckb0ne> muscle memory

20:27 <jekstrand> That'll do it

20:30 libv_ is now known as libv

20:32 thellstrom1 has joined #dri-devel

20:32 thellstrom has quit [Remote host closed the connection]

20:34 iive has joined #dri-devel

20:36 Danct12 has joined #dri-devel

20:37 marex_ has joined #dri-devel

20:38 marex has quit [Read error: Connection reset by peer]

20:39 <bl4ckb0ne> waiting for CI to finish now

20:40 <jekstrand> bl4ckb0ne: Acked. Feel free to add my tag and marge. Looks good this time.

20:41 * jekstrand rebases VK_EXT_global_priority_query on top

20:41 <bl4ckb0ne> the `Part-of` is added automatically right?

20:41 <jekstrand> yup

20:41 <jekstrand> Marge does that

20:42 <jekstrand> Also, you don't need to wait for CI. Marge will rebase, add the Part-of, run CI and then merge.

20:42 <jekstrand> If CI fails, Marge won't merged.

20:42 <jekstrand> *merge

20:42 <bl4ckb0ne> ill let you assign it to marge, I don't have the rights

20:42 <jekstrand> Ok

20:42 <jekstrand> Let me know when you've re-pushed with my A-B tag

20:42 <bl4ckb0ne> thanks

20:42 <bl4ckb0ne> already did

20:43 <jekstrand> cool[6~

20:44 <jekstrand> bl4ckb0ne: One more problem: XML is still at 179.

20:44 <bl4ckb0ne> oh it is

20:46 <bl4ckb0ne> gitlab was hiding the xml diff

20:46 <bl4ckb0ne> updated it

20:47 <jekstrand> k

20:48 <jekstrand> assigned marge

20:48 <bl4ckb0ne> thanks!

20:49 <jekstrand> bl4ckb0ne: yw.

20:49 <jekstrand> bl4ckb0ne: Out of curiosity, any particular reason why you want 179+?

20:49 <bl4ckb0ne> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11014

20:50 <bl4ckb0ne> i just missed my shot, the ext was approve earlier today

20:54 <jekstrand> ah

20:55 <bl4ckb0ne> ill put a reminder to bump next monday if my ext is there

20:55 <bl4ckb0ne> hopefully with the right xml the first time

20:56 <jekstrand> Well, you reminded me to post https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11250 which I'd forgotten about. :D

20:57 <jekstrand> That one got released in 180

20:57 <jekstrand> I think there's a RADV one somewhere as well

20:57 rasterman has quit []

20:57 shankaru1 has quit [Remote host closed the connection]

20:58 shankaru1 has joined #dri-devel

21:00 ngcortes has joined #dri-devel

21:09 torv27 has joined #dri-devel

21:09 shankaru1 has quit []

21:09 torv27 has quit [Remote host closed the connection]

21:15 thellstrom1 has quit []

21:19 marex_ is now known as marex

21:21 Daanct12 has joined #dri-devel

21:22 ehermes has joined #dri-devel

21:23 ehermes has quit [Remote host closed the connection]

21:27 Danct12 has quit [Ping timeout: 480 seconds]

21:29 jewins has joined #dri-devel

21:32 <bnieuwenhuizen> jekstrand: we have two even :) https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11215 and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11216

21:33 <bnieuwenhuizen> obviously both based on a different interpretation of the spec :P

21:37 <jekstrand> bnieuwenhuizen: Do they both pass CTS?

21:37 <jekstrand> bnieuwenhuizen: Feel free to tell me my interpretation is wrong. :)

21:40 gouchi has quit [Remote host closed the connection]

21:47 <bnieuwenhuizen> jekstrand: haven't explicitly tested but I believe they both would

21:48 <bnieuwenhuizen> disagreement is whether the driver should also test permissions best effort or if it is really just a "yup these priorities are different"

21:49 <anholt> idr: any chance you would have some bandwidth for figuring out why x86 isn't getting the piglit dmabuf tests built? (x86: https://gitlab.freedesktop.org/anholt/mesa/-/jobs/10605481/raw, arm64 https://gitlab.freedesktop.org/anholt/mesa/-/jobs/10605482/raw)

21:49 <anholt> would love to get yuv import covered on iris with !11193

21:52 mbrost has joined #dri-devel

22:01 <jekstrand> bnieuwenhuizen: The intention of the spec is that the client shouldn't even try any priorities that aren't retrieved from the query.

22:01 mbrost has quit [Remote host closed the connection]

22:03 Plagman has quit []

22:03 Plagman has joined #dri-devel

22:03 <jekstrand> It may still get INITIALIZATION_FAILED but only due to a genuine permissions issue and not a "We don't support that priority ever" issue.

22:05 mattrope has joined #dri-devel

22:11 <jekstrand> bnieuwenhuizen: There's a part of me that's inclined to do the most useless implementation possible. Just return all the priorities.

22:12 <jekstrand> bnieuwenhuizen: But my current implementation does the useful thing since i915 currently doesn't provide any way besides boosting to get a higher priority on a render node after the fact.

22:14 choozy26 has joined #dri-devel

22:14 choozy26 has quit [Remote host closed the connection]

22:16 Duke`` has quit [Ping timeout: 480 seconds]

22:19 <bnieuwenhuizen> jekstrand: yes, so the two competing implementations we have for RADV are the dumb one and the useful one :P

22:20 <jekstrand> bnieuwenhuizen: Which do you like better?

22:20 <bnieuwenhuizen> I'm like meh

22:20 <bnieuwenhuizen> I confirmed the one user is ok with both

22:21 <bnieuwenhuizen> so half inclined to go dumb just because it is less code

22:21 <jekstrand> That and it forces people to think about what they're doing and not make assumptions.

22:21 <jekstrand> Assuming, of course, that those people care about Linux.

22:21 <jekstrand> But since "those people" are the Android core team.... I'd like to think they do. :)

22:23 Daanct12 has quit [Quit: Quitting]

22:25 ngcortes has quit [Remote host closed the connection]

22:26 karolherbst has quit [Quit: Konversation terminated!]

22:26 karolherbst has joined #dri-devel

22:28 * dcbaker cries in we have `Test` in `interpreter.py` and `testSerialisation` in `backend.py`

22:32 aswar002 has joined #dri-devel

22:38 <jekstrand> anholt: Is gitlab CI running any Intel Vulkan?

22:38 <jekstrand> anholt: I pushed a Vulkan-only patch and it failed CI:

22:38 <jekstrand> https://gitlab.freedesktop.org/mesa/mesa/-/jobs/10612897

22:43 pcercuei has quit [Quit: dodo]

22:44 <airlied> jekstrand: smash the retry button

22:45 * jekstrand is getting annoyed with flaky CI

22:47 <airlied> join the queue :-P

22:47 <airlied> I think alyssa and zmike lead the conga line

22:48 <zmike> no, no, I stopped being annoyed with it long ago

22:48 <jekstrand> zmike: Did you just stop merging patches?

22:48 <zmike> yea kinda

22:48 <zmike> I try every day or two

22:48 <zmike> sometimes less

22:48 <alyssa> airlied: Merging hundreds of patches a month does that, yes.

22:51 <daniels> jekstrand: the machine tanked at some point during a dEQP run

22:53 ngcortes has joined #dri-devel

22:53 <alyssa> jekstrand: Is that an option? Should I try that?

22:54 <daniels> jekstrand: it's failed 2 jobs of the last 200, so I would guess that there's some instability in mainline (or hardware e.g. battery dipping below critical point) that makes it disappear

22:54 <daniels> airlied: either be annoyed with it silently or please at least help out with #3437 (if not the actual issues themselves) so we can all get it stable

22:55 <jekstrand> and... now it's failing name resolution

22:55 <daniels> jekstrand: that's been noted for fix

22:55 <daniels> every single place which touches network has retry for this reason, apart from piglit's internal pull and ci-fairy's result upload

22:55 <daniels> both are going to be fixed in short order

22:58 <jekstrand> daniels: Given that every single item in #3437 is currently checked off except the weird glcpp one no one understands, "help with #3437" isn't a very useful response.

22:59 <zmike> I think he meant in reporting issues

22:59 <daniels> jekstrand: I mean at least dump it into the comments; the summary is laughably out of date for sure, but that's because of the tragedy of the commons, which will not be solved by more tragedy of the commons?

23:00 <zmike> though my proposal to make a new 3437 still stands, as gitlab issues don't seem to have been designed towards having that many comments and it takes like a full minute for that page to load

23:01 <daniels> there's no comment for piglit/ci-fairy not doing retry on network fail, but that has been noted and put on the list for the CI fairies to deal with this week

23:01 <airlied> daniels: hey I think my employer is providing enough support for me to complain out lod

23:02 <airlied> loud

23:02 <daniels> zmike: hmm, 1min is really pathological - it's definitely not quick for me by now, but far from 1min

23:02 <daniels> airlied: *to your employer :P

23:02 <zmike> shrug

23:03 <alyssa> 1min sounds right

23:03 <airlied> I expect when CI reaches some form of stability we'll just add a bunch more hw and destabilise it again

23:03 <daniels> I don't mind a new #3437, or trying to see if clicking through resolving all the issues makes it pull quicker on the frontend

23:03 <alyssa> airlied: this

23:03 <airlied> maybe we should find a line where we say enough

23:03 <alyssa> and .. more APIs

23:03 <daniels> airlied: so far that's been the sine wave, eyah

23:03 <daniels> if you want to say enough, subscribe to the CI label for MRs and start drawing the lines

23:03 <alyssa> airlied: ~~and more drivers for the same hw/API~~ oh wait you're crocus COI whoops

23:04 <jekstrand> alyssa: :P

23:04 <alyssa> oh and more test suites for the same system (deqp vs piglit vs khronos cts)

23:04 <alyssa> "hardware x APIs x test suites" has really awful combinatorics.

23:05 <airlied> yeah the 10m holy grail seems to have been dropped also

23:06 <airlied> it's more like an hour if CI doesn't timeout somewhere

23:06 <airlied> ah well back to writing more code to make it take longer :-P

23:06 <daniels> way untrue.

23:07 <robclark> alyssa: tbf the whole "spend a day or two bisecting all the new deqp regressions after you've been away for a week or two" is not the lesser of two evils here ;-)

23:07 <alyssa> robclark: that's /still/ a risk, unfortunately..

23:07 <jekstrand> When CI is working, it takes 10m or so and you get all the tests on all the hardware. It's great.

23:07 <jekstrand> It's not 1hr

23:07 <alyssa> Putting aside any details about this CI,

23:07 <alyssa> "dEQP-GLES2, dEQP-GLES3, dEQP-GLES31, KHR-GLES2, KHR-GLES3, KHR-GLES31, piglit quickgl, piglit cl, dEQP-VK" x "Mali G52, Mali G72, Mali T860, Mali T760"

23:07 <airlied> I suppose it's probably an hour waiting for marge to timeout on someone elses run

23:08 <jekstrand> But when you have machines which are falling off their network or drivers with flaky tests, less great.

23:08 <alyssa> is just the hardware I have to personally care about

23:08 <alyssa> that's really hard!

23:08 <daniels> airlied: an hour being the normal is complete bollocks

23:09 <airlied> daniels: I don't think there is a normal, but 10m runs are rare

23:09 <airlied> esp for anything that hits the hw

23:09 <airlied> but mostly I suppose I've been stuck in marge-bot queues with problems

23:09 <daniels> sure, that's because 10min has crept out to about 15min (which probably needs to be rebalanced), and 10min was never the goal for you assign to marge -> it is merged; it was the goal for the long pole of the last stage of hw testing, so +5min for your actual build

23:11 <daniels> 3 weeks of the last 5 have been catastrophically bad, due to a530 test badness + rpi hw badness + fdo storage issues + NM/PipeWire suddenly getting very enthusiastic about very long runtimes, but even with all those, it's still nowhere _near_ an hour on average, even at the very worst peak around European lunchtime

23:11 <jekstrand> Part of the problem is, like politics, IT, and many other things, we only really know it exists when it's failing. The rest of the time, we assign Marge and forget about it. It's hard to get an actual perspective on it thanks to human psychology.

23:12 iive has quit []

23:12 <airlied> jekstrand: indeed, as in when I check back 30mins later and it's all merged I forget about it

23:12 <airlied> when I check back 30m later and it's behind marge saying CI took too long I register it

23:13 <airlied> maybe we just need hw to spend longer in staging areas before going live or be quicker to yank it out completely if it starts flaking

23:13 <daniels> jekstrand: luckily it is really easy to get an actual perspective on it by using the various scripts and snippets people have posted to query the pipelines and make your own analyses on fail rates (doing manual filters for when it catches legit fails, which is way more than you think) or mean/median/mode end-to-end time, or ...

23:13 <daniels> airlied: we do both

23:14 <zmike> here's a random q: would it be possible to set like a 30min total time threshold on a job and kill it if it exceeds that?

23:14 <daniels> new hw lives in manual runs for a bit, and it also gets smashed to bits by manually-triggered runs, and then the people responsible for it sit on the results and look for patterns

23:14 <zmike> I think that'd at least cut down congestion for flake jobs

23:14 <daniels> it also gets yanked by people being yelled at about failures, mostly automatically but

23:14 <daniels> zmike: right now it's 60m, I think 30m is p reasonable for a per-job timeout

23:15 <daniels> would A-b a MR to tune it down

23:15 <zmike> tbh I feel like 20 min should be reasonable, but idk what actual total times are like

23:15 <daniels> well, it's a balancing act

23:15 <daniels> when you have 10-12min runtimes, 20min runtimes puts you at the risk for spurious failure when you kill it at 99% because it's been griefed by load

23:16 <daniels> then someone has to reassign it to marge and this is further proof CI is awful :P

23:16 <zmike> yea that's why I figured 30 should be safe

23:16 <daniels> seems p reasonable

23:17 <daniels> 60 covered a multitude of sins to begin with, but we're getting aggressive enough with the long tail of fail that 30 should work

23:19 mattrope has quit [Remote host closed the connection]

23:21 <daniels> anyway, you're all Mesa developers, if you want something changed in Mesa then you're just as free to float MRs to change it as anyone else ...

23:21 <zmike> DONE

23:22 <alyssa> zmike: reviewed.

23:23 <zmike> oof and ci exploded already

23:23 <alyssa> How does the kernel handle developer volume vs regressions?

23:24 <daniels> it doesn't

23:24 <robclark> badly?

23:24 <alyssa> right... my audio drivers can attest to that :(

23:24 <daniels> y

23:24 <alyssa> speaking of is rk3399-gru-sound still broken? yes, very much so.

23:25 <bl4ckb0ne> jekstrand: pipeline failed on gles3 for vulkan 1.2.180 https://gitlab.freedesktop.org/mesa/mesa/-/jobs/10613787

23:25 <alyssa> bl4ckb0ne: join the conga line

23:26 <jekstrand> bl4ckb0ne: I know. I re-assigned

23:26 neonking has quit [Remote host closed the connection]

23:27 <airlied> alyssa: the kernel just YOLOs

23:27 <daniels> merged on a whim / no regressions / no central dictation of development priority

23:27 <daniels> pick any two

23:28 <airlied> I pick pikachu and squirtle

23:29 <alyssa> pika pika!

23:29 <bl4ckb0ne> thanks

23:29 <daniels> airlied: glx@glx-wait-msc,Timeout

23:29 mattrope has joined #dri-devel

23:29 <robclark> it could perhaps be useful to have a step between manual pipelines and must-pass-to-merge.. ie. for new or less stable hw, etc.. run the CI job but don't block the CI.. that would at least give some more testing, and a chance for someone to look at the result and decide "yeah, you actually broke that hw, I'm reverting your MR"

23:31 <jekstrand> If machines are falling off the network, I'm not sure that helps. If we have flaky drivers, then, yeah, they shouldn't be in must-pass CI.

23:31 <robclark> well, pass or fail is sum total of the flakes, but yeah if test infra issues then that isn't a justification to revert an MR

23:32 <daniels> jekstrand: in the gap between those two absolutes is the internet

23:32 <daniels> jekstrand: unclear whether it's USB ethernet badness, or cable badness, or switch badness, or just the internet being painful; my guess is on both #1 and #4

23:32 <jekstrand> daniels: Yeah, having the entire internet in the middle doesn't help.

23:33 <airlied> but also usb ethernet

23:33 <daniels> airlied: Chromebooks

23:33 <daniels> would you prefer wifi? :P

23:33 <bnieuwenhuizen> ethernet over serial :P

23:33 <robclark> we can use LTE now :-P

23:33 <daniels> ...

23:33 <jekstrand> But given that I've seen 3 "fall off the network" fails from the same class of machine in an hour likely means it's not just the internet, generally.

23:33 <daniels> https://www.google.com/query?q=fisherman+site:linkedin.com

23:33 <alyssa> If a machine doesn't get to userland (network failure, boot fialure, ....), methinks it should be skipped instead of blocking CI

23:33 <jekstrand> robclark: Don't use the 5G. Didn't you hear? It gives you coronavirus.

23:33 <robclark> :-P

23:33 <daniels> alyssa: it _does_ get to userland

23:34 <alyssa> jekstrand: If you're double vaccinated you can use the 5G, the CDC said so.

23:34 <airlied> daniels: it would be lols if wifi was more stable than the usb ethernet

23:34 <daniels> alyssa: it fails long after it's got an IP through DHCP, pulled the rootfs, etc etc; at some much later stage, a random request fails

23:34 <daniels> alyssa: it already does get silently retried behind the scenes if it fails to make it as far as executing tests

23:34 <alyssa> daniels: Sure, let me amend that -- the/only/ reason pre-merge should fail is if a job actually reported that there are a test regression.

23:35 <daniels> airlied: you've used Intel wifi, right?

23:35 * airlied is doing crocus testing over wifi because I don't have enough ethernet cables :-P

23:35 <daniels> alyssa: I'm on the fence about that

23:35 <alyssa> Arguably CI reports should reallu be a tristate, "definitely passes", "definitely fails", and "inconclusive"

23:35 <daniels> sure

23:35 <daniels> but what do you do with inconclusive?

23:35 <alyssa> Right now we're mapping inconclusive to fail and it's causing burnout.

23:35 <jekstrand> daniels: Rather wifi? That depends on the USB ethernet. If it's the one in my USB-C dock, it routinely falls off the bus if I try to do something heavy like, say, rsync a kernel build.

23:36 <airlied> you map inconclusive to reassign to marge-bot :-P

23:36 <airlied> and watch it loop until someone notices

23:36 <alyssa> I would say report a CI warning in gitlab, but marge bot means that nobody actually looks at CI results unless the job fails.

23:36 <jekstrand> airlied: amazon sells them in 10-packs: https://www.amazon.com/Cable-Matters-10-Pack-Snagless-Ethernet/dp/B00K2E4QZE/ref=sr_1_5?dchild=1&keywords=3ft+ethernet+cables&qid=1623195389&sr=8-5

23:37 <daniels> airlied: or, instead of wasting rebuild cycles and everyone's time, you keep on working your way through the causes of spurious fails like people have been, and you do things like insert retries into network ops

23:37 <daniels> alyssa: I don't see how mapping inconclusive to success is any less frustrating

23:37 <daniels> alyssa: as you say, no-one will ever look at or care about anything unless it's visibly in their face

23:37 <jekstrand> airlied: I might have bought that once or twice and have a 16-port switch on my desk.....

23:38 <alyssa> daniels: It's visibly in the /wrong/ person's face

23:38 <airlied> jekstrand: my 16 port switch needs more ports, though I suppose I could put crocus machines on a 100mb 8-port

23:38 <alyssa> The intersection of "understands arcane NIR details" and "understands arcane LAVA details" is... er, Emma

23:38 <daniels> alyssa: so Mali boards flake 30% of the time and no-one really cares, then jekstrand marges a core NIR change which genuinely breaks Panfrost, it flakes its way through to happy inconclusive success, and the next time you try to merge a non-functional-change Panfrost change, it gets rejected because surprise, NIR is broken

23:39 <daniels> (smash retry until it passes)

23:39 <jekstrand> airlied: I'm pretty sure Amazon can fix that one for you too. :)

23:39 <alyssa> So why when there's an infrastructure problem triggered in a random nir MR is the person getting hurt by that the NIR author, not the LAVA one?

23:39 * jekstrand would hever merge a change which breaks panfrost. All my patches are perfect!

23:39 <alyssa> jekstrand: luv you

23:39 <airlied> jekstrand: I'm trying to get it to fix my no cherryview problem first :-P

23:39 * jekstrand waits for craftyguy to show up and murder him.

23:40 <alyssa> daniels: Are there any Panfrost driver flakes extant?

23:40 <daniels> alyssa: if there are spurious fails, someone whose fault it isn't gets hurt

23:40 anarsoul has quit [Ping timeout: 480 seconds]

23:41 <alyssa> or @ anyone, if you see Panfrost flake (and not the LAVA farm infrastructure issues, genuine spurious fail in the dEQP report), please tell me so I can deal with it, and I'm sorry in advanced if this happens, that's on me.

23:41 <daniels> either that's the immediately proximate person (who can retry for a less-magic-8-ball result), or it's some distant person in the future who has less chance of obtaining an actual answer

23:41 mcan06[m|gr] has joined #dri-devel

23:41 <daniels> alyssa: I mean likewise, please do be telling the infra people if there are infra failures so we can deal with it, and we're sorry in advance if that happens, that's on us :)

23:42 mcan06[m|gr] has quit [Remote host closed the connection]

23:42 <jekstrand> Well, most of the fails I'm seeing today are APLs falling off the internet.

23:42 <jekstrand> Which I guess counts as infra

23:42 <jekstrand> Not sure whose infra or what the infra problem is.

23:43 <jekstrand> For all I know, someone's cat has been chewing on the USB adapter

23:43 <alyssa> Meow.

23:43 <daniels> jekstrand: ours, undefined network, if it's showing up frequently enough to be a blocking issue for you then please assign Marge an MR which disables those jobs with my R-b

23:43 <kisak> I've had more internet failures from rabbits eatting fiber optic line than anything else

23:44 <daniels> and we'll bring it back when we're confident that we've bottomed out whatever issue it has been

23:44 <alyssa> !11246 flaked for the 4th time today.

23:44 <alyssa> ths time it's iris-apl-egl

23:44 <alyssa> disabling that job brb

23:45 * jekstrand is trying to figure out how to disable jobs

23:45 <alyssa> Prepend the job name with a dot

23:46 <daniels> ^

23:47 <alyssa> MR submittted.

23:47 <daniels> the fact that 100% of the APL failures have been network errors at the very end of the job, and that it's isolated to APL rather than any of the machines in the same rack/room/building, makes me think that there's some kind of USB autosuspend badness going on

23:47 <jekstrand> makes sense

23:47 <jekstrand> alyssa: MR?

23:47 <alyssa> !11255

23:48 <alyssa> I personally saw -egl fail but dropped all the APL jobs for good measure if it's infra

23:48 <jekstrand> I've seen -gles3 fail too

23:48 <daniels> if it's infra, it's not going to be API-sensitive

23:48 <jekstrand> yup

23:49 <alyssa> daniels: I guess where we're coming from is just the combinatorics. If a given machine fails once a year, but it takes a day to deal with in total, given we have hundreds of machines that means CI is broken almost always.

23:49 <alyssa> (Hundreds? I've never counted but I imagine across all the different labs and gitlab runners it adds up.)

23:50 <alyssa> (Maybe 100? Numbers are hard. The point still stands.)

23:50 <daniels> again 'almost always' is a million miles away from the actual numbers

23:50 <alyssa> So we're very weary of the strategy of just fixing more problems because the root cause isn't a particular network failure, it's that at the scale we do CI there will /always/ be problems

23:50 <daniels> I know

23:51 <alyssa> Would it help if I keep a log from a dev point of view of MRs I merge and the outcomes?

23:51 <jekstrand> I think marge already keeps that log for us

23:51 <daniels> yep, and you can also pick up the scripts already posted to do some graphing and analysis if you want to get fancy with it

23:52 <daniels> (as long as you have a manual filter for jobs which are legit fails)

23:52 <alyssa> that's the rub..

23:52 anarsoul has joined #dri-devel

23:52 <alyssa> Looking at my recent merged MRs labeled with NIR --

23:52 <alyssa> !11199 went in one try

23:53 anarsoul has quit [Remote host closed the connection]

23:53 <alyssa> !10411 2 tries, dEQP-GLES31 flake on Panfrost which tomeu/bbrezillon found, related to indirect draw stuff

23:53 <alyssa> !10022 one try

23:53 anarsoul has joined #dri-devel

23:53 <alyssa> !10601 2 tries but not obvious if that was flake or fail

23:54 <alyssa> !10578 2 fails

23:54 <alyssa> er

23:54 <alyssa> er 1 try, sorry mixed up

23:54 <alyssa> !10391 1 try.

23:54 <alyssa> ok, so that's nearly as bad as it feels.

23:54 <alyssa> although...

23:55 andrey-konovalov has quit [Ping timeout: 480 seconds]

23:55 aswar002 has quit [Ping timeout: 480 seconds]

23:55 <alyssa> !10022 was over an hour in the queue before marge pushed any commits

23:56 <alyssa> !10601 was about 45 minutes in the queue before getting a commit pushed for one of the two tries

23:58 <robclark> there are defn times when lot of MRs are submitted in short order, and those 10-15min's add up

23:59 <alyssa> yeah.