#dri-devel on 2023-03-27 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:00 cphealy has quit []

00:14 columbarius has joined #dri-devel

00:16 co1umbarius has quit [Ping timeout: 480 seconds]

00:49 dsrt^ has joined #dri-devel

01:30 Danct12 is now known as Guest8981

01:30 Danct12 has joined #dri-devel

01:36 yuq825 has joined #dri-devel

01:46 camus has joined #dri-devel

02:00 thenemesis has joined #dri-devel

02:01 paulk-bis has joined #dri-devel

02:01 thenemesis has quit []

02:02 paulk has quit [Ping timeout: 480 seconds]

02:38 paulk-ter has joined #dri-devel

02:40 paulk-bis has quit [Ping timeout: 480 seconds]

02:56 shoragan has quit [Quit: quit]

02:58 shoragan has joined #dri-devel

03:01 rcf has quit [Quit: WeeChat 3.8]

03:02 rcf has joined #dri-devel

03:02 thenemesis has joined #dri-devel

03:03 thenemesis has quit []

03:05 khfeng has quit [Remote host closed the connection]

03:06 khfeng has joined #dri-devel

03:16 <mareko> fma is also ballooning register pressure over mul+add, many things do

03:21 <alyssa> yes. and?

03:40 camus1 has joined #dri-devel

03:44 camus has quit [Ping timeout: 480 seconds]

03:56 bmodem has joined #dri-devel

04:02 Zopolis4 has joined #dri-devel

04:45 aravind has joined #dri-devel

04:51 robobub has quit []

04:55 heat_ has quit [Ping timeout: 480 seconds]

05:09 jaganteki has quit [Remote host closed the connection]

05:09 bgs has joined #dri-devel

05:13 godvino has joined #dri-devel

05:19 Guest8981 has quit [Remote host closed the connection]

05:20 Daanct12 has joined #dri-devel

05:23 Daanct12 has quit [Remote host closed the connection]

05:23 Daanct12 has joined #dri-devel

05:25 kzd has quit [Ping timeout: 480 seconds]

05:25 Daanct12 has quit [Remote host closed the connection]

05:26 Daanct12 has joined #dri-devel

05:27 Daanct12 has quit [Remote host closed the connection]

05:28 Danct12 has quit [Remote host closed the connection]

05:28 Danct12 has joined #dri-devel

05:28 tzimmermann has joined #dri-devel

05:30 Daanct12 has joined #dri-devel

05:48 apinheiro has quit [Remote host closed the connection]

05:48 fab has joined #dri-devel

05:52 epoll has quit [Ping timeout: 480 seconds]

05:55 Company has quit [Quit: Leaving]

06:00 godvino has quit [Quit: WeeChat 3.6]

06:01 epoll has joined #dri-devel

06:14 danvet has joined #dri-devel

06:42 Zopolis4 has quit []

06:42 fab has quit [Quit: fab]

06:44 bgs has quit [Remote host closed the connection]

06:52 frieder has joined #dri-devel

07:02 mvlad has joined #dri-devel

07:02 sghuge has joined #dri-devel

07:16 jhli has quit [Remote host closed the connection]

07:17 fab has joined #dri-devel

07:22 kts has joined #dri-devel

07:24 tursulin has joined #dri-devel

07:28 fab has quit [Quit: fab]

07:29 fab has joined #dri-devel

07:32 jfalempe has joined #dri-devel

07:33 alanc has quit [Remote host closed the connection]

07:34 alanc has joined #dri-devel

07:37 <danvet> robclark, no ack for the syncobj uapi from gfxstrand or bnieuwenhuizen_ ?

07:43 pcercuei has joined #dri-devel

07:45 <HdkR> danvet: Is this a uapi change? Where can I look at it?

07:47 <danvet> HdkR, https://lore.kernel.org/dri-devel/20230308155322.344664-10-robdclark@gmail.com/

07:47 <danvet> some ack from vk people would be good for this I think

07:52 vliaskov has joined #dri-devel

07:55 <HdkR> Alright cool, looks like the msm change will just work since I was already passing through padding, syncobj_wait is also a passthrough with correct arrangement, and...I'm not even watching the sync_file uapi so if it breaks, wheh

07:56 lynxeye has joined #dri-devel

07:58 apinheiro has joined #dri-devel

07:59 ice99 has joined #dri-devel

07:59 Zopolis4 has joined #dri-devel

08:09 pochu has joined #dri-devel

08:14 tzimmermann has quit [Remote host closed the connection]

08:15 <HdkR> I should probably one day take a scroll through ioctls that are registering compat handlers

08:17 fab_ has joined #dri-devel

08:18 fab_ is now known as Guest9006

08:18 Haaninjo has joined #dri-devel

08:22 <danvet> HdkR, anything new that requires compat handler is kinda a misdesign, we really shouldn't need these anymore

08:23 <danvet> unfortunately android having been 32bit only for a while butchered a few :-/

08:23 <danvet> but in drm I don't think we've screwed up since years

08:23 fab has quit [Ping timeout: 480 seconds]

08:23 <HdkR> There's been a decent number of mess ups with tail padding on the structs

08:23 <HdkR> Recent too

08:23 <danvet> HdkR, care to supply an ack on-list?

08:24 <danvet> HdkR, hm for arrays and stuff?

08:24 <danvet> drm_ioctl() auto-corrects padding both ways, so unless you don't memset the entire struct or something it should work out

08:24 <HdkR> nah, just ending the struct with a uint32_t and the tail alignment ends with 32-bit or 64-bit aligned depending on bitness

08:25 <HdkR> Is the most common thing I've noticed

08:25 <danvet> yeah I think we screw that sometimes, but it's also shouldn't hurt (unless it's an array)

08:25 <HdkR> Changes the ioctl number which flags my CI :P

08:26 <danvet> oh for secomp filtering and stuff like that?

08:27 <HdkR> I didn't even think about seccomp filtering. That...might be interesting

08:27 <danvet> sometimes we need to patch it up like DMA_BUF_SET_NAME_A/B

08:27 <HdkR> But since FEX is translating 32-bit ioctls to 64-bit ioctls, any change in ioctl size is a flag for bad smell

08:27 <danvet> but drm_ioctl handling padding either way absorbs these fully

08:28 Danct12 has quit [Quit: WeeChat 3.8]

08:28 tzimmermann has joined #dri-devel

08:28 <danvet> HdkR, ah yeah if you write your own compat code then this sucks :-(

08:28 <HdkR> Indeed!

08:29 <danvet> just wondering whether a makefile check for uapi header structs would be doable

08:29 <danvet> would need a ton of annotations for all the current ones that are busted though

08:30 <HdkR> I have a ton of annotations specifically because of this. Mostly focused around drm, and I'm /definitely/ missing some

08:30 Ahuj has joined #dri-devel

08:31 <HdkR> Then a clang script that compares member alignment, size, offset, etc between architectures

08:31 rasterman has joined #dri-devel

08:32 <danvet> HdkR, maybe we want gitlab first to make sure it's actually used, but this sounds like something that would be really useful in upstream

08:32 <danvet> maybe opt-in for drm only at first

08:33 <HdkR> I will love if this could get maintained upstream at some point so I don't need to constantly glare at the ML for uapi changes :P

09:13 srslypascal is now known as Guest9009

09:13 srslypascal has joined #dri-devel

09:14 <HdkR> https://github.com/FEX-Emu/FEX/blob/main/Source/Tests/LinuxSyscalls/x32/Ioctl/HelperDefines.h#L13 As a reference, I do some basic annotation of the structs through wrapping the ioctl and which works in tandem with the clang tool to pull out all of the information. It could be cleaner

09:15 <HdkR> Doing it this way since I don't annotate the struct definitions directly so I can use unmodified headers

09:16 <HdkR> Also needs to handle opaque ioctl definitions that don't have the struct information declared in the `_IO_RW`, which is really annoying to validate

09:17 Guest9009 has quit [Ping timeout: 480 seconds]

09:33 jdavies has joined #dri-devel

09:34 jdavies is now known as Guest9010

09:42 Guest9010 has quit [Ping timeout: 480 seconds]

09:50 vliaskov has quit [Ping timeout: 480 seconds]

09:54 gio has joined #dri-devel

09:58 vliaskov has joined #dri-devel

09:58 frieder has quit [Ping timeout: 480 seconds]

10:19 nimrod456 has joined #dri-devel

10:20 nimrod456 has quit [Remote host closed the connection]

10:23 fe_sch[m] has joined #dri-devel

10:25 frieder has joined #dri-devel

10:25 agneli_ has joined #dri-devel

10:25 frieder has quit [Remote host closed the connection]

10:26 frieder has joined #dri-devel

10:27 agneli has quit [Ping timeout: 480 seconds]

10:37 srslypascal has quit [Ping timeout: 480 seconds]

10:38 heat has joined #dri-devel

10:39 srslypascal has joined #dri-devel

10:40 apinheiro has quit [Ping timeout: 480 seconds]

10:42 frieder has quit [Ping timeout: 480 seconds]

10:45 srslypascal is now known as Guest9016

10:45 srslypascal has joined #dri-devel

10:45 jaganteki has joined #dri-devel

10:49 apinheiro has joined #dri-devel

10:51 frieder has joined #dri-devel

10:51 Guest9016 has quit [Ping timeout: 480 seconds]

10:56 yuq825 has left #dri-devel [#dri-devel]

10:59 devilhorns has joined #dri-devel

10:59 tango_ has joined #dri-devel

11:03 vals_ has quit [Ping timeout: 480 seconds]

11:13 thenemesis has joined #dri-devel

11:19 srslypascal has quit [Ping timeout: 480 seconds]

11:20 tobiasjakobi has joined #dri-devel

11:20 tobiasjakobi has quit [Remote host closed the connection]

11:22 srslypascal has joined #dri-devel

11:24 kxkamil has quit []

11:25 angerctl is now known as Namarrgon

11:42 jaganteki has quit [Remote host closed the connection]

11:50 thenemesis has quit []

11:52 thenemesis has joined #dri-devel

11:59 frieder has quit [Ping timeout: 480 seconds]

12:00 gouchi has joined #dri-devel

12:00 thenemesis has quit [Ping timeout: 480 seconds]

12:07 agneli has joined #dri-devel

12:08 agneli_ has quit [Ping timeout: 480 seconds]

12:09 frieder has joined #dri-devel

12:10 devilhorns has quit []

12:15 Ahuj has quit [Ping timeout: 480 seconds]

12:15 gio has quit [Ping timeout: 480 seconds]

12:19 gio has joined #dri-devel

12:28 mohamexiety has joined #dri-devel

12:43 dtmrzgl has quit [Ping timeout: 480 seconds]

12:45 rosefromthedead has joined #dri-devel

12:53 fxkamd has joined #dri-devel

12:58 Dr_Who has joined #dri-devel

12:59 <eric_engestrom> DavidHeidelberg[m]: not sure what you're trying to do with the CI compiler wrapper, but yes, right now it's there to pass `-Werror` because meson ignores `-werror=true` (which IMO is a meson bug), and in my WIP MR I'm also adding `--fatal-warnings` in that wrapper (arguable whether meson should do that, so we might have to keep the wrapper indefinitely if meson doesn't accept that)

13:00 <eric_engestrom> DavidHeidelberg[m]: but for `clang` vs `clang-15`, you can trivially add a new 2-lines wrapper that includes the version number in the executable name

13:04 bmodem1 has joined #dri-devel

13:04 <DavidHeidelberg[m]> My approach was to remove dash and number; ofc I can, I'm just thinking about how complex and ugly it looks :(

13:04 kts has quit [Ping timeout: 480 seconds]

13:04 jaganteki has joined #dri-devel

13:05 <DavidHeidelberg[m]> wish meson would fix that behaviour to get along with our needs rather than this hacking around

13:05 * eric_engestrom nods

13:06 <eric_engestrom> the whole "`clang` something has a version number in its executable name" is a mess, and the way meson deals with it is another mess :/

13:06 <eric_engestrom> s/something/sometimes/

13:09 <MrCooper> eric_engestrom: FWIW, -Wl,--fatal-warnings could be added to c{,pp}_link_args, unless that breaks some meson configure checks as well?

13:10 <eric_engestrom> MrCooper: you're right, and that's what I was doing at first actually

13:10 <eric_engestrom> now I'm wondering why I changed this to use the wrapper instead

13:11 bmodem has quit [Ping timeout: 480 seconds]

13:11 <eric_engestrom> but very good point, so the wrapper is only needed until meson's `--werror=true` enables `-Werror`

13:13 <MrCooper> right

13:23 frieder has quit [Ping timeout: 480 seconds]

13:33 frieder has joined #dri-devel

13:34 vandemar has joined #dri-devel

13:38 elongbug has joined #dri-devel

13:39 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

13:42 <vandemar> I keep getting segfaults in a game, after anywhere from a few minutes to 1-2 hours, running on a radeon 6700, in mesa 23.0.0 amdgpu_cs_add_buffer:674, more details https://pastebin.com/Zkup7jaV

13:44 <vandemar> Is there anything else I should investigate, or should I file a bug, or is it a known issue? I didn't see any recent similar looking reports of segfaults, but I didn't look that far back.

13:45 <pepp> vandemar: please file a bug

13:47 dsrt^ has quit [Remote host closed the connection]

13:48 <eric_engestrom> MrCooper, DavidHeidelberg[m]: MR updated (although it is still blocked by the bug of telling the linker we export symbols that don't exist depending on which driver we build), but it made me wonder: why to we do `-D c_args="$(echo -n $C_ARGS)"` instead of `-D c_args="$C_ARGS"`? I don't think duplicate whitespaces are an issue, if that's what this was trying to remove

13:48 <pepp> vandemar: identifying which call to radeon_add_to_buffer_list() from si_emit_draw_packets the lead to the crash would help

13:49 <MrCooper> eric_engestrom: it's to get rid of the newlines

13:50 <eric_engestrom> if we're using `>` there shouldn't be any newline?

13:50 srslypascal has quit [Quit: Leaving]

13:50 <eric_engestrom> in yaml I mean

13:50 <eric_engestrom> oh, perhaps a trailing newline, maybe we should have `>-`

13:51 <MrCooper> maybe, can't remember the details

13:51 aravind has quit [Remote host closed the connection]

13:51 aravind has joined #dri-devel

13:52 <zmike> mareko: for KHR-GL46.gpu_shader_fp64.fp64.state_query doesn't this mean the TES uniforms are also not referenced since the GS isn't reading gl_Position?

13:54 srslypascal has joined #dri-devel

13:55 <vandemar> pepp: ok. it's line 1523 (23.0.0) in si_emit_draw_packets, in the if(index_size) branch

13:56 <pepp> vandemar: can you print the indexbuf variable with gdb?

13:57 <vandemar> no locals? should there be?

14:01 Guest9006 has quit []

14:15 bmodem has joined #dri-devel

14:16 kts has joined #dri-devel

14:18 frieder has quit [Remote host closed the connection]

14:18 srslypascal has quit [Ping timeout: 480 seconds]

14:21 srslypascal has joined #dri-devel

14:21 bmodem1 has quit [Ping timeout: 480 seconds]

14:22 Zopolis4 has quit []

14:22 aravind has quit [Ping timeout: 480 seconds]

14:22 aravind has joined #dri-devel

14:27 fab has joined #dri-devel

14:27 srslypascal is now known as Guest9029

14:27 kzd has joined #dri-devel

14:27 srslypascal has joined #dri-devel

14:32 gawin has joined #dri-devel

14:32 Guest9029 has quit [Ping timeout: 480 seconds]

14:38 <gfxstrand> robclark: For your 64+32 loads, is the 32-bit offset signed or unsigned?

14:41 <robclark> gfxstrand: IIRC signed in most (all?) cases

14:41 kts_ has joined #dri-devel

14:41 kts_ has quit []

14:46 kts has quit [Ping timeout: 480 seconds]

14:47 <gfxstrand> robclark: Ok, so at least we have consistency there.

14:47 <gfxstrand> etnaviv is 32+32 so no signeness issues there.

14:50 <robclark> I think the harder thing is that there are several different encodings we can pick (ie. one option has a small const offset, but we can optionally left-shift the variable src by a PoT.. or we can choose a different option w/ larger const offset, etc)

14:51 <gfxstrand> For NV, we can just re-materialize the add if offset > INT32_MAX

14:52 kts has joined #dri-devel

14:52 <gfxstrand> I'm more concerned about hardware that can do non-const offsets.

14:53 <robclark> hmm, I think we'd prefer the offset to _not_ be 64b (but happy for nir not to lower to this form if offset is too big)

14:53 <gfxstrand> Because we can't know whether or not the offset is negative then

14:53 <robclark> in practice I don't think you'll see many >4GB buffers ;-)

14:53 <gfxstrand> Oh, the plan is for the offset to be 32bit

14:53 <gfxstrand> always

14:53 <robclark> +1

14:53 frieder has joined #dri-devel

14:53 <gfxstrand> The question is how sign-extension is handled on the 32-bit part

14:54 <robclark> hmm

14:54 <robclark> I think I'd have to play with that

14:54 <gfxstrand> NVIDIA only has const offsets and they're signed. So if it's unsigned, we just check for <= INT32_MAX and re-mat the add if it would be negative.

14:54 <gfxstrand> For Etnaviv, it's 32+32 so there is no zero- or sign-extension

14:55 <gfxstrand> It's freedreno where we're going to have a problem if we get it wrong.

14:55 bluetail30 has quit [Ping timeout: 480 seconds]

14:55 <gfxstrand> For NIR internals, I think I'd like it to be unsigned because then it matches nir_address_format_64bit_global_32bit_offset.

14:55 <gfxstrand> But NIR can be changed, hardware can't.

14:56 <gfxstrand> It also depends a bit on how we want to deal with offsets. Array indices are typically signed so for an OpenCL-like case where you have a pointer that you're treating as an array, you may have a negative offset.

14:59 <robclark> I'm going to hope/assume that hw handles negative offset in a sane way, but will need to do some experiments

15:00 bluetail30 has joined #dri-devel

15:04 <gfxstrand> k

15:07 aravind has quit [Ping timeout: 480 seconds]

15:10 rosefromthedead has quit [Ping timeout: 480 seconds]

15:20 Duke`` has joined #dri-devel

15:21 frieder has quit [Remote host closed the connection]

15:21 frieder has joined #dri-devel

15:26 soreau has quit [Quit: Leaving]

15:29 ice99 has quit [Ping timeout: 480 seconds]

15:30 <mareko> zmike: yes

15:32 bgs has joined #dri-devel

15:38 <alyssa> HdkR: wouldn't it be less work to implement 32-bit libGL/libVK thunking and call it a day?

15:42 <eric_engestrom> any objection to exporting drmOpenMinor()? it's the only missing piece to implement https://gitlab.freedesktop.org/mesa/drm/-/issues/85#note_1839524

15:43 <eric_engestrom> (marcan's suggestion of looping over the devices once and picking the driver based on that, instead of going through all the devices for all the drivers until one loads)

15:44 stuarts has joined #dri-devel

15:45 <robclark> gfxstrand: ok, so the variable src offset is unsigned 32b

15:45 <gfxstrand> Venemo: What does AMD have for offsets on load/store_global?

15:45 <gfxstrand> robclark: Okay...

15:45 <gfxstrand> robclark: But some of the constant ones are signed?

15:46 <robclark> I _think_ so.. they are also a lot smaller than 32b

15:47 <robclark> or at least we have signed offsets in some other places

15:50 <alyssa> gfxstrand: for AGX, 64+32 either signed or unsigned, but the offset is implicitly multiplied by the element size (4 unless you're doing 8/16-bit loads)

15:51 <alyssa> agx_nir_lower_address sorts out the mess, not planning to use common code becuase that last offset multiply thing is off the deep ened

15:52 Company has joined #dri-devel

15:52 gouchi has quit [Quit: Quitte]

15:52 <Venemo> gfxstrand: I think we have a vector address, scalar address and a constant address, but pendingchaos can correct me if I'm wrong

15:53 frieder has quit [Remote host closed the connection]

15:55 <robclark> gfxstrand: yeah, immed offsets encoded in instruction are signed

15:56 <robclark> and yeah, offset is in units of element size as well

15:56 <gfxstrand> robclark: Are your non-constant offsets in element size units?

15:57 <robclark> yeah

15:57 <gfxstrand> bugger. Okay.

15:58 <gfxstrand> So now we have Immaginappledreno?!?

15:58 <robclark> yeah, I don't think unaligned access is a thing ;-)

15:59 <gfxstrand> I fear I may be starting to agree with Venemo that we want to just add a signed 32-bit offset const index to load/store_global.

16:00 <gfxstrand> nir_lower_mem_access_bit_sizes is going to hate me for it.

16:00 <gfxstrand> Or maybe we just assert offset == 0 in that pass

16:01 <robclark> hmm, I was kinda expecting that lower-derefs could give us a 32b offset that driver can consume..

16:01 <gfxstrand> Yeah, there's some debate to be had about where the best place to do that offsetting is

16:02 <robclark> if you think about array+struct dereferencing you'd design an instruction exactly like what I have ;-)

16:03 <gfxstrand> Right up until some app does a bit of "creative" casting. :P

16:04 kts has quit [Quit: Konversation terminated!]

16:04 <robclark> sure there isn't some weasle out clause somewhere in the spec declaring that negative array indexing is UB?

16:10 srslypascal has quit [Quit: Leaving]

16:16 <gfxstrand> Negative array indexing is UB assuming you get the entire array access in a single index.

16:17 <gfxstrand> However there's OpPtrAsArray which lets you effectively do (&arr[7])[-3] which would be arr[4] which is valid.

16:18 <robclark> if the xyz[-3] is UB how does it become defined if xyz=&arr[7]?

16:18 pochu has quit [Quit: leaving]

16:19 srslypascal has joined #dri-devel

16:21 <pendingchaos> AMD global loads on gfx6 and gfx9+ can have a limited constant offset, which can be negative on gfx9+

16:21 <pendingchaos> for gfx6: it can do either a sgpr/vgpr address and a unsigned 32-bit sgpr offset

16:21 <pendingchaos> for gfx9+: it can take a vgpr address, or a sgpr address and unsigned 32-bit vgpr offset

16:21 <pendingchaos> for gfx7/8: it can take a vgpr address and offsets have to be lowered as an addition

16:21 <gfxstrand> robclark: The UB comes in when you actually access outside arr[], not based purely on the index.

16:22 <gfxstrand> robclark: As long as all your OpPtrAssArray end you back inside the array, you're fine.

16:22 <alyssa> gfxstrand: wait do we know what imagination does?

16:22 <gfxstrand> Well, as long as you never end up outside.

16:22 <gfxstrand> alyssa: No, I'm just making silly jokes. :P

16:22 <alyssa> the ISA on AGX is all Apple as far as we know

16:22 tursulin has quit [Ping timeout: 480 seconds]

16:23 <gfxstrand> robclark: So if you have int arr[10] then arr[7] is valid as is (&arr[7])[-3] because that's just arr[4]. I'm not sure about (&arr[12])[-5]. That one might be UB.

16:25 <robclark> hmm, this is only a problem if part of the offset is folded into the 64b address..

16:26 <robclark> although maybe we can't 100% avoid that?

16:26 <gfxstrand> Yeah

16:26 <gfxstrand> I mean, it's not strictly speaking a problem. We can handle whatever. It just affects design a bit.

16:26 <alyssa> appledreno

16:28 <cwabbott> robclark: yeah, presumably the two parts could be separated by other stuff in between

16:28 <cwabbott> pass it as a function argument, use in a phi, you name it

16:32 junaid has joined #dri-devel

16:33 <robclark> I guess even if we limited it to cases where we could tell at compile time that the base address is the actual start of the object, that seems fine.. if you're trying to be a jerk to the compiler I don't really care if you get horrible suboptimal code ;-)

16:36 bmodem has quit [Ping timeout: 480 seconds]

16:37 <gfxstrand> At this point, I think we want to just add a signed 32-bit offset to load/store_global like Venemo said. I don't like calling it "base" but we can add a new name. Names are cheap.

16:37 <gfxstrand> italove: ^^

16:38 <gfxstrand> For the non-constant case, IDK what to do. It's also a far less important case, IMO. If there's a few adds in the shader, oh well.

16:39 <robclark> non-constant == any array access

16:39 srslypascal is now known as Guest9038

16:39 srslypascal has joined #dri-devel

16:39 <robclark> and few adds means bunch of instructions after lowering to 32b :-(

16:40 <gfxstrand> robclark: Yeah, but when it comes to evolving core NIR, I think we start with what we know.

16:40 vliaskov has quit []

16:41 nekit has quit [Quit: The Lounge - https://thelounge.chat]

16:41 <gfxstrand> I'm not going to suggest we take away your ir3 intrinsics :)

16:41 <gfxstrand> Just make the common load/store more capable.

16:43 <gfxstrand> The constant offset case is the one lots of hardware can do, where signed vs. unsigned matters less, and where it's easy enough / zero-cost to do `offset / (bit_size/8)` if you need to

16:43 pallavim has joined #dri-devel

16:44 <Venemo> gfxstrand: thx :)

16:45 <robclark> hmm, it is easy enough to check if nir src is const.. and if we _eventually_ want to support variable offset, does that mean we end up with both const and non-const offset?

16:45 Guest9038 has quit [Ping timeout: 480 seconds]

16:46 <gfxstrand> robclark: No. The load/store_global_offset32 versions that we'd add wouldn't have a const_index offset.

16:46 <anholt_> gallo: a618 perf traces busted? https://gitlab.freedesktop.org/mesa/mesa/-/jobs/38738095

16:47 <robclark> hmm, ok

16:48 <gfxstrand> I wish we could practically just do load/store_global_offset32 and have the driver do nir_src_is_const() on it but there seem to be too many variables right now to figure out the right thing to do globally there.

16:48 <gfxstrand> signed vs. unsigned, offset units, etc.

16:49 <robclark> offset units I think you can punt on... a later stage pass would turn it into driver specific intrinsic and add the approprate PoT shift (and then maybe that shift gets optimized back out again)

16:50 <robclark> I'd look at it this way, it is a lot easier for backend to do something like that than it is to fish out 32b'ness from 64b math

16:50 <gfxstrand> Oh, no arguments there.

16:50 pivi has joined #dri-devel

16:51 <gfxstrand> As long as you have something to optimize to which you can do before you lower 64b math to 32b, I think that's the important thing.

16:51 <robclark> there is still the signedness issue.. but at least offset units is easy to push to driver

16:53 pivi has quit []

16:53 <danvet> gfxstrand, different topic, ack on the syncobj deadline from robclark?

16:53 <danvet> it looked just like nits you had

16:53 <danvet> just want to make sure since uapi

16:53 junaid has quit [Remote host closed the connection]

16:56 <gfxstrand> danvet: Yeah, I'm gonna look at that again. Been on Khronos calls and thinking about NIR pointers all morning.

16:57 <danvet> gfxstrand, pls ping me when you're all happy so I can pull robclark's pr

16:57 <danvet> or ask for respin :-)

17:06 nekit has joined #dri-devel

17:16 jhli has joined #dri-devel

17:19 Ahuj has joined #dri-devel

17:20 lynxeye has quit [Quit: Leaving.]

17:21 smilessh has quit [Ping timeout: 480 seconds]

17:22 <hays> what could it mean if during the boot process, I get no hdmi output until wayland/x11 loads?

17:22 <hays> apologize if this is the wrong channel

17:34 <ccr> I have an ASUS motherboard that occasionally starts acting so that there's no HDMI output until Xorg starts, it apparently suddenly decides that it prefers VGA output (board has VGA, DVI and HDMI).

17:35 <ccr> 3

17:40 <hays> hmm that is an interesting thing to check. there are 2 hdmi outputs on this thing

17:41 srslypascal has quit [Ping timeout: 480 seconds]

17:43 <hays> kernel commandline lets you do video=HDMI-1:D video=HDMI-2:D but unclear semantics of that...

17:43 srslypascal has joined #dri-devel

17:43 <hays> but seems worth trying

17:45 <ccr> you could also check if UEFI/BIOS has some setting for preferred connector

17:46 <ccr> (in my case it's a weird bug in UEFI or something)

17:49 srslypascal has quit [Read error: No route to host]

17:50 djbw has joined #dri-devel

17:51 srslypascal has joined #dri-devel

17:55 gawin has quit [Quit: Konversation terminated!]

17:55 djbw has quit [Read error: Connection reset by peer]

17:58 oliviac has joined #dri-devel

18:03 ngcortes has joined #dri-devel

18:03 soreau has joined #dri-devel

18:11 <gfxstrand> robclark: RE deadline defaults: What I wouldn't give for Rust's Option<T> right about now....

18:12 <robclark> Option<T> is nice for other reasons.. but doesn't really solve the issue that driver doesn't have enough information from app ;-)

18:13 <robclark> vk extension wouild be a good thing.. but I think short term a 90% soln and driconf if needed are workable

18:15 mohamexiety has quit []

18:20 konstantin_ has joined #dri-devel

18:22 srslypascal is now known as Guest9047

18:22 srslypascal has joined #dri-devel

18:22 <gfxstrand> robclark: No, but it solves the "how do I indicate that I don't care?" internal API question :)

18:23 <danvet> just a real type system would be nice

18:24 konstantin has quit [Ping timeout: 480 seconds]

18:25 djbw has joined #dri-devel

18:25 <gfxstrand> Well, yeah.

18:26 Guest9047 has quit [Ping timeout: 480 seconds]

18:33 rasterman has quit [Quit: Gettin' stinky!]

18:41 <jenatali> I'm curious if there's Vulkan drivers out there that are working around app bugs in descriptor indexing. Specifically out-of-bounds indexing

18:41 <jenatali> (Unless that's supposed to be well-defined and I just missed a bit of spec language)

18:43 oliviac has quit []

18:43 flibitijibibo has quit [Read error: Connection reset by peer]

18:43 <gfxstrand> With robustness enabled, it's supposed to be less buggy

18:44 srslypascal is now known as Guest9049

18:44 srslypascal has joined #dri-devel

18:45 tzimmermann has quit [Quit: Leaving]

18:45 flibitijibibo has joined #dri-devel

18:50 Guest9049 has quit [Ping timeout: 480 seconds]

18:50 <jenatali> gfxstrand: Which robustness?

18:51 <gfxstrand> For buffers, it's robustBufferAccess, I think. For others, I don't remember the details.

18:51 <jenatali> I'm not talking out-of-bounds within a buffer, I'm talking out-of-bounds within an array of buffers

18:52 <gfxstrand> I know

18:52 <gfxstrand> You're going to make me look this up, aren't you?

18:52 <jenatali> That's a strong way to phrase it :P

18:59 flibitijibibo has quit [Ping timeout: 480 seconds]

19:00 <jenatali> gfxstrand: https://vulkan.lunarg.com/doc/view/1.3.204.1/windows/gpu_validation.html - the validation described here would seem to declare it's invalid, and I don't see any caveats around robustness

19:03 <gfxstrand> jenatali: I'm not seeing it in the spec anywhere

19:03 <jenatali> Awesome. I'm still debugging DOOM Eternal and it looks like they do it

19:05 <gfxstrand> That's believable

19:06 <DavidHeidelberg[m]> jenatali: on which difficulty you do debug? :D

19:07 <jenatali> David Heidelberg: When debugging an app like that, there's only one difficulty

19:07 <jenatali> And it hurts

19:07 <DavidHeidelberg[m]> nightmare :P ?

19:08 <alyssa> jenatali: the natural descriptor indexing impl on some hw is going to be robust against that automatically so I could easily believe in app bugs

19:08 <alyssa> robclark: optimizing out the shift after it's added in is tricky

19:08 <alyssa> because `(a << b) >> b` is not equal to a in case of overflow

19:09 <gfxstrand> No, but ((a << b) >> b) << b is

19:09 <jenatali> alyssa: Interesting. I guess I'm confused as to what the behavior would be, if e.g. you index out-of-bounds of samplers

19:09 <alyssa> jenatali: on valhall, return all-0 texels, iirc

19:09 anarsoul has quit [Ping timeout: 480 seconds]

19:10 <alyssa> but I agree that's goofy

19:10 <jenatali> Huh, cool

19:10 <alyssa> I don't think Arm specifically designed that, more of an emergent behaviour

19:10 <alyssa> the texture instructions check for valid descriptors and return 0's if not present (ostensibly)

19:11 <alyssa> so I think it Just Works

19:11 <jenatali> Huh

19:11 <alyssa> but also I haven't tried and don't particularly care

19:11 <jenatali> D3D leaves it as undefined behavior if you're using the native indexing functionality, but the stuff I had to do to get VK to work make it so much worse

19:12 <alyssa> nod

19:12 anarsoul has joined #dri-devel

19:13 <alyssa> we don't need another blog post about descriptor sets

19:13 <alyssa> i learned that at xdc

19:13 <alyssa> :p

19:18 ngcortes has quit [Ping timeout: 480 seconds]

19:33 Namarrgon has quit [Quit: WeeChat 3.7.1]

19:36 mvlad has quit [Remote host closed the connection]

19:39 <italove> gfxstrand: ack on adding a const index offset to load/store_global and handling non-const separately, but then wouldn't it make sense to just use `base` so that we can fit more cleanly into `nir_opt_offsets`? It seems other load/store ops already assume `base` is an offset. Or you think that's not a good enough reason?

19:40 gouchi has joined #dri-devel

19:40 Namarrgon has joined #dri-devel

19:41 ngcortes has joined #dri-devel

19:42 lemonzest has quit [Quit: WeeChat 3.6]

19:43 soreau has quit [Ping timeout: 480 seconds]

19:49 ybogdano is now known as Guest9052

19:49 ybogdano has joined #dri-devel

19:52 soreau has joined #dri-devel

19:58 <HdkR> alyssa: 32-bit thunking is a lot of work. Which is why we will have both. Also we still need to support all the....other 32-bit ioctl mess outside of DRM still

20:01 <alyssa> HdkR: fair enough

20:02 <alyssa> still seems like it ought to be the easier of the two but I don't know what I don't know

20:02 <HdkR> :P

20:02 <alyssa> gfxstrand: italove: Wait, are we touching core load/store_global? nak on that, not thrilled on the idea of auditing every backend

20:02 <alyssa> or just adding new offset versions?

20:02 <alyssa> IDk

20:03 <alyssa> TBH, we need backend specific address arithmetic opt for some backends regardless so I'm pretty meh about the whole thing

20:08 rmckeever has joined #dri-devel

20:11 paulk-ter has quit [Remote host closed the connection]

20:20 agd5f_ has quit []

20:20 agd5f has joined #dri-devel

20:37 Ahuj has quit [Ping timeout: 480 seconds]

20:41 <gfxstrand> alyssa: The plan was to modify load/store_global but make it so non-zero offsets are optional.

20:42 <gfxstrand> alyssa: So the "audit" would just be running around to all the back-ends and adding `assert(nir_intrinsic_offset(intrin) == 0)` to their load/store_global handling.

20:42 <gfxstrand> Should be pretty mechanical and fool-proof.

20:45 konstantin has joined #dri-devel

20:46 ice99 has joined #dri-devel

20:47 ybogdano has quit [Quit: The Lounge - https://thelounge.chat]

20:47 Duke`` has quit [Ping timeout: 480 seconds]

20:48 bgs has quit [Remote host closed the connection]

20:48 ybogdano has joined #dri-devel

20:48 konstantin_ has quit [Ping timeout: 480 seconds]

20:49 ybogdano has quit []

20:51 Zopolis4 has joined #dri-devel

20:51 ybogdano has joined #dri-devel

21:11 flibitijibibo has joined #dri-devel

21:19 bnieuwenhuizen_ has quit []

21:20 bnieuwenhuizen has joined #dri-devel

21:20 fab has quit [Quit: fab]

21:29 <Kayden> has anyone had much luck running specviewperf2020?

21:30 <Kayden> something with the nodejs gui stuff seems to be going wrong here and it just never displays anything on the screen at all, either when trying to use the viewset downloader, or just running the benchmarks

21:34 <Kayden> IIRC strace just showed it sitting around blocked in a read call waiting on some socket forever

21:37 <alyssa> gfxstrand: meh, fair enough

21:37 <alyssa> is constant-only offsets really worth plumbing into NIR?

21:38 <alyssa> i don't care i just don't plan on using it for 3/4 of the ISAs that I work on

21:38 <alyssa> I'll use it for #4 i guess (Valhall)

21:39 <gfxstrand> alyssa: I guess it depends on how much nir_ssa_scalar crawling you want back-ends doing

21:39 <alyssa> shrug

21:40 danvet has quit [Ping timeout: 480 seconds]

21:40 <alyssa> I don't see this materially reducing the backend crawls if backends want competent address arithm

21:40 <alyssa> since most (all?) backends can do something more than just constant only which is the absolute lowest common denominator

21:41 ice99 has quit [Ping timeout: 480 seconds]

21:41 <gfxstrand> Intel can't do anything. Nvidia is const-only

21:41 <alyssa> OK

21:41 <alyssa> so.. it sounds like NV is the only backend that actually is simplified by this.

21:41 <alyssa> since every other backend either can't do it at all or can do something more complicated

21:41 <alyssa> and in either case won't reduce backend crawling with a "common" crawl

21:42 <gfxstrand> The perfect is killing the good right now...

21:42 <alyssa> is it?

21:42 <alyssa> I'm just not sold on what the good is

21:43 <alyssa> We *already* have backend crawling for most of the backends that care for it, for that matter.

21:43 <gfxstrand> You really don't want back-end crawling if your backend wants lower_int64

21:43 <gfxstrand> crawling post-int64-lowering is a giant pain

21:43 <alyssa> yes, this is true.

21:43 <gfxstrand> So we need something that can get lowered at least a little earlier in the pipeline.

21:43 <alyssa> and the reason I haven't bothered on mali, but that's more because I keep procrastinating on wiring up the real int64 that does exist lol

21:44 <gfxstrand> If you haven't bothered on mali, then we don't "already" have it everywhere.

21:44 <alyssa> I mean

21:44 <alyssa> I don't plan to wire up the new thing on mali either ;-)

21:44 Dr_Who has quit [Ping timeout: 480 seconds]

21:44 <alyssa> mostly because even doing constant-only is a huge pain on bifrost

21:44 <gfxstrand> That's fine. That's your choice.

21:45 <gfxstrand> It's optional. You can just not use it on bifrost.

21:45 <alyssa> For context, AGX uses lower_int64 but I call agx_nir_lower_address immediately before nir_lower_int64 iirc

21:45 gouchi has quit [Remote host closed the connection]

21:45 <gfxstrand> I'm just saying there's a giant difference between "Thing X already exists everywhere, you don't need to bother" and "Think X *could* exist everywhere but I haven't bothered because it's a pain."

21:45 <alyssa> and agx_nir_lower_address lowers to load/store_agx intrinsics which map directly to the hw

21:45 <gfxstrand> The later case is exactly when we want common things.

21:45 <alyssa> Fair enough

21:46 <gfxstrand> Maybe the answer is everyone has their own load/store_hw intrinsics and a lowering pass.

21:46 <alyssa> gfxstrand: My opinion -- take it or leave it -- is add load/store_global_offset intrinsics (I think we already have the load) and add a NIR pass to do the crawl, it can live in src/compiler/nir/ if there's a second user of it

21:46 <gfxstrand> It seems like we can do better than that.

21:47 <gfxstrand> alyssa: Yeah, well that's the approach Venemo is NAKing.

21:47 <alyssa> NAK and maybe Mali can use that (and call the common crawl right before lower_int64)

21:47 <alyssa> ugh

21:47 <gfxstrand> And where I started

21:47 <alyssa> Venemo: what's your nak?

21:47 <gfxstrand> That's why the perfect hates the good here. :P

21:47 <alyssa> load_global_nak

21:47 <alyssa> then you can't nak it

21:47 <alyssa> or maybe you have to nak it

21:47 <alyssa> shit

21:47 <Venemo> I think load_global should be consistent with load_everything_else and have a const offset

21:48 <gfxstrand> Venemo: Except that's not everything else

21:48 <gfxstrand> That's everything else that doesn't take explicit addresses.

21:48 <Venemo> eg. load_shared

21:48 <gfxstrand> Wait, when did load_shared grow an extra offset?!?

21:49 <alyssa> uhh

21:49 <alyssa> did I miss that too?

21:49 <gfxstrand> Maybe it always had one? Could be an artifact of history, that one.

21:49 <alyssa> I did not know it had one?!

21:49 <alyssa> is this broken in my backends?

21:49 <Venemo> load_shared has a base offset

21:50 <Venemo> so it makes sense to add it to load_global as well

21:50 apinheiro has quit [Quit: Leaving]

21:50 <Venemo> that said, there is no hard NAK from me, it's just that I don't see why we should keep the intrinsic without the offset when we add the new one with the offset

21:50 <gfxstrand> This goes back a long ways...

21:51 <gfxstrand> Like pre-2018

21:51 <Venemo> if you add a new one with a nee offset, who would actually want to use the old one?

21:51 <alyssa> me

21:51 <Venemo> y tho?

21:51 <alyssa> because I'm doing all my own crawling anyway and an extra offset means one more thing to worry about

21:52 <zmike> Kayden: link? I can try it tomorrow and let you know

21:52 <Venemo> you can simply emit an add, it you can elect not to call the pass that collects the const offset, alyssa

21:52 <alyssa> and the "well it'll just always be zero just trust us" is a recipe for brokenness in the future when some clever nir pass comes along in the future

21:52 <Venemo> or* you can elect

21:53 <alyssa> and also I need to get dinner and go to class and don't care about this bikeshed do whatever your want i'm out, peace!

21:53 alyssa has quit [Quit: leaving]

21:53 <gfxstrand> So, RE load/store_shared. The offset appears to go all the way back but it hasn't been used by most backends since we switched to using nir_lower_explicit_io for shared.

21:54 <gfxstrand> The base+offset stuff is typically associated with internal load/store ops, not explicit.

21:55 <gfxstrand> It looks like Intel does support it though I doubt it ever sees a non-zero base these days.

21:55 <gfxstrand> Intel HW doesn't have it, to be clear. The back-end emits an add.

21:58 <Venemo> intel implemented it once I reminded them it's missing

21:58 <gfxstrand> Oh

21:58 <Venemo> it was last year when we started using it in the task shader lowering

21:59 <gfxstrand> ah

21:59 <Venemo> there was a bug due to them not supporting it

22:00 <Venemo> anyway, I am sorry that this discussion offended alyssa, but I think it is good to have consistency between these intrinsics

22:00 <Venemo> it is somewhat a mess though

22:00 <gfxstrand> I'm all for consistency, I'm just not sure which way consistency favors

22:00 <gfxstrand> From the Intel PoV, consistency favors ripping out BASE everywhere.

22:01 <Venemo> we added load_global_amd so we can have the offsets we need, so dunno

22:01 <gfxstrand> Well, that's the problem. :) The hardware is inconsistent.

22:02 <Venemo> if you say we should remove the offsets from the ones that have it and we all should use backend specific ones instead, that is also fine by me

22:02 <gfxstrand> I'm not necessarily saying we should do that.

22:03 <gfxstrand> I'm not sure what the best thing to do is.

22:03 <gfxstrand> I went into today with a pretty clear plan and it's been shot to hell. IDK where that leaves us.

22:03 <Venemo> I'm sorry

22:03 <gfxstrand> It's okay, it's how this process works.

22:04 <gfxstrand> It's good to uncover all the inconsistencies

22:04 <jenatali> Personally, I like having 2 intrinsics, one with a base that must be respected, and one that can never have a source. Having a nir_option for which one you prefer and having some common lowering pass that removes offsets when unwanted across all ops seems right to me

22:04 <jenatali> That way we don't have "assert(offset == 0)" in backends that suddenly blows up because some pass started using offsets

22:04 <gfxstrand> Well, we actually have three cases:

22:05 <gfxstrand> 1) Just an address

22:05 <gfxstrand> 2) Address + imm (where imm has some bit size and signedness)

22:05 <gfxstrand> 3) Address + offset

22:05 <jenatali> True

22:06 <Venemo> maybe we should have some kind of abstract NIR address type, which could be configured by the backend what it means exactly. and then we could all use the same intrinsic

22:06 <jenatali> That already exists, and is consumed by (e.g.) lower_explicit_io to select the right intrinsics for the backend

22:07 <gfxstrand> Intel wants 1, Nvidia wants 2, Etnaviv, Mali, and Freedreno, all want 2+2 with some hand-waved details about how big the immediates can be and the stride of the offset.

22:07 <jenatali> Derefs are already the abstract address type

22:07 <Venemo> I'm not talking about derefs

22:07 <gfxstrand> *want 2+3

22:07 <gfxstrand> Actually, I think Etnaviv just wants 3

22:08 <gfxstrand> Venemo: What you're suggesting is pretty nir_address_mode, just way more of them than we already have.

22:08 <gfxstrand> *pretty much

22:08 <Venemo> maybe the load could take an arbitrary number of sources, but the meaning of those could be configured in nir_compiler_options

22:09 <gfxstrand> That's pretty much nir_address_mode

22:09 <Venemo> I don't see how that helps here

22:10 <gfxstrand> It doesn't, not without way more address modes

22:10 <Venemo> it could even part of the intrinsic itself

22:11 <gfxstrand> But how would you describe what they mean?

22:11 <gfxstrand> If it's all arbitrary and there isn't a machine-understandable meaning, we're back to everyeone having custom back-end intrinsics.

22:12 <Venemo> struct { arbitrary_offset_sources : 8; scalar_offset_sources : 8; const_offset_sources: 8 } each field a bitmask that describes how to interpret each source

22:13 <Venemo> and you can have up to 8 sources in total

22:13 <gfxstrand> Writing correct generic code is going to be near impossible.

22:13 <gfxstrand> We can barely do it now and that's with a way simpler system.

22:14 <gfxstrand> That also doesn't take into account things like strides etc.

22:14 <Venemo> well, then I guess we'll just have to keep using the backend specific intrinsics we already have

22:14 <Venemo> I'm just throwing around random ideas, of course

22:15 <Venemo> not necessarily saying this is any good

22:15 <gfxstrand> How different are the rules on AMD for scalar vs. not?

22:16 <Venemo> what kind of rules do you mean?

22:16 ybogdano has quit [Ping timeout: 480 seconds]

22:16 <Venemo> scalar means it must be an SGPR

22:17 <Venemo> on the NIR level it means divergence analysis must be able to prove that it's uniform

22:17 <gfxstrand> Yeah, I know that

22:18 <gfxstrand> But do you have different forms for SGPR vs. VGPR. Like VGPR can only do addr+imm but SGPR can do addr+offset or something like that.

22:18 <gfxstrand> Or maybe just VGPR+SGPR

22:18 <gfxstrand> I'm wondering how complicated the rules are and if there's any hope of getting AMD to use common code.

22:18 <Venemo> most (not all) loads have all three kinds of offsets

22:19 <gfxstrand> all supporting 64-bit?

22:21 pcercuei has quit [Quit: dodo]

22:21 <pendingchaos> https://pastebin.com/raw/w220DB73

22:22 <Venemo> shared only has vgpr and imm, buffer loads have base sgpr pair, vgpr index, vgpr offset, sgpr offset and imm, scalar loads have a base sgpr pair, sgpr offset and imm, global is a bit weird I think it only has an sgpr pair base and a vgpr offset and imm

22:22 <Venemo> thanks pendingchaos that is much more accurate for global loads than what I said

22:22 Haaninjo has quit [Quit: Ex-Chat]

22:23 <gfxstrand> Ok, that's enough for me to know that a common NIR thing isn't going to model everything you want for AMD>

22:24 <gfxstrand> So the question is if we can do something more useful than what we have today but that's also useful for other people and not any worse for anybody.

22:33 <Venemo> yea

22:33 <gfxstrand> But, IDK, everyone doing their own thing may not be totally horrile. :shrug:

22:35 pallavim_ has joined #dri-devel

22:35 pallavim_ has quit [Read error: Connection reset by peer]

22:36 pallavim has quit [Read error: Connection reset by peer]

22:40 <Venemo> gfxstrand: in that case though, there is no need for load_global_offset either. nobody is going to want to downgrade to this when we already have a backend specific one that more accurately represents the HW

22:43 <gfxstrand> Yeah, if everyone's doing their own thing, no bases, just addresses.

22:54 <Venemo> so, shall we close the MR?

22:54 <gfxstrand> Let me think some more before we do that.

22:54 <Venemo> sure thing

23:00 Zopolis4 has quit []

23:08 mbrost has joined #dri-devel

23:31 <Venemo> cmarcelo, mslusarz: when is it going to be time to remove NV_mesh_shader from ANV?

23:32 <Venemo> gfxstrand: are you going to want to have NV_mesh_shader in NVK? :O

23:32 mbrost has quit [Remote host closed the connection]

23:32 mbrost has joined #dri-devel

23:35 <gfxstrand> Venemo: No

23:35 <Venemo> hooray :)

23:35 <Venemo> that means we can remove that monstrosity from mesa without looking back

23:36 <cmarcelo> Venemo: I _think_ should be fine by us, but mslusarz is more on top of mesh things, so let's wait him to chime in :)

23:37 <Venemo> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22139 <- radv

23:43 oneforall2 has quit [Read error: Connection reset by peer]

23:44 oneforall2 has joined #dri-devel

23:56 <Venemo> gfxstrand: I am looking to change the divergence analysis a bit, so it can distinguish between workgroup-uniform and wave-uniform. so I'd like to change nir_src_is_divergent and nir_dest_is_divergent to also take a scope, what do you think about that?

23:59 <cmarcelo> is freedreno CI having trouble with S3 storage? e.g. https://gitlab.freedesktop.org/mesa/mesa/-/jobs/38857903