#dri-devel on 2023-09-01 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:26 <gfxstrand> I really hate that this stupid query crash cost me like 4 hours of CTS runtime today. *grumble*

00:27 psykose has quit [Ping timeout: 480 seconds]

00:28 <karolherbst> *pat *pat*

00:30 <gfxstrand> *prrrr*

00:30 <epony> popcorn teatiem

00:30 psykose has joined #dri-devel

00:39 oneforall2 has joined #dri-devel

00:43 guru__ has joined #dri-devel

00:43 oneforall2 has quit [Read error: Connection reset by peer]

00:44 jewins has quit [Ping timeout: 480 seconds]

00:51 oneforall2 has joined #dri-devel

00:53 columbarius has joined #dri-devel

00:54 co1umbarius has quit [Ping timeout: 480 seconds]

00:57 guru__ has quit [Ping timeout: 480 seconds]

00:59 youmukon1 has joined #dri-devel

00:59 youmukon1 has quit []

01:00 youmukon1 has joined #dri-devel

01:01 youmukon1 has quit []

01:01 youmukon1 has joined #dri-devel

01:01 youmukonpaku1337 is now known as Guest1397

01:01 youmukon1 is now known as youmukonpaku1337

01:07 Guest1397 has quit [Ping timeout: 480 seconds]

01:08 Kayden has joined #dri-devel

01:08 yyds has joined #dri-devel

01:14 oneforall2 has quit [Remote host closed the connection]

01:14 oneforall2 has joined #dri-devel

01:17 psykose has quit [Ping timeout: 480 seconds]

01:18 sukrutb_ has quit [Ping timeout: 480 seconds]

01:19 epony has quit [Ping timeout: 480 seconds]

01:22 psykose has joined #dri-devel

01:25 psykose has quit [Remote host closed the connection]

01:26 psykose has joined #dri-devel

01:31 antoniospg has quit []

01:38 idr has quit [Quit: Leaving]

01:40 lemonzest has quit [Quit: WeeChat 4.0.4]

01:46 lemonzest has joined #dri-devel

02:11 guru__ has joined #dri-devel

02:13 <i509vcb> In the Vulkan spec there is a mention of VK_ERROR_UNKNOWN possibly being returned by any command. But outside of that does the spec saying vkWhatever returns some specified error codes mean other error codes are technically valid to return?

02:14 <i509vcb> Question is related to me noticing that vkCreateWaylandSurfaceKHR states VK_ERROR_OUT_OF_HOST_MEMORY and VK_ERROR_OUT_OF_DEVICE_MEMORY are failure return codes, but Mesa can return VK_ERROR_SURFACE_LOST_KHR if you happen to hit some code paths

02:14 tristan has joined #dri-devel

02:15 tristan is now known as Guest1402

02:15 <i509vcb> s/valid/invalid

02:16 psykose has quit [Ping timeout: 480 seconds]

02:17 oneforall2 has quit [Ping timeout: 480 seconds]

02:20 psykose has joined #dri-devel

02:20 guru__ has quit [Read error: Connection reset by peer]

02:20 <zmike> sounds like bug

02:21 <i509vcb> It kind of makes sense that you could instantly lose a surface because the wl_display you gave to vulkan had a protocol error, but the spec seems ignore the existance of SURFACE_LOST in that case

02:24 ayaka_ has joined #dri-devel

02:28 oneforall2 has joined #dri-devel

02:32 guru__ has joined #dri-devel

02:35 crabbedhaloablut has joined #dri-devel

02:38 oneforall2 has quit [Ping timeout: 480 seconds]

02:59 kzd_ has joined #dri-devel

02:59 kzd_ has quit []

02:59 orbea has quit [Remote host closed the connection]

02:59 orbea has joined #dri-devel

03:01 kzd has quit [Ping timeout: 480 seconds]

03:03 <DemiMarie> Someone ought to fuzz Mesa and make sure it returns VK_ERROR_OUT_OF_HOST_MEMORY where needed. /hj

03:05 jewins has joined #dri-devel

03:07 aravind has joined #dri-devel

03:14 Guest1402 has quit [Ping timeout: 480 seconds]

03:25 sukrutb_ has joined #dri-devel

03:41 JohnnyonFlame has joined #dri-devel

03:51 <Lynne> it's not like linux will ever return null for malloc, sadly, but on windows it's possible

04:03 <ishitatsuyuki> there are some CTS test that uses a custom allocator to simulate this. it's the C programmer's greatest foe ;P

04:06 tristan has joined #dri-devel

04:06 tristan is now known as Guest1408

04:09 ayaka_ has quit [Remote host closed the connection]

04:09 ayaka_ has joined #dri-devel

04:13 Leopold_ has quit []

04:14 bmodem has joined #dri-devel

04:15 <HdkR> Linux not returning null for malloc? That's easy, just run out of virtual address space

04:17 <airlied> just be a 32-bit game :-P

04:18 Leopold_ has joined #dri-devel

04:19 <HdkR> Yea, a 32-bit game is easy mode :D

04:22 <kode54> that reminds me

04:22 <kode54> I can't get Yuzu to run on Vulkan on ANV right now

04:22 <kode54> it's dying and throwing a terminating exception because some Vulkan call returns VK_ERROR_UNKNOWN

04:23 <kode54> naturally, the console output doesn't say where this is being thrown from

04:33 <airlied> alyssa: do you do function calls? :-) 24687 has some spirv/nir bits

04:34 jewins has quit [Ping timeout: 480 seconds]

04:36 ayaka_ has quit [Ping timeout: 480 seconds]

04:46 bmodem has quit [Ping timeout: 480 seconds]

04:47 bmodem has joined #dri-devel

04:47 bmodem has quit [Excess Flood]

04:48 bmodem has joined #dri-devel

04:48 Guest1408 has quit [Ping timeout: 480 seconds]

05:02 rauji___ has joined #dri-devel

05:06 <Lynne> ishitatsuyuki: it's my greatest regret, writing all this nice and neat resilient code to cascade all errors, but never actually using it or seeing it run

05:07 <Lynne> linux should give oom'd programs at least half a chance of closing carefully by letting malloc return null, but far too much code has been written under the assumption it won't

05:08 tristan has joined #dri-devel

05:08 tristan is now known as Guest1411

05:23 tzimmermann has joined #dri-devel

05:23 <HdkR> Lynne: I would love some inotify system to register low memory situations. Kind of like cgroup notifications but app level

05:24 aravind has quit [Ping timeout: 480 seconds]

05:24 <kode54> dj-death: do I need to poke that issue where I posted that trace?

05:24 aravind has joined #dri-devel

05:26 <dj-death> kode54: that would be helpful

05:26 <kode54> will do

05:26 <dj-death> kode54: I thought you said it was a crash

05:26 <kode54> I meant the one where i915 was running slowly for one game

05:26 <kode54> I added the traces and generated a new log

05:27 <kode54> but then I didn't realize you went on holiday

05:27 <dj-death> normally all VK_ERROR_* should go through vk_errorf

05:27 <kode54> this is a different thing

05:27 <dj-death> there should be a trace somewhere

05:27 <kode54> VK_ERROR_UNKNOWN was Yuzu

05:27 <kode54> the log I added traces for was Borderlands: GOTY Enhanced

05:27 Guest1411 has quit [Ping timeout: 480 seconds]

05:27 <kode54> I'm apparently tracking multiple issues

05:27 <kode54> in different hings

05:27 <kode54> *things

05:28 mszyprow has joined #dri-devel

05:28 <kode54> not sure what to do about yuzu

05:28 <kode54> I'll have to look at the source code to see why it's just throwing an exception

05:30 <kode54> yuzu doesn't have a single reference to vk_errorf

05:30 camus has quit [Read error: Connection reset by peer]

05:30 camus has joined #dri-devel

05:31 <dj-death> kode54: I meant the vulkan driver

05:31 <kode54> oh

05:33 <kode54> how do I get those messages if the app isn't able to show them?

05:33 <kode54> which environment variable do I need to set to make them all just dump to the console?

05:34 <dj-death> kode54: that's the one? : https://store.steampowered.com/app/729040/Borderlands_Game_of_the_Year_Enhanced/

05:34 <kode54> yeah, that's the one

05:34 <kode54> I somehow got it for free for owning the original GOTY edition

05:36 <kode54> that game runs like crap on i915.ko, and causes a GPU crash on xe.ko

05:36 sukrutb_ has quit [Ping timeout: 480 seconds]

05:36 <dj-death> kode54: unfortunately you need to recompile mesa with this bit of code enabled I think : https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/vulkan/runtime/vk_log.c#L130

05:36 <dj-death> the message should end up on the console

05:36 <kode54> oh no, I have to build a debug build

05:36 <kode54> I suppose I should be running debug builds by default when testing out xe.ko

05:37 <dj-death> it's more than debug build I think

05:37 <dj-death> you also need to turn that line into : #if 1

05:37 <kode54> oh

05:37 <kode54> or -DDEBUG ?

05:37 <dj-death> or that yes

05:38 <kode54> yeah, my default build setup uses the mesa-tkg-git PKGBUILD and scripts

05:38 <kode54> and that does NDEBUG by default

05:38 <dj-death> maybe we should have a MESA_VK_LOG_FILENAME variable and write all the traces if enabled

05:38 <kode54> that may be a good idea

05:38 <dj-death> if it's just for yuzu you can also build your own repo

05:39 junaid has joined #dri-devel

05:39 <dj-death> and set VK_ICD_FILENAMES= to the json file of the anv driver

05:39 <dj-death> like configure the repo with meson

05:39 <dj-death> then build : ninja -C build src/intel/vulkan/libvulkan_intel.so src/intel/vulkan/intel_devenv_icd.x86_64.json

05:39 ayaka_ has joined #dri-devel

05:39 <dj-death> and export VK_ICD_FILENAMES=$PWD/build/src/intel/vulkan/intel_devenv_icd.x86_64.json

05:40 <dj-death> the vulkan loader will pick up that driver when the app creates the VkInstance

05:42 <kode54> gotcha

05:42 <dj-death> kode54: just before I go and buy Borderland GOTY, you don't reproduce the problem with Borderland 3?

05:42 <kode54> I can try

05:42 <kode54> let me install that

05:43 <dj-death> thanks

05:43 <kode54> should probably take me about 20 minutes to install that

05:43 <dj-death> because I already have that one

05:43 <kode54> I mean, I was playing Borderlands 3 at one point, but there was an annoying issue that I didn't like with it

05:43 <kode54> where the masked water textures and such would randomly flicker through the rest of the world

05:44 <kode54> this happened on both i915 and xe

05:44 <dj-death> :(

05:44 <kode54> it went away if I set the environment variable for full sync, but that destroyed my frame rate

05:45 <kode54> I need to test it again

05:45 <kode54> let me install it first

05:46 <kode54> this is getting tight too

05:46 <kode54> this will leave me with about 45GB of free space

05:46 <kode54> I need to rebalance my installed junk

05:47 <kode54> maybe I just have a junk video card

05:55 glennk has quit [Remote host closed the connection]

05:56 glennk has joined #dri-devel

06:00 epony has joined #dri-devel

06:03 sima has joined #dri-devel

06:04 epony has quit []

06:15 <kode54> okay

06:15 <kode54> I enabled that block of debug code

06:15 <kode54> what do I need to pass in env to get all messages now?

06:15 epony has joined #dri-devel

06:19 fab has quit [Quit: fab]

06:19 <dj-death> kode54: you don't, we that activated it should print out on the console

06:19 <kode54> it didn't print anything that wasn't printed before

06:20 <dj-death> hmm okay strange

06:20 <dj-death> and now you have a debug build?

06:21 sukrutb_ has joined #dri-devel

06:21 <kode54> I'll try that next

06:21 <dj-death> maybe also comment that early return : https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/vulkan/runtime/vk_log.c#L114

06:26 <kode54> https://www.irccloud.com/pastebin/GcycHPmD/

06:27 <kode54> looks like yuzu did a booboo

06:29 <dj-death> ah yeah

06:30 <dj-death> validation layers might have caught that

06:47 fab has joined #dri-devel

06:50 <tzimmermann> javierm, hi. may i ask you for some reviews?

06:52 <epony> yes

06:52 <epony> I review you now from this.

06:52 <epony> ah, it's for smb else

06:52 frankbinns has quit [Remote host closed the connection]

06:52 <epony> ok

06:53 mvlad has joined #dri-devel

06:55 yyds has quit [Remote host closed the connection]

06:56 yyds has joined #dri-devel

06:59 yyds has quit [Remote host closed the connection]

06:59 <kode54> I started BL3 about 20 minutes ago

06:59 <kode54> it's still preparing vulkan shaders

07:00 sghuge has quit [Remote host closed the connection]

07:00 <epony> ok

07:00 sghuge has joined #dri-devel

07:00 <epony> how is vulcan now?

07:01 yyds has joined #dri-devel

07:01 <epony> amb assadore serveall

07:02 junaid has quit [Remote host closed the connection]

07:03 bmodem has quit [Ping timeout: 480 seconds]

07:06 <kode54> 62% done processing

07:14 <kode54> now it's 64% done

07:15 <kode54> what the hell kind of shaders does this game use

07:15 <epony> shady

07:17 <HdkR> kode54: If you still have the DEBUG build, all the validation really slows down shader compilation

07:17 Company has joined #dri-devel

07:17 <kode54> it's a release build with that debug message output enabled

07:17 <kode54> it normally takes this long every time I first run BL3

07:18 <kode54> the game, after all, does download 2GB of shaders and transcoded videos

07:18 <HdkR> UE4 title, probably has a million shader variants as well :D

07:18 <epony> can it not YT?

07:18 <epony> why trans coded

07:23 tristan has joined #dri-devel

07:24 tristan is now known as Guest1417

07:24 <kode54> probably doesn't help that I've got a R7 2700

07:24 <kode54> I've already been told that bottlenecks my GPU

07:26 <HdkR> Fossilize shader compilation also will thread out to the number of CPU cores you have. So pretty good scaling

07:26 <kode54> commandline says --num-threads 15

07:27 <HdkR> Indeed

07:28 <HdkR> Maximum threads subtract one is easy math :D

07:28 <kode54> it just bumped up to 65%, then rolled back to 64%

07:30 yuq825 has joined #dri-devel

07:32 <javierm> tzimmermann: sure

07:33 <tzimmermann> javierm, thnaks, i have more fbdev cleanups in https://patchwork.freedesktop.org/series/122976/ and https://patchwork.freedesktop.org/series/123017/

07:34 pcercuei has joined #dri-devel

07:34 <javierm> tzimmermann: you are welcome, I'll try to do it later today

07:40 neniagh has quit []

07:42 neniagh has joined #dri-devel

07:45 frankbinns has joined #dri-devel

07:45 <kode54> okay

07:45 <kode54> I hit the skip button

07:45 <kode54> now it's gone green "running"

07:45 <kode54> and no window has appeared yet

07:48 <kode54> fine, I'll reboot to stable kernel and switch to stable mesa and see how long this takes to boot up

07:50 <epony> transcode with GPU

07:50 <epony> it has a lot of stream processors

07:51 <epony> in CPU is a drama

07:51 <kode54> did somebody say something

07:52 <epony> are you transcoding in CPU?

07:52 <epony> how much memory you have?

07:53 <kode54> starting from scratch on 23.1.6

07:53 djbw has quit [Read error: Connection reset by peer]

07:53 <epony> how much of it moves in 1 second through CPU and how much does it bulge over the data set (memory expand vs data input)

07:54 YuGiOhJCJ has joined #dri-devel

07:54 <epony> is it really running the threads on all cores.. with good saturation?

07:55 lynxeye has joined #dri-devel

07:55 <epony> can you offload it to the GPU

07:55 <epony> check your cache eviction rates

08:01 Ahuj has joined #dri-devel

08:04 bmodem has joined #dri-devel

08:13 <kode54> fossilize just finished

08:14 <kode54> now it's doing the claptrap walk animation that used to have a progress display, but no longer does

08:14 <kode54> done

08:18 aravind has quit [Ping timeout: 480 seconds]

08:19 apinheiro has joined #dri-devel

08:20 <kode54> yeah, none of the pipeline recreation lag that BL:GOTY Enhanced has

08:20 <kode54> but it still has the flickering water

08:20 <kode54> I'll record a video

08:28 kts has joined #dri-devel

08:31 <kode54> https://f.losno.co/v/bl3-benchmark.mp4

08:31 <kode54> I manage better frame rates under Windows, by quite a bit, on the same settings (Ultra)

08:33 kts has quit []

08:34 <kode54> oh, from the other game

08:34 <kode54> https://gitlab.freedesktop.org/mesa/mesa/uploads/41d19a60a85ab244c18843a2864066da/trace.perfetto-trace.4.zst

08:34 <kode54> was that useful?

08:35 sgruszka has joined #dri-devel

08:37 JohnnyonFlame has quit [Read error: Connection reset by peer]

08:37 <kj> gfxstrand: to double check, it's not acceptable to compile down to SPIR-V or NIR (and serialize) at build time for internal shaders?

08:38 <kj> So the options would be to check in glsl and glsl_to_nir(), or build the shader in nir at runtime

08:40 rasterman has joined #dri-devel

08:41 vliaskov has joined #dri-devel

08:41 <kj> Asking because for pvr we still need to unhardcode some internal shaders so would be nice to write them up with something more higher level than rogue ir (which we've done atm)

08:43 <kj> I recall a conversation here about setting up a common way of doing things for internal shaders but not sure what happened with that

08:47 epony has quit [autokilled: This host violated network policy. Mail support@oftc.net if you think this is in error. (2023-09-01 08:47:19)]

08:50 <dj-death> kode54: thanks yeah

08:50 <dj-death> kode54: really looks like a window system issue

08:50 <kode54> which window system should I use?

08:51 <dj-death> kode54: looks like the game is like waiting to get a new buffer for an image for like 160ms

08:51 <kode54> is there one especially suited to this task?

08:51 <dj-death> kode54: I think most people use gnome-shell

08:51 <kode54> Xorg or Wayland?

08:51 <kode54> I have nasty window scaling glitches on both gnome and plasma

08:52 <kode54> resizing the scaling of one output to 200% causes the window shadows to glitch out

08:52 <dj-death> both should work

08:52 <kode54> I'll try gnome again

08:54 <dj-death> kode54: I can see on the graph that the GPU is completely idle at times : https://i.imgur.com/TxaAtBy.png

08:55 <dj-death> kode54: and everytime that seems to be because it ran out of swapchain buffer

08:55 <dj-death> and it's wait for one to come back from the compositor

08:55 <kode54> weird

08:55 <dj-death> but in the middle of that you have 10+ frame that went fine

08:55 <dj-death> each around 16ms

08:57 <dj-death> not ruling out some driver issue but it's really strange...

08:57 <dj-death> that doesn't look like a GPU programming issue

08:57 <dj-death> more like a WSI problem

08:59 Guest1417 has quit [Ping timeout: 480 seconds]

09:02 aravind has joined #dri-devel

09:02 cmichael has joined #dri-devel

09:04 swalker__ has joined #dri-devel

09:06 swalker_ has joined #dri-devel

09:06 swalker_ is now known as Guest1428

09:10 alex3305 has joined #dri-devel

09:12 aravind has quit []

09:12 swalker__ has quit [Ping timeout: 480 seconds]

09:15 <kode54> okay

09:15 <kode54> it happens under Gnome too

09:16 <kode54> could this also be that annoying as hell TPM bug

09:17 dtmrzgl has quit []

09:19 donaldrobson has joined #dri-devel

09:27 randy_ has joined #dri-devel

09:33 ayaka_ has quit [Ping timeout: 480 seconds]

09:36 randy_ has quit [Ping timeout: 480 seconds]

09:40 <dj-death> TPM?

09:48 dtmrzgl has joined #dri-devel

09:50 <kode54> nope, it wasn't fTPM

09:50 <kode54> how did you find that it was waiting on frames from the compositor?

09:51 ayaka_ has joined #dri-devel

09:58 alex3305 has quit [Remote host closed the connection]

10:03 kts has joined #dri-devel

10:06 apinheiro has quit [Quit: Leaving]

10:07 mceier has quit [Quit: leaving]

10:09 mceier has joined #dri-devel

10:13 <dj-death> kode54: if you look at the timeline

10:13 <kode54> I don't know what I'm looking for

10:13 <kode54> and I have no idea how to zoom in or find more detail from what was logged

10:13 <dj-death> kode54: in the row that has the name of the app

10:14 <kode54> ok

10:14 <dj-death> you like on the row, it'll expand

10:14 <dj-death> then you see sub-rows that are the threads in the app

10:14 <kode54> what app are you using to view this trace?

10:14 <kode54> I'm using a web site

10:14 <dj-death> you can zoom in/out with Ctrl+scroll-up/down

10:14 <dj-death> yeah me too

10:14 <dj-death> https://ui.perfetto.dev/

10:15 <dj-death> and load the trace file

10:15 <kode54> yes, and I see rows named after the app

10:15 <dj-death> you should list of bunch of row in the app that are thread of the WSI

10:15 <kode54> with frames, and a bunch of different items that are gapped where the delays were

10:15 <dj-death> "WSI swapchain q ..."

10:16 <dj-death> the "pull present queue" are when the thread is waiting for a new swapchain buffer to be available

10:16 <kode54> oh

10:16 <kode54> I was looking at the wrong thing

10:16 <dj-death> that matches one image of the app

10:16 <kode54> I was looking at Borderlands-#-something rows

10:16 <kode54> no idea what those are logging

10:17 <dj-death> mind is called "Z:\home\chris\..."

10:17 <dj-death> so those "pull present queue" items is a thread being blocked on waiting for a free buffer

10:17 <dj-death> but you can see that they wait for 95ms, 80ms, etc...

10:18 <dj-death> one is really bad at 160ms

10:18 <dj-death> that way too much

10:18 <kode54> I see that

10:18 <kode54> and it's happening under Gnome too

10:18 <dj-death> it should be almost immediate

10:18 <kode54> could it be my kernel? I'm using a non-distro kernel

10:20 <dj-death> I don't know to be honest

10:20 <dj-death> never seen something like that

10:21 <dj-death> what's really strange is the recreation of swapchains constantly

10:22 <dj-death> that's not driven by the driver but by the app/dxgi

10:23 ahajda has joined #dri-devel

10:23 <kode54> dxvk bug tracker told me that they recreate the swap chain if it times out

10:23 ahajda has quit [Read error: Connection reset by peer]

10:24 ahajda has joined #dri-devel

10:28 <dj-death> kode54: yeah but what's odd is that for each swapchain, they appear to only do a single AcquireFrame

10:29 <dj-death> I see the app is polling a query as well

10:29 <dj-death> maybe there is some issue there

10:30 <dj-death> well yeah there might be a kernel issue after all

10:30 <dj-death> vkQueueSubmit is blocked for 90+ms

10:31 <dj-death> that's blocking the WSI in as well I bet

10:46 mceier has quit [Quit: leaving]

10:48 <kode54> the thing is

10:48 <kode54> I can't even test if this is an xe/i915 thing

10:48 mceier has joined #dri-devel

10:48 <kode54> it won't even run on xe.ko

10:49 penguin42 has joined #dri-devel

11:07 <dj-death> kode54: trace.perfetto-trace.4.zst was recorded on Xe ?

11:07 <kode54> no, i915

11:07 <kode54> I can't even get it to run without crashing the GuC on Xe

11:08 <kode54> it just starts up to a black screen, then gets a crash notice about lost DX11 device

11:09 <kode54> and the kernel dumps a useless GPU core text file to a /sys file

11:09 <kode54> since there's no usable GuC info in it

11:10 <kode54> it was even suggested that the GuC info that is there is from after it's already been restarted, so doubly useless

11:11 <dj-death> alright

11:12 <kode54> ah, it wasn't a DX11 device lost

11:12 <kode54> it was General Protection Fault

11:13 <kode54> [ 1908.101030] xe 0000:28:00.0: [drm] Engine reset: guc_id=133

11:13 <kode54> [ 1908.108626] xe 0000:28:00.0: [drm] Timedout job: seqno=4294967188, guc_id=133, flags=0x8

11:13 <kode54> yup, it crashed

11:13 <kode54> would love to have usable dumping so we can get to the bottom of these GuC crashes

11:13 <dj-death> if only we could have the type of dma-fence the kernel is waiting on

11:13 cmichael has quit [Quit: Leaving]

11:14 <kode54> oh, the other one? that would be great

11:25 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

11:28 ayaka_ has quit [Ping timeout: 480 seconds]

11:34 yyds has quit [Remote host closed the connection]

11:39 kts has quit [Quit: Konversation terminated!]

11:56 swizzlefowl has joined #dri-devel

11:56 <kode54> maybe DSB?

12:01 <swizzlefowl> Hello

12:08 youmukon1 has joined #dri-devel

12:11 <zamundaaa[m]> MrCooper: I'm doing a bunch of performance work for KWin and found something pretty unexpected

12:11 youmukonpaku1337 has quit [Read error: Connection reset by peer]

12:12 <zamundaaa[m]> When I'm rendering to a gbm buffer (imported as an EGLImage, which is used as the color attachment for an fbo) and call glFinish(), the fds of the buffer aren't always immediately readable afterwards

12:13 <zamundaaa[m]> This seems to happen quite seldomly on AMD, and more often on Intel. Are my expectations for how this works just wrong, or could there be some driver bugs involved?

12:24 donaldrobson has quit [Ping timeout: 480 seconds]

12:36 camus has quit [Ping timeout: 480 seconds]

12:46 yuq825 has left #dri-devel [#dri-devel]

12:48 youmukonpaku1337 has joined #dri-devel

12:48 youmukon1 has quit []

13:13 <DemiMarie> kode54: Fuzz the GuC and report the bugs to Intel?

13:22 <DemiMarie> Sorry

13:23 bmodem has quit [Ping timeout: 480 seconds]

13:23 <DemiMarie> Asahi Lina has some experience debugging firmware crashes.

13:26 bmodem has joined #dri-devel

13:29 jewins has joined #dri-devel

13:33 <alyssa> kj: compiling glsl/cl to spir-v at build-time is fine.

13:33 <alyssa> compiling to nir & serializing is somewhat more sketchy. but intel goes all the way to hw binaries at build time so YMMV i guess

13:34 bmodem has quit [Ping timeout: 480 seconds]

13:34 <alyssa> i'll probably send out common code for doing CL kernels at build time in a reasonably generic way. I have a vague plan to let CL C be usable for certain vertex/fragment shaders too, if that's something that's needed.

13:35 <alyssa> Not sure what kinds of shaders you're talking about. For small stuff usually nir_builder is the right call, it's just the chunky monkey kernels that really benefit.

13:36 <DavidHeidelberg[m]> eric_engestrom: the build job limit sounds like good idea, MR Ack :)

13:41 fab has quit [Quit: fab]

13:49 <kj> alyssa: thanks, we have compute shaders for queries which are fairly simple, but there's also a whole bunch of shaders used for transfer stuff which might be a bit more involved (haven't looked in depth there)

13:50 kzd has joined #dri-devel

13:51 <penguin42> If I've got a 'Compute Shader LLVM IR' dumped from rusticl debug, is there anyway I can push it through llvm to see what it does?

13:51 <alyssa> kj: yeah.. I've done some pretty intense nir_builder but yeah working with C is a lot nicer :p

13:52 <alyssa> see agx_nir_lower_texture.c if you want to be scared :p

13:52 <alyssa> (not really a candidate for CL at this point)

13:52 OftenTimeConsuming has quit [Remote host closed the connection]

13:53 OftenTimeConsuming has joined #dri-devel

13:53 <karolherbst> penguin42: llvm has tooling for their stuf,f like llvm-dis and whatnot

13:54 <penguin42> karolherbst: Yeh so I've seen those, but I don't know how to go from the debug output of mesa to feeding that into llvm so I can play around with llvm to see why it's doing what it's doing

14:07 <karolherbst> yeah... sadly I don't really know much about how to dig into those things deeper from an AMD perspective, might wantt o ask around in #radeon as there are some LLVM developers who might be able to help out with it

14:07 <penguin42> karolherbst: Ack, do you know how to throw that debug into llvm ?

14:08 <karolherbst> no idea how to trigger the normal pipeline, however there is AMD_DEBUG=asm or something to print the actual hardware level IR

14:08 <karolherbst> or something

14:09 <penguin42> karolherbst: Yeh so I have the IR and I have the asm, I wanted to play around with what was inbetween; mostly this is trying to understand where that weird load ordering thing came from

14:09 <karolherbst> ahh

14:09 <karolherbst> I guess for that you'll have to compile LLVM and see what passes it runs

14:09 <karolherbst> maybe there is an LLVM option to print what it runs

14:10 <karolherbst> dunno

14:10 <penguin42> yeh I guess there is once I can figure out how to run it :-)

14:10 <karolherbst> just use your local LLVM build instead of your system one

14:11 <penguin42> karolherbst: But what options do I pass to llvm to take that IR and spit out that asm?

14:11 <penguin42> karolherbst: I don't see any of that at the moment because I only have the rusticl debug

14:11 fab has joined #dri-devel

14:12 <karolherbst> there is no simple solution here

14:12 <penguin42> ok

14:12 <karolherbst> your best bet is to just use the current mesa code

14:12 <karolherbst> and whatever radeonsi is doing

14:12 <karolherbst> I don't know if you can do that stuff on the cli even

14:13 <penguin42> ok - that's what I was really after, because I was assuming doing it from the CLI would be the best way to add llvm debug/tracing/etc

14:14 <karolherbst> there might be a way, I just don't know it

14:14 <karolherbst> but you can also just write a small tool doing the same thing

14:14 <karolherbst> e.g. copying the pipeline radeonsi is using

14:15 <penguin42> karolherbst: I can see optimising shader code can drive people nuts; I was finding one change to my shader made rusticl faster and ROCm slower or the other way around

14:16 <karolherbst> yeah...

14:16 <karolherbst> optimizing compilers be like that

14:16 <karolherbst> at some point it's hard to find those changes which are always a benefit

14:16 <karolherbst> so, some stuff gets slower, some stuff gets faster

14:17 <penguin42> karolherbst: I'm suspecting some of this might be 'bank clashes' - but wth knows; AMDs pretty profiling tools look like they need their kernel drivers

14:18 <karolherbst> yeah.. could be

14:18 <karolherbst> though, did you try umr?

14:18 <penguin42> what's umr?

14:18 <karolherbst> https://gitlab.freedesktop.org/tomstdenis/umr

14:18 <karolherbst> though not sure it would help here

14:19 <penguin42> karolherbst: Ooh, one to try later

14:26 guru__ has quit [Quit: Leaving]

14:26 oneforall2 has joined #dri-devel

14:37 yyds has joined #dri-devel

14:38 Ahuj has quit [Ping timeout: 480 seconds]

14:42 <eric_engestrom> DavidHeidelberg[m]: thanks! mind saying that on the MR? :P

14:46 <eric_engestrom> pendingchaos: (sorry for the delay, been doing too many things lately and I forgot to read my mentions here) I'll continue making new 23.1.x releases until 23.2.0 is out, no matter how long it takes

14:57 tzimmermann has quit [Quit: Leaving]

15:12 Duke`` has joined #dri-devel

15:12 mripard has quit [Quit: mripard]

15:30 Jeremy_Rand_Talos_ has quit [Read error: Connection reset by peer]

15:30 Jeremy_Rand_Talos_ has joined #dri-devel

15:33 Mis012[m]1 is now known as Mis012[m]

15:35 Jeremy_Rand_Talos_ has quit [Remote host closed the connection]

15:35 Jeremy_Rand_Talos_ has joined #dri-devel

15:36 Jeremy_Rand_Talos_ has quit [Remote host closed the connection]

15:36 Jeremy_Rand_Talos_ has joined #dri-devel

15:48 sgruszka has quit [Ping timeout: 480 seconds]

15:53 yyds has quit [Remote host closed the connection]

15:56 dviola has quit [Quit: WeeChat 4.0.4]

16:01 ahajda has quit [Quit: Going offline, see ya! (www.adiirc.com)]

16:17 Guest1428 has quit [Remote host closed the connection]

16:20 lemonzest has quit [Quit: WeeChat 4.0.4]

16:21 dviola has joined #dri-devel

16:23 <Lynne> cargo is nice and all until a fresh sync takes 500 megabytes and you're on a bad connection, and a crate decides it absolutely must use nightly, as all crates do

16:23 <Venemo> is gitlab down again?

16:24 <Lynne> just a throwaway comment

16:26 ahajda has joined #dri-devel

16:27 lemonzest has joined #dri-devel

16:35 rasterman has quit [Quit: Gettin' stinky!]

16:40 jimc has quit [Read error: Connection reset by peer]

16:47 mszyprow has quit [Ping timeout: 480 seconds]

16:52 <alyssa> something something cargo cult

16:52 <gfxstrand> :P

16:53 <karolherbst> 🦀

16:56 lynxeye has quit [Quit: Leaving.]

17:01 <anarsoul> Lynne: just don't do sync when you're on a bad connection

17:04 <Lynne> having a bad connection is hardly a choice

17:05 <tnt> I've got an application causing : "[drm] GPU HANG: ecode 12:1:85dcfdfb, in ngscopeclient" (intel 12th gen, vulkan app).

17:06 <tnt> How would one go about tracing what's going on ?

17:15 frankbinns has quit [Remote host closed the connection]

17:19 <cmarcelo> does anyone foresee glsl_function_type() being useful again (only user was spirv, but stopped in favor of own implementation, right now is dead code)? deciding here if I can just remove or make changes to improve with others as part of an ongoing MR.

17:23 tobiasjakobi has joined #dri-devel

17:23 tobiasjakobi has quit []

17:31 mszyprow has joined #dri-devel

17:33 <dcbaker> mattst88: did you meant to request a review from marge on the intel_clc series, or did you mean to assign it to marge?

17:33 <mattst88> dcbaker: lol, derp

17:34 <mattst88> thanks for noticing that

17:35 kts has joined #dri-devel

17:35 <dcbaker> that looks good to me, btw. I'd still like to get it to where we don't need to do that, but there are so many assumptions in meson's implementation that things are for the host and only the host it's turning into a slug fest with some seriously annoying problems

17:36 <alyssa> dcbaker: I'm experimenting with a generic mesa_clc

17:36 <dcbaker> so, same problem then?

17:36 <alyssa> current tentative plan is that it goes CLC->SPIR-V but does not touch any NIR

17:37 <alyssa> which should be a lot fewer deps but yes, same problem until clang can do that itself

17:38 <dcbaker> the good news is Mesa isn't the only project with this problem, it turns out that other big complex projects run into the same issue when cross compiling

17:38 <dcbaker> at least, good in that everyone agrees it needs to happen, lol

17:56 <DemiMarie> Sorry for all the unanswerable questions I asked earlier!

18:04 djbw has joined #dri-devel

18:11 <alyssa> nod

18:12 <alyssa> i expect by end-of-year asahi will have a hard build-dep on clc

18:13 <alyssa> We have a significant need from it and we have buy in from Fedora and Intel's already doing it for raytracing so sure yeah why not

18:13 <alyssa> and asahi only needs to build on arm and x86, so no LLVM problem

18:16 <dcbaker> sigh. LLVM.

18:16 junaid has joined #dri-devel

18:26 <dcbaker> karolherbst: I'm going to start reviewing the next round of the crates in meson work next week, I know gfxstrand has been trying it out. Are there any crates you need/want?

18:29 Nyamiou has joined #dri-devel

18:30 gouchi has joined #dri-devel

18:30 Nyamiou has quit []

18:31 <alyssa> dcbaker: Debian has some build rules to disable LLVM on exotic architectures (':

18:31 <alyssa> So no CLC deps in common code without angering somebody.

18:31 <alyssa> Although... if they're cross building it shouldn't matter?

18:32 <karolherbst> dcbaker: uhm.. mostly just syn and serde

18:32 <alyssa> Like you should be able to run intel_clc/mesa_clc on the host with the host LLVM and then do an LLVM-free target mesa build using the precompiled kernels

18:32 <alyssa> you don't get Rusticl support but the BVH kernels etc should work fine in that set up

18:32 <alyssa> we dont support that on the mesa side but we.. probably could?

18:33 <dcbaker> karolherbst: cool, I'm pretty sure syn already works

18:33 <dcbaker> I'll make sure we test out serde

18:33 <karolherbst> cool

18:33 <dcbaker> alyssa: yeah.... I'm just not sure how you'd go about supporting OpenCL without llvm/clang at this point

18:34 <karolherbst> there are probably random others I might want to use in the future, but those would be a good start to drop some code

18:35 <alyssa> dcbaker: ~~gcc-spirv when~~ delet

18:36 <dcbaker> alyssa: I mean, if someone else wants to write the code and it works...

18:36 <alyssa> dcbaker: :p

18:37 <alyssa> I don't love runtime LLVM deps, honestly I build -Dllvm=disabled myself up until now

18:37 <alyssa> but I feel entitled to a buildtime LLVM dep o:)

18:37 <alyssa> (I already build mesa with clang, this is just more of that :D)

18:38 <dcbaker> I don't care that much about buildtime deps that much, but I usually build with gcc and getting the right version of LLVM can be a real pain sometimes

18:39 <alyssa> I do know an llvm spirv target was talked about, I wonder if mesa_clc will be obsoleted in due time..

18:40 <alyssa> except for libclc, src/compiler/clc/ isn't doing much that clang couldn't do itself..

18:40 <alyssa> :q

18:41 <karolherbst> yeah.. maybe.. if it's not causing any regressions in the CTS that is

18:41 <alyssa> yeah

18:43 alanc has quit [Remote host closed the connection]

18:43 alanc has joined #dri-devel

18:49 glennk has quit [Remote host closed the connection]

18:49 glennk has joined #dri-devel

18:50 glennk has quit [Remote host closed the connection]

18:54 a-865 has quit [Ping timeout: 480 seconds]

18:58 glennk has joined #dri-devel

19:00 <airlied> zmike: you should have made that blog a monetized twitter post, controversy gets clicks!

19:05 a-865 has joined #dri-devel

19:06 <zmike> airlied: there's no controversy there, just people who agree with me and people who are wrong

19:09 mszyprow has quit [Ping timeout: 480 seconds]

19:10 <airlied> that's the attitude to get more elon bucks :-P

19:15 sravn has quit [Quit: WeeChat 3.5]

19:17 sravn has joined #dri-devel

19:19 <alyssa> I wonder what it'll take to get this loop unrolled: https://rosenzweig.io/hmm

19:19 <alyssa> It's from a... familiar piece of source code.

19:21 <alyssa> hmm I wonder how clang unrolls this

19:21 <alyssa> ...does clang not unroll this? T.T

19:21 <pendingchaos> that looks like an overly complicated IF statement

19:21 <alyssa> pendingchaos: correct

19:23 <alyssa> it's from doing something i'm really not supposed to =D

19:26 <alyssa> clang seems to eliminate the backwards branch when compiled for my cpu

19:43 <alyssa> an extra opt_algebraic rule removes a big chunk of loop header, but still not enough for unrolling..

19:44 <alyssa> oh... opencl is nopping out __builtin_expect I guess..

19:44 <alyssa> is it?

19:45 <karolherbst> yeah.. we ignore it for now

19:45 <alyssa> karolherbst: at what point is it being ignored?

19:45 <karolherbst> inside vtn

19:45 <alyssa> alright

19:45 <alyssa> I wonder if I should plumb that through. Or perhaps more likely, see if I can get clc to do a bit more LLVM opts

19:46 <alyssa> oh right we -O0 right.

19:46 <alyssa> erg

19:46 <karolherbst> yeah.. builtin_expect isn't _that_ terrible to pipe though, it's just that it's one of those 80/20 things

19:46 <alyssa> right.. ugh..

19:46 <karolherbst> alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23852

19:46 <alyssa> For this particular kernel, if I build with -O3, it gets unrolled in llvm just fine

19:47 <karolherbst> ahh...

19:47 <karolherbst> I think there is strictly nothing against using more opts, just some opts break the tooling

19:47 <karolherbst> and the translator gives up

19:47 <alyssa> yeah..

19:47 <alyssa> it's just annoying because with -O3 the final NIR is excellent

19:47 <karolherbst> mhh

19:48 <karolherbst> anyway, in this commit specifically I played around with enabling some llvm opts: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23852/diffs?commit_id=4b18b0770154aec4ad905bba9856db1cd47b5d60

19:48 <karolherbst> and those are safe

19:48 <alyssa> ack

19:48 <karolherbst> however we can always allow callers to enable more or something

19:49 <karolherbst> but I wouldn't enable that for everything

19:49 <karolherbst> maybe that stuff gets better once the spirv backend lands

19:49 <alyssa> Alternatively, NIR's optimizer should be good enough to handle all this? O:)

19:49 <karolherbst> yeah... well.. hopefully :)

19:50 <alyssa> needs expect plumbed thru for this guy

19:50 <karolherbst> my motivation with that MR was to cut down the size of cached blobs

19:51 <alyssa> hmm wait is expect even doing what i expect

19:51 <karolherbst> what do you expect expect is doing?

19:51 <karolherbst> but yes, it's very cursed

19:51 <karolherbst> and not trivial

19:51 <alyssa> I mixed up expect and assume

19:52 <karolherbst> right...

19:52 <karolherbst> expect is the more trivial thing, assume is cursed to implement

19:53 immibis has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

19:54 <alyssa> OOOOOKAY

19:54 immibis has joined #dri-devel

19:54 <alyssa> so it turns out I've been doing something absolutely stupid in mesa for years

19:55 <alyssa> kewl.

19:55 <karolherbst> heh

19:55 <alyssa> well, loop's gone now, with -O0

19:55 <karolherbst> :D

19:55 <karolherbst> do we want to know?

19:56 <alyssa> No

19:56 <alyssa> Even so, with -O3 there's a big pile of load/store_scratch that goes away with -O0

19:56 <karolherbst> yeah....

19:56 <alyssa> i'm guessing I'm missing some NIR copyprop pass somewhere

19:56 <alyssa> seems like we should be able to see thru that

19:57 <karolherbst> so you have scratch stuff with O3 but not with O0?

19:57 <karolherbst> anyway.. you can always copy whatever rusticl is doing to get rid of those things

19:57 <karolherbst> the entire pipeline is cursed

19:57 <karolherbst> kinda

19:57 <karolherbst> but it kinda also works

20:01 <airlied> the translator docs I think state O0 only is supported, anything else is a crap shoot

20:01 <karolherbst> yeah.. I'm also a bit hesistant of even landing some of the llvm opt work because of this

20:01 <airlied> but yeah it might be worth running some llvm passes, but also I think we can do a lot on the NIR side to step up

20:01 <karolherbst> it's literally not tested anywhere

20:01 <airlied> and close the gaps

20:01 <airlied> if we want NIR to be a real life compute compiler

20:01 <karolherbst> right.. I didn't want to get more optimize binaries, just smaller ones

20:02 <karolherbst> and I was able to reduce binaries sizes by lik 60%

20:02 <karolherbst> it's just not very stable

20:02 <airlied> I suppose in theory spirv-opt could be used as well, if it ever did anything useful :-P

20:02 <karolherbst> it speeds up caching/ reduces peak memory usage and other benefits, sadly we probably can't rely on it

20:02 <karolherbst> yeah... that gives me another 25% reduction

20:03 <karolherbst> it's all in the MR

20:03 <karolherbst> sadly.. I can't use the impressive `MergeFunctions` LLVM pass

20:03 <karolherbst> so the benefits are all kinda smallish

20:03 <karolherbst> `MergeFunctions` generates function pointers in a few places

20:08 <karolherbst> alyssa: anyway, I think the asahi CL stuff is ready to land, I've listed all the remaining problems, but nothing stands out really and I mitigated the linear image issue as much as possible. Now you simply can't map 3D images, but whatever. Maybe I should just assign to marge and... figure out timestamps after that

20:09 mvlad has quit [Remote host closed the connection]

20:15 Mangix has quit [Ping timeout: 480 seconds]

20:18 <alyssa> karolherbst: I had scratch with O0 but disappeared with O3. I think I screwed up my pass order or whatever, will look at it harder, NIR should be able to breeze thru this

20:19 <karolherbst> yeah.. it should

20:19 <karolherbst> just run all the passes 5 times or something

20:19 <alyssa> Lol

20:19 <alyssa> 20:01 airlied | if we want NIR to be a real life compute compiler

20:20 <alyssa> Yeah... A big chunk of stuff that CL wants, VK also wants and we don't have any LLVM to cheat off there. So trying to get NIR into shape seems like the better long-term approach, idk

20:20 <karolherbst> yeah

20:20 <alyssa> I'm good with asahicl being merged

20:28 fab has quit [Quit: fab]

20:39 alyssa has quit [Quit: alyssa]

20:51 sima has quit [Ping timeout: 480 seconds]

20:55 alyssa has joined #dri-devel

20:55 <alyssa> k, this is interesting:

20:55 <alyssa> 64 %5 = deref_var &__const.hello.cfg (constant struct.AGX_USC_TEXTURE)

20:55 <alyssa> 64 %7 = deref_cast (uvec4 *)%5 (constant uvec4) (ptr_stride=16, align_mul=0, align_offset=0)

20:55 <alyssa> 32x4 %8 = @load_deref (%7) (access=none)

20:56 <alyssa> nir_opt_deref doesn't remove the cast and nir_opt_constant_folding doesn't see through the cast, so that turns into a load_global (!) instead of a load_const

20:57 Duke`` has quit [Ping timeout: 480 seconds]

20:57 <karolherbst> constant isn't in shader constant memory though

20:58 <karolherbst> it's just a ubo (more or less)

20:58 <karolherbst> just with global addressing

20:58 <alyssa> still supposed to be constant folded.

20:58 <karolherbst> does the constant variable have a constant initializer?

20:59 <alyssa> yes

20:59 <alyssa> it's just the cast in the way

21:00 junaid has quit [Remote host closed the connection]

21:00 <karolherbst> so without the cast it would constant fold? Mhh.. normally we kinda drop pointless casts but there are a few restrictions in palce

21:01 <karolherbst> but yeah, if the source is constant known at compile time we should constant fold it

21:07 JohnnyonFlame has joined #dri-devel

21:07 <alyssa> also, deref_ptr_as_array

21:08 <alyssa> 64 %5 = deref_var &__const.hello.cfg (constant struct.AGX_USC_TEXTURE)

21:08 <alyssa> 64 %9 = deref_cast (uvec2 *)%5 (constant uvec2) (ptr_stride=8, align_mul=0, align_offset=0)

21:08 <alyssa> 64 %11 = deref_ptr_as_array &(*%9)[2] (constant uvec2) // &(*(uvec2 *)%5)[2]

21:08 <alyssa> 32x2 %12 = @load_deref (%11) (access=none)

21:09 Mangix has joined #dri-devel

21:09 <alyssa> admittedly dont fully understand whats happening here but it should also constant fold..

21:09 <karolherbst> it might be that most of the folding only reliably works once IO is lowered

21:10 <alyssa> But it's too late by then, since lowering I/O turns this into load_global(load_constant_base_ptr)

21:10 <karolherbst> mhhh... yeah, it shouldn't...

21:10 <alyssa> this needs to be optimized away in derefs

21:11 mszyprow has joined #dri-devel

21:11 <karolherbst> I think the problem here is, that casting a constant memory pointer to a different address space is kinda UB

21:12 <alyssa> there's no constant memory pointer in the C code?

21:12 <alyssa> *CL kernel

21:12 <alyssa> just literals that clang decided to turn into constant memory

21:12 <karolherbst> right... it's kinda weird tbh

21:13 <karolherbst> what's the CLC source?

21:13 <alyssa> agx_pack(..) { ..}

21:20 <karolherbst> not really sure what that gets generated into, but in theory it should just be a stack variable getting fields assigned, so I'm kinda confused why it's doing this kinda nonsense in nir

21:21 <karolherbst> or rather, I don't see where this cast would come from

21:22 <karolherbst> does it look better in the spir-v? though I suspect not

21:22 <karolherbst> or maybe?

21:23 <karolherbst> what does the nir straight out of spirv_to_nir look like?

21:25 <karolherbst> but anyway... casting from constant to generic is just not legal

21:26 <karolherbst> the thing is... because it's all coming from C you also can't just drop random cast, because $reasons

21:28 <karolherbst> like e.g. if you'd do (global* int)some_local_memory_ptr, you also can't just load from the local address, because it's technically a bug in the source code and UB

21:28 <karolherbst> but only in the sense of your pointer is probably pointing to invalid memory

21:29 <karolherbst> but what if you do (local* int)(global* int)...

21:30 <alyssa> there's no cast to global?

21:30 <alyssa> what's happening is that there's a constant struct

21:30 <alyssa> that would be fine, if we split the struct with split_struct_vars

21:30 <alyssa> but that bails on complex uses, because of the deref_ptr_as_array

21:31 <alyssa> which in turn nir_deref.c claims should be eliminated by nir_opt_deref but that's not happening

21:31 <karolherbst> ehhh wait.. I missinterpreted the " 64 %9 = deref_cast (uvec2 *)%5 (constant uvec2) (ptr_stride=8, align_mul=0, align_offset=0)" thing...

21:31 <alyssa> presumably my pass order is busted.

21:32 <karolherbst> if it's a pointless cast, opt_deref should kinda be able to get rid of it

21:32 <karolherbst> alyssa: btw, did you call explicit_type?

21:32 <karolherbst> some of the passes rely on explicit type information

21:33 <alyssa> yes

21:33 <alyssa> https://rosenzweig.io/hello.cl

21:33 <alyssa> Here's a simple reproducer

21:33 <alyssa> right after vtn, this looks like https://rosenzweig.io/spirv-to-nir.txt

21:34 <karolherbst> huh...

21:34 <karolherbst> that's a lot of stuff...

21:34 <alyssa> after all the lowering/opt passes, this ends up as a mess with scratch access https://rosenzweig.io/final.txt

21:35 mszyprow has quit [Ping timeout: 480 seconds]

21:35 <alyssa> not sure if rusticl fairs any better

21:37 <karolherbst> https://gist.githubusercontent.com/karolherbst/6a2981d5cf7fecc82c7011840220c664/raw/675bd58ae198c7f5ec0f34f5af6d3d55afa92f51/gistfile1.txt

21:37 <karolherbst> let me write to a struct instead

21:38 <karolherbst> indeed...

21:39 <alyssa> karolherbst: that code doesn't write to a struct?

21:39 <karolherbst> yo, that's kinda rude of nir :')

21:39 <karolherbst> yeah, I have it local now and it also uses scratch

21:39 <alyssa> Joy

21:40 <karolherbst> funky...

21:41 <alyssa> very basic issue: why is nir_split_struct_vars failing on https://rosenzweig.io/why.txt?

21:42 Mangix has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

21:42 <alyssa> presumably the cast from struct to uvec

21:43 <karolherbst> probably

21:44 <alyssa> which is being inserted by nir_lower_memcpy, I think

21:44 <karolherbst> it's kinda funky, that with the constant struct initiialize nir at some point does load_const, but for whatever reason it thinks it should pipe that through scratch memory...

21:45 <karolherbst> ohhhhh

21:45 <karolherbst> uhhhhh

21:45 <karolherbst> it's this bug

21:45 <karolherbst> I hate it

21:46 <karolherbst> I remember now...

21:46 <karolherbst> alyssa: https://gist.githubusercontent.com/karolherbst/127d028e6c6cca2d74d5c5bba3d321e0/raw/09975229af07124a613468646701f6e54d7e4128/gistfile1.txt

21:46 <karolherbst> this is kinda the reason

21:46 <karolherbst> the deref chains for store and load are different

21:46 <karolherbst> so we fail to see they are equal

21:47 <karolherbst> or rather point to equal things

21:47 <karolherbst> there is a dump LLVM reasons for it and the translator also not being super nice to us

21:48 <karolherbst> so when storing it, you have explicit struct member accesses

21:48 <karolherbst> but on load you don't have the struct information and it just does raw vec/scalar loads

21:48 <karolherbst> it's really annoying

21:48 <karolherbst> however, we should still be able to optimize it away :D

21:49 <karolherbst> it's just that our opt_deref isn't smart enough for that yet

21:49 <alyssa> alright..

21:49 <karolherbst> I think there is an MR for that...

21:50 <karolherbst> maybe not

21:51 <karolherbst> gfxstrand might remember

21:51 <alyssa> in this case at least, the obvious problem is that we're lowering memcpy to a raw copy of bytes, which fundamentally impedes other opts

21:52 <karolherbst> but yeah.. I think ultimately this is something we can only clean up after io lowering

21:52 <karolherbst> mhhh.. yeah, just..

21:52 <karolherbst> In my example there is no memcpy

21:52 <alyssa> two solutions are to either lower memcpys of structs to memcpys of each element separately, if we know it's tightly packed & so on

21:52 <alyssa> or to teach struct lowering to delete memcpies

21:52 <alyssa> uh

21:52 <alyssa> or to teach struct splitting to split memcpys

21:53 <karolherbst> ehh wait.. I faied to use my search function.. there is a memcpy

21:53 <karolherbst> let's see...

21:54 <karolherbst> yeah soo.. we can't do much useful with that memcpy

21:54 <karolherbst> it's just taking the raw pointer and copies the function_temp stuff into it

21:55 <karolherbst> so out ouf LLVM/SPIR-V it's already a plain byte copy

21:55 <karolherbst> and not much we can really do about it

21:55 <karolherbst> and I don't think that before IO lowering is the place where we could actually resolve that, because we'd have to know the actual offsets the load/stores go to

21:56 <karolherbst> in order to propagate it

21:58 <karolherbst> the downside of doing this after io lowering is, that we already allocated scratch space

22:00 <karolherbst> I honestly don't know what would be the best path forward here

22:01 <karolherbst> maybe we can convince LLVM to not do this nonsense? but then we can also get spir-v doing it anyway

22:02 alyssa has quit [Quit: alyssa]

22:03 alyssa has joined #dri-devel

22:04 <alyssa> ok.. I think we can split the memcpy, at least in the simple case I'm looking at

22:06 <alyssa> but the original case didn't have a memcpy there, just stores with deref_ptr_as_array..

22:07 <alyssa> oh, but there's legitimately a cast happening in that one

22:07 Mangix has joined #dri-devel

22:08 <alyssa> even though it's a cast between.. morally equivalent things

22:08 <karolherbst> I wonder if the better strategy is to simply convert everything to byte arrays as a intermediate step before io lowering.. :D

22:09 <karolherbst> but that's going to be messy

22:09 gouchi has quit [Quit: Quitte]

22:12 <alyssa> i mean.. trying to unlower scratch back to SSA sounds like you're in for a bad time

22:12 <karolherbst> yup

22:13 <karolherbst> I think all solutions here are messy in one or the other way

22:14 <alyssa> i think the memcpy is the root brokenness

22:14 <karolherbst> sure

22:17 <alyssa> This is the nonsense that we get with everything up until lowering memcpys, triggered in my kernel by passing a struct around:

22:17 <alyssa> https://rosenzweig.io/zeroing.txt

22:17 <karolherbst> yeah...

22:17 <karolherbst> there isn't really anything you can do on a deref level

22:17 <alyssa> Why not?

22:18 <alyssa> It feels like we "should" be able to split that memcpy_deref into a memcpy_deref for each struct element

22:18 <karolherbst> sure, but we don't copy the struct, we copy pointers to bytes

22:18 <alyssa> so..?

22:18 <alyssa> we have derefs, we can see thru the casts

22:19 <karolherbst> fair enough

22:19 <alyssa> really annoying that LLVM makes us jump through these hoops for such a trivial example though

22:19 <karolherbst> yep

22:20 <karolherbst> I do wonder though: if we get rid of the casts, does that get optimized away?

22:20 <karolherbst> after memcpy lowering I mean

22:20 <alyssa> it's not valid to get rid of them straight up

22:20 <alyssa> that's a memcpy between a struct and a u8

22:20 <karolherbst> mhh.. fair enough

22:21 <karolherbst> is it like that straight away or is that constant array after some opts?

22:21 <karolherbst> like in the initial nir it should still be all structs or not?

22:22 <alyssa> it's like that in the initial nir because llvm suuuucks

22:22 <karolherbst> pain

22:23 <alyssa> look on the bright side, I can pass pointers to structs as function arguments, that's cool

22:23 <karolherbst> :D

22:23 <karolherbst> yeah...

22:24 <alyssa> wait..

22:25 <alyssa> oh come on!

22:25 <alyssa> i switched to passing the struct by value instead, you're still not happy nir?

22:27 <alyssa> oh.. yeah, it's choking on the struct initializer which llvm is helpfully turning into a memcpy from raw bytes

22:27 <karolherbst> yes, llvm best compiler, very helpful

22:29 <alyssa> the good news is that I can deal with the struct initializer nonsense..

22:30 <alyssa> oh. strictly no i can't on this struct because there's padding :~)

22:31 <alyssa> Actually I have no idea if that's legal C or not

22:31 <alyssa> casting between a struct ptr and a u8* and putting stuff into the padding bytes and expecting that to work

22:31 RAOF has quit [Remote host closed the connection]

22:32 <karolherbst> well.. why not?

22:33 <karolherbst> you might read/write random garbage, but besides that?

22:33 <alyssa> padding rules are implementation defined so..

22:33 RAOF has joined #dri-devel

22:33 <karolherbst> they shouldn't be

22:33 <karolherbst> the C struct layout is kinda very strictly defined

22:35 <karolherbst> but maybe it's technically implementation defined, but I think everybody kinda follows the same rules on each platform? dunno

22:36 <karolherbst> the fun part is, that the CL CTS tests this stuff

22:37 <alyssa> it's ludicrous to me that llvm is going so far as casting to u8*.

22:37 <karolherbst> well...

22:37 <karolherbst> I'm sure they have their good reasons

22:37 <karolherbst> but yeah...

22:47 pcercuei has quit [Quit: dodo]

22:55 kts has quit [Quit: Konversation terminated!]

23:01 <alyssa> intel raytracing on arm64 when

23:03 alyssa has quit [Quit: alyssa]

23:03 vliaskov has quit []

23:11 jewins has quit [Ping timeout: 480 seconds]

23:39 Company has quit [Quit: Leaving]

23:48 crabbedhaloablut has quit []