ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
<gfxstrand> I really hate that this stupid query crash cost me like 4 hours of CTS runtime today. *grumble*
psykose has quit [Ping timeout: 480 seconds]
<karolherbst> *pat *pat*
<gfxstrand> *prrrr*
<epony> popcorn teatiem
psykose has joined #dri-devel
oneforall2 has joined #dri-devel
guru__ has joined #dri-devel
oneforall2 has quit [Read error: Connection reset by peer]
jewins has quit [Ping timeout: 480 seconds]
oneforall2 has joined #dri-devel
columbarius has joined #dri-devel
co1umbarius has quit [Ping timeout: 480 seconds]
guru__ has quit [Ping timeout: 480 seconds]
youmukon1 has joined #dri-devel
youmukon1 has quit []
youmukon1 has joined #dri-devel
youmukon1 has quit []
youmukon1 has joined #dri-devel
youmukonpaku1337 is now known as Guest1397
youmukon1 is now known as youmukonpaku1337
Guest1397 has quit [Ping timeout: 480 seconds]
Kayden has joined #dri-devel
yyds has joined #dri-devel
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
psykose has quit [Ping timeout: 480 seconds]
sukrutb_ has quit [Ping timeout: 480 seconds]
epony has quit [Ping timeout: 480 seconds]
psykose has joined #dri-devel
psykose has quit [Remote host closed the connection]
psykose has joined #dri-devel
antoniospg has quit []
idr has quit [Quit: Leaving]
lemonzest has quit [Quit: WeeChat 4.0.4]
lemonzest has joined #dri-devel
guru__ has joined #dri-devel
<i509vcb> In the Vulkan spec there is a mention of VK_ERROR_UNKNOWN possibly being returned by any command. But outside of that does the spec saying vkWhatever returns some specified error codes mean other error codes are technically valid to return?
<i509vcb> Question is related to me noticing that vkCreateWaylandSurfaceKHR states VK_ERROR_OUT_OF_HOST_MEMORY and VK_ERROR_OUT_OF_DEVICE_MEMORY are failure return codes, but Mesa can return VK_ERROR_SURFACE_LOST_KHR if you happen to hit some code paths
tristan has joined #dri-devel
tristan is now known as Guest1402
<i509vcb> s/valid/invalid
psykose has quit [Ping timeout: 480 seconds]
oneforall2 has quit [Ping timeout: 480 seconds]
psykose has joined #dri-devel
guru__ has quit [Read error: Connection reset by peer]
<zmike> sounds like bug
<i509vcb> It kind of makes sense that you could instantly lose a surface because the wl_display you gave to vulkan had a protocol error, but the spec seems ignore the existance of SURFACE_LOST in that case
ayaka_ has joined #dri-devel
oneforall2 has joined #dri-devel
guru__ has joined #dri-devel
crabbedhaloablut has joined #dri-devel
oneforall2 has quit [Ping timeout: 480 seconds]
kzd_ has joined #dri-devel
kzd_ has quit []
orbea has quit [Remote host closed the connection]
orbea has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
<DemiMarie> Someone ought to fuzz Mesa and make sure it returns VK_ERROR_OUT_OF_HOST_MEMORY where needed. /hj
jewins has joined #dri-devel
aravind has joined #dri-devel
Guest1402 has quit [Ping timeout: 480 seconds]
sukrutb_ has joined #dri-devel
JohnnyonFlame has joined #dri-devel
<Lynne> it's not like linux will ever return null for malloc, sadly, but on windows it's possible
<ishitatsuyuki> there are some CTS test that uses a custom allocator to simulate this. it's the C programmer's greatest foe ;P
tristan has joined #dri-devel
tristan is now known as Guest1408
ayaka_ has quit [Remote host closed the connection]
ayaka_ has joined #dri-devel
Leopold_ has quit []
bmodem has joined #dri-devel
<HdkR> Linux not returning null for malloc? That's easy, just run out of virtual address space
<airlied> just be a 32-bit game :-P
Leopold_ has joined #dri-devel
<HdkR> Yea, a 32-bit game is easy mode :D
<kode54> that reminds me
<kode54> I can't get Yuzu to run on Vulkan on ANV right now
<kode54> it's dying and throwing a terminating exception because some Vulkan call returns VK_ERROR_UNKNOWN
<kode54> naturally, the console output doesn't say where this is being thrown from
<airlied> alyssa: do you do function calls? :-) 24687 has some spirv/nir bits
jewins has quit [Ping timeout: 480 seconds]
ayaka_ has quit [Ping timeout: 480 seconds]
bmodem has quit [Ping timeout: 480 seconds]
bmodem has joined #dri-devel
bmodem has quit [Excess Flood]
bmodem has joined #dri-devel
Guest1408 has quit [Ping timeout: 480 seconds]
rauji___ has joined #dri-devel
<Lynne> ishitatsuyuki: it's my greatest regret, writing all this nice and neat resilient code to cascade all errors, but never actually using it or seeing it run
<Lynne> linux should give oom'd programs at least half a chance of closing carefully by letting malloc return null, but far too much code has been written under the assumption it won't
tristan has joined #dri-devel
tristan is now known as Guest1411
tzimmermann has joined #dri-devel
<HdkR> Lynne: I would love some inotify system to register low memory situations. Kind of like cgroup notifications but app level
aravind has quit [Ping timeout: 480 seconds]
<kode54> dj-death: do I need to poke that issue where I posted that trace?
aravind has joined #dri-devel
<dj-death> kode54: that would be helpful
<kode54> will do
<dj-death> kode54: I thought you said it was a crash
<kode54> I meant the one where i915 was running slowly for one game
<kode54> I added the traces and generated a new log
<kode54> but then I didn't realize you went on holiday
<dj-death> normally all VK_ERROR_* should go through vk_errorf
<kode54> this is a different thing
<dj-death> there should be a trace somewhere
<kode54> VK_ERROR_UNKNOWN was Yuzu
<kode54> the log I added traces for was Borderlands: GOTY Enhanced
Guest1411 has quit [Ping timeout: 480 seconds]
<kode54> I'm apparently tracking multiple issues
<kode54> in different hings
<kode54> *things
mszyprow has joined #dri-devel
<kode54> not sure what to do about yuzu
<kode54> I'll have to look at the source code to see why it's just throwing an exception
<kode54> yuzu doesn't have a single reference to vk_errorf
camus has quit [Read error: Connection reset by peer]
camus has joined #dri-devel
<dj-death> kode54: I meant the vulkan driver
<kode54> oh
<kode54> how do I get those messages if the app isn't able to show them?
<kode54> which environment variable do I need to set to make them all just dump to the console?
<kode54> yeah, that's the one
<kode54> I somehow got it for free for owning the original GOTY edition
<kode54> that game runs like crap on i915.ko, and causes a GPU crash on xe.ko
sukrutb_ has quit [Ping timeout: 480 seconds]
<dj-death> kode54: unfortunately you need to recompile mesa with this bit of code enabled I think : https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/vulkan/runtime/vk_log.c#L130
<dj-death> the message should end up on the console
<kode54> oh no, I have to build a debug build
<kode54> I suppose I should be running debug builds by default when testing out xe.ko
<dj-death> it's more than debug build I think
<dj-death> you also need to turn that line into : #if 1
<kode54> oh
<kode54> or -DDEBUG ?
<dj-death> or that yes
<kode54> yeah, my default build setup uses the mesa-tkg-git PKGBUILD and scripts
<kode54> and that does NDEBUG by default
<dj-death> maybe we should have a MESA_VK_LOG_FILENAME variable and write all the traces if enabled
<kode54> that may be a good idea
<dj-death> if it's just for yuzu you can also build your own repo
junaid has joined #dri-devel
<dj-death> and set VK_ICD_FILENAMES= to the json file of the anv driver
<dj-death> like configure the repo with meson
<dj-death> then build : ninja -C build src/intel/vulkan/libvulkan_intel.so src/intel/vulkan/intel_devenv_icd.x86_64.json
ayaka_ has joined #dri-devel
<dj-death> and export VK_ICD_FILENAMES=$PWD/build/src/intel/vulkan/intel_devenv_icd.x86_64.json
<dj-death> the vulkan loader will pick up that driver when the app creates the VkInstance
<kode54> gotcha
<dj-death> kode54: just before I go and buy Borderland GOTY, you don't reproduce the problem with Borderland 3?
<kode54> I can try
<kode54> let me install that
<dj-death> thanks
<kode54> should probably take me about 20 minutes to install that
<dj-death> because I already have that one
<kode54> I mean, I was playing Borderlands 3 at one point, but there was an annoying issue that I didn't like with it
<kode54> where the masked water textures and such would randomly flicker through the rest of the world
<kode54> this happened on both i915 and xe
<dj-death> :(
<kode54> it went away if I set the environment variable for full sync, but that destroyed my frame rate
<kode54> I need to test it again
<kode54> let me install it first
<kode54> this is getting tight too
<kode54> this will leave me with about 45GB of free space
<kode54> I need to rebalance my installed junk
<kode54> maybe I just have a junk video card
glennk has quit [Remote host closed the connection]
glennk has joined #dri-devel
epony has joined #dri-devel
sima has joined #dri-devel
epony has quit []
<kode54> okay
<kode54> I enabled that block of debug code
<kode54> what do I need to pass in env to get all messages now?
epony has joined #dri-devel
fab has quit [Quit: fab]
<dj-death> kode54: you don't, we that activated it should print out on the console
<kode54> it didn't print anything that wasn't printed before
<dj-death> hmm okay strange
<dj-death> and now you have a debug build?
sukrutb_ has joined #dri-devel
<kode54> I'll try that next
<kode54> looks like yuzu did a booboo
<dj-death> ah yeah
<dj-death> validation layers might have caught that
fab has joined #dri-devel
<tzimmermann> javierm, hi. may i ask you for some reviews?
<epony> yes
<epony> I review you now from this.
<epony> ah, it's for smb else
frankbinns has quit [Remote host closed the connection]
<epony> ok
mvlad has joined #dri-devel
yyds has quit [Remote host closed the connection]
yyds has joined #dri-devel
yyds has quit [Remote host closed the connection]
<kode54> I started BL3 about 20 minutes ago
<kode54> it's still preparing vulkan shaders
sghuge has quit [Remote host closed the connection]
<epony> ok
sghuge has joined #dri-devel
<epony> how is vulcan now?
yyds has joined #dri-devel
<epony> amb assadore serveall
junaid has quit [Remote host closed the connection]
bmodem has quit [Ping timeout: 480 seconds]
<kode54> 62% done processing
<kode54> now it's 64% done
<kode54> what the hell kind of shaders does this game use
<epony> shady
<HdkR> kode54: If you still have the DEBUG build, all the validation really slows down shader compilation
Company has joined #dri-devel
<kode54> it's a release build with that debug message output enabled
<kode54> it normally takes this long every time I first run BL3
<kode54> the game, after all, does download 2GB of shaders and transcoded videos
<HdkR> UE4 title, probably has a million shader variants as well :D
<epony> can it not YT?
<epony> why trans coded
tristan has joined #dri-devel
tristan is now known as Guest1417
<kode54> probably doesn't help that I've got a R7 2700
<kode54> I've already been told that bottlenecks my GPU
<HdkR> Fossilize shader compilation also will thread out to the number of CPU cores you have. So pretty good scaling
<kode54> commandline says --num-threads 15
<HdkR> Indeed
<HdkR> Maximum threads subtract one is easy math :D
<kode54> it just bumped up to 65%, then rolled back to 64%
yuq825 has joined #dri-devel
<javierm> tzimmermann: sure
<tzimmermann> javierm, thnaks, i have more fbdev cleanups in https://patchwork.freedesktop.org/series/122976/ and https://patchwork.freedesktop.org/series/123017/
pcercuei has joined #dri-devel
<javierm> tzimmermann: you are welcome, I'll try to do it later today
neniagh has quit []
neniagh has joined #dri-devel
frankbinns has joined #dri-devel
<kode54> okay
<kode54> I hit the skip button
<kode54> now it's gone green "running"
<kode54> and no window has appeared yet
<kode54> fine, I'll reboot to stable kernel and switch to stable mesa and see how long this takes to boot up
<epony> transcode with GPU
<epony> it has a lot of stream processors
<epony> in CPU is a drama
<kode54> did somebody say something
<epony> are you transcoding in CPU?
<epony> how much memory you have?
<kode54> starting from scratch on 23.1.6
djbw has quit [Read error: Connection reset by peer]
<epony> how much of it moves in 1 second through CPU and how much does it bulge over the data set (memory expand vs data input)
YuGiOhJCJ has joined #dri-devel
<epony> is it really running the threads on all cores.. with good saturation?
lynxeye has joined #dri-devel
<epony> can you offload it to the GPU
<epony> check your cache eviction rates
Ahuj has joined #dri-devel
bmodem has joined #dri-devel
<kode54> fossilize just finished
<kode54> now it's doing the claptrap walk animation that used to have a progress display, but no longer does
<kode54> done
aravind has quit [Ping timeout: 480 seconds]
apinheiro has joined #dri-devel
<kode54> yeah, none of the pipeline recreation lag that BL:GOTY Enhanced has
<kode54> but it still has the flickering water
<kode54> I'll record a video
kts has joined #dri-devel
<kode54> I manage better frame rates under Windows, by quite a bit, on the same settings (Ultra)
kts has quit []
<kode54> oh, from the other game
<kode54> was that useful?
sgruszka has joined #dri-devel
JohnnyonFlame has quit [Read error: Connection reset by peer]
<kj> gfxstrand: to double check, it's not acceptable to compile down to SPIR-V or NIR (and serialize) at build time for internal shaders?
<kj> So the options would be to check in glsl and glsl_to_nir(), or build the shader in nir at runtime
rasterman has joined #dri-devel
vliaskov has joined #dri-devel
<kj> Asking because for pvr we still need to unhardcode some internal shaders so would be nice to write them up with something more higher level than rogue ir (which we've done atm)
<kj> I recall a conversation here about setting up a common way of doing things for internal shaders but not sure what happened with that
epony has quit [autokilled: This host violated network policy. Mail support@oftc.net if you think this is in error. (2023-09-01 08:47:19)]
<dj-death> kode54: thanks yeah
<dj-death> kode54: really looks like a window system issue
<kode54> which window system should I use?
<dj-death> kode54: looks like the game is like waiting to get a new buffer for an image for like 160ms
<kode54> is there one especially suited to this task?
<dj-death> kode54: I think most people use gnome-shell
<kode54> Xorg or Wayland?
<kode54> I have nasty window scaling glitches on both gnome and plasma
<kode54> resizing the scaling of one output to 200% causes the window shadows to glitch out
<dj-death> both should work
<kode54> I'll try gnome again
<dj-death> kode54: I can see on the graph that the GPU is completely idle at times : https://i.imgur.com/TxaAtBy.png
<dj-death> kode54: and everytime that seems to be because it ran out of swapchain buffer
<dj-death> and it's wait for one to come back from the compositor
<kode54> weird
<dj-death> but in the middle of that you have 10+ frame that went fine
<dj-death> each around 16ms
<dj-death> not ruling out some driver issue but it's really strange...
<dj-death> that doesn't look like a GPU programming issue
<dj-death> more like a WSI problem
Guest1417 has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
cmichael has joined #dri-devel
swalker__ has joined #dri-devel
swalker_ has joined #dri-devel
swalker_ is now known as Guest1428
alex3305 has joined #dri-devel
aravind has quit []
swalker__ has quit [Ping timeout: 480 seconds]
<kode54> okay
<kode54> it happens under Gnome too
<kode54> could this also be that annoying as hell TPM bug
dtmrzgl has quit []
donaldrobson has joined #dri-devel
randy_ has joined #dri-devel
ayaka_ has quit [Ping timeout: 480 seconds]
randy_ has quit [Ping timeout: 480 seconds]
<dj-death> TPM?
dtmrzgl has joined #dri-devel
<kode54> nope, it wasn't fTPM
<kode54> how did you find that it was waiting on frames from the compositor?
ayaka_ has joined #dri-devel
alex3305 has quit [Remote host closed the connection]
kts has joined #dri-devel
apinheiro has quit [Quit: Leaving]
mceier has quit [Quit: leaving]
mceier has joined #dri-devel
<dj-death> kode54: if you look at the timeline
<kode54> I don't know what I'm looking for
<kode54> and I have no idea how to zoom in or find more detail from what was logged
<dj-death> kode54: in the row that has the name of the app
<kode54> ok
<dj-death> you like on the row, it'll expand
<dj-death> then you see sub-rows that are the threads in the app
<kode54> what app are you using to view this trace?
<kode54> I'm using a web site
<dj-death> you can zoom in/out with Ctrl+scroll-up/down
<dj-death> yeah me too
<dj-death> and load the trace file
<kode54> yes, and I see rows named after the app
<dj-death> you should list of bunch of row in the app that are thread of the WSI
<kode54> with frames, and a bunch of different items that are gapped where the delays were
<dj-death> "WSI swapchain q ..."
<dj-death> the "pull present queue" are when the thread is waiting for a new swapchain buffer to be available
<kode54> oh
<kode54> I was looking at the wrong thing
<dj-death> that matches one image of the app
<kode54> I was looking at Borderlands-#-something rows
<kode54> no idea what those are logging
<dj-death> mind is called "Z:\home\chris\..."
<dj-death> so those "pull present queue" items is a thread being blocked on waiting for a free buffer
<dj-death> but you can see that they wait for 95ms, 80ms, etc...
<dj-death> one is really bad at 160ms
<dj-death> that way too much
<kode54> I see that
<kode54> and it's happening under Gnome too
<dj-death> it should be almost immediate
<kode54> could it be my kernel? I'm using a non-distro kernel
<dj-death> I don't know to be honest
<dj-death> never seen something like that
<dj-death> what's really strange is the recreation of swapchains constantly
<dj-death> that's not driven by the driver but by the app/dxgi
ahajda has joined #dri-devel
<kode54> dxvk bug tracker told me that they recreate the swap chain if it times out
ahajda has quit [Read error: Connection reset by peer]
ahajda has joined #dri-devel
<dj-death> kode54: yeah but what's odd is that for each swapchain, they appear to only do a single AcquireFrame
<dj-death> I see the app is polling a query as well
<dj-death> maybe there is some issue there
<dj-death> well yeah there might be a kernel issue after all
<dj-death> vkQueueSubmit is blocked for 90+ms
<dj-death> that's blocking the WSI in as well I bet
mceier has quit [Quit: leaving]
<kode54> the thing is
<kode54> I can't even test if this is an xe/i915 thing
mceier has joined #dri-devel
<kode54> it won't even run on xe.ko
penguin42 has joined #dri-devel
<dj-death> kode54: trace.perfetto-trace.4.zst was recorded on Xe ?
<kode54> no, i915
<kode54> I can't even get it to run without crashing the GuC on Xe
<kode54> it just starts up to a black screen, then gets a crash notice about lost DX11 device
<kode54> and the kernel dumps a useless GPU core text file to a /sys file
<kode54> since there's no usable GuC info in it
<kode54> it was even suggested that the GuC info that is there is from after it's already been restarted, so doubly useless
<dj-death> alright
<kode54> ah, it wasn't a DX11 device lost
<kode54> it was General Protection Fault
<kode54> [ 1908.101030] xe 0000:28:00.0: [drm] Engine reset: guc_id=133
<kode54> [ 1908.108626] xe 0000:28:00.0: [drm] Timedout job: seqno=4294967188, guc_id=133, flags=0x8
<kode54> yup, it crashed
<kode54> would love to have usable dumping so we can get to the bottom of these GuC crashes
<dj-death> if only we could have the type of dma-fence the kernel is waiting on
cmichael has quit [Quit: Leaving]
<kode54> oh, the other one? that would be great
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
ayaka_ has quit [Ping timeout: 480 seconds]
yyds has quit [Remote host closed the connection]
kts has quit [Quit: Konversation terminated!]
swizzlefowl has joined #dri-devel
<kode54> maybe DSB?
<swizzlefowl> Hello
youmukon1 has joined #dri-devel
<zamundaaa[m]> MrCooper: I'm doing a bunch of performance work for KWin and found something pretty unexpected
youmukonpaku1337 has quit [Read error: Connection reset by peer]
<zamundaaa[m]> When I'm rendering to a gbm buffer (imported as an EGLImage, which is used as the color attachment for an fbo) and call glFinish(), the fds of the buffer aren't always immediately readable afterwards
<zamundaaa[m]> This seems to happen quite seldomly on AMD, and more often on Intel. Are my expectations for how this works just wrong, or could there be some driver bugs involved?
donaldrobson has quit [Ping timeout: 480 seconds]
camus has quit [Ping timeout: 480 seconds]
yuq825 has left #dri-devel [#dri-devel]
youmukonpaku1337 has joined #dri-devel
youmukon1 has quit []
<DemiMarie> kode54: Fuzz the GuC and report the bugs to Intel?
<DemiMarie> Sorry
bmodem has quit [Ping timeout: 480 seconds]
<DemiMarie> Asahi Lina has some experience debugging firmware crashes.
bmodem has joined #dri-devel
jewins has joined #dri-devel
<alyssa> kj: compiling glsl/cl to spir-v at build-time is fine.
<alyssa> compiling to nir & serializing is somewhat more sketchy. but intel goes all the way to hw binaries at build time so YMMV i guess
bmodem has quit [Ping timeout: 480 seconds]
<alyssa> i'll probably send out common code for doing CL kernels at build time in a reasonably generic way. I have a vague plan to let CL C be usable for certain vertex/fragment shaders too, if that's something that's needed.
<alyssa> Not sure what kinds of shaders you're talking about. For small stuff usually nir_builder is the right call, it's just the chunky monkey kernels that really benefit.
<DavidHeidelberg[m]> eric_engestrom: the build job limit sounds like good idea, MR Ack :)
fab has quit [Quit: fab]
<kj> alyssa: thanks, we have compute shaders for queries which are fairly simple, but there's also a whole bunch of shaders used for transfer stuff which might be a bit more involved (haven't looked in depth there)
kzd has joined #dri-devel
<penguin42> If I've got a 'Compute Shader LLVM IR' dumped from rusticl debug, is there anyway I can push it through llvm to see what it does?
<alyssa> kj: yeah.. I've done some pretty intense nir_builder but yeah working with C is a lot nicer :p
<alyssa> see agx_nir_lower_texture.c if you want to be scared :p
<alyssa> (not really a candidate for CL at this point)
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
<karolherbst> penguin42: llvm has tooling for their stuf,f like llvm-dis and whatnot
<penguin42> karolherbst: Yeh so I've seen those, but I don't know how to go from the debug output of mesa to feeding that into llvm so I can play around with llvm to see why it's doing what it's doing
<karolherbst> yeah... sadly I don't really know much about how to dig into those things deeper from an AMD perspective, might wantt o ask around in #radeon as there are some LLVM developers who might be able to help out with it
<penguin42> karolherbst: Ack, do you know how to throw that debug into llvm ?
<karolherbst> no idea how to trigger the normal pipeline, however there is AMD_DEBUG=asm or something to print the actual hardware level IR
<karolherbst> or something
<penguin42> karolherbst: Yeh so I have the IR and I have the asm, I wanted to play around with what was inbetween; mostly this is trying to understand where that weird load ordering thing came from
<karolherbst> ahh
<karolherbst> I guess for that you'll have to compile LLVM and see what passes it runs
<karolherbst> maybe there is an LLVM option to print what it runs
<karolherbst> dunno
<penguin42> yeh I guess there is once I can figure out how to run it :-)
<karolherbst> just use your local LLVM build instead of your system one
<penguin42> karolherbst: But what options do I pass to llvm to take that IR and spit out that asm?
<penguin42> karolherbst: I don't see any of that at the moment because I only have the rusticl debug
fab has joined #dri-devel
<karolherbst> there is no simple solution here
<penguin42> ok
<karolherbst> your best bet is to just use the current mesa code
<karolherbst> and whatever radeonsi is doing
<karolherbst> I don't know if you can do that stuff on the cli even
<penguin42> ok - that's what I was really after, because I was assuming doing it from the CLI would be the best way to add llvm debug/tracing/etc
<karolherbst> there might be a way, I just don't know it
<karolherbst> but you can also just write a small tool doing the same thing
<karolherbst> e.g. copying the pipeline radeonsi is using
<penguin42> karolherbst: I can see optimising shader code can drive people nuts; I was finding one change to my shader made rusticl faster and ROCm slower or the other way around
<karolherbst> yeah...
<karolherbst> optimizing compilers be like that
<karolherbst> at some point it's hard to find those changes which are always a benefit
<karolherbst> so, some stuff gets slower, some stuff gets faster
<penguin42> karolherbst: I'm suspecting some of this might be 'bank clashes' - but wth knows; AMDs pretty profiling tools look like they need their kernel drivers
<karolherbst> yeah.. could be
<karolherbst> though, did you try umr?
<penguin42> what's umr?
<karolherbst> though not sure it would help here
<penguin42> karolherbst: Ooh, one to try later
guru__ has quit [Quit: Leaving]
oneforall2 has joined #dri-devel
yyds has joined #dri-devel
Ahuj has quit [Ping timeout: 480 seconds]
<eric_engestrom> DavidHeidelberg[m]: thanks! mind saying that on the MR? :P
<eric_engestrom> pendingchaos: (sorry for the delay, been doing too many things lately and I forgot to read my mentions here) I'll continue making new 23.1.x releases until 23.2.0 is out, no matter how long it takes
tzimmermann has quit [Quit: Leaving]
Duke`` has joined #dri-devel
mripard has quit [Quit: mripard]
Jeremy_Rand_Talos_ has quit [Read error: Connection reset by peer]
Jeremy_Rand_Talos_ has joined #dri-devel
Mis012[m]1 is now known as Mis012[m]
Jeremy_Rand_Talos_ has quit [Remote host closed the connection]
Jeremy_Rand_Talos_ has joined #dri-devel
Jeremy_Rand_Talos_ has quit [Remote host closed the connection]
Jeremy_Rand_Talos_ has joined #dri-devel
sgruszka has quit [Ping timeout: 480 seconds]
yyds has quit [Remote host closed the connection]
dviola has quit [Quit: WeeChat 4.0.4]
ahajda has quit [Quit: Going offline, see ya! (www.adiirc.com)]
Guest1428 has quit [Remote host closed the connection]
lemonzest has quit [Quit: WeeChat 4.0.4]
dviola has joined #dri-devel
<Lynne> cargo is nice and all until a fresh sync takes 500 megabytes and you're on a bad connection, and a crate decides it absolutely must use nightly, as all crates do
<Venemo> is gitlab down again?
<Lynne> just a throwaway comment
ahajda has joined #dri-devel
lemonzest has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
jimc has quit [Read error: Connection reset by peer]
mszyprow has quit [Ping timeout: 480 seconds]
<alyssa> something something cargo cult
<gfxstrand> :P
<karolherbst> 🦀
lynxeye has quit [Quit: Leaving.]
<anarsoul> Lynne: just don't do sync when you're on a bad connection
<Lynne> having a bad connection is hardly a choice
<tnt> I've got an application causing : "[drm] GPU HANG: ecode 12:1:85dcfdfb, in ngscopeclient" (intel 12th gen, vulkan app).
<tnt> How would one go about tracing what's going on ?
frankbinns has quit [Remote host closed the connection]
<cmarcelo> does anyone foresee glsl_function_type() being useful again (only user was spirv, but stopped in favor of own implementation, right now is dead code)? deciding here if I can just remove or make changes to improve with others as part of an ongoing MR.
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
mszyprow has joined #dri-devel
<dcbaker> mattst88: did you meant to request a review from marge on the intel_clc series, or did you mean to assign it to marge?
<mattst88> dcbaker: lol, derp
<mattst88> thanks for noticing that
kts has joined #dri-devel
<dcbaker> that looks good to me, btw. I'd still like to get it to where we don't need to do that, but there are so many assumptions in meson's implementation that things are for the host and only the host it's turning into a slug fest with some seriously annoying problems
<alyssa> dcbaker: I'm experimenting with a generic mesa_clc
<dcbaker> so, same problem then?
<alyssa> current tentative plan is that it goes CLC->SPIR-V but does not touch any NIR
<alyssa> which should be a lot fewer deps but yes, same problem until clang can do that itself
<dcbaker> the good news is Mesa isn't the only project with this problem, it turns out that other big complex projects run into the same issue when cross compiling
<dcbaker> at least, good in that everyone agrees it needs to happen, lol
<DemiMarie> Sorry for all the unanswerable questions I asked earlier!
djbw has joined #dri-devel
<alyssa> nod
<alyssa> i expect by end-of-year asahi will have a hard build-dep on clc
<alyssa> We have a significant need from it and we have buy in from Fedora and Intel's already doing it for raytracing so sure yeah why not
<alyssa> and asahi only needs to build on arm and x86, so no LLVM problem
<dcbaker> sigh. LLVM.
junaid has joined #dri-devel
<dcbaker> karolherbst: I'm going to start reviewing the next round of the crates in meson work next week, I know gfxstrand has been trying it out. Are there any crates you need/want?
Nyamiou has joined #dri-devel
gouchi has joined #dri-devel
Nyamiou has quit []
<alyssa> dcbaker: Debian has some build rules to disable LLVM on exotic architectures (':
<alyssa> So no CLC deps in common code without angering somebody.
<alyssa> Although... if they're cross building it shouldn't matter?
<karolherbst> dcbaker: uhm.. mostly just syn and serde
<alyssa> Like you should be able to run intel_clc/mesa_clc on the host with the host LLVM and then do an LLVM-free target mesa build using the precompiled kernels
<alyssa> you don't get Rusticl support but the BVH kernels etc should work fine in that set up
<alyssa> we dont support that on the mesa side but we.. probably could?
<dcbaker> karolherbst: cool, I'm pretty sure syn already works
<dcbaker> I'll make sure we test out serde
<karolherbst> cool
<dcbaker> alyssa: yeah.... I'm just not sure how you'd go about supporting OpenCL without llvm/clang at this point
<karolherbst> there are probably random others I might want to use in the future, but those would be a good start to drop some code
<alyssa> dcbaker: ~~gcc-spirv when~~ delet
<dcbaker> alyssa: I mean, if someone else wants to write the code and it works...
<alyssa> dcbaker: :p
<alyssa> I don't love runtime LLVM deps, honestly I build -Dllvm=disabled myself up until now
<alyssa> but I feel entitled to a buildtime LLVM dep o:)
<alyssa> (I already build mesa with clang, this is just more of that :D)
<dcbaker> I don't care that much about buildtime deps that much, but I usually build with gcc and getting the right version of LLVM can be a real pain sometimes
<alyssa> I do know an llvm spirv target was talked about, I wonder if mesa_clc will be obsoleted in due time..
<alyssa> except for libclc, src/compiler/clc/ isn't doing much that clang couldn't do itself..
<alyssa> :q
<karolherbst> yeah.. maybe.. if it's not causing any regressions in the CTS that is
<alyssa> yeah
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
glennk has quit [Remote host closed the connection]
glennk has joined #dri-devel
glennk has quit [Remote host closed the connection]
a-865 has quit [Ping timeout: 480 seconds]
glennk has joined #dri-devel
<airlied> zmike: you should have made that blog a monetized twitter post, controversy gets clicks!
a-865 has joined #dri-devel
<zmike> airlied: there's no controversy there, just people who agree with me and people who are wrong
mszyprow has quit [Ping timeout: 480 seconds]
<airlied> that's the attitude to get more elon bucks :-P
sravn has quit [Quit: WeeChat 3.5]
sravn has joined #dri-devel
<alyssa> I wonder what it'll take to get this loop unrolled: https://rosenzweig.io/hmm
<alyssa> It's from a... familiar piece of source code.
<alyssa> hmm I wonder how clang unrolls this
<alyssa> ...does clang not unroll this? T.T
<pendingchaos> that looks like an overly complicated IF statement
<alyssa> pendingchaos: correct
<alyssa> it's from doing something i'm really not supposed to =D
<alyssa> clang seems to eliminate the backwards branch when compiled for my cpu
<alyssa> an extra opt_algebraic rule removes a big chunk of loop header, but still not enough for unrolling..
<alyssa> oh... opencl is nopping out __builtin_expect I guess..
<alyssa> is it?
<karolherbst> yeah.. we ignore it for now
<alyssa> karolherbst: at what point is it being ignored?
<karolherbst> inside vtn
<alyssa> alright
<alyssa> I wonder if I should plumb that through. Or perhaps more likely, see if I can get clc to do a bit more LLVM opts
<alyssa> oh right we -O0 right.
<alyssa> erg
<karolherbst> yeah.. builtin_expect isn't _that_ terrible to pipe though, it's just that it's one of those 80/20 things
<alyssa> right.. ugh..
<alyssa> For this particular kernel, if I build with -O3, it gets unrolled in llvm just fine
<karolherbst> ahh...
<karolherbst> I think there is strictly nothing against using more opts, just some opts break the tooling
<karolherbst> and the translator gives up
<alyssa> yeah..
<alyssa> it's just annoying because with -O3 the final NIR is excellent
<karolherbst> mhh
<karolherbst> anyway, in this commit specifically I played around with enabling some llvm opts: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23852/diffs?commit_id=4b18b0770154aec4ad905bba9856db1cd47b5d60
<karolherbst> and those are safe
<alyssa> ack
<karolherbst> however we can always allow callers to enable more or something
<karolherbst> but I wouldn't enable that for everything
<karolherbst> maybe that stuff gets better once the spirv backend lands
<alyssa> Alternatively, NIR's optimizer should be good enough to handle all this? O:)
<karolherbst> yeah... well.. hopefully :)
<alyssa> needs expect plumbed thru for this guy
<karolherbst> my motivation with that MR was to cut down the size of cached blobs
<alyssa> hmm wait is expect even doing what i expect
<karolherbst> what do you expect expect is doing?
<karolherbst> but yes, it's very cursed
<karolherbst> and not trivial
<alyssa> I mixed up expect and assume
<karolherbst> right...
<karolherbst> expect is the more trivial thing, assume is cursed to implement
immibis has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
<alyssa> OOOOOKAY
immibis has joined #dri-devel
<alyssa> so it turns out I've been doing something absolutely stupid in mesa for years
<alyssa> kewl.
<karolherbst> heh
<alyssa> well, loop's gone now, with -O0
<karolherbst> :D
<karolherbst> do we want to know?
<alyssa> No
<alyssa> Even so, with -O3 there's a big pile of load/store_scratch that goes away with -O0
<karolherbst> yeah....
<alyssa> i'm guessing I'm missing some NIR copyprop pass somewhere
<alyssa> seems like we should be able to see thru that
<karolherbst> so you have scratch stuff with O3 but not with O0?
<karolherbst> anyway.. you can always copy whatever rusticl is doing to get rid of those things
<karolherbst> the entire pipeline is cursed
<karolherbst> kinda
<karolherbst> but it kinda also works
<airlied> the translator docs I think state O0 only is supported, anything else is a crap shoot
<karolherbst> yeah.. I'm also a bit hesistant of even landing some of the llvm opt work because of this
<airlied> but yeah it might be worth running some llvm passes, but also I think we can do a lot on the NIR side to step up
<karolherbst> it's literally not tested anywhere
<airlied> and close the gaps
<airlied> if we want NIR to be a real life compute compiler
<karolherbst> right.. I didn't want to get more optimize binaries, just smaller ones
<karolherbst> and I was able to reduce binaries sizes by lik 60%
<karolherbst> it's just not very stable
<airlied> I suppose in theory spirv-opt could be used as well, if it ever did anything useful :-P
<karolherbst> it speeds up caching/ reduces peak memory usage and other benefits, sadly we probably can't rely on it
<karolherbst> yeah... that gives me another 25% reduction
<karolherbst> it's all in the MR
<karolherbst> sadly.. I can't use the impressive `MergeFunctions` LLVM pass
<karolherbst> so the benefits are all kinda smallish
<karolherbst> `MergeFunctions` generates function pointers in a few places
<karolherbst> alyssa: anyway, I think the asahi CL stuff is ready to land, I've listed all the remaining problems, but nothing stands out really and I mitigated the linear image issue as much as possible. Now you simply can't map 3D images, but whatever. Maybe I should just assign to marge and... figure out timestamps after that
mvlad has quit [Remote host closed the connection]
Mangix has quit [Ping timeout: 480 seconds]
<alyssa> karolherbst: I had scratch with O0 but disappeared with O3. I think I screwed up my pass order or whatever, will look at it harder, NIR should be able to breeze thru this
<karolherbst> yeah.. it should
<karolherbst> just run all the passes 5 times or something
<alyssa> Lol
<alyssa> 20:01 airlied | if we want NIR to be a real life compute compiler
<alyssa> Yeah... A big chunk of stuff that CL wants, VK also wants and we don't have any LLVM to cheat off there. So trying to get NIR into shape seems like the better long-term approach, idk
<karolherbst> yeah
<alyssa> I'm good with asahicl being merged
fab has quit [Quit: fab]
alyssa has quit [Quit: alyssa]
sima has quit [Ping timeout: 480 seconds]
alyssa has joined #dri-devel
<alyssa> k, this is interesting:
<alyssa> 64 %5 = deref_var &__const.hello.cfg (constant struct.AGX_USC_TEXTURE)
<alyssa> 64 %7 = deref_cast (uvec4 *)%5 (constant uvec4) (ptr_stride=16, align_mul=0, align_offset=0)
<alyssa> 32x4 %8 = @load_deref (%7) (access=none)
<alyssa> nir_opt_deref doesn't remove the cast and nir_opt_constant_folding doesn't see through the cast, so that turns into a load_global (!) instead of a load_const
Duke`` has quit [Ping timeout: 480 seconds]
<karolherbst> constant isn't in shader constant memory though
<karolherbst> it's just a ubo (more or less)
<karolherbst> just with global addressing
<alyssa> still supposed to be constant folded.
<karolherbst> does the constant variable have a constant initializer?
<alyssa> yes
<alyssa> it's just the cast in the way
junaid has quit [Remote host closed the connection]
<karolherbst> so without the cast it would constant fold? Mhh.. normally we kinda drop pointless casts but there are a few restrictions in palce
<karolherbst> but yeah, if the source is constant known at compile time we should constant fold it
JohnnyonFlame has joined #dri-devel
<alyssa> also, deref_ptr_as_array
<alyssa> 64 %5 = deref_var &__const.hello.cfg (constant struct.AGX_USC_TEXTURE)
<alyssa> 64 %9 = deref_cast (uvec2 *)%5 (constant uvec2) (ptr_stride=8, align_mul=0, align_offset=0)
<alyssa> 64 %11 = deref_ptr_as_array &(*%9)[2] (constant uvec2) // &(*(uvec2 *)%5)[2]
<alyssa> 32x2 %12 = @load_deref (%11) (access=none)
Mangix has joined #dri-devel
<alyssa> admittedly dont fully understand whats happening here but it should also constant fold..
<karolherbst> it might be that most of the folding only reliably works once IO is lowered
<alyssa> But it's too late by then, since lowering I/O turns this into load_global(load_constant_base_ptr)
<karolherbst> mhhh... yeah, it shouldn't...
<alyssa> this needs to be optimized away in derefs
mszyprow has joined #dri-devel
<karolherbst> I think the problem here is, that casting a constant memory pointer to a different address space is kinda UB
<alyssa> there's no constant memory pointer in the C code?
<alyssa> *CL kernel
<alyssa> just literals that clang decided to turn into constant memory
<karolherbst> right... it's kinda weird tbh
<karolherbst> what's the CLC source?
<alyssa> agx_pack(..) { ..}
<karolherbst> not really sure what that gets generated into, but in theory it should just be a stack variable getting fields assigned, so I'm kinda confused why it's doing this kinda nonsense in nir
<karolherbst> or rather, I don't see where this cast would come from
<karolherbst> does it look better in the spir-v? though I suspect not
<karolherbst> or maybe?
<karolherbst> what does the nir straight out of spirv_to_nir look like?
<karolherbst> but anyway... casting from constant to generic is just not legal
<karolherbst> the thing is... because it's all coming from C you also can't just drop random cast, because $reasons
<karolherbst> like e.g. if you'd do (global* int)some_local_memory_ptr, you also can't just load from the local address, because it's technically a bug in the source code and UB
<karolherbst> but only in the sense of your pointer is probably pointing to invalid memory
<karolherbst> but what if you do (local* int)(global* int)...
<alyssa> there's no cast to global?
<alyssa> what's happening is that there's a constant struct
<alyssa> that would be fine, if we split the struct with split_struct_vars
<alyssa> but that bails on complex uses, because of the deref_ptr_as_array
<alyssa> which in turn nir_deref.c claims should be eliminated by nir_opt_deref but that's not happening
<karolherbst> ehhh wait.. I missinterpreted the " 64 %9 = deref_cast (uvec2 *)%5 (constant uvec2) (ptr_stride=8, align_mul=0, align_offset=0)" thing...
<alyssa> presumably my pass order is busted.
<karolherbst> if it's a pointless cast, opt_deref should kinda be able to get rid of it
<karolherbst> alyssa: btw, did you call explicit_type?
<karolherbst> some of the passes rely on explicit type information
<alyssa> yes
<alyssa> Here's a simple reproducer
<alyssa> right after vtn, this looks like https://rosenzweig.io/spirv-to-nir.txt
<karolherbst> huh...
<karolherbst> that's a lot of stuff...
<alyssa> after all the lowering/opt passes, this ends up as a mess with scratch access https://rosenzweig.io/final.txt
mszyprow has quit [Ping timeout: 480 seconds]
<alyssa> not sure if rusticl fairs any better
<karolherbst> let me write to a struct instead
<karolherbst> indeed...
<alyssa> karolherbst: that code doesn't write to a struct?
<karolherbst> yo, that's kinda rude of nir :')
<karolherbst> yeah, I have it local now and it also uses scratch
<alyssa> Joy
<karolherbst> funky...
<alyssa> very basic issue: why is nir_split_struct_vars failing on https://rosenzweig.io/why.txt?
Mangix has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
<alyssa> presumably the cast from struct to uvec
<karolherbst> probably
<alyssa> which is being inserted by nir_lower_memcpy, I think
<karolherbst> it's kinda funky, that with the constant struct initiialize nir at some point does load_const, but for whatever reason it thinks it should pipe that through scratch memory...
<karolherbst> ohhhhh
<karolherbst> uhhhhh
<karolherbst> it's this bug
<karolherbst> I hate it
<karolherbst> I remember now...
<karolherbst> this is kinda the reason
<karolherbst> the deref chains for store and load are different
<karolherbst> so we fail to see they are equal
<karolherbst> or rather point to equal things
<karolherbst> there is a dump LLVM reasons for it and the translator also not being super nice to us
<karolherbst> so when storing it, you have explicit struct member accesses
<karolherbst> but on load you don't have the struct information and it just does raw vec/scalar loads
<karolherbst> it's really annoying
<karolherbst> however, we should still be able to optimize it away :D
<karolherbst> it's just that our opt_deref isn't smart enough for that yet
<alyssa> alright..
<karolherbst> I think there is an MR for that...
<karolherbst> maybe not
<karolherbst> gfxstrand might remember
<alyssa> in this case at least, the obvious problem is that we're lowering memcpy to a raw copy of bytes, which fundamentally impedes other opts
<karolherbst> but yeah.. I think ultimately this is something we can only clean up after io lowering
<karolherbst> mhhh.. yeah, just..
<karolherbst> In my example there is no memcpy
<alyssa> two solutions are to either lower memcpys of structs to memcpys of each element separately, if we know it's tightly packed & so on
<alyssa> or to teach struct lowering to delete memcpies
<alyssa> uh
<alyssa> or to teach struct splitting to split memcpys
<karolherbst> ehh wait.. I faied to use my search function.. there is a memcpy
<karolherbst> let's see...
<karolherbst> yeah soo.. we can't do much useful with that memcpy
<karolherbst> it's just taking the raw pointer and copies the function_temp stuff into it
<karolherbst> so out ouf LLVM/SPIR-V it's already a plain byte copy
<karolherbst> and not much we can really do about it
<karolherbst> and I don't think that before IO lowering is the place where we could actually resolve that, because we'd have to know the actual offsets the load/stores go to
<karolherbst> in order to propagate it
<karolherbst> the downside of doing this after io lowering is, that we already allocated scratch space
<karolherbst> I honestly don't know what would be the best path forward here
<karolherbst> maybe we can convince LLVM to not do this nonsense? but then we can also get spir-v doing it anyway
alyssa has quit [Quit: alyssa]
alyssa has joined #dri-devel
<alyssa> ok.. I think we can split the memcpy, at least in the simple case I'm looking at
<alyssa> but the original case didn't have a memcpy there, just stores with deref_ptr_as_array..
<alyssa> oh, but there's legitimately a cast happening in that one
Mangix has joined #dri-devel
<alyssa> even though it's a cast between.. morally equivalent things
<karolherbst> I wonder if the better strategy is to simply convert everything to byte arrays as a intermediate step before io lowering.. :D
<karolherbst> but that's going to be messy
gouchi has quit [Quit: Quitte]
<alyssa> i mean.. trying to unlower scratch back to SSA sounds like you're in for a bad time
<karolherbst> yup
<karolherbst> I think all solutions here are messy in one or the other way
<alyssa> i think the memcpy is the root brokenness
<karolherbst> sure
<alyssa> This is the nonsense that we get with everything up until lowering memcpys, triggered in my kernel by passing a struct around:
<karolherbst> yeah...
<karolherbst> there isn't really anything you can do on a deref level
<alyssa> Why not?
<alyssa> It feels like we "should" be able to split that memcpy_deref into a memcpy_deref for each struct element
<karolherbst> sure, but we don't copy the struct, we copy pointers to bytes
<alyssa> so..?
<alyssa> we have derefs, we can see thru the casts
<karolherbst> fair enough
<alyssa> really annoying that LLVM makes us jump through these hoops for such a trivial example though
<karolherbst> yep
<karolherbst> I do wonder though: if we get rid of the casts, does that get optimized away?
<karolherbst> after memcpy lowering I mean
<alyssa> it's not valid to get rid of them straight up
<alyssa> that's a memcpy between a struct and a u8
<karolherbst> mhh.. fair enough
<karolherbst> is it like that straight away or is that constant array after some opts?
<karolherbst> like in the initial nir it should still be all structs or not?
<alyssa> it's like that in the initial nir because llvm suuuucks
<karolherbst> pain
<alyssa> look on the bright side, I can pass pointers to structs as function arguments, that's cool
<karolherbst> :D
<karolherbst> yeah...
<alyssa> wait..
<alyssa> oh come on!
<alyssa> i switched to passing the struct by value instead, you're still not happy nir?
<alyssa> oh.. yeah, it's choking on the struct initializer which llvm is helpfully turning into a memcpy from raw bytes
<karolherbst> yes, llvm best compiler, very helpful
<alyssa> the good news is that I can deal with the struct initializer nonsense..
<alyssa> oh. strictly no i can't on this struct because there's padding :~)
<alyssa> Actually I have no idea if that's legal C or not
<alyssa> casting between a struct ptr and a u8* and putting stuff into the padding bytes and expecting that to work
RAOF has quit [Remote host closed the connection]
<karolherbst> well.. why not?
<karolherbst> you might read/write random garbage, but besides that?
<alyssa> padding rules are implementation defined so..
RAOF has joined #dri-devel
<karolherbst> they shouldn't be
<karolherbst> the C struct layout is kinda very strictly defined
<karolherbst> but maybe it's technically implementation defined, but I think everybody kinda follows the same rules on each platform? dunno
<karolherbst> the fun part is, that the CL CTS tests this stuff
<alyssa> it's ludicrous to me that llvm is going so far as casting to u8*.
<karolherbst> well...
<karolherbst> I'm sure they have their good reasons
<karolherbst> but yeah...
pcercuei has quit [Quit: dodo]
kts has quit [Quit: Konversation terminated!]
<alyssa> intel raytracing on arm64 when
alyssa has quit [Quit: alyssa]
vliaskov has quit []
jewins has quit [Ping timeout: 480 seconds]
Company has quit [Quit: Leaving]
crabbedhaloablut has quit []