ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
pcercuei has quit [Quit: dodo]
iive has quit [Quit: They came for me...]
Haaninjo has quit [Quit: Ex-Chat]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
co1umbarius has joined #dri-devel
columbarius has quit [Read error: Connection reset by peer]
Waterr has joined #dri-devel
Waterr has quit [Killed (MoranServ (Possible spambot -- mail with questions.))]
YuGiOhJCJ has joined #dri-devel
apinheiro has quit [Quit: Leaving]
pixelcluster_ has joined #dri-devel
pixelcluster has quit [Ping timeout: 480 seconds]
tertl8 has joined #dri-devel
Company has quit [Quit: Leaving]
Jeremy_Rand_Talos has quit [Remote host closed the connection]
Jeremy_Rand_Talos has joined #dri-devel
edt_ has joined #dri-devel
kts has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
davispuh has quit [Ping timeout: 480 seconds]
Mangix has quit [Quit: - Chat comfortably. Anywhere.]
kts has joined #dri-devel
Mangix has joined #dri-devel
heat is now known as Guest2714
Guest2714 has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
bmodem has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
kts has quit [Ping timeout: 480 seconds]
aravind has quit [Ping timeout: 480 seconds]
TMM has quit [Quit: - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
anujp has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
<DemiMarie> Is it possible to identify who is to blame for a GPU hang? In native contexts it would be useful to be able to determine which VM is at fault and ban it from using the GPU until the user says otherwise.
junaid has joined #dri-devel
ity has quit [Remote host closed the connection]
ity has joined #dri-devel
fab has joined #dri-devel
itoral has joined #dri-devel
junaid has quit [Remote host closed the connection]
macromorgan has quit [Read error: Connection reset by peer]
macromorgan has joined #dri-devel
lemonzest has quit [Quit: WeeChat 4.2.1]
lemonzest has joined #dri-devel
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
dorcaslitunyaVM has joined #dri-devel
kts has joined #dri-devel
fab has quit [Quit: fab]
itoral_ has joined #dri-devel
itoral has quit [Ping timeout: 480 seconds]
illwieckz has quit [Quit: I'll be back!]
kts has quit [Ping timeout: 480 seconds]
loki_val has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
crabbedhaloablut has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
kts has quit [Read error: Connection reset by peer]
sima has joined #dri-devel
illwieckz has joined #dri-devel
kts has joined #dri-devel
rgallaispou has joined #dri-devel
tomba_ has joined #dri-devel
fab has joined #dri-devel
tzimmermann has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
glennk has joined #dri-devel
mvlad has joined #dri-devel
tursulin has joined #dri-devel
crabbedhaloablut has joined #dri-devel
frieder has joined #dri-devel
loki_val has quit [Ping timeout: 480 seconds]
kts has quit [Ping timeout: 480 seconds]
pjakobsson has joined #dri-devel
sukrutb has joined #dri-devel
sukrutb has quit [Remote host closed the connection]
sukrutb has joined #dri-devel
sukrutb has quit [Ping timeout: 480 seconds]
hansg has joined #dri-devel
lynxeye has joined #dri-devel
Haaninjo has joined #dri-devel
simondnnsn has quit [Read error: Connection reset by peer]
simondnnsn has joined #dri-devel
vliaskov has joined #dri-devel
bolson_ has quit [Ping timeout: 480 seconds]
<tzimmermann> jani, thanks for reviewing my fbdev heeader cleanup. may i ask you to give an ack to the additional patch in v2:
<jani> tzimmermann: ack; already looked at it but got distracted and forgot to reply
<tzimmermann> thanks
itoral_ has quit [Ping timeout: 480 seconds]
itoral_ has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
rasterman has joined #dri-devel
pcercuei has joined #dri-devel
CME has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Quit: Ex-Chat]
cmichael has joined #dri-devel
glennk has quit [Ping timeout: 480 seconds]
glennk has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
simondnnsn has quit [Read error: Connection reset by peer]
simondnnsn has joined #dri-devel
xq has joined #dri-devel
<samuelig> cmarcelo, go ahead
shoragan has quit [Read error: Network is unreachable]
CME has joined #dri-devel
shoragan has joined #dri-devel
<samuelig> cmarcelo, would you like me or somebody else to review them?
shoragan has quit [Read error: Network is unreachable]
shoragan has joined #dri-devel
CME_ has joined #dri-devel
CME has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
DodoGTA has quit [Remote host closed the connection]
Duke`` has joined #dri-devel
bmodem has quit [Ping timeout: 480 seconds]
pH5 has quit [Read error: Network is unreachable]
pH5 has joined #dri-devel
pH5 has quit [Read error: Connection reset by peer]
pH5 has joined #dri-devel
DodoGTA has joined #dri-devel
<karolherbst> DemiMarie: it might be potentially be possible for some drivers and some faults
<karolherbst> mhh maybe "some drivers" is too optimistic, let's say "some GPUs"
kts_ has joined #dri-devel
dviola has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
kts_ has quit [Ping timeout: 480 seconds]
fireburn has joined #dri-devel
psykose has joined #dri-devel
bmodem has joined #dri-devel
itoral_ has quit [Remote host closed the connection]
pixelcluster_ has quit []
pixelcluster has joined #dri-devel
Calandracas has quit [Remote host closed the connection]
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
Leopold___ has joined #dri-devel
dorcaslitunyaVM has quit [Remote host closed the connection]
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
Leopold____ has joined #dri-devel
Leopold_ has quit [Ping timeout: 480 seconds]
Leopold___ has quit [Ping timeout: 480 seconds]
thaytan has quit [Ping timeout: 480 seconds]
sknebel has quit [Read error: Connection reset by peer]
simondnnsn has quit [Read error: Connection reset by peer]
Company has joined #dri-devel
sknebel has joined #dri-devel
simondnnsn has joined #dri-devel
<zmike> alyssa kusma: I assume at least one of you will be on call today for ?
<alyssa> zmike: wasn't planning on it but I can be if you want
<zmike> I figured since it was so contentious that at least someone would show up, but maybe I didn't skim hard enough to see through the argumentation
krumelmonster has quit [Ping timeout: 480 seconds]
<alyssa> I just don't think we should be breaking piles of apps
krumelmonster has joined #dri-devel
thaytan has joined #dri-devel
yrlf has quit [Quit: Ping timeout (120 seconds)]
yrlf has joined #dri-devel
heat has joined #dri-devel
tertl8 has quit [Quit: Connection closed for inactivity]
edt_ has quit []
Calandracas has joined #dri-devel
fab has quit [Quit: fab]
Calandracas has quit [Remote host closed the connection]
<mareko> is the mesa-commit list discontinued?
<mareko> zmike: I think I can file cts tickets
<mareko> just haven't doen it
<daniels> mareko: yep
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
bolson has joined #dri-devel
<mareko> daniels: somebody asked me about it, I didn't even know that it existed :)
kzd has joined #dri-devel
<DemiMarie> karolherbst: which GPUs?
<karolherbst> unknown
<DemiMarie> Any guesses?
<karolherbst> nope
* DemiMarie wishes GPUs had full preemption
<karolherbst> that's not the problem :)
<karolherbst> even if they had, how would the kernel know what crashed the GPU if the GPU gets suddenly into a weird state and returns nonsense?
<karolherbst> or the firmware returning nonsense
<karolherbst> or not responding
tobiasjakobi has joined #dri-devel
<DemiMarie> The GPU should never be able to be put in such a state unless the kernel driver is buggy.
<karolherbst> well..
<karolherbst> it's all software in the end
<karolherbst> and software has bugs
<karolherbst> GPU not getting into a weird state is like saying "computers shouldn't be able to get into a weird state"
tobiasjakobi has quit []
<DemiMarie> karolherbst: suppose we ignore KMD and FW bugs for now
<karolherbst> still
<karolherbst> you ask for bugfree computers
<DemiMarie> No
<karolherbst> a GPU in itself can be considered a full computer.. maybe not a personal one, but definetly on the embedded level
<DemiMarie> Or my question is bad
<DemiMarie> So on the CPU, if there is a fault (such as accessing junk memory), the hardware gets control and tells the kernel enough information for the kernel to know what the fault was and what process did it.
<karolherbst> we all have to accept that an embedded system can and will crash in a way that you can only power cycle it to recover
<karolherbst> you can't compare GPUs to CPUs
Calandracas has joined #dri-devel
<DemiMarie> Why?
<karolherbst> because GPUs are way more complex
<DemiMarie> Why?
<karolherbst> because GPUs are more like embedded devices
<DemiMarie> Wh
<DemiMarie> y?
<karolherbst> I got tired :) have fun
<sima> gpu is essentially a distributed network, there's a pile of things that send messages around, and eventually they reach a network node that talks to the memory subsystem
<sima> that's the point you get the hw fault
<DemiMarie> Am I asking questions that only the HW vendor can answer?
<sima> and the kernel pretty much has to be able to preempt, or things will go sideways due to priority/locking inversion issues
<sima> so unlike a cpu, where you can preempt a single node, for a gpu you have to preempt that entire distributed network
<sima> including all the in-flight messages
<sima> DemiMarie, ^^ and safe/restore of what essentially is a cluster is just too hard
<mattst88> I think since GPUs are expected to be programmed by drivers, their designers are able to not ensure that they're impossible to wedge (unlike a CPU, where that would be basically unforgivable)
<DemiMarie> sima: I thought CPUs are also distributed internally. Do they just put more work into hiding it?
<sima> DemiMarie, they're a lot less distributed, and they have enormous amounts of magic to hide their distributed nature from the application
<sima> because the ISA isn't distributed
<sima> so you can reuse that for preempting the entire thing
<sima> with gpu you actually send around these messages explicitly in shaders
<sima> (well most of them at least)
<DemiMarie> mattst88: hopefully it will be less forgivable over time, given the rising use of GPUs in secure situations.
rz_ has quit [Ping timeout: 480 seconds]
rz has joined #dri-devel
<DemiMarie> sima: Ah, so that is why one needs memory barriers in so many situations where a CPU would never need them.
<sima> yeah that's one aspect
<DemiMarie> How does preemption work for compute then?
<sima> but the overall one that makes preempt/page fault so hard really is that it's a network of nodes having a big chat, and memory i/o is just one of them - kinda like you have storage servers in a cluster
<cmarcelo> samuelig: thanks. just this ACK here is good for me
<sima> DemiMarie, badly :-)
<sima> no idea how it works on others, but on intel, where it does work essentially preempt sends an interrupt to all the compute cores
<sima> and they run a special shader which essentially stores the entire thread state into memory somewhere
<sima> but that means this special preempt shader and your compute kernel need to cooperate
<sima> and preemption is kinda voluntary
<sima> and it only works for nodes which support it, and because things get messy for 3d fixed function they just dont
<DemiMarie> sima: why do they need to cooperate, beyond the need for a memory buffer to save the state?
<sima> the hw cannot actually save/restore itself, it's kinda like the kernel asking userspace to please store all it's state, so that it can nuke the process
<sima> and then on restart it asks userspace again to recreate everything
<sima> it's a giantic mess afaiui
<DemiMarie> Could the KMD provide the preempt shader?
<sima> afaiui it's tied to how you compile the compute shader
<DemiMarie> Or just say, “you have X amount of time before I blow you away?”
<sima> like if you yolo too much in-flight stuff you can't restore that on the other side
<sima> yeah it's a timeout
<sima> but the timeout is huge because register state is like a few mb
<DemiMarie> What is the timeout?
<sima> more than a frame iirc
<DemiMarie> Ugh
<sima> so forget anything remotely realtime
<DemiMarie> Hopefully future GPUs will support full preemption of everything.
<sima> it's getting less
<DemiMarie> That’s good
<sima> like both intel and amd are switching to the model where any pending page fault prevents preemption
<DemiMarie> What do you mean?
<sima> because the thing that hits the page fault is a few network hops away from the compute cores that can save/restore
<sima> so you cannot preempt while any fault is pending
<DemiMarie> So that just means no page faults allowed.
<DemiMarie> Pin all memory.
<sima> nah it just means kmd engineers are crying a lot
<DemiMarie> Why?
<DemiMarie> Because pre-pinned memory management is terrible?
<sima> yeah
<sima> so you get to pick essentially between "you get page faults, no pinning yay" and "you get preempt, no gpu hogging, yay"
<sima> but not at the same time
kts has joined #dri-devel
<sima> which is a full on inversion of the linux kernel virtual memory handling
<DemiMarie> For Qubes OS we will pick the latter every time
<DemiMarie> So make guests pin all their buffers.
<sima> so you don't necessarily need to pin everything, you /just/ need to guarantee there's enough memory around to fix up any faults while you try to preempt
<DemiMarie> Lots of room for bugs 🙂.
<DemiMarie> In the Qubes case, the memory will be granted by a Xen guest, so it is already pinned.
<sima> yeah it's one of these "great for tech demo, so much pain to ship" things
<DemiMarie> sima: a sysctl or module opt to just force pinning would be nice
<sima> plan for linux is that you get cgroups to make sure there's not memory thrashing that hurts
<DemiMarie> That way those who do not need page faults can avoid the attack surface.
<sima> but we had that for like decade plus as a plan by now :-/
bmodem has quit [Ping timeout: 480 seconds]
<DemiMarie> sima: why is pre-pinned memory management so bad?
<sima> people love memory overcommit
<sima> with pinning everything you need enough memory for everything for its worst case
<DemiMarie> Make that userspace’s job
<DemiMarie> It can pin and unpin buffers at will
<sima> which is a lot more than just enough for the average case and let the kernel balance usage
<sima> but in the end, if you want real-time then "pin it all" really is the only option
<sima> irrespective of gpu or not
<DemiMarie> Hence why PipeWire has an option for mlockall()
fab has joined #dri-devel
<DemiMarie> sima: is the kernel command-line option to disable overcommit a reasonable idea?
<DemiMarie> Or could it be emulated in the native context code?
<DemiMarie> This also explains Apple’s decision to make AR fully declarative: it means they can guarantee real-time behavior, because there are no app-provided shaders in the code that updates what the user sees in response to the real-world changing.
hansg has quit [Quit: Leaving]
<DemiMarie> sima: thanks for taking some time to explain all of this!!! It means a lot to me.
Leopold____ has quit [Remote host closed the connection]
Leopold_ has joined #dri-devel
davispuh has joined #dri-devel
macromorgan has quit [Read error: Connection reset by peer]
macromorgan has joined #dri-devel
macromorgan has quit []
macromorgan has joined #dri-devel
anujp has joined #dri-devel
Leopold_ has quit [Remote host closed the connection]
davispuh has quit [Quit: - Chat comfortably. Anywhere.]
Leopold has joined #dri-devel
larunbe has joined #dri-devel
davispuh has joined #dri-devel
alarumbe has quit [Ping timeout: 480 seconds]
sukrutb has joined #dri-devel
Aura has joined #dri-devel
Leopold has quit [Remote host closed the connection]
Dark-Show has quit [Quit: Leaving]
Leopold has joined #dri-devel
cmichael has quit [Quit: Leaving]
kts has quit [Ping timeout: 480 seconds]
<DemiMarie> So for my use-case, I care less about ensuring that the GPU doesn’t crash (so long as the crash is non-exploitable) as I do about ensuring that there is someone to blame.
<DemiMarie> I want the userspace VMM to be able to distinguish between `GUILTY_CONTEXT` and `INNOCENT_CONTEXT`>
simondnnsn has quit [Read error: Connection reset by peer]
<alyssa> karolherbst: dj-death: forgot about the "designated initializers cause spilling because nir_opt_memcpy chokes" issue
<alyssa> gfxstrand: karolherbst and I talked about it back in October or so, and then I think we forgot, or at least I did
<karolherbst> same
sukrutb has quit [Ping timeout: 480 seconds]
<karolherbst> but did faiths MR helped with that? I think there was more to do, like... something with opt_memcpy or something
<alyssa> I don't think it did but I might not have testeed
<karolherbst> it also kinda depends when you run it
<karolherbst> ohh right.. there was `copy_deref` doing something similar
<karolherbst> and memcpy lowering could translate memcpy_deref to copy_deref or something
<karolherbst> in any case...
<jenatali> Right, you have to lower vars to explicit types, and then opt_memcpy should be able it to turn it into a copy
<karolherbst> if the copy between derefs is huge, you can end up with tons of live values
<karolherbst> mhh yeah.. I might have to check again as I did reorder things again
<jenatali> (But then you want to erase scratch size and lower vars to explicit types again to recompute how much scratch is actually needed after that optimization...
<jenatali> )
<karolherbst> but yeah.. we need to lower those copies to loops, not unroll them directly
<alyssa> iirc there were a bunch of related memcpy issues I hit
<DemiMarie> karolherbst: sorry for tiring you with the repeated “why”.
<karolherbst> don't worry, I was also literally tired anyway and had to finish other things
jkrzyszt has joined #dri-devel
dv_ has quit [Read error: Connection reset by peer]
tzimmermann has quit [Quit: Leaving]
dv_ has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
Dark-Show has joined #dri-devel
<DemiMarie> It now makes more sense why graphics is not preemptable: It must be so fast (to avoid user complaints) that it is faster to simply wipe out the state and force applications to recompute it.
<DemiMarie> Reset can be done by a broadcast signal in hardware that forces everything to a known state, irrespective of what state it had been in. This is much simpler and cheaper (both in time and in transistors) than trying to save that state for later restoration.
<zmike> mareko pepp: it looks like radeonsi doesn't do any kind of checking with sample counts for e.g., rgb9e5? so si_is_format_supported will return true for sample_count==8
<zmike> is this somehow intended? I'm skeptical that you really support this
<mareko> zmike: it's always supported by shader images, it's supported by render targets only if the format is supported
<zmike> you support multisampled rgb9e5 in shader images?
<mareko> yes
<zmike> huh
<zmike> and textures?
<mareko> texelFetch only
<zmike> hm
<mareko> it's the same for all MSAA
<mareko> it's just 32bpp memory layout with a 32bpp format, then the texture and render hw just needs to handle the format conversion to/from FP32 and FP16
<sima> DemiMarie, afaik that's also what the big compute cluster people do, with checkpoints to permanent storage thrown in often enough that you don't have to throw away too much computation time when something, anyhting really, goes wrong
<sima> so also reset and recover from known-good application state, including anything gpus do
<DemiMarie> sima: that indeed makes sense. IIUC it is expected to have to change at some point, because failures will be too frequent, but I’m not sure if that point has been reached yet.
<DemiMarie> sima: interestingly AGX can do resets so quickly that one can reset every frame and still have a usable desktop.
<DemiMarie> That’s what Asahi did before Lina figured out TLB flushing.
<sima> yeah reset tends to be fairly quick, the slow part is waiting long enough to not piss of users too much about the fallout
jeeeun841351908 has quit []
<DemiMarie> Waiting for what?
<DemiMarie> Also, is there any information available as to what caused the reset?
<DemiMarie> I want to throw up a dialog to the user saying, “VM X crashed the GPU.”
<sima> arb_robustness tells you why you died (i.e. guilty or innocent collateral damage)
<sima> but it's very driver specific, and you need to be the userspace that created the gpu ctx
<sima> plus I think aside from amdgpu and intel no one even implements that, you just get a "you died"
<DemiMarie> Does Vulkan have something similar, and is this information something that the native context implementation could collect?
<sima> if that
<DemiMarie> sima: thankfully AMDGPU and Intel are the ones I care about the most by far
jeeeun841351908 has joined #dri-devel
<DemiMarie> sima: is this a hardware or driver limitation?
<sima> only thing I've found with a bit of googling, but that doesn't give you info (at least at a glance) about issues caused by someone elese
<sima> DemiMarie, driver limitation generally, it's a lot of tricky corner cases
<sima> also if you're asking about why there's innocent ctx getting nuked, that's usually a mix of hw and driver limitations
<dj-death> jenatali: have you been doing much with LLVM 17?
<jenatali> dj-death: Nothing at all
<dj-death> jenatali: just hitting some interesting casts from the translation
<dj-death> jenatali: whereas the LLVM 16 was doing more deref_struct
<jenatali> At this point my goal is to stay on LLVM 15 as long as possible and let you all shake out all the problems with newer LLVM versions :)
<dj-death> jenatali: that's preventing a bunch of optimizations
<dj-death> ahaha :)
<dj-death> nice
<dj-death> I guess we could put an upper bound on the LLVM version
<jenatali> One of the nice benefits of being on Windows and being forced to statically link LLVM is I get to control when we update
<jenatali> And it takes hours to build so I'm not inclined to do it often, especially if it risks issues like this
<dj-death> yeah
<dj-death> I love linux distros
<Calandracas> packaging llvm is pain
<alyssa> jenatali: :clown:
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
<Calandracas> espacially when some things like zig==15, while chromium>=17 and rust>=17
<Calandracas> I wish there was a cannonical way to support parrallel installations
sukrutb has joined #dri-devel
ptrc has quit [Remote host closed the connection]
ptrc has joined #dri-devel
<karolherbst> I mean.. it kinda works, you are just (sometimes) screwed if multiple versions end up in the same program
konstantin_ has joined #dri-devel
konstantin is now known as Guest2795
konstantin_ is now known as konstantin
<zmike> jenatali: good news, I think I'm accidentally fixing all those xfails I added a couple weeks ago
<jenatali> \o/
<jenatali> I'll be happy to review once you've got something for me to look at
<zmike> I tried adding better samplecount checking to zink and ended up failing the same tests as you
Guest2795 has quit [Ping timeout: 480 seconds]
<jenatali> Hah, awesome
<karolherbst> opaque pointers were a mistake, don't @ me
fab has quit [Quit: fab]
<HdkR> void* is as opaque as we need to be
frieder has quit [Remote host closed the connection]
tomba_ has quit [Ping timeout: 480 seconds]
<airlied> dj-death, karolherbst : I assume the changes in derefs is just opaque ptrs getting us different patterns?
<karolherbst> yes
<karolherbst> more or less
<karolherbst> and LLVM doing unholy optimizations due to that
<karolherbst> I really hope the SPIR-V backend doesn't give us the same issues...
<karolherbst> well.. with llvm-19 that is
<airlied> well it's just different patterns, apps could give us them we just have to keep up
<karolherbst> nah, in this case it's something LLVM does
<karolherbst> for apps that would be super unholy to do as well
<karolherbst> like.. if your first member of a struct is an int, nobody does this: "(int *)some_struct" instead of just "&some_struct->first_field"
<karolherbst> but that's what LLVM-17 is now giving us
<karolherbst> so yeah. in _theory_ apps could, but none actually would
<airlied> I admire your confidence
<karolherbst> I know I'm wrong, but I still want to believe
<karolherbst> anyway.. I think I fixed it with rusticl and maybe the same fix helps intel
<alyssa> i feel like i've done that in user code
<karolherbst> but not 100%
<karolherbst> it was to fix some llvm-17 issue though
tertl8 has joined #dri-devel
heat is now known as Guest2800
Guest2800 has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
<DemiMarie> Calandracas: statically link LLVM in all of its callers and ensure its symbols are never exported?
<Calandracas> that isn't really ideal. What we ended up doing is splitings all of the shared libs into their own packages for each version (libclang15 package provides, libclang17 package provides etc.
<karolherbst> Calandracas: have you tried to see if it also works if all of them get loaded into the same process?
<karolherbst> because apparently that doesn't work in all distributions
<karolherbst> and it's actually a cursed use case
<karolherbst> but a real one
<Calandracas> The ugly part is needing to have a full toolchain for each llvm version. What alpine does is install each version in /usr/lib/llvm$VERSION/
<karolherbst> oh yeah, that as well
<Calandracas> Fedora installs the latest release to /usr but "compat" packages go to /usr/lib/llvm$VERSION
<karolherbst> and making sure that things like "CLANG_RESOURCE_DIR" stay consistent with the installation directory relative to the so file
<Calandracas> it then turns into a mess of symlinking things from /usr/lib/llvm$VERSION to /usr
<karolherbst> though I only got reports of fedora messing that up and only for some packages? kinda works if you compile mesa from git
<karolherbst> yeah....
<tnt> karolherbst: and then there are some applications doing stuff with RTLD_DEEPBIND that don't help either.
vliaskov has quit []
<karolherbst> Calandracas: in mesa we need to know where the resource path of clang is.. I think we finally have something that works with that MR:
<karolherbst> the "realpath" part was needed for debian having weird symlinks :D
<Calandracas> for context, I'm talking about void linux, which just merged llvm17 last week
<airlied> in theory if you build all the llvm version with proper sym versioning it should work, in practice it's all screwed
<karolherbst> yeah...
<karolherbst> though at least on fedora I had three CL impls using a different llvm version (15/16/17) and it was fine....
<karolherbst> which surprised me tbh
<Calandracas> well its fine when llvm is only used a library, and applications can link whatever soname they want
<Calandracas> but having conflicting versions of llvm-config, cmake files, etc. is what causes issues
<karolherbst> yeah.... that's also painful
<Calandracas> + musl, armv6l, and armv7l patches
<Calandracas> but thats a different issue altogether
lynxeye has quit [Quit: Leaving.]
<dj-death> airlied: yes
<dj-death> karolherbst: I guess I could try to reproduce the order in which you run NIR passes
<dj-death> karolherbst: hopefully that works
<dj-death> I have doubts
<karolherbst> the idea behind my changes was that I need to run nir_lower_memcpy after explicit_types and some deref opt loop
tursulin has quit [Ping timeout: 480 seconds]
<karolherbst> and the deref opt loop does more opts if explicit type information exists
<karolherbst> s/loop//
Haaninjo has joined #dri-devel
krushia has quit [Quit: Konversation terminated!]
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
<dj-death> karolherbst: I kind of end up with the same code if I pass the arguments by value instead of by pointer :)
<dj-death> stuff spills everywhere
Duke`` has quit [Ping timeout: 480 seconds]
simon-perretta-img has quit [Read error: Connection reset by peer]
simon-perretta-img has joined #dri-devel
<DemiMarie> airlied: considering that proprietary games might not use symbol versioning I am not surprised that there are problems there.
<jenatali> It's not really about symbol versioning, and if it was, it'd be about LLVM using symbol versioning. The problem is the lack of symbol namespacing
<karolherbst> dj-death: yeah.. that's LLVM being LLVM
<jenatali> You'd want component A in a process to use LLVM A, and component B to use LLVM B, but since the symbols are global, you can get the components using the wrong version
<karolherbst> it's kinda sad that LLVM IR stopped being useful for a middle-end IR :'(
<DemiMarie> jenatali: I thought symbol versioning solved that by including the versions (which are different) in the symbol that the dynamic linker looked up.
<karolherbst> now it's just a backend one
<jenatali> Demi: "Solved"
<jenatali> Still requires LLVM to put versions on their symbols
<DemiMarie> and they don’t?
<jenatali> Not AFAIK
<DemiMarie> could this be forced during compilation?
<DemiMarie> but yeah, until that is dealt with static linking seems like the safest option
<jenatali> 🤷‍♂️ My knowledge in this space is really just based on how things are different from Windows
<dj-death> karolherbst: so I suppose if I try to compile the same code with rusticl, I'll end up with spills every where
<DemiMarie> and any distro that doesn’t like it can deal with the bug reports
<vsyrjala> llvm even leaks its compile time options into the abi. i once tried to turn off the nvptx (or whatever) thing in my system llvm. everything linked against llvm stoped working because of some missing global symbol
<karolherbst> dj-death: maybe? Wouldn't surprise me, but it also kinda depends how the codegen goes
<karolherbst> anyway.. do you have a dump of the code being compiled?
<karolherbst> like.. the thing passed into llvm
<karolherbst> In hinsight I should have added `dump_clc` to `CLC_DEBUG`, but uhh.. that's kinda not useful in the CL context
<dj-death> karolherbst: yeah :
<karolherbst> yeah..
<karolherbst> "__attribute__((unused)) const struct GFX125_3DSTATE_VERTEX_BUFFERS * restrict values" that's your generic pointer
<karolherbst> if you pass in private memory, specify it as "private const struct ...*"
<dj-death> I know
<karolherbst> though that just gets rid of the generic modes in the cast
<karolherbst> not the cast itself
<karolherbst> but should make it easier to fix the compiler pass order
<dj-death> yeah
<dj-death> will see tomorrow, thanks
<karolherbst> mhh.. we have this CL runner in piglit.. let's see if I can make it do something
<dj-death> yeah private didn't help indeed :)
<karolherbst> yo.. how do I make this code run.. this will be header file hell :D
<karolherbst> mhh it didn't crash at least
<karolherbst> ehh wait.. that's still on my box with llvm-16 I think? mhh
<karolherbst> finishing compiling llvm-17 and testing it there shortly
paulk has quit [Ping timeout: 480 seconds]
<karolherbst> but turning it into a kernel can also lead to a lot of things happening or not happening...
paulk has joined #dri-devel
<airlied> jenatali: pretty sure there is an llvm option to turn on symbol versions across the api
<jenatali> Oh cool
jkrzyszt has quit [Ping timeout: 480 seconds]
jsa has joined #dri-devel
sima has quit [Ping timeout: 480 seconds]
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
<dj-death> karolherbst: any luck with llvm-17?
<karolherbst> recompiling mesa atm
<dj-death> I can install both versions on debian
<karolherbst> dj-death: mhh.. with llvm-17 I indeed run into scratch being used...
<dj-death> it's not too bad
<karolherbst> but at least I have a fairly trivial reproducer now
<dj-death> karolherbst: ah so it's more complicated than just ordering :(
<karolherbst> I'll try to take a look and see if I can fix it for rusticl, because that's a real issue regardless...
<karolherbst> I wonder....
<dj-death> yeah, well let me know :)
<karolherbst> let me diff it...
<karolherbst> yeah....
<karolherbst> it's `nir_opt_deref` not kicking in
<karolherbst> well..
<karolherbst> somewhat
<karolherbst> mhhh.. interesting
<dj-death> there is an additional
<dj-death> 64 %160 = deref_cast (struct.GFX125_3DSTATE_VERTEX_BUFFERS *)%6 (constant struct.GFX125_3DSTATE_VERTEX_BUFFERS) (ptr_stride=20, align_mul=0, align_offset=0)
<karolherbst> yeah...
<karolherbst> so the thing is, that vars_to_ssa is smart and matches the struct field access back to the original source
<karolherbst> but that doesn't happen with the deref_struct thing missing
<karolherbst> this "32 %26 = mov %21" makes it all optimized away in the llvm-16 case
<karolherbst> but I don't know yet what I think would be a good solution to this issue
<karolherbst> this casting to the struct base instead of creating a deref of the first field is really a nonsense thing llvm is doing 🙃I don't understand why
glennk has quit [Ping timeout: 480 seconds]
<karolherbst> I think we can detect this pattern with explicit type information
<jenatali> Seems like we could detect that in nir_opt_deref?
<karolherbst> and just... workaround it in a dirty way
<karolherbst> yeah
<karolherbst> if you cast a struct to the type of it's first member... just do a deref_struct on the first field or something
<jenatali> Right
<jenatali> As long as the cast doesn't add alignment/stride info
<karolherbst> yeah...
<karolherbst> but that sounds like the most pragmatic solution here
<jenatali> It should probably be recursive too...
<karolherbst> I wonder what would be the _in a perfect world with perfect code_ solution tho
<karolherbst> mhhhh
<karolherbst> let's see...
<jenatali> struct outer { struct middle { struct inner { int a; }; }; };
<karolherbst> should be easy to try
<jenatali> Should be able to cast outer* to int*
<karolherbst> but yeah...
<karolherbst> we can follow inner structs until we hit a non struct/array thing
<karolherbst> I suspect LLVM might do the same on arrays 🙃 why stop at structs?
<jenatali> Right, array element 0
<jenatali> Seems like a worthy optimization even if it was an app that was doing it explicitly TBH
<karolherbst> that's the beauty of LLVM: everything is possible
<dj-death> I'm checking why this doesn't get removed by opt_replace_struct_wrapper_cast :
<dj-death> 64 %20 = deref_cast (struct.GFX125_3DSTATE_VERTEX_BUFFERS *)%12 (function_temp struct.GFX125_3DSTATE_VERTEX_BUFFERS) (ptr_stride=0, align_mul=0, align_offset=0)
<karolherbst> the stride probably?
<dj-death> nope :
<dj-death> glsl_get_struct_field_offset(parent->type, 0) != 0
<dj-death> WTF
<karolherbst> ahh yeah
<karolherbst> do you have explicit types at that point?
<karolherbst> ohh also...
<karolherbst> OHHHH
<karolherbst> I remember
<karolherbst> I was debugging that function
<karolherbst> but I forgot on why it didn't work
<karolherbst> lemme debug this :D
heat has quit [Remote host closed the connection]
<karolherbst> dj-death: okay.. so
<karolherbst> in rusticl I pass this check
<karolherbst> but it fails later
<karolherbst> `if (cast->cast.ptr_stride != glsl_get_explicit_stride(field_type))`
<karolherbst> ptr_stride of the cast is 4
<karolherbst> field_type has a stride of 0 :')
jsa has quit []
<jenatali> karolherbst: No, that's not the right check
<karolherbst> my change? I know.. I just forced to make it work
<jenatali> If explicit stride is 0 you'd want to compute an implicit stride
<karolherbst> I see...
<karolherbst> in any case, the struct fields have no explicit_stride and therefore we don't do this opt
<dj-death> karolherbst: yeah that works here too
<karolherbst> requires explicit_types before opt_deref tho :)
<jenatali> Right, a uint in a struct has an implicit stride of 4, but it gets cast to a pointer with an explicit stride of... 4
<karolherbst> but yeah
<dj-death> yeah I can work with that
<dj-death> especially for temp variables
<karolherbst> so yeah.. I think my changes were motivated by this.. but it didn't fix the issue because I forgot to deal with the that opt not kicking in
<karolherbst> yeah... I just took the opportunity and reworked my pipeline so that I only have to call explicit types exactly once for each var type, so I won't have to reset the scratch_size at all :)
<jenatali> karolherbst: I don't see how you can do that
<karolherbst> well.. I did
<karolherbst> why shouldn't it be possible?
<jenatali> If you need explicit types before opt_deref/opt_memcpy, but then after those optimizations temp variables disappear
<jenatali> If you don't reset scratch size, you'll overallocate, since you still reserve space for variables that got deleted
<karolherbst> so here is the thing
<karolherbst> I call run those opts once with and once without explicit types
<karolherbst> or multiple times even
<jenatali> Sure
<karolherbst> so in the end it all works out
TMM has quit [Quit: - Chat comfortably. Anywhere.]
<jenatali> But when you lower to explicit types it sets the scratch size
TMM has joined #dri-devel
<karolherbst> yeah
<karolherbst> but I do that kinda late
<jenatali> And those opts won't actually result in variables getting deleted until after lowering to explicit types
<jenatali> So you still overallocate. If you reset the scratch size and rerun the explicit types pass, you'll get a (much) smaller value
Haaninjo has quit [Quit: Ex-Chat]
<karolherbst> mhh? I don't think I actually overallocate scratch size, because it is able to figure out to dce some/most of the vars? but maybe I actually did miss something here
<jenatali> Which is a design flaw in using that pass to set scratch size, FWIW
<karolherbst> but I didn't encounter anything strange
<jenatali> What do you do based on the scratch size set in the shader info?
<karolherbst> nothing?
<jenatali> I ended up failing to compile shaders because the validator that we run downstream on the DXIL sees that we request an alloca, but then never use it, because all of the temp variables go away between setting the scratch size and emitting code
<karolherbst> mhhh
<karolherbst> maybe we should add a nir validation for this?
<karolherbst> scratch size set, but not scratch ops found?
<karolherbst> *no
<dj-death> yeah
<jenatali> It'd need to be explicitly run. Scratch size is set after lower_to_explicit_types, but you need lower_explicit_io afterwards to actually create the scratch ops
<karolherbst> I mean.. or a deref on temp memory
<dj-death> would need to be moved to io?
<jenatali> Then vars_to_ssa or copy_prop would start failing validation
<karolherbst> we could check for scratch ops or derefs on temporaries
<karolherbst> shouldn't be too hard
<jenatali> Nothing decrements the scratch size once it's set
<jenatali> Because they just leave holes instead. The scratch offsets are baked into the variables after lowering to explicit types
<karolherbst> then those passes nuking all those ops, should set scratch size to 0
<karolherbst> that's kinda the point here in validating it, no?
<jenatali> But they don't necessarily nuke everything
<jenatali> And scratch that looks like { empty space, var } is still bad
<karolherbst> fair
<karolherbst> or we just add a "nir_shrink_memory" pass
<karolherbst> and use it on shared as well
<jenatali> Aka "set scratch size to 0 and re-run lower_to_explicit_types" :)
<karolherbst> mhhh
<karolherbst> sounds like pain :D
<jenatali> And yeah, same problem with shared, but I don't know if as much stuff can remove shared variables
<karolherbst> _but_...
<karolherbst> maybe I should validate on it in debug builds
<karolherbst> do explicit types again and see if it changes the sizes
<jenatali> Hint: It will ;)
<karolherbst> we'll see about that 🙃 though I think what I am doing now gets me pretty close at least
<jenatali> Look at the link you pasted above
<jenatali> scratch: 20
<jenatali> I don't see any load_scratch or store_scratch
<karolherbst> yo.. pain
<jenatali> Yeah
<karolherbst> *sigh*... guess I'll just add it back then
qyliss has quit [Quit: bye]
<karolherbst> so anyway...
<dj-death> karolherbst: still a bit confused by your current fix
<karolherbst> fixing opt_replace_struct_wrapper_cast comes first
<karolherbst> don't think too much about it
<dj-death> karolherbst: why uint has no explicit_stride ?
<karolherbst> I just hacked it
<jenatali> dj-death: Explicit stride is only a thing on arrays and matrices in nir
<jenatali> And deref_cast I guess
<jenatali> Since those can be treated as arrays using deref_ptr_as_array
<dj-death> oh
<dj-death> so you have to trust the deref here
<karolherbst> maybe we should just set explicit_stride on struct members? dunno
<dj-death> yeah
qyliss has joined #dri-devel
<karolherbst> though that gets funky with packed structs
<dj-death> I mean they have a location
<jenatali> I'd want to hear from gfxstrand for this particular issue
<karolherbst> can't we just sneak a fix past her this time? 🙃
mvlad has quit [Remote host closed the connection]
<dj-death> karolherbst: not fixing all the issues though :)
<dj-death> struct delta64 { uint64_t v0; uint64_t v1;
<dj-death> } data = *((global struct delta64 *)&query[qw_offset]);
<dj-death> this local variable ends up to scratch
<jenatali> dj-death: How are you expecting that to not end up in scratch?
<jenatali> Range analysis on qw_offset to realize it can only be 0 or 1 and turn it into a bcsel or something?
<dj-death> jenatali:
<dj-death> jenatali: a simplified example also scratching
<jenatali> dj-death: Ohh I see
<jenatali> I misread it at first :)
<karolherbst> mhhh
<karolherbst> my hope is that with the spirv backend we can just turn on -O2 and llvm gives us less silly code :D
<airlied> I admire your confidence (v2)
<karolherbst> I'm sure it will work out in the end
<jenatali> As long as the end isn't like 10 years from now
<dj-death> the problem here is that it's casting the first field which is uint64_t
<dj-death> into a uvec4
<dj-death> interesting
<dj-death> I guess we could add more special casing
<dj-death> if you're casting a struct's first field to a type that covers the entire struct
<dj-death> you actually want the initial deref to the struct
<jenatali> Yeah
<dj-death> 64 %9 = deref_var &data (function_temp struct.delta64)
<dj-death> 64 %38 = deref_struct &%9->field0 (function_temp uint64_t) // &data.field0
<dj-death> 64 %39 = deref_cast (uvec4 *)%38 (function_temp uvec4) (ptr_stride=16, align_mul=0, align_offset=0)
<dj-death> can just replace %39 with %9
<jenatali> Weird...
<jenatali> Yeah maybe not even having to qualify "that covers the entire struct," the only thing you need to make sure is that you're not trying to go the other way and cast to the first member of an inner struct
heat has joined #dri-devel