<DemiMarie>
Is it possible to identify who is to blame for a GPU hang? In native contexts it would be useful to be able to determine which VM is at fault and ban it from using the GPU until the user says otherwise.
junaid has joined #dri-devel
ity has quit [Remote host closed the connection]
ity has joined #dri-devel
fab has joined #dri-devel
itoral has joined #dri-devel
junaid has quit [Remote host closed the connection]
macromorgan has quit [Read error: Connection reset by peer]
macromorgan has joined #dri-devel
lemonzest has quit [Quit: WeeChat 4.2.1]
lemonzest has joined #dri-devel
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
dorcaslitunyaVM has joined #dri-devel
kts has joined #dri-devel
fab has quit [Quit: fab]
itoral_ has joined #dri-devel
itoral has quit [Ping timeout: 480 seconds]
illwieckz has quit [Quit: I'll be back!]
kts has quit [Ping timeout: 480 seconds]
loki_val has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
crabbedhaloablut has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
kts has quit [Read error: Connection reset by peer]
sima has joined #dri-devel
illwieckz has joined #dri-devel
kts has joined #dri-devel
rgallaispou has joined #dri-devel
tomba_ has joined #dri-devel
fab has joined #dri-devel
tzimmermann has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
glennk has joined #dri-devel
mvlad has joined #dri-devel
tursulin has joined #dri-devel
crabbedhaloablut has joined #dri-devel
frieder has joined #dri-devel
loki_val has quit [Ping timeout: 480 seconds]
kts has quit [Ping timeout: 480 seconds]
pjakobsson has joined #dri-devel
sukrutb has joined #dri-devel
sukrutb has quit [Remote host closed the connection]
sukrutb has joined #dri-devel
sukrutb has quit [Ping timeout: 480 seconds]
hansg has joined #dri-devel
lynxeye has joined #dri-devel
Haaninjo has joined #dri-devel
simondnnsn has quit [Read error: Connection reset by peer]
<alyssa>
zmike: wasn't planning on it but I can be if you want
<zmike>
I figured since it was so contentious that at least someone would show up, but maybe I didn't skim hard enough to see through the argumentation
krumelmonster has quit [Ping timeout: 480 seconds]
<alyssa>
I just don't think we should be breaking piles of apps
krumelmonster has joined #dri-devel
thaytan has joined #dri-devel
yrlf has quit [Quit: Ping timeout (120 seconds)]
yrlf has joined #dri-devel
heat has joined #dri-devel
tertl8 has quit [Quit: Connection closed for inactivity]
edt_ has quit []
Calandracas has joined #dri-devel
fab has quit [Quit: fab]
Calandracas has quit [Remote host closed the connection]
<mareko>
is the mesa-commit list discontinued?
<mareko>
zmike: I think I can file cts tickets
<mareko>
just haven't doen it
<daniels>
mareko: yep
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
bolson has joined #dri-devel
<mareko>
daniels: somebody asked me about it, I didn't even know that it existed :)
kzd has joined #dri-devel
<DemiMarie>
karolherbst: which GPUs?
<karolherbst>
unknown
<DemiMarie>
Any guesses?
<karolherbst>
nope
* DemiMarie
wishes GPUs had full preemption
<karolherbst>
that's not the problem :)
<karolherbst>
even if they had, how would the kernel know what crashed the GPU if the GPU gets suddenly into a weird state and returns nonsense?
<karolherbst>
or the firmware returning nonsense
<karolherbst>
or not responding
tobiasjakobi has joined #dri-devel
<DemiMarie>
The GPU should never be able to be put in such a state unless the kernel driver is buggy.
<karolherbst>
well..
<karolherbst>
it's all software in the end
<karolherbst>
and software has bugs
<karolherbst>
GPU not getting into a weird state is like saying "computers shouldn't be able to get into a weird state"
tobiasjakobi has quit []
<DemiMarie>
karolherbst: suppose we ignore KMD and FW bugs for now
<karolherbst>
still
<karolherbst>
you ask for bugfree computers
<DemiMarie>
No
<karolherbst>
a GPU in itself can be considered a full computer.. maybe not a personal one, but definetly on the embedded level
<DemiMarie>
Or my question is bad
<DemiMarie>
So on the CPU, if there is a fault (such as accessing junk memory), the hardware gets control and tells the kernel enough information for the kernel to know what the fault was and what process did it.
<karolherbst>
we all have to accept that an embedded system can and will crash in a way that you can only power cycle it to recover
<karolherbst>
you can't compare GPUs to CPUs
Calandracas has joined #dri-devel
<DemiMarie>
Why?
<karolherbst>
because GPUs are way more complex
<DemiMarie>
Why?
<karolherbst>
because GPUs are more like embedded devices
<DemiMarie>
Wh
<DemiMarie>
y?
<karolherbst>
I got tired :) have fun
<sima>
gpu is essentially a distributed network, there's a pile of things that send messages around, and eventually they reach a network node that talks to the memory subsystem
<sima>
that's the point you get the hw fault
<DemiMarie>
Am I asking questions that only the HW vendor can answer?
<sima>
and the kernel pretty much has to be able to preempt, or things will go sideways due to priority/locking inversion issues
<sima>
so unlike a cpu, where you can preempt a single node, for a gpu you have to preempt that entire distributed network
<sima>
including all the in-flight messages
<sima>
DemiMarie, ^^ and safe/restore of what essentially is a cluster is just too hard
<mattst88>
I think since GPUs are expected to be programmed by drivers, their designers are able to not ensure that they're impossible to wedge (unlike a CPU, where that would be basically unforgivable)
<DemiMarie>
sima: I thought CPUs are also distributed internally. Do they just put more work into hiding it?
<sima>
DemiMarie, they're a lot less distributed, and they have enormous amounts of magic to hide their distributed nature from the application
<sima>
because the ISA isn't distributed
<sima>
so you can reuse that for preempting the entire thing
<sima>
with gpu you actually send around these messages explicitly in shaders
<sima>
(well most of them at least)
<DemiMarie>
mattst88: hopefully it will be less forgivable over time, given the rising use of GPUs in secure situations.
rz_ has quit [Ping timeout: 480 seconds]
rz has joined #dri-devel
<DemiMarie>
sima: Ah, so that is why one needs memory barriers in so many situations where a CPU would never need them.
<sima>
yeah that's one aspect
<DemiMarie>
How does preemption work for compute then?
<sima>
but the overall one that makes preempt/page fault so hard really is that it's a network of nodes having a big chat, and memory i/o is just one of them - kinda like you have storage servers in a cluster
<cmarcelo>
samuelig: thanks. just this ACK here is good for me
<sima>
DemiMarie, badly :-)
<sima>
no idea how it works on others, but on intel, where it does work essentially preempt sends an interrupt to all the compute cores
<sima>
and they run a special shader which essentially stores the entire thread state into memory somewhere
<sima>
but that means this special preempt shader and your compute kernel need to cooperate
<sima>
and preemption is kinda voluntary
<sima>
and it only works for nodes which support it, and because things get messy for 3d fixed function they just dont
<DemiMarie>
sima: why do they need to cooperate, beyond the need for a memory buffer to save the state?
<sima>
the hw cannot actually save/restore itself, it's kinda like the kernel asking userspace to please store all it's state, so that it can nuke the process
<sima>
and then on restart it asks userspace again to recreate everything
<sima>
it's a giantic mess afaiui
<DemiMarie>
Could the KMD provide the preempt shader?
<sima>
afaiui it's tied to how you compile the compute shader
<DemiMarie>
Or just say, โyou have X amount of time before I blow you away?โ
<sima>
like if you yolo too much in-flight stuff you can't restore that on the other side
<sima>
yeah it's a timeout
<sima>
but the timeout is huge because register state is like a few mb
<DemiMarie>
What is the timeout?
<sima>
more than a frame iirc
<DemiMarie>
Ugh
<sima>
so forget anything remotely realtime
<DemiMarie>
Hopefully future GPUs will support full preemption of everything.
<sima>
it's getting less
<DemiMarie>
Thatโs good
<sima>
like both intel and amd are switching to the model where any pending page fault prevents preemption
<DemiMarie>
What do you mean?
<sima>
because the thing that hits the page fault is a few network hops away from the compute cores that can save/restore
<sima>
so you cannot preempt while any fault is pending
<DemiMarie>
So that just means no page faults allowed.
<DemiMarie>
Pin all memory.
<sima>
nah it just means kmd engineers are crying a lot
<DemiMarie>
Why?
<DemiMarie>
Because pre-pinned memory management is terrible?
<sima>
yeah
<sima>
so you get to pick essentially between "you get page faults, no pinning yay" and "you get preempt, no gpu hogging, yay"
<sima>
but not at the same time
kts has joined #dri-devel
<sima>
which is a full on inversion of the linux kernel virtual memory handling
<DemiMarie>
For Qubes OS we will pick the latter every time
<DemiMarie>
So make guests pin all their buffers.
<sima>
so you don't necessarily need to pin everything, you /just/ need to guarantee there's enough memory around to fix up any faults while you try to preempt
<DemiMarie>
Lots of room for bugs ๐.
<DemiMarie>
In the Qubes case, the memory will be granted by a Xen guest, so it is already pinned.
<sima>
yeah it's one of these "great for tech demo, so much pain to ship" things
<DemiMarie>
sima: a sysctl or module opt to just force pinning would be nice
<sima>
plan for linux is that you get cgroups to make sure there's not memory thrashing that hurts
<DemiMarie>
That way those who do not need page faults can avoid the attack surface.
<sima>
but we had that for like decade plus as a plan by now :-/
bmodem has quit [Ping timeout: 480 seconds]
<DemiMarie>
sima: why is pre-pinned memory management so bad?
<sima>
people love memory overcommit
<sima>
with pinning everything you need enough memory for everything for its worst case
<DemiMarie>
Make that userspaceโs job
<DemiMarie>
It can pin and unpin buffers at will
<sima>
which is a lot more than just enough for the average case and let the kernel balance usage
<sima>
but in the end, if you want real-time then "pin it all" really is the only option
<sima>
irrespective of gpu or not
<DemiMarie>
Hence why PipeWire has an option for mlockall()
fab has joined #dri-devel
<DemiMarie>
sima: is the kernel command-line option to disable overcommit a reasonable idea?
<DemiMarie>
Or could it be emulated in the native context code?
<DemiMarie>
This also explains Appleโs decision to make AR fully declarative: it means they can guarantee real-time behavior, because there are no app-provided shaders in the code that updates what the user sees in response to the real-world changing.
hansg has quit [Quit: Leaving]
<DemiMarie>
sima: thanks for taking some time to explain all of this!!! It means a lot to me.
Leopold____ has quit [Remote host closed the connection]
Leopold_ has joined #dri-devel
davispuh has joined #dri-devel
macromorgan has quit [Read error: Connection reset by peer]
macromorgan has joined #dri-devel
macromorgan has quit []
macromorgan has joined #dri-devel
anujp has joined #dri-devel
Leopold_ has quit [Remote host closed the connection]
<DemiMarie>
So for my use-case, I care less about ensuring that the GPU doesnโt crash (so long as the crash is non-exploitable) as I do about ensuring that there is someone to blame.
<DemiMarie>
I want the userspace VMM to be able to distinguish between `GUILTY_CONTEXT` and `INNOCENT_CONTEXT`>
simondnnsn has quit [Read error: Connection reset by peer]
<alyssa>
karolherbst: dj-death: forgot about the "designated initializers cause spilling because nir_opt_memcpy chokes" issue
<alyssa>
gfxstrand: karolherbst and I talked about it back in October or so, and then I think we forgot, or at least I did
<karolherbst>
same
sukrutb has quit [Ping timeout: 480 seconds]
<karolherbst>
but did faiths MR helped with that? I think there was more to do, like... something with opt_memcpy or something
<alyssa>
I don't think it did but I might not have testeed
<karolherbst>
it also kinda depends when you run it
<karolherbst>
ohh right.. there was `copy_deref` doing something similar
<karolherbst>
and memcpy lowering could translate memcpy_deref to copy_deref or something
<karolherbst>
in any case...
<jenatali>
Right, you have to lower vars to explicit types, and then opt_memcpy should be able it to turn it into a copy
<karolherbst>
if the copy between derefs is huge, you can end up with tons of live values
<karolherbst>
mhh yeah.. I might have to check again as I did reorder things again
<jenatali>
(But then you want to erase scratch size and lower vars to explicit types again to recompute how much scratch is actually needed after that optimization...
<jenatali>
)
<karolherbst>
but yeah.. we need to lower those copies to loops, not unroll them directly
<alyssa>
iirc there were a bunch of related memcpy issues I hit
<DemiMarie>
karolherbst: sorry for tiring you with the repeated โwhyโ.
<karolherbst>
don't worry, I was also literally tired anyway and had to finish other things
jkrzyszt has joined #dri-devel
dv_ has quit [Read error: Connection reset by peer]
tzimmermann has quit [Quit: Leaving]
dv_ has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
Dark-Show has joined #dri-devel
<DemiMarie>
It now makes more sense why graphics is not preemptable: It must be so fast (to avoid user complaints) that it is faster to simply wipe out the state and force applications to recompute it.
<DemiMarie>
Reset can be done by a broadcast signal in hardware that forces everything to a known state, irrespective of what state it had been in. This is much simpler and cheaper (both in time and in transistors) than trying to save that state for later restoration.
<zmike>
mareko pepp: it looks like radeonsi doesn't do any kind of checking with sample counts for e.g., rgb9e5? so si_is_format_supported will return true for sample_count==8
<zmike>
is this somehow intended? I'm skeptical that you really support this
<mareko>
zmike: it's always supported by shader images, it's supported by render targets only if the format is supported
<zmike>
you support multisampled rgb9e5 in shader images?
<mareko>
yes
<zmike>
huh
<zmike>
and textures?
<mareko>
texelFetch only
<zmike>
hm
<mareko>
it's the same for all MSAA
<mareko>
it's just 32bpp memory layout with a 32bpp format, then the texture and render hw just needs to handle the format conversion to/from FP32 and FP16
<sima>
DemiMarie, afaik that's also what the big compute cluster people do, with checkpoints to permanent storage thrown in often enough that you don't have to throw away too much computation time when something, anyhting really, goes wrong
<sima>
so also reset and recover from known-good application state, including anything gpus do
<DemiMarie>
sima: that indeed makes sense. IIUC it is expected to have to change at some point, because failures will be too frequent, but Iโm not sure if that point has been reached yet.
<DemiMarie>
sima: interestingly AGX can do resets so quickly that one can reset every frame and still have a usable desktop.
<DemiMarie>
Thatโs what Asahi did before Lina figured out TLB flushing.
<sima>
yeah reset tends to be fairly quick, the slow part is waiting long enough to not piss of users too much about the fallout
jeeeun841351908 has quit []
<DemiMarie>
Waiting for what?
<DemiMarie>
Also, is there any information available as to what caused the reset?
<DemiMarie>
I want to throw up a dialog to the user saying, โVM X crashed the GPU.โ
<sima>
arb_robustness tells you why you died (i.e. guilty or innocent collateral damage)
<sima>
but it's very driver specific, and you need to be the userspace that created the gpu ctx
<sima>
plus I think aside from amdgpu and intel no one even implements that, you just get a "you died"
<DemiMarie>
Does Vulkan have something similar, and is this information something that the native context implementation could collect?
<sima>
if that
<DemiMarie>
sima: thankfully AMDGPU and Intel are the ones I care about the most by far
jeeeun841351908 has joined #dri-devel
<DemiMarie>
sima: is this a hardware or driver limitation?
<karolherbst>
it was to fix some llvm-17 issue though
tertl8 has joined #dri-devel
heat is now known as Guest2800
Guest2800 has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
<DemiMarie>
Calandracas: statically link LLVM in all of its callers and ensure its symbols are never exported?
<Calandracas>
that isn't really ideal. What we ended up doing is splitings all of the shared libs into their own packages for each version (libclang15 package provides libclang.so.15, libclang17 package provides libclang.so.17) etc.
<karolherbst>
Calandracas: have you tried to see if it also works if all of them get loaded into the same process?
<karolherbst>
because apparently that doesn't work in all distributions
<karolherbst>
and it's actually a cursed use case
<karolherbst>
but a real one
<Calandracas>
The ugly part is needing to have a full toolchain for each llvm version. What alpine does is install each version in /usr/lib/llvm$VERSION/
<karolherbst>
oh yeah, that as well
<Calandracas>
Fedora installs the latest release to /usr but "compat" packages go to /usr/lib/llvm$VERSION
<karolherbst>
and making sure that things like "CLANG_RESOURCE_DIR" stay consistent with the installation directory relative to the so file
<Calandracas>
it then turns into a mess of symlinking things from /usr/lib/llvm$VERSION to /usr
<karolherbst>
though I only got reports of fedora messing that up and only for some packages? kinda works if you compile mesa from git
<karolherbst>
yeah....
<tnt>
karolherbst: and then there are some applications doing stuff with RTLD_DEEPBIND that don't help either.
<karolherbst>
the "realpath" part was needed for debian having weird symlinks :D
<Calandracas>
for context, I'm talking about void linux, which just merged llvm17 last week
<airlied>
in theory if you build all the llvm version with proper sym versioning it should work, in practice it's all screwed
<karolherbst>
yeah...
<karolherbst>
though at least on fedora I had three CL impls using a different llvm version (15/16/17) and it was fine....
<karolherbst>
which surprised me tbh
<Calandracas>
well its fine when llvm is only used a library, and applications can link whatever soname they want
<Calandracas>
but having conflicting versions of llvm-config, cmake files, etc. is what causes issues
<karolherbst>
yeah.... that's also painful
<Calandracas>
+ musl, armv6l, and armv7l patches
<Calandracas>
but thats a different issue altogether
lynxeye has quit [Quit: Leaving.]
<dj-death>
airlied: yes
<dj-death>
karolherbst: I guess I could try to reproduce the order in which you run NIR passes
<dj-death>
karolherbst: hopefully that works
<dj-death>
I have doubts
<karolherbst>
the idea behind my changes was that I need to run nir_lower_memcpy after explicit_types and some deref opt loop
tursulin has quit [Ping timeout: 480 seconds]
<karolherbst>
and the deref opt loop does more opts if explicit type information exists
<karolherbst>
s/loop//
Haaninjo has joined #dri-devel
krushia has quit [Quit: Konversation terminated!]
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
<dj-death>
karolherbst: I kind of end up with the same code if I pass the arguments by value instead of by pointer :)
<dj-death>
stuff spills everywhere
Duke`` has quit [Ping timeout: 480 seconds]
simon-perretta-img has quit [Read error: Connection reset by peer]
simon-perretta-img has joined #dri-devel
<DemiMarie>
airlied: considering that proprietary games might not use symbol versioning I am not surprised that there are problems there.
<jenatali>
It's not really about symbol versioning, and if it was, it'd be about LLVM using symbol versioning. The problem is the lack of symbol namespacing
<karolherbst>
dj-death: yeah.. that's LLVM being LLVM
<jenatali>
You'd want component A in a process to use LLVM A, and component B to use LLVM B, but since the symbols are global, you can get the components using the wrong version
<karolherbst>
it's kinda sad that LLVM IR stopped being useful for a middle-end IR :'(
<DemiMarie>
jenatali: I thought symbol versioning solved that by including the versions (which are different) in the symbol that the dynamic linker looked up.
<karolherbst>
now it's just a backend one
<jenatali>
Demi: "Solved"
<jenatali>
Still requires LLVM to put versions on their symbols
<DemiMarie>
and they donโt?
<jenatali>
Not AFAIK
<DemiMarie>
could this be forced during compilation?
<DemiMarie>
but yeah, until that is dealt with static linking seems like the safest option
<jenatali>
๐คทโโ๏ธ My knowledge in this space is really just based on how things are different from Windows
<dj-death>
karolherbst: so I suppose if I try to compile the same code with rusticl, I'll end up with spills every where
<DemiMarie>
and any distro that doesnโt like it can deal with the bug reports
<vsyrjala>
llvm even leaks its compile time options into the abi. i once tried to turn off the nvptx (or whatever) thing in my system llvm. everything linked against llvm stoped working because of some missing global symbol
<karolherbst>
dj-death: maybe? Wouldn't surprise me, but it also kinda depends how the codegen goes
<karolherbst>
anyway.. do you have a dump of the code being compiled?
<karolherbst>
like.. the thing passed into llvm
<karolherbst>
In hinsight I should have added `dump_clc` to `CLC_DEBUG`, but uhh.. that's kinda not useful in the CL context
<karolherbst>
this "32 %26 = mov %21" makes it all optimized away in the llvm-16 case
<karolherbst>
but I don't know yet what I think would be a good solution to this issue
<karolherbst>
this casting to the struct base instead of creating a deref of the first field is really a nonsense thing llvm is doing ๐I don't understand why
glennk has quit [Ping timeout: 480 seconds]
<karolherbst>
I think we can detect this pattern with explicit type information
<jenatali>
Seems like we could detect that in nir_opt_deref?
<karolherbst>
and just... workaround it in a dirty way
<karolherbst>
yeah
<karolherbst>
if you cast a struct to the type of it's first member... just do a deref_struct on the first field or something
<jenatali>
Right
<jenatali>
As long as the cast doesn't add alignment/stride info
<karolherbst>
yeah...
<karolherbst>
but that sounds like the most pragmatic solution here
<jenatali>
It should probably be recursive too...
<karolherbst>
I wonder what would be the _in a perfect world with perfect code_ solution tho
<jenatali>
karolherbst: No, that's not the right check
<karolherbst>
my change? I know.. I just forced to make it work
<jenatali>
If explicit stride is 0 you'd want to compute an implicit stride
<karolherbst>
I see...
<karolherbst>
in any case, the struct fields have no explicit_stride and therefore we don't do this opt
<dj-death>
karolherbst: yeah that works here too
<karolherbst>
requires explicit_types before opt_deref tho :)
<jenatali>
Right, a uint in a struct has an implicit stride of 4, but it gets cast to a pointer with an explicit stride of... 4
<karolherbst>
but yeah
<dj-death>
yeah I can work with that
<dj-death>
especially for temp variables
<karolherbst>
so yeah.. I think my changes were motivated by this.. but it didn't fix the issue because I forgot to deal with the that opt not kicking in
<karolherbst>
yeah... I just took the opportunity and reworked my pipeline so that I only have to call explicit types exactly once for each var type, so I won't have to reset the scratch_size at all :)
<jenatali>
karolherbst: I don't see how you can do that
<karolherbst>
well.. I did
<karolherbst>
why shouldn't it be possible?
<jenatali>
If you need explicit types before opt_deref/opt_memcpy, but then after those optimizations temp variables disappear
<jenatali>
If you don't reset scratch size, you'll overallocate, since you still reserve space for variables that got deleted
<karolherbst>
so here is the thing
<karolherbst>
I call run those opts once with and once without explicit types
<jenatali>
But when you lower to explicit types it sets the scratch size
TMM has joined #dri-devel
<karolherbst>
yeah
<karolherbst>
but I do that kinda late
<jenatali>
And those opts won't actually result in variables getting deleted until after lowering to explicit types
<jenatali>
So you still overallocate. If you reset the scratch size and rerun the explicit types pass, you'll get a (much) smaller value
Haaninjo has quit [Quit: Ex-Chat]
<karolherbst>
mhh? I don't think I actually overallocate scratch size, because it is able to figure out to dce some/most of the vars? but maybe I actually did miss something here
<jenatali>
Which is a design flaw in using that pass to set scratch size, FWIW
<karolherbst>
but I didn't encounter anything strange
<jenatali>
What do you do based on the scratch size set in the shader info?
<karolherbst>
nothing?
<jenatali>
I ended up failing to compile shaders because the validator that we run downstream on the DXIL sees that we request an alloca, but then never use it, because all of the temp variables go away between setting the scratch size and emitting code
<karolherbst>
mhhh
<karolherbst>
maybe we should add a nir validation for this?
<karolherbst>
scratch size set, but not scratch ops found?
<karolherbst>
*no
<dj-death>
yeah
<jenatali>
It'd need to be explicitly run. Scratch size is set after lower_to_explicit_types, but you need lower_explicit_io afterwards to actually create the scratch ops
<karolherbst>
I mean.. or a deref on temp memory
<dj-death>
would need to be moved to io?
<jenatali>
Then vars_to_ssa or copy_prop would start failing validation
<karolherbst>
we could check for scratch ops or derefs on temporaries
<karolherbst>
shouldn't be too hard
<jenatali>
Nothing decrements the scratch size once it's set
<jenatali>
Because they just leave holes instead. The scratch offsets are baked into the variables after lowering to explicit types
<karolherbst>
then those passes nuking all those ops, should set scratch size to 0
<karolherbst>
that's kinda the point here in validating it, no?
<jenatali>
But they don't necessarily nuke everything
<jenatali>
And scratch that looks like { empty space, var } is still bad
<karolherbst>
fair
<karolherbst>
or we just add a "nir_shrink_memory" pass
<karolherbst>
and use it on shared as well
<jenatali>
Aka "set scratch size to 0 and re-run lower_to_explicit_types" :)
<karolherbst>
mhhh
<karolherbst>
sounds like pain :D
<jenatali>
And yeah, same problem with shared, but I don't know if as much stuff can remove shared variables
<karolherbst>
_but_...
<karolherbst>
maybe I should validate on it in debug builds
<karolherbst>
do explicit types again and see if it changes the sizes
<jenatali>
Hint: It will ;)
<karolherbst>
we'll see about that ๐ though I think what I am doing now gets me pretty close at least
<jenatali>
Look at the link you pasted above
<jenatali>
scratch: 20
<jenatali>
I don't see any load_scratch or store_scratch
<karolherbst>
yo.. pain
<jenatali>
Yeah
<karolherbst>
*sigh*... guess I'll just add it back then
qyliss has quit [Quit: bye]
<karolherbst>
so anyway...
<dj-death>
karolherbst: still a bit confused by your current fix
<karolherbst>
fixing opt_replace_struct_wrapper_cast comes first
<karolherbst>
don't think too much about it
<dj-death>
karolherbst: why uint has no explicit_stride ?
<karolherbst>
I just hacked it
<jenatali>
dj-death: Explicit stride is only a thing on arrays and matrices in nir
<jenatali>
And deref_cast I guess
<jenatali>
Since those can be treated as arrays using deref_ptr_as_array
<dj-death>
oh
<dj-death>
so you have to trust the deref here
<karolherbst>
maybe we should just set explicit_stride on struct members? dunno
<dj-death>
yeah
qyliss has joined #dri-devel
<karolherbst>
though that gets funky with packed structs
<dj-death>
I mean they have a location
<jenatali>
I'd want to hear from gfxstrand for this particular issue
<karolherbst>
can't we just sneak a fix past her this time? ๐
mvlad has quit [Remote host closed the connection]
<dj-death>
karolherbst: not fixing all the issues though :)
<jenatali>
Yeah maybe not even having to qualify "that covers the entire struct," the only thing you need to make sure is that you're not trying to go the other way and cast to the first member of an inner struct