<karolherbst>
daniels: that's exactly the point :)
<karolherbst>
I have _no_ idea what happaned there, but it's clearly something funky
<karolherbst>
just wanted to share it before I ignore it and move on :)
<karolherbst>
(in case we have more of such false positives people just ignore and move on)
<karolherbst>
*false negatives
junaid has joined #dri-devel
junaid has quit [Remote host closed the connection]
<karolherbst>
airlied: mhh.. seems like that WGP mode only causes problems on some tests when running vec16... maybe it's just some compiler bug somewhere in the end
rcf has quit [Quit: WeeChat 3.8]
Duke`` has quit [Ping timeout: 480 seconds]
gouchi has joined #dri-devel
gouchi has quit [Remote host closed the connection]
kts_ has joined #dri-devel
Duke`` has joined #dri-devel
kts_ has quit []
kts has joined #dri-devel
gouchi has joined #dri-devel
kts_ has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
nehsou^ has joined #dri-devel
kts_ has quit [Remote host closed the connection]
smiles_ has joined #dri-devel
smilessh has quit [Ping timeout: 480 seconds]
Danct12 has quit [Quit: How would you be so reckless with someone's heart!?]
JohnnyonFlame has quit [Read error: Connection reset by peer]
AndrewR has joined #dri-devel
<AndrewR>
karolherbst, Finally updated llvm so mesa git compiles fully again (but I think it demand newer bindgen than 0.60 ..0.65 from Slackware current works ..)
<karolherbst>
mhhh, yeah, might be plausible. If your rustc toolchain uses your system LLVM then I can see why it needs some updates
<AndrewR>
karolherbst, yeah, in Slackware rust inked dynamically to llvm-libs ..so updating it basically meain installing second ver. of llvm over so rustc will not die yet...
<AndrewR>
karolherbst, I plan to re-compile rust one of those days ...
<AndrewR>
..also, russian-speaking user surfaced on our big black website, so I discovered my patches for x265 on aarch64 were incorrect, and I updated them (as part of cinelerra-gg bundled libs) ...
gouchi has quit [Remote host closed the connection]
<AndrewR>
... I also was reading psychtoolbox-3 code just for comments. One of few applications making use of 30bpc mode on amd gpus ...
junaid has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
kts has joined #dri-devel
<karolherbst>
airlied, mareko: mhh, seems like forcing wave32 for compute fixes it... I honestly have no idea about all the implications here but arsenm suggested to always use 32 for compute
<bnieuwenhuizen>
on RDNA3 using wave32 has some important perf disadvantages though
<karolherbst>
for compute?
<bnieuwenhuizen>
should be yeah
<bnieuwenhuizen>
the dual issue fp32 gets way more difficult
<karolherbst>
I don't know if it's important, but luxmark-luxball gives me ~0.5% higher scores with wave64
<bnieuwenhuizen>
that isn't RDNA3 though is it?
<karolherbst>
uhhh.... right, that's RDNA2
<karolherbst>
I think
<karolherbst>
navi22
<bnieuwenhuizen>
yeah
<karolherbst>
anyway... my kernel gets nuked in CU mode :'(
<karolherbst>
no idea why
<karolherbst>
at least some
<karolherbst>
I have no idea what's up
<karolherbst>
but it seems to be working with wave32
<karolherbst>
we can potentially also just do things differently for SHADER_KERNEL, but I also kinda want to know why that happens
kzd has joined #dri-devel
jrayhawk has quit [Quit: lv management]
junaid has quit [Remote host closed the connection]
Company has joined #dri-devel
jrayhawk has joined #dri-devel
kts has quit [Remote host closed the connection]
krushia has quit [Quit: Konversation terminated!]
heat has joined #dri-devel
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
YuGiOhJCJ has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
kts has joined #dri-devel
fab has quit [Quit: fab]
fab has joined #dri-devel
fab has quit []
smiles_ has quit [Ping timeout: 480 seconds]
fab has joined #dri-devel
<mareko>
karolherbst: what does it mean "nuked"?
<karolherbst>
ehhh... GPU reset
<mareko>
what is the resource usage of the kernel?
<karolherbst>
I didn't check.. let me boot the machine up again
<mareko>
instead of having to use MESA_SHADER_KERNEL and getting broken because of that, it would be better to use MESA_SHADER_COMPUTE and have "is_kernel" in the shader info
<karolherbst>
mhhh.. maybe that would have been the better approach here
<karolherbst>
anyway, the block is 512x1x1
fab has joined #dri-devel
<mareko>
I think the problem is there is not enough VGPRs
<karolherbst>
and that matters with CU vs WGP mode?
<karolherbst>
the exact same thing works just fine in WGP mode
<karolherbst>
but it's plausible it's caused by launching too many threads.. I could check that
<mareko>
144 VGPRs means 3 waves per SIMD
<mareko>
the CU has 2 SIMDs, so 6 waves at most, but 512x1x1 is 8 waves
<karolherbst>
yep
<karolherbst>
works fine with 256 threads
<mareko>
the WGP has 4 SIMDs, so 12 waves at most, so it fits
fab_ has joined #dri-devel
<karolherbst>
I guess we'll have to fix si_get_compute_state_info then
<mareko>
I think LLVM miscompiled that shader
fab_ is now known as Guest2661
<karolherbst>
ahh
<karolherbst>
ohhh...
<karolherbst>
it should use more SGPRs instead?
<mareko>
it should have capped VGPR usage to 128 and spill the rest
<mareko>
so that it would fit on the CU
<karolherbst>
the header is passed to LLVM, right?
<karolherbst>
or is there maybe a flag radeonsi would have to set to tell LLVM to assume CU mode or something?
<karolherbst>
maybe we should move this to #radeon as arsenm is there
<mareko>
no
fab has quit [Ping timeout: 480 seconds]
Guest2661 is now known as fab
<mareko>
this is another case of MESA_SHADER_KERNEL not being handled
<karolherbst>
ahh
fab is now known as Guest2662
<karolherbst>
mhhh... annoying
kts has quit [Remote host closed the connection]
<karolherbst>
maybe we really should get rid of KERNEL and move it into shader_info
<karolherbst>
shader_info.cs.is_kernel
<mareko>
yes that could fix a lot of things
<karolherbst>
yeah.. let's do that
kts has joined #dri-devel
DottorLeo has joined #dri-devel
junaid has joined #dri-devel
<mareko>
if it spills and you don't want spilling, we can implement and enable the WGP mode properly
<karolherbst>
would it spill into SGPRs first?
<karolherbst>
not sure how much of a perf difference it would make
<mareko>
no
<mareko>
SGPRs are the scarcest resource
<mareko>
SGPRs are spilled to VGPRs, and VGPRs are spilled to memory
<karolherbst>
ahhh
<karolherbst>
guess I got it backwards then
<mareko>
VGPRs are 2048-bit vector registers in wave64, while SGPRs are always 32-bit
<HdkR>
Map SGPRs to NVIDIA's Uniform Registers, and VGPRS to NVIDIA's "registers", close enough approximation that it works out :P
<karolherbst>
yeah I know, I just thought scalar are registers per thread :D
<karolherbst>
uhhh.. anv already makes use of the KERNEL stage in very annoying ways
junaid has quit [Remote host closed the connection]
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
DottorLeo has quit [Quit: Konversation terminated!]
<DemiMarie>
Why do GPUs use ioctl() and not io_uring?
<karolherbst>
because it won't change anything really
<karolherbst>
though maybe it has some advantages inside pointless microbenchmarks...
<karolherbst>
but anyway, the real answer is: when mesa started there was just ioctl, and moving to io_uring doesn't really pay off.. probably
<psykose>
it's also usually painful to have to actually maintain 2 ways of doing something at once just for one to be 2% faster
<DemiMarie>
karolherbst: how can a GPU reset be anything but a kernel or firmware bug?
<psykose>
unless all old kernel support is dropped you can't remove the non-io-uring one
<karolherbst>
DemiMarie: ask AMD
<karolherbst>
on AMD it's either full reset or broken GPU
<karolherbst>
and if your shader hangs, for whatever reason,it's a full reset
<karolherbst>
but anyway
<karolherbst>
you can't validate GPU commands
<karolherbst>
sooo.. userspace has the power of causing GPU resets
krushia has joined #dri-devel
<DemiMarie>
1. Why can’t you validate GPU commands?
<DemiMarie>
2. Shouldn’t the GPU be robust against invalid commands?
<karolherbst>
halting problem
<mareko>
the GPU only has single-tasking for gfx and any gfx app can use an infinite loop, so the kernel has to stop it with a reset, it's a feature
<karolherbst>
I don't know if I'd call so far calling it a feature, but there is nothing else you can do, soo....
<DemiMarie>
Are Intel and Nvidia better when it comes to GPU error recovery?
<karolherbst>
yes
<mareko>
freeze or reset, your choice
<DemiMarie>
mareko: to me this is a hardware bug
<karolherbst>
it is
<karolherbst>
the hardware in incabable or recovery
<karolherbst>
*incapable
<mareko>
DemiMarie: halting problem is a hw bug?
<karolherbst>
but new gens are supposed to implement some sort of recovery
<karolherbst>
mareko: no, having to reset the entire GPU is
<karolherbst>
at least the AMD way
<mareko>
no we don't
<karolherbst>
VRAM content is lost, no?
<karolherbst>
or rather.. why does my display state gets fully reset on GPU resets 🙃
<karolherbst>
it doesn't have to be this way
<mareko>
there are levels of resets
<karolherbst>
other GPUs can just kill a GPU context and move on
<DemiMarie>
A properly-designed GPU can support two or more mutually distrusting users, such that one user cannot interfere with the other user’s use of the GPU.
<karolherbst>
yeah.. I agree
<mareko>
if a shader doesn't finish, all shaders of that process are killed
<krushia>
ever run into webgl malware (or poorly coded shaders on a web site)? it isn't fun
<karolherbst>
I generally see more stuff getting killed
<DemiMarie>
My understanding is that Intel, Nvidia, and Apple GPUs all meet this criterion (modulo implementation bugs).
<mareko>
karolherbst: that's the next level
<karolherbst>
yeah.. which I run into today with the wgp issue :)
<DemiMarie>
mareko: that is arguably a blocker for virtGPU native contexts.
<DemiMarie>
A VM must not be able to crash the host.
<karolherbst>
AMD is improving the situation
<mareko>
DemiMarie: it's also a blocker for virgl
<mareko>
DemiMarie: if you want to argue that way
<karolherbst>
but yeah.. it's kinda messy
<karolherbst>
well.. other vendors are better at it
<DemiMarie>
mareko: could virgl add explicit preemption checks during compilation?
<karolherbst>
mhh.. should be possible
<karolherbst>
but it's not only sahder code
<karolherbst>
like .. my bug today was the shader just using too many registers causing GPU resets...
<mareko>
I'll just say that these assumptions and questions make no sense with how things work, virgl can't do anything and virgl is also a security hole for the whole VM
<karolherbst>
there is a lot of things the kernel/host side would have to deal with
<mareko>
security and isolation is not something virgl does well
<karolherbst>
the point is rather, that a guest could constantly DOS the entire host
<karolherbst>
but yeah...
<karolherbst>
GPU isolation in generaly is practically impossible problem
<karolherbst>
*general
<karolherbst>
though nvidia does support partitioning the GPU on the hardware level
<mareko>
the native context does isolation properly, venus also does it properly, virgl doesn't
<karolherbst>
and you can assign isolated parts of the GPU to guests which do not interfer with each other
<DemiMarie>
mareko: are virtGPU native contexts better at this?
<mareko>
yes
<DemiMarie>
Intel also supports SR-IOV on recent iGPUs (Alder Lake, IIRC)
<karolherbst>
yeah.. SR-IOV is probably the only isolation mode which actually works
<DemiMarie>
karolherbst: why do virtGPU native contexts not work?
<mareko>
but the gfx queue can do execute 1 job, so if a job takes 2 seconds, it will block everybody for 2 seconds even if it doesn't trigger the reset
<Lynne>
I wish amd's gpus didn't fall over for something as simple as doing unchecked out of bounds access
<karolherbst>
DemiMarie: because you don't properly isolate, but it could be good enough depending on the hardware/driver
<DemiMarie>
In a way that does not lead to circular reasoning
sima has quit [Ping timeout: 480 seconds]
<karolherbst>
hw level isolation of tasks and resources
<karolherbst>
but stalls is something which is probably impossible to fix, but also not really problematic
<karolherbst>
well.. depends
<karolherbst>
don't want to do RT sensitive things on shared GPUs
heat_ has quit [Read error: Connection reset by peer]
heat_ has joined #dri-devel
<karolherbst>
but I've also seen nvidia not being able to recover properly 🙃 sometimes the GPU just disconnects itself from the PCIe bus
<mareko>
I love air conditioning
<karolherbst>
I wished I had some
<zmike>
air conditioning++
<mareko>
karolherbst: actually compute and RT has shader-instruction-level preemption on AMD hw, not sure if enabled though
<mareko>
outside of ROCm
<karolherbst>
probably not for graphics
<karolherbst>
but I was considering enabling it for some drivers
<karolherbst>
it's not nice if compute can stall the gfx pipelines
<mareko>
surprisingly compute shader-level preemption shouldn't need any userspace changes AFAIK, but you need to use a compute queue and whatever else ROCm is doing
<karolherbst>
yeah...
<karolherbst>
I have to figure this stuff out also for intel
<karolherbst>
i915 is reaping jobs taking more than 0.5seconds or something
<karolherbst>
and some workloads run into this already
<mareko>
the kernel implements suspending CU waves to memory
<karolherbst>
yeah.. makes sense
<karolherbst>
we really should add a flag on screen creation for this stuff
<mareko>
CUs have multitasking for running waves, but they can't release resources without suspending waves to memory
<mareko>
you can run 32 different shaders on a single CU at the same time if they all fit wrt VGPRs and shared mem
<karolherbst>
yeah.. I think it's similar to nvidia. And by default you can only context switched between shader invocations
<karolherbst>
and move them into VRAM
<karolherbst>
(the contexts that is)
benjaminl has quit [Ping timeout: 480 seconds]
benjaminl has joined #dri-devel
kts has quit [Remote host closed the connection]
kts has joined #dri-devel
<DemiMarie>
karolherbst: what do you mean by “hw level isolation of tasks and resources”?
<karolherbst>
well.. work on the GPU takes resources and you can split them up, be it either memory or execution units
<DemiMarie>
mareko: why not enable preemption for graphics on AMD GPUs?
<DemiMarie>
yes, one can
<karolherbst>
performance
<DemiMarie>
that’s what page tables and scheduling are for IIUC
<karolherbst>
nah
<karolherbst>
I meant proper isolation
<DemiMarie>
how big is the hit?
<DemiMarie>
define “proper”
<karolherbst>
splitting physical memory
<DemiMarie>
hmm?
<karolherbst>
and assigning execution units to partitions
<DemiMarie>
why is that stronger than what page tables can provide?
<karolherbst>
so you eliminate even the chance of reading back leftover state from other guests
<DemiMarie>
I thought that is just a matter of not having vulnerable drivers
<karolherbst>
because you don't even have to bother about tracking old use of physical memory
<karolherbst>
you get a private block of X physical memory assigned
<karolherbst>
that's the level of partitioning you can do with SR-IOV level virtualization
<DemiMarie>
Is that how it is implemented on e.g. Intel iGPUs?
<karolherbst>
I don't know
<karolherbst>
but nvidia hardware can do this
<karolherbst>
well.. recent one at least
<karolherbst>
or rather those which support SR-IOV
<DemiMarie>
which are unobtanium IIUC
<karolherbst>
I think ampere can do it already as well
<DemiMarie>
In my world, if it isn’t something the average user can afford, it doesn’t exist
<karolherbst>
but anyway, you just have to set up the device partitions somewhere and then you have leanly isolated partitons of the GPU
<karolherbst>
each with their own page tables and everything
<DemiMarie>
One question I always have about hardware partitioning is, “How much is actually partitioned, and how much of this is just firmware trickery?”.
<DemiMarie>
Because the on-device firmware is a significant attack surface.
<karolherbst>
on nvidia it's almost all in hardware
<karolherbst>
you still have the firmware for context switching which is shared, but the actual resources used by actual workloads is all split
<karolherbst>
(in hardware)
<karolherbst>
there are fields to assign SMs to partitions, can do it even finer grained afaik
<karolherbst>
same for VRAM
<DemiMarie>
Even on consumer GPUs?
<karolherbst>
yeah
<DemiMarie>
What about the RM API?
<karolherbst>
dunno if nvidia enables it though in their driver
<karolherbst>
well.. I think GSP still would run shared as you only have one GSP processor on the GPU still
<DemiMarie>
Yeah
<karolherbst>
but that's mostly doing hardware level things
<karolherbst>
really doens't matter much
<DemiMarie>
I thought much of the RM API was implemented in firmware.
<DemiMarie>
Stuff like memory allocation.
<karolherbst>
nah, that stuff exists outside afaik
<karolherbst>
that's performance sensitive stuff,
<karolherbst>
but allocating memory is also very boring
benjaminl has quit [Ping timeout: 480 seconds]
benjaminl has joined #dri-devel
<DavidHeidelberg[m]>
<DavidHeidelberg[m]>
mupuf amdgpu:codename:NAVI21 also down? So I can disable also VALVE farm?