ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
heat has joined #dri-devel
heat_ has quit [Read error: No route to host]
Haaninjo has quit [Quit: Ex-Chat]
co1umbarius has joined #dri-devel
rmckeever has quit [Quit: Leaving]
columbarius has quit [Ping timeout: 480 seconds]
BenjaminBreeg has quit [reticulum.oftc.net dacia.oftc.net]
Piraty has quit [reticulum.oftc.net dacia.oftc.net]
BenjaminBreeg has joined #dri-devel
Piraty has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
Emantor has quit [Quit: ZNC - http://znc.in]
Emantor has joined #dri-devel
larunbe has joined #dri-devel
alarumbe has quit [Read error: Connection reset by peer]
smilessh has joined #dri-devel
flto has quit [Ping timeout: 480 seconds]
flto has joined #dri-devel
larunbe has quit []
alarumbe has joined #dri-devel
benjamin1 has quit [Ping timeout: 480 seconds]
BenjaminBreeg has quit [Remote host closed the connection]
BenjaminBreeg has joined #dri-devel
<mareko> any idea why RADV would run out of memory with Mesa 23.x but not 22.2?
<mareko> I'm hearing UE5 Lyra runs out of memory with RADV from Mesa 23.x
heat_ has joined #dri-devel
heat has quit [Read error: Connection reset by peer]
YuGiOhJCJ has joined #dri-devel
heat_ has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
fab has joined #dri-devel
fab has quit []
diego has left #dri-devel [WeeChat 3.8]
dviola has joined #dri-devel
Company has quit [Quit: Leaving]
Duke`` has joined #dri-devel
junaid has joined #dri-devel
fab has joined #dri-devel
benjamin1 has joined #dri-devel
benjamin1 has quit [Ping timeout: 480 seconds]
kzd has quit [Quit: kzd]
aravind has quit []
<HdkR> 4
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
<daniels> karolherbst: maybe I’m just not really awake enough yet, but … there are no crashes in c15.r1.log … ?
rasterman has joined #dri-devel
sima has joined #dri-devel
ADS_Sr has quit [Ping timeout: 480 seconds]
<mupuf> DavidHeidelberg[m]: yeepee! You should be able to use a b2c release kernel, it has all you need (including built-in amdgpu firmware)
djbw_ has quit [Read error: Connection reset by peer]
junaid has quit [Ping timeout: 480 seconds]
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
Cyrinux9 has quit []
Cyrinux9 has joined #dri-devel
<karolherbst> daniels: that's exactly the point :)
<karolherbst> I have _no_ idea what happaned there, but it's clearly something funky
<karolherbst> just wanted to share it before I ignore it and move on :)
<karolherbst> (in case we have more of such false positives people just ignore and move on)
<karolherbst> *false negatives
junaid has joined #dri-devel
junaid has quit [Remote host closed the connection]
<karolherbst> airlied: mhh.. seems like that WGP mode only causes problems on some tests when running vec16... maybe it's just some compiler bug somewhere in the end
rcf has quit [Quit: WeeChat 3.8]
Duke`` has quit [Ping timeout: 480 seconds]
gouchi has joined #dri-devel
gouchi has quit [Remote host closed the connection]
kts_ has joined #dri-devel
Duke`` has joined #dri-devel
kts_ has quit []
kts has joined #dri-devel
gouchi has joined #dri-devel
kts_ has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
nehsou^ has joined #dri-devel
kts_ has quit [Remote host closed the connection]
smiles_ has joined #dri-devel
smilessh has quit [Ping timeout: 480 seconds]
Danct12 has quit [Quit: How would you be so reckless with someone's heart!?]
JohnnyonFlame has quit [Read error: Connection reset by peer]
AndrewR has joined #dri-devel
<AndrewR> karolherbst, Finally updated llvm so mesa git compiles fully again (but I think it demand newer bindgen than 0.60 ..0.65 from Slackware current works ..)
<karolherbst> mhhh, yeah, might be plausible. If your rustc toolchain uses your system LLVM then I can see why it needs some updates
<AndrewR> karolherbst, yeah, in Slackware rust inked dynamically to llvm-libs ..so updating it basically meain installing second ver. of llvm over so rustc will not die yet...
<AndrewR> karolherbst, I plan to re-compile rust one of those days ...
<AndrewR> ..also, russian-speaking user surfaced on our big black website, so I discovered my patches for x265 on aarch64 were incorrect, and I updated them (as part of cinelerra-gg bundled libs) ...
<AndrewR> https://www.linux.org.ru/gallery/workplaces/17252389 (not much to see but ... they exist! (asahi linux users) )
abhinav__ has quit [Quit: The Lounge - https://thelounge.chat]
jessica_24 has quit [Quit: The Lounge - https://thelounge.chat]
rohiiyer02 has quit []
naseer has quit []
lumag has quit [Quit: ZNC 1.8.1 - https://znc.in]
lumag has joined #dri-devel
abhinav__ has joined #dri-devel
rohiiyer02 has joined #dri-devel
naseer has joined #dri-devel
kts has joined #dri-devel
xroumegue has quit [Ping timeout: 480 seconds]
gouchi has quit [Remote host closed the connection]
<AndrewR> ... I also was reading psychtoolbox-3 code just for comments. One of few applications making use of 30bpc mode on amd gpus ...
junaid has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
kts has joined #dri-devel
<karolherbst> airlied, mareko: mhh, seems like forcing wave32 for compute fixes it... I honestly have no idea about all the implications here but arsenm suggested to always use 32 for compute
<bnieuwenhuizen> on RDNA3 using wave32 has some important perf disadvantages though
<karolherbst> for compute?
<bnieuwenhuizen> should be yeah
<bnieuwenhuizen> the dual issue fp32 gets way more difficult
<karolherbst> I don't know if it's important, but luxmark-luxball gives me ~0.5% higher scores with wave64
<bnieuwenhuizen> that isn't RDNA3 though is it?
<karolherbst> uhhh.... right, that's RDNA2
<karolherbst> I think
<karolherbst> navi22
<bnieuwenhuizen> yeah
<karolherbst> anyway... my kernel gets nuked in CU mode :'(
<karolherbst> no idea why
<karolherbst> at least some
<karolherbst> I have no idea what's up
<karolherbst> but it seems to be working with wave32
<karolherbst> we can potentially also just do things differently for SHADER_KERNEL, but I also kinda want to know why that happens
kzd has joined #dri-devel
jrayhawk has quit [Quit: lv management]
junaid has quit [Remote host closed the connection]
Company has joined #dri-devel
jrayhawk has joined #dri-devel
kts has quit [Remote host closed the connection]
krushia has quit [Quit: Konversation terminated!]
heat has joined #dri-devel
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
YuGiOhJCJ has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
kts has joined #dri-devel
fab has quit [Quit: fab]
fab has joined #dri-devel
fab has quit []
smiles_ has quit [Ping timeout: 480 seconds]
fab has joined #dri-devel
<mareko> karolherbst: what does it mean "nuked"?
<karolherbst> ehhh... GPU reset
<mareko> what is the resource usage of the kernel?
<karolherbst> good question
<mareko> i.e. thread count, shared memory, VGPRs
<karolherbst> at the bottom
<mareko> what's the block size?
fab has quit [Quit: fab]
<karolherbst> I didn't check.. let me boot the machine up again
<mareko> instead of having to use MESA_SHADER_KERNEL and getting broken because of that, it would be better to use MESA_SHADER_COMPUTE and have "is_kernel" in the shader info
<karolherbst> mhhh.. maybe that would have been the better approach here
<karolherbst> anyway, the block is 512x1x1
fab has joined #dri-devel
<mareko> I think the problem is there is not enough VGPRs
<karolherbst> and that matters with CU vs WGP mode?
<karolherbst> the exact same thing works just fine in WGP mode
<karolherbst> but it's plausible it's caused by launching too many threads.. I could check that
<mareko> 144 VGPRs means 3 waves per SIMD
<mareko> the CU has 2 SIMDs, so 6 waves at most, but 512x1x1 is 8 waves
<karolherbst> yep
<karolherbst> works fine with 256 threads
<mareko> the WGP has 4 SIMDs, so 12 waves at most, so it fits
fab_ has joined #dri-devel
<karolherbst> I guess we'll have to fix si_get_compute_state_info then
<mareko> I think LLVM miscompiled that shader
fab_ is now known as Guest2661
<karolherbst> ahh
<karolherbst> ohhh...
<karolherbst> it should use more SGPRs instead?
<mareko> it should have capped VGPR usage to 128 and spill the rest
<mareko> so that it would fit on the CU
<karolherbst> the header is passed to LLVM, right?
<karolherbst> or is there maybe a flag radeonsi would have to set to tell LLVM to assume CU mode or something?
<karolherbst> maybe we should move this to #radeon as arsenm is there
<mareko> no
fab has quit [Ping timeout: 480 seconds]
Guest2661 is now known as fab
<mareko> this is another case of MESA_SHADER_KERNEL not being handled
<karolherbst> ahh
fab is now known as Guest2662
<karolherbst> mhhh... annoying
kts has quit [Remote host closed the connection]
<karolherbst> maybe we really should get rid of KERNEL and move it into shader_info
<karolherbst> shader_info.cs.is_kernel
<mareko> yes that could fix a lot of things
<karolherbst> yeah.. let's do that
kts has joined #dri-devel
DottorLeo has joined #dri-devel
junaid has joined #dri-devel
<mareko> if it spills and you don't want spilling, we can implement and enable the WGP mode properly
<karolherbst> would it spill into SGPRs first?
<karolherbst> not sure how much of a perf difference it would make
<mareko> no
<mareko> SGPRs are the scarcest resource
<mareko> SGPRs are spilled to VGPRs, and VGPRs are spilled to memory
<karolherbst> ahhh
<karolherbst> guess I got it backwards then
<mareko> VGPRs are 2048-bit vector registers in wave64, while SGPRs are always 32-bit
<HdkR> Map SGPRs to NVIDIA's Uniform Registers, and VGPRS to NVIDIA's "registers", close enough approximation that it works out :P
<karolherbst> yeah I know, I just thought scalar are registers per thread :D
<karolherbst> uhhh.. anv already makes use of the KERNEL stage in very annoying ways
junaid has quit [Remote host closed the connection]
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
DottorLeo has quit [Quit: Konversation terminated!]
heat_ has joined #dri-devel
heat has quit [Read error: No route to host]
iive has joined #dri-devel
benjamin1 has joined #dri-devel
<karolherbst> :') 52 files changed, 145 insertions(+), 157 deletions(-)
benjamin1 has quit [Quit: WeeChat 3.8]
benjaminl has joined #dri-devel
<karolherbst> mareko: still getting 140
Duke`` has quit [Ping timeout: 480 seconds]
rcf has joined #dri-devel
<karolherbst> what's the thing in radeonsi which should make sure LLVM spills VGPR? Maybe it's indeed some bug somewhere besides the shader stage
Guest2662 has quit [Quit: Guest2662]
fab has joined #dri-devel
<pendingchaos> I thought it was LLVM which determines the VGPR limit
<pendingchaos> maybe https://pastebin.com/raw/mT2gj4wX ?
<karolherbst> pendingchaos: yep, that fixes it
kts has quit [Remote host closed the connection]
<karolherbst> I really should start doing hardware based CI with rusticl as I keep hitting bugs which aren't CL specific 🙃
kts has joined #dri-devel
<karolherbst> pendingchaos: will you write a proper patch and create the MR or should I do it?
<pendingchaos> you can do it
<karolherbst> okay
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
glennk has quit [Remote host closed the connection]
glennk has joined #dri-devel
kts has quit [Remote host closed the connection]
kts has joined #dri-devel
<DemiMarie> Why do GPUs use ioctl() and not io_uring?
<karolherbst> because it won't change anything really
<karolherbst> though maybe it has some advantages inside pointless microbenchmarks...
<karolherbst> but anyway, the real answer is: when mesa started there was just ioctl, and moving to io_uring doesn't really pay off.. probably
<psykose> it's also usually painful to have to actually maintain 2 ways of doing something at once just for one to be 2% faster
<DemiMarie> karolherbst: how can a GPU reset be anything but a kernel or firmware bug?
<psykose> unless all old kernel support is dropped you can't remove the non-io-uring one
<karolherbst> DemiMarie: ask AMD
<karolherbst> on AMD it's either full reset or broken GPU
<karolherbst> and if your shader hangs, for whatever reason,it's a full reset
<karolherbst> but anyway
<karolherbst> you can't validate GPU commands
<karolherbst> sooo.. userspace has the power of causing GPU resets
krushia has joined #dri-devel
<DemiMarie> 1. Why can’t you validate GPU commands?
<DemiMarie> 2. Shouldn’t the GPU be robust against invalid commands?
<karolherbst> halting problem
<mareko> the GPU only has single-tasking for gfx and any gfx app can use an infinite loop, so the kernel has to stop it with a reset, it's a feature
<karolherbst> I don't know if I'd call so far calling it a feature, but there is nothing else you can do, soo....
<DemiMarie> Are Intel and Nvidia better when it comes to GPU error recovery?
<karolherbst> yes
<mareko> freeze or reset, your choice
<DemiMarie> mareko: to me this is a hardware bug
<karolherbst> it is
<karolherbst> the hardware in incabable or recovery
<karolherbst> *incapable
<mareko> DemiMarie: halting problem is a hw bug?
<karolherbst> but new gens are supposed to implement some sort of recovery
<karolherbst> mareko: no, having to reset the entire GPU is
<karolherbst> at least the AMD way
<mareko> no we don't
<karolherbst> VRAM content is lost, no?
<karolherbst> or rather.. why does my display state gets fully reset on GPU resets 🙃
<karolherbst> it doesn't have to be this way
<mareko> there are levels of resets
<karolherbst> other GPUs can just kill a GPU context and move on
<DemiMarie> A properly-designed GPU can support two or more mutually distrusting users, such that one user cannot interfere with the other user’s use of the GPU.
<karolherbst> yeah.. I agree
<mareko> if a shader doesn't finish, all shaders of that process are killed
<krushia> ever run into webgl malware (or poorly coded shaders on a web site)? it isn't fun
<karolherbst> I generally see more stuff getting killed
<DemiMarie> My understanding is that Intel, Nvidia, and Apple GPUs all meet this criterion (modulo implementation bugs).
<mareko> karolherbst: that's the next level
<karolherbst> yeah.. which I run into today with the wgp issue :)
<DemiMarie> mareko: that is arguably a blocker for virtGPU native contexts.
<DemiMarie> A VM must not be able to crash the host.
<karolherbst> AMD is improving the situation
<mareko> DemiMarie: it's also a blocker for virgl
<mareko> DemiMarie: if you want to argue that way
<karolherbst> but yeah.. it's kinda messy
<karolherbst> well.. other vendors are better at it
<DemiMarie> mareko: could virgl add explicit preemption checks during compilation?
<karolherbst> mhh.. should be possible
<karolherbst> but it's not only sahder code
<karolherbst> like .. my bug today was the shader just using too many registers causing GPU resets...
<mareko> I'll just say that these assumptions and questions make no sense with how things work, virgl can't do anything and virgl is also a security hole for the whole VM
<karolherbst> there is a lot of things the kernel/host side would have to deal with
<mareko> security and isolation is not something virgl does well
<karolherbst> the point is rather, that a guest could constantly DOS the entire host
<karolherbst> but yeah...
<karolherbst> GPU isolation in generaly is practically impossible problem
<karolherbst> *general
<karolherbst> though nvidia does support partitioning the GPU on the hardware level
<mareko> the native context does isolation properly, venus also does it properly, virgl doesn't
<karolherbst> and you can assign isolated parts of the GPU to guests which do not interfer with each other
<DemiMarie> mareko: are virtGPU native contexts better at this?
<mareko> yes
<DemiMarie> Intel also supports SR-IOV on recent iGPUs (Alder Lake, IIRC)
<karolherbst> yeah.. SR-IOV is probably the only isolation mode which actually works
<DemiMarie> karolherbst: why do virtGPU native contexts not work?
<mareko> but the gfx queue can do execute 1 job, so if a job takes 2 seconds, it will block everybody for 2 seconds even if it doesn't trigger the reset
<Lynne> I wish amd's gpus didn't fall over for something as simple as doing unchecked out of bounds access
<karolherbst> DemiMarie: because you don't properly isolate, but it could be good enough depending on the hardware/driver
<DemiMarie> karolherbst: define “properly isolate”
<DemiMarie> In a way that does not lead to circular reasoning
sima has quit [Ping timeout: 480 seconds]
<karolherbst> hw level isolation of tasks and resources
<karolherbst> but stalls is something which is probably impossible to fix, but also not really problematic
<karolherbst> well.. depends
<karolherbst> don't want to do RT sensitive things on shared GPUs
heat_ has quit [Read error: Connection reset by peer]
heat_ has joined #dri-devel
<karolherbst> but I've also seen nvidia not being able to recover properly 🙃 sometimes the GPU just disconnects itself from the PCIe bus
<mareko> I love air conditioning
<karolherbst> I wished I had some
<zmike> air conditioning++
<mareko> karolherbst: actually compute and RT has shader-instruction-level preemption on AMD hw, not sure if enabled though
<mareko> outside of ROCm
<karolherbst> probably not for graphics
<karolherbst> but I was considering enabling it for some drivers
<karolherbst> it's not nice if compute can stall the gfx pipelines
<mareko> surprisingly compute shader-level preemption shouldn't need any userspace changes AFAIK, but you need to use a compute queue and whatever else ROCm is doing
<karolherbst> yeah...
<karolherbst> I have to figure this stuff out also for intel
<karolherbst> i915 is reaping jobs taking more than 0.5seconds or something
<karolherbst> and some workloads run into this already
<mareko> the kernel implements suspending CU waves to memory
<karolherbst> yeah.. makes sense
<karolherbst> we really should add a flag on screen creation for this stuff
<mareko> CUs have multitasking for running waves, but they can't release resources without suspending waves to memory
<mareko> you can run 32 different shaders on a single CU at the same time if they all fit wrt VGPRs and shared mem
<karolherbst> yeah.. I think it's similar to nvidia. And by default you can only context switched between shader invocations
<karolherbst> and move them into VRAM
<karolherbst> (the contexts that is)
benjaminl has quit [Ping timeout: 480 seconds]
benjaminl has joined #dri-devel
kts has quit [Remote host closed the connection]
kts has joined #dri-devel
<DemiMarie> karolherbst: what do you mean by “hw level isolation of tasks and resources”?
<karolherbst> well.. work on the GPU takes resources and you can split them up, be it either memory or execution units
<DemiMarie> mareko: why not enable preemption for graphics on AMD GPUs?
<DemiMarie> yes, one can
<karolherbst> performance
<DemiMarie> that’s what page tables and scheduling are for IIUC
<karolherbst> nah
<karolherbst> I meant proper isolation
<DemiMarie> how big is the hit?
<DemiMarie> define “proper”
<karolherbst> splitting physical memory
<DemiMarie> hmm?
<karolherbst> and assigning execution units to partitions
<DemiMarie> why is that stronger than what page tables can provide?
<karolherbst> so you eliminate even the chance of reading back leftover state from other guests
<DemiMarie> I thought that is just a matter of not having vulnerable drivers
<karolherbst> because you don't even have to bother about tracking old use of physical memory
<karolherbst> you get a private block of X physical memory assigned
<karolherbst> that's the level of partitioning you can do with SR-IOV level virtualization
<DemiMarie> Is that how it is implemented on e.g. Intel iGPUs?
<karolherbst> I don't know
<karolherbst> but nvidia hardware can do this
<karolherbst> well.. recent one at least
<karolherbst> or rather those which support SR-IOV
<DemiMarie> which are unobtanium IIUC
<karolherbst> I think ampere can do it already as well
<DemiMarie> In my world, if it isn’t something the average user can afford, it doesn’t exist
<karolherbst> and generally consumer GPUs
<karolherbst> I think
<DemiMarie> Are you referring to https://libvf.io?
<karolherbst> mhh?
<DemiMarie> okay
<karolherbst> but anyway, you just have to set up the device partitions somewhere and then you have leanly isolated partitons of the GPU
<karolherbst> each with their own page tables and everything
<DemiMarie> One question I always have about hardware partitioning is, “How much is actually partitioned, and how much of this is just firmware trickery?”.
<DemiMarie> Because the on-device firmware is a significant attack surface.
<karolherbst> on nvidia it's almost all in hardware
<karolherbst> you still have the firmware for context switching which is shared, but the actual resources used by actual workloads is all split
<karolherbst> (in hardware)
<karolherbst> there are fields to assign SMs to partitions, can do it even finer grained afaik
<karolherbst> same for VRAM
<DemiMarie> Even on consumer GPUs?
<karolherbst> yeah
<DemiMarie> What about the RM API?
<karolherbst> dunno if nvidia enables it though in their driver
<karolherbst> well.. I think GSP still would run shared as you only have one GSP processor on the GPU still
<DemiMarie> Yeah
<karolherbst> but that's mostly doing hardware level things
<karolherbst> really doens't matter much
<DemiMarie> I thought much of the RM API was implemented in firmware.
<DemiMarie> Stuff like memory allocation.
<karolherbst> nah, that stuff exists outside afaik
<karolherbst> that's performance sensitive stuff,
<karolherbst> but allocating memory is also very boring
benjaminl has quit [Ping timeout: 480 seconds]
benjaminl has joined #dri-devel
<DavidHeidelberg[m]>
<DavidHeidelberg[m]> mupuf amdgpu:codename:NAVI21 also down? So I can disable also VALVE farm?
iive has quit [Quit: They came for me...]