<lina>
I'm looking at the magic shader code and it seems to be some kind of dispatcher based on a special register value...
<lina>
It definitely interacts with the compute buffer struct I found
<lina>
Some codepaths do save/restore stuff (and I think we might be missing some looping instruction decoding? I see unknown instructions right after the load/store groups...)
<lina>
But at least one op, 0xf, just runs 5 unknown instructions. I wonder if that is cache maintenance...
<lina>
And then there's a magic store to address 0x2d822acc in here... is this some kind of MMIO or doorbell?
Stary has joined #asahi-gpu
<lina>
Aaaa, I just tried doing a faulting write in that shader and it locks up the whole GPU... the compute part runs fine, but a subsequent 3D job hangs. I can see the fault status but the firmware is just sitting there...
<lina>
I think what happens is that since we're not flushing caches, the fault happens after the compute job is complete, in some state where the firmware doesn't implement timeouts properly...
possiblemeatball has quit [Quit: Leaving]
SSJ_GZ has joined #asahi-gpu
hertz has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
tim has joined #asahi-gpu
tim has quit [Remote host closed the connection]
le0n has quit [Quit: see you later, alligator]
le0n has joined #asahi-gpu
LinuxM1 has joined #asahi-gpu
SSJ_GZ has quit [Ping timeout: 480 seconds]
kit_ty_kate has quit [Quit: WeeChat 3.6]
LinuxM1 has quit [Ping timeout: 480 seconds]
bcrumb has joined #asahi-gpu
bcrumb has quit []
hertz has joined #asahi-gpu
mkurz has quit [Quit: Leaving]
bcrumb has joined #asahi-gpu
possiblemeatball has joined #asahi-gpu
bcrumb has quit [Quit: WeeChat 3.7.1]
SSJ_GZ has joined #asahi-gpu
cylm has joined #asahi-gpu
cr1901 has quit [Ping timeout: 480 seconds]
Tramtrist has quit [Remote host closed the connection]
<alyssa>
this should be predicated on stage == COMPUTE, you're writing into a union
balor has quit [Quit: balor]
balor has joined #asahi-gpu
<ella-0>
alyssa: With respect to the system values MR I am currently porting AGXV to it. Will let you know if anything doesn't work well with vulkan but so far so good :3
bcrumb has joined #asahi-gpu
<alyssa>
ella-0: awesome!
<alyssa>
Is it, like... better than what we had before?
<alyssa>
I've never written a VK driver lol
<ella-0>
yes
<ella-0>
It simplifies the command buffer code somewhat and makes it possible to allocate a chunk of the register file for push constants.
<alyssa>
Hmmm ok
<alyssa>
I still don't really get how
<alyssa>
but if you say it's right and Jason says it's right that is good for me
<alyssa>
and obviously it solves a problem I had in GL lol
bcrumb has quit [Quit: WeeChat 3.7.1]
<alyssa>
by the way.. at what point will we be bottlenecked on agxv specific stuff
<alyssa>
i.e. when should I pull your tree and start hacking on deqp-vk, versus continuing to support more and better gl in the interest of agxv
mini0n has joined #asahi-gpu
<ella-0>
uhh not sure
<alyssa>
kie
<ella-0>
I was hoping to get some basic compute running on agxv before that I think
<alyssa>
makes sense
<alyssa>
I was hoping the combination of you, Lina, Karol, and Dougall would be able to finish off compute support without me ^.^
<alyssa>
make sure my baby can grow without me, yknow
faruk has joined #asahi-gpu
<alyssa>
<alyssa>
o
cr1901 has joined #asahi-gpu
<ella-0>
That makes sense. I'm happy to work on compute stuff :3 currently the main thing holding agxv back is not having vkCmdCopy* and other transfer operations implemented. I tried and failed to implement them using the tilebuffer load/store shaders
<alyssa>
Right..
<alyssa>
vkCmdCopy* sucks.
<alyssa>
vkCmdCopyImageToBuffer should probably be a compute kernel
<alyssa>
vkCmdCopyBufferToImage should be a fragment shader
<alyssa>
probably
<alyssa>
except maybe some little details about ASTC/BCn formats? but I think you can just munge the dimensions?
<alyssa>
also I would expect vk_meta can do those because they're pretty general
<alyssa>
vkCmdCopyBufferToBuffer is also a totally generic compute kernel
<alyssa>
the only spicy thing is vkCmdCopyImageToImage
<alyssa>
implementing that efficiently requires being able to "cast" framebuffer compressed images to different formats
<ella-0>
Yup it's pain :<
<alyssa>
right, so, there are basically 2 classes of hardware
<alyssa>
1. hw that can do compressed texture views. in this case, img2img copies are just blits! you might need to munge the dest format into something renderable with the same size, maybe munge the dimensions in some evil cases, but by and large this is generic blitting on the fragment pipe, where a texture view (with view format != original format) on the sampling image for casting
<alyssa>
2. hw that can't. obviously this class of hw can still do uncompressed -- at worst, linear -- texture views. so the sane option here is to eat the bandwidth hit -- allocate an uncompressed staging resource and lower compressed->compressed copies to a pair of copies compressed->uncompressed + uncompressed->compressed because you can do views for each of those. and those simpler copies are easy and
<alyssa>
generic on the fragment pipe.
<alyssa>
I am unsure which class of hardware AGX is.
<alyssa>
Mali is #1 on Valhall, but #2 on anything older
<alyssa>
panvk 1.0 has some extremely delicate logic to try to do compressed->compressed image copies with format conversion in one shot with format packing/unpacking in the copy shaders
<alyssa>
while that's almost certainly faster, it isn't worth the complexity ... vkCmdCopyImageToImage with compressed<--->compressed and incompatible formats is an abomination that should never have been added to the spec and bad perf is to be expected
<alyssa>
I don't know if any real apps would even hit that. I'm doubful.
<alyssa>
GL drivers won't even try and will just decompressed your image in this case.
hightower2 has joined #asahi-gpu
<alyssa>
oh, uh
<alyssa>
3 classes i guess
<alyssa>
3. hardware blitters that trivialize the problem
<alyssa>
NVIDIA is #3 so NVK doesn't need meta shaders for this
<ella-0>
interesting
<alyssa>
but architecturally vk_meta should be able to support both #1 and #2 with some work
<alyssa>
if you don't get to it I probably will in, like, June