ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
sigmaris has quit [Ping timeout: 480 seconds]
camus has quit []
megi has quit [Quit: WeeChat 3.2]
megi has joined #panfrost
<icecream95>
I've just sent my first non-faulting Panfrost GPU jobs over the network
<icecream95>
I just need to make it send the updated BO data back to the driver, then we can have accelerated Vulkan in web browsers...
* icecream95
remembers what happened last time he tried doing stupid things with Vulkan
sigmaris has joined #panfrost
<tomeu>
alyssa: using the m1 for panfrost development sounds great to me, this is how I test my wip code:
<tomeu>
I'm building with gcc under qemu as this is a x86 machine, so on your m1 it should be really fast
<icecream95>
tomeu: With my patches for sending ioctls and BOs over a network you could run everything (including dEQP on panvk) on an x86 machine and only use the ARM device for sending the jobs to the GPU
<icecream95>
You'd still need to test it natively to make sure synchronisation works correctly etc.
<macc24>
tomeu: FYI gcc in qemu seems to have broken recently
<macc24>
at least for me
<tomeu>
icecream95: interesting!
<tomeu>
macc24: I think it has been years since the last time I upgraded the debian on the nfsroot/container :)
<macc24>
tomeu: jesus christ
<tomeu>
my only interaction with the machine is that cmdline above, so I'm more than happy to not have to worry about it
rasterman has joined #panfrost
camus has joined #panfrost
<bbrezillon>
alyssa: adding panfrost_dump.c to my TODO list ;). Right now I'm finishing the panvk+panfrost-common-lib per-gen split (I'm almost done BTW)
camus has quit [Read error: Connection reset by peer]
<alyssa>
if you want to get back to literally anything else
camus has joined #panfrost
<bbrezillon>
alyssa: I'm taking care of v7 right now, depending on how it goes I might let you do the others
<alyssa>
bbrezillon: Fair enough =)
<Lyude>
alyssa: asking since I vaguely remember this being a that had to be solved for panfrost at one point, what was the solution for us fixing the whole "failed to open foo: /usr/lib64/dri/foo.so". Did we just write up small dri shims for each different display chip using panfrost?
<Lyude>
oh we might have it handled actually
<alyssa>
Hmm?
<alyssa>
`kmsro`
<alyssa>
I think is the term to grep for
<Lyude>
alyssa: gotcha
<alyssa>
but yes basically that
<alyssa>
robmur01: Noooo! my bifrost fp16 instr_invalid_enc faults are back, and now I have a reproducing shader that doesn't even use v2f32_to_v2f16 *or* f16_to_f32
<alyssa>
i'm increasingly convinced the scheduler is fine and there's some rare bug in the packing code
<alyssa>
Lyude: i am decidedly tempted to break out your assembler to debug this
<alyssa>
of course i have no idea if I can still remember cwabbott 's bifrost synatx :p
<Lyude>
neat o:
<alyssa>
i just
<alyssa>
do i have a subtle bug? does the disasm have the same bug? is there a hw bug?
<alyssa>
why is the DDK unaffected
<alyssa>
and oh how I wish I could poke the DDK at a finer level
<alyssa>
it's hard to figure out how it packs certain things when I have so little influence over clause scheduling
<alyssa>
For this buggy shader, my schedule is 1 cycle faster than the DDK
<alyssa>
Is the DDK missing a schedule opportunity? Or is there something subtly wrong about the schedule explaining the fault?
<robmur01>
well, FWIW my instinct says that if two things are slightly different and one of them is wrong, there's a high chance that the difference is significant :)
<robmur01>
+1 for disassemble that thing
wwilly has joined #panfrost
<alyssa>
robmur01: hm
<alyssa>
new/old bifrost syntax is too differnt uhm
<alyssa>
bifrost is canceled
<HdkR>
Perfect, long live valhall
<alyssa>
help
<robmur01>
can't you get both shader binaries and run them through the same tool?
<macc24>
\o/ no more g72 quirks
<alyssa>
robmur01: which tool?
<alyssa>
I can't get the shader binaries close enough since Bifrost scheduling heuristics are basically just calling `rand()`
<robmur01>
oh, I thought disassemblers existed already
<robmur01>
this whole graphics malarkey still baffles me sometimes :)
<alyssa>
disassembler yes
<alyssa>
but clause scheduling makes that less than useful
<alyssa>
Apparently DDK's clause scheduling does have logic preventing FMA.v2f16 / MKVEC.v2i16 from being put in a tuple together? but ... why?!
<alyssa>
robmur01: daniels suggest I vary the modifier
<alyssa>
aboutly MKVEC.v2i16 with t0.h1 won't fuse (See above)
<alyssa>
but the DDK -will- fuse MKVEC.v2i16 with t0.h0
<alyssa>
there is absolutely no valid architectural reason for that distinction to matter
<alyssa>
which just underscores that there is internal pipeline state leaking through and I mean
<alyssa>
14:05
<alyssa>
I'll just end up playing whack-a-mole if I keep adding rules on for these symptoms
<macc24>
alyssa: how high tolerance do you have to dumb questions?
<alyssa>
macc24: fairly low but i can make an exception for you ;)
<macc24>
alyssa: is bifrost compiler padding with nops?
<alyssa>
macc24: yes, and that's not a dumb question
<macc24>
why?
<alyssa>
because bifrost is dumb
<alyssa>
bifrost's arithmetic logic unit has two units, FMA and ADD
<HdkR>
If you can't fill the pipeline with work, then you need to pad :>
<alyssa>
they can execute different kinds of instructions
<alyssa>
it always executes FMA/ADD/FMA/ADD/... in that order
<alyssa>
so if you don't have an instructionr eady for a given unit (FMA or ADD)... you have to put in a NOP instead
<alyssa>
more modern designs like Valhall do the same thing, but they do it internal to the hardware. i.e. if they dont have work ready, they'll just stall, the compiler doesn't have to literally tell the hardware to stall.
<macc24>
alyssa: what would happen if compiler didn't do any nops
<robclark>
that is kinda the opposite direction from every other gpu vendor.. which seems to go in the direction of "make the compiler figure it out"