#panfrost on 2021-07-28 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

00:06 sigmaris has quit [Ping timeout: 480 seconds]

00:15 camus has quit []

00:24 megi has quit [Quit: WeeChat 3.2]

00:25 megi has joined #panfrost

03:04 <icecream95> I've just sent my first non-faulting Panfrost GPU jobs over the network

03:04 <icecream95> I just need to make it send the updated BO data back to the driver, then we can have accelerated Vulkan in web browsers...

03:05 * icecream95 remembers what happened last time he tried doing stupid things with Vulkan

03:48 sigmaris has joined #panfrost

06:36 <tomeu> alyssa: using the m1 for panfrost development sounds great to me, this is how I test my wip code:

06:36 <tomeu> sudo systemd-nspawn -D ~/nfsroot-panfrost sh -c "su tomeu -c 'ninja -C ~/deqp-build deqp-vk'" && sudo systemd-nspawn -D ~/nfsroot-panfrost sh -c "su tomeu -c 'ninja -j16 -C /home/tomeu/mesa-build' && ninja -j16 -C /home/tomeu/mesa-build install" && ssh 10.42.0.62 PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1

06:36 <tomeu> ~/deqp-build/external/vulkancts/modules/vulkan/deqp-vk -n dEQP-VK.renderpass.suballocation.unused_clear_attachments.*

06:37 <tomeu> I'm building with gcc under qemu as this is a x86 machine, so on your m1 it should be really fast

06:41 <icecream95> tomeu: With my patches for sending ioctls and BOs over a network you could run everything (including dEQP on panvk) on an x86 machine and only use the ARM device for sending the jobs to the GPU

06:41 <icecream95> You'd still need to test it natively to make sure synchronisation works correctly etc.

06:41 <macc24> tomeu: FYI gcc in qemu seems to have broken recently

06:41 <macc24> at least for me

06:45 <tomeu> icecream95: interesting!

06:46 <tomeu> macc24: I think it has been years since the last time I upgraded the debian on the nfsroot/container :)

06:49 <macc24> tomeu: jesus christ

06:50 <tomeu> my only interaction with the machine is that cmdline above, so I'm more than happy to not have to worry about it

06:59 rasterman has joined #panfrost

07:13 camus has joined #panfrost

07:17 <bbrezillon> alyssa: adding panfrost_dump.c to my TODO list ;). Right now I'm finishing the panvk+panfrost-common-lib per-gen split (I'm almost done BTW)

07:52 camus has quit [Read error: Connection reset by peer]

08:38 warpme_ has joined #panfrost

08:44 <tomeu> bbrezillon: have pushed a bunch of stuff from your blend+blit branch to https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12095/commits

08:44 <tomeu> plus the start of the split

08:45 <tomeu> do you think we could merge that next? and if so, what are the biggest issues that need to be addressed before we can do so?

09:02 wwilly has quit [Ping timeout: 480 seconds]

09:23 wwilly has joined #panfrost

09:26 <bbrezillon> tomeu: I'd like to merge the per-gen stuff first

10:28 wwilly has quit []

10:53 <tomeu> bbrezillon: ok, and then that MR?

12:13 <bbrezillon> tomeu: didn't look at it yet, but yesm that's the idea

12:27 camus has joined #panfrost

13:03 <alyssa> bbrezillon: awesome sauce

14:06 camus has quit [Read error: Connection reset by peer]

15:01 <bbrezillon> alyssa: https://gitlab.freedesktop.org/bbrezillon/mesa/-/commits/per-gen-xml-prep/

15:01 <bbrezillon> now I need to split the XMLs...

15:07 <alyssa> bbrezillon: I can split the XMLs

15:07 <alyssa> if you want to get back to literally anything else

15:48 camus has joined #panfrost

16:36 <bbrezillon> alyssa: I'm taking care of v7 right now, depending on how it goes I might let you do the others

16:37 <alyssa> bbrezillon: Fair enough =)

16:42 <Lyude> alyssa: asking since I vaguely remember this being a that had to be solved for panfrost at one point, what was the solution for us fixing the whole "failed to open foo: /usr/lib64/dri/foo.so". Did we just write up small dri shims for each different display chip using panfrost?

16:43 <Lyude> oh we might have it handled actually

16:43 <alyssa> Hmm?

16:43 <alyssa> `kmsro`

16:43 <alyssa> I think is the term to grep for

16:43 <Lyude> alyssa: gotcha

16:44 <alyssa> but yes basically that

16:56 <alyssa> robmur01: Noooo! my bifrost fp16 instr_invalid_enc faults are back, and now I have a reproducing shader that doesn't even use v2f32_to_v2f16 *or* f16_to_f32

16:57 <alyssa> i'm increasingly convinced the scheduler is fine and there's some rare bug in the packing code

17:02 <alyssa> Lyude: i am decidedly tempted to break out your assembler to debug this

17:02 <alyssa> of course i have no idea if I can still remember cwabbott 's bifrost synatx :p

17:15 <Lyude> neat o:

17:21 <alyssa> i just

17:22 <alyssa> do i have a subtle bug? does the disasm have the same bug? is there a hw bug?

17:22 <alyssa> why is the DDK unaffected

17:23 <alyssa> and oh how I wish I could poke the DDK at a finer level

17:23 <alyssa> it's hard to figure out how it packs certain things when I have so little influence over clause scheduling

17:24 <alyssa> For this buggy shader, my schedule is 1 cycle faster than the DDK

17:25 <alyssa> Is the DDK missing a schedule opportunity? Or is there something subtly wrong about the schedule explaining the fault?

17:43 <robmur01> well, FWIW my instinct says that if two things are slightly different and one of them is wrong, there's a high chance that the difference is significant :)

17:43 <robmur01> +1 for disassemble that thing

17:44 wwilly has joined #panfrost

17:44 <alyssa> robmur01: hm

17:45 <alyssa> new/old bifrost syntax is too differnt uhm

17:46 <alyssa> bifrost is canceled

17:47 <HdkR> Perfect, long live valhall

17:47 <alyssa> help

17:47 <robmur01> can't you get both shader binaries and run them through the same tool?

17:47 <macc24> \o/ no more g72 quirks

17:47 <alyssa> robmur01: which tool?

17:48 <alyssa> I can't get the shader binaries close enough since Bifrost scheduling heuristics are basically just calling `rand()`

17:48 <robmur01> oh, I thought disassemblers existed already

17:50 <robmur01> this whole graphics malarkey still baffles me sometimes :)

17:51 <alyssa> disassembler yes

17:51 <alyssa> but clause scheduling makes that less than useful

17:53 <alyssa> Ok, this is significant --

17:53 <alyssa> *FMA.v2f16 r0:t0, t0.h00, 0x409a3f9a /* 4.820264 */, #0.neg

17:53 <alyssa> +NOP.i32 t1

17:53 <alyssa> *MKVEC.v2i16 r1:t0, t0.h1, 0x00003c00 /* 0.000000 */

17:53 <alyssa> +NOP.i32 t1

17:53 <alyssa> versus

17:53 <alyssa> C *FMA.v2f16 r0:t0, t1.h00, 0x409a3f9a /* 4.820264 */, #0.neg

17:53 <alyssa> +MKVEC.v2i16 r1:t1, t.h1, 0x00003c00 /* 0.000000 */

17:53 <alyssa> (DDK on top, Panfrost on the bottom)

17:54 <alyssa> Apparently DDK's clause scheduling does have logic preventing FMA.v2f16 / MKVEC.v2i16 from being put in a tuple together? but ... why?!

18:05 <alyssa> robmur01: daniels suggest I vary the modifier

18:05 <alyssa> aboutly MKVEC.v2i16 with t0.h1 won't fuse (See above)

18:05 <alyssa> but the DDK -will- fuse MKVEC.v2i16 with t0.h0

18:06 <alyssa> there is absolutely no valid architectural reason for that distinction to matter

18:06 <alyssa> which just underscores that there is internal pipeline state leaking through and I mean

18:06 <alyssa> 14:05

18:06 <alyssa> I'll just end up playing whack-a-mole if I keep adding rules on for these symptoms

19:29 <macc24> alyssa: how high tolerance do you have to dumb questions?

19:33 <alyssa> macc24: fairly low but i can make an exception for you ;)

19:46 <macc24> alyssa: is bifrost compiler padding with nops?

19:50 <alyssa> macc24: yes, and that's not a dumb question

19:50 <macc24> why?

19:51 <alyssa> because bifrost is dumb

19:51 <alyssa> bifrost's arithmetic logic unit has two units, FMA and ADD

19:51 <HdkR> If you can't fill the pipeline with work, then you need to pad :>

19:51 <alyssa> they can execute different kinds of instructions

19:52 <alyssa> it always executes FMA/ADD/FMA/ADD/... in that order

19:52 <alyssa> so if you don't have an instructionr eady for a given unit (FMA or ADD)... you have to put in a NOP instead

19:53 <alyssa> more modern designs like Valhall do the same thing, but they do it internal to the hardware. i.e. if they dont have work ready, they'll just stall, the compiler doesn't have to literally tell the hardware to stall.

19:57 <macc24> alyssa: what would happen if compiler didn't do any nops

19:57 <robclark> that is kinda the opposite direction from every other gpu vendor.. which seems to go in the direction of "make the compiler figure it out"

19:58 <HdkR> https://en.wikipedia.org/wiki/Hazard_(computer_architecture) You hit these kinds of problems without nops :)

19:58 <alyssa> robclark: yeah, they tried that and called it bifrost and we're still cleaning up the ashes

20:01 <macc24> we need to make driver developers make gpus and gpu engineers make drivers

20:01 <macc24> that'll solve all problems

20:06 <anholt_> broadcom had basically the same people doing the HW and the simulator and the docs and the driver. it's a good way to go.

20:06 <anholt_> though the docs can suffer

20:06 <macc24> i... wouldn't take broadcom as an example

20:08 <anholt_> it's the simplest, best-documented, most stable GPU I've worked on. shame about their display engines.

20:30 atler is now known as Guest2626

20:31 atler has joined #panfrost

20:32 Guest2626 has quit [Ping timeout: 480 seconds]

20:39 atler has quit [Read error: Connection reset by peer]

20:43 atler has joined #panfrost

21:12 Putti has quit [Ping timeout: 480 seconds]

21:33 enunes has quit [Ping timeout: 480 seconds]

21:48 atler is now known as Guest2631

21:48 atler has joined #panfrost

21:50 Guest2631 has quit [Ping timeout: 480 seconds]

22:07 atler has quit [Ping timeout: 480 seconds]

22:12 atler has joined #panfrost

22:18 warpme_ has quit [Quit: Connection closed for inactivity]

23:01 atler is now known as Guest2637

23:01 atler has joined #panfrost

23:03 Guest2637 has quit [Ping timeout: 480 seconds]

23:40 rasterman has quit [Quit: Gettin' stinky!]

23:53 camus has quit []