ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard + Bifrost + Valhall - Logs - I don't know anything about WSI. That's my story and I'm sticking to it.
jolan has joined #panfrost
alyssa has joined #panfrost
<alyssa> yeet
<alyssa> merry christmas I hope you enjoy your mesa coding style compliant panfrost
<alyssa> and yeeted
<CounterPillow> you too!
<alyssa> italove: rebased your open MRs
<alyssa> and rebased mine
<alyssa> and also rebased my old opencl branch but apparently all of the good stuff I already landed and all that's left are some hacks we never figured out
q4a1 has quit []
q4a has joined #panfrost
<q4a> alyssa: What panfrost task will you do next?
<alyssa> after "chill the heck out", you mean? :-)
* alyssa has a todo list somewhere here
<q4a> yes
<alyssa> v10 gles3.1 support
<alyssa> immediate next task is finishing up the pandecode side and getting those patches out
<alyssa> that's pretty close I just keep getting distracted with shiny things like running Dolphin on my other driver, hehe ^.^
<alyssa> The challenge with CSF is 100% on the kernel side
<alyssa> Given that we have conformatn gles3.1 on v9, the Mesa side for v10 is... well, it's not trivial but there's nothing really novel happening
<alyssa> Way more cache coherency snafu than any Mali I've seen so far, though, so that's really fun (-:
<alyssa> I am suspicious that the rk3588 board I have might have been from a bad early batch of the SoC ... fresh one is on the way from a different manufacturer, hopefully that works out better
<q4a> Thanks for your work. I have v10 and I'm ready to help with simple things or test when needed
<alyssa> thanks :)
<alyssa> I have a few month months of uni to finish up
<alyssa> after that, the sky's the limit :-)
<alyssa> q4a: If you're interested in learning about compilers, there's a lot of "low hanging" tasks in the Valhall compiler you could work on
<alyssa> instruction selection optimizations and such
<q4a> Yes. I'm interested
<alyssa> stuff that probably doesn't actually help fps in real workloads, so I can't justify spending time on anymore, but are lots of funs and good learning experiences
<alyssa> Okie
<alyssa> I can write up some issues on gitlab about ideas to work on
<q4a> it will be great!
<alyssa> :D
<q4a> I need specific kernel for that tasks?
<alyssa> Mmh, that's tricky
<alyssa> What Mali hardware do you have other than v10?
<q4a> rk3288
<alyssa> right. different compiler then.
<alyssa> Mmh, most of what I have in mind you would be writing unit tests for
<alyssa> so it actually shouldn't matter what hw you have
<alyssa> once your unit tests pass, obviously i would run it through deqp on v9
<alyssa> which reminds me we really need to get v9 in CI
* alyssa mumbles
<alyssa> q4a: So, for environment, I recommend setting up drm-shim
<alyssa> with the commands there you can then run panfrost's compilers for any target GPU you like on a shader you craft, or on a big pile of shaders as you choose
<alyssa> readme for shader-db helps
<alyssa> `python3 before.txt after.txt` will generate some nice stats
<alyssa> or, would. I think you need a patch I forgot to upstream
<q4a> All this should work on rk3288?
<alyssa> sure
<alyssa> or an x86 machine or whatever
<alyssa> only requirement is that you're running Linux (or maybe BSD)
<alyssa> <-- this patch to shader-db will make work
<alyssa> with valhall
<alyssa> as the docs explain, PAN_GPU_ID=9093 will target a Valhall processor as you want
<alyssa> I have an alias "run-g57" that expands to "LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/ PAN_GPU_ID=9093 ./run"
<alyssa> I guess the proper way to do your own mesa builds now is to run `meson devenv` inside of your `mesa/build/` folder
<alyssa> within the proper mesa devenv, with drm-shim as above I can run e.g
<alyssa> $ run-g57 shaders/glmark/1-18.shader_test
<alyssa> and it'll print out some stats about the shader it compiled:
<alyssa> shaders/glmark/1-18.shader_test - MESA_SHADER_POSITION shader: 30 inst, 3.000000 cycles, 0.343750 fma, 0.062500 cvt, 0.062500 sfu, 0.000000 v, 0.000000 t, 3.000000 ls, 16 quadwords, 2 threads, 0 loops, 0:0 spills:fills
<alyssa> shaders/glmark/1-18.shader_test - MESA_SHADER_VARYING shader: 123 inst, 6.000000 cycles, 1.515625 fma, 0.109375 cvt, 0.812500 sfu, 0.000000 v, 0.000000 t, 6.000000 ls, 64 quadwords, 2 threads, 0 loops, 0:0 spills:fills
<alyssa> shaders/glmark/1-18.shader_test - MESA_SHADER_FRAGMENT shader: 22 inst, 0.250000 cycles, 0.093750 fma, 0.187500 cvt, 0.062500 sfu, 0.250000 v, 0.000000 t, 0.000000 ls, 16 quadwords, 2 threads, 0 loops, 0:0 spills:fills
<alyssa> and if I want to see the assembly
<alyssa> `BIFROST_MESA_DEBUG=shaders run-g57 shaders/glmark/1-18.shader_test` will show me the asm
<alyssa> and intermediate IR and so on
<alyssa> at the top is the final optimized NIR shader
<alyssa> next up is the final optimized Valhall instructions, but before register allocation
<alyssa> next up is the shader after register allocation
<alyssa> then ^ plus the late Valhall specific passes that aren't used on Bifrost
<alyssa> finally, a disassembly of the compiled shader itself, i.e. what the hardware actually executes
<alyssa> Looking at that shader, already lots of little inefficiencies jump out
<alyssa> so here's a low hanging fruit of the above variety: folding in the F16_TO_F32 instruction into the FMA_RSCALE.f32 instruction
<alyssa> F16_TO_F32 r0, ^r0.h0
<alyssa> FMA_RSCALE.f32 r0, ^r0, 0x3F800000, 0x0.neg, ^r1
<alyssa> this can be written more efficiently as
<alyssa> FMA_RSCALE.f32 r0, ^r0.h0, 0x3F800000, 0x0.neg, ^r1
<alyssa> i think
<alyssa> uh, no, apparently it can't. oof.
<alyssa> ok, here's a different issue
<alyssa> IADD_IMM.i32 r1, 0x0, #0x18
<alyssa> FMA_RSCALE.f32 r0, ^r0, 0x3F800000, 0x0.neg, ^r1
<alyssa> that "IADD_IMM.i32" instruction just loads a constant
<alyssa> It would be more efficient to reserve a "fast access uniform" (FAU entry) for the constant and write in one instruction instead
<alyssa> FMA_RSCALE.f32 r0, ^r0, 0x3F800000, 0x0.neg, u0
<alyssa> `src/panfrost/bifrost/valhall/va_lower_constants.c` is responsible for lowering constants
<alyssa> read that pass, you'll see there's a todo for using uniforms
<alyssa> `src/panfrost/bifrost/valhall/test/test-lower-constants.cpp` tests that pass. you'll want to write unit tests for the optimization you're trying to write first, and then you can run them from your mesa/build with `meson test --suite=panfrost`
<alyssa> also read `bi_opt_push_ubo.c` and the push data structure
<alyssa> and the sysvals infrastructure
<alyssa> you'll need to extend them somehow to push constants
<alyssa> and then upload those constants in the driver
<q4a> ok. I need some time to read, build and test this stuff
floof58 is now known as Guest283
floof58 has joined #panfrost
Guest283 has quit [Ping timeout: 480 seconds]
<alyssa> good luck!
<q4a> I builded mesa, shader-db and got my asm output:
<q4a> for `BIFROST_MESA_DEBUG=shaders run-g57 shaders/glmark/1-18.shader_test`
<q4a> Need to read more about asm instructions..
<alyssa> q4a:
<alyssa> that's not updated anymore but it's mostly accurate
<alyssa> src/panfrost/bifrost/valhall/ISA.xml is updated, however.
<q4a> thanks!
Daanct12 has joined #panfrost
cphealy has quit [Remote host closed the connection]
cphealy has joined #panfrost
warpme_____ has quit []
Daanct12 has quit [Quit: Leaving]
rasterman has joined #panfrost
Net147 has quit [Quit: Quit]
Net147 has joined #panfrost
MajorBiscuit has joined #panfrost
cphealy has quit []
cphealy has joined #panfrost
warpme_____ has joined #panfrost
kenzie7 has quit []
kenzie7 has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
Daanct12 has joined #panfrost
Danct12 has quit [Ping timeout: 480 seconds]
floof58 has quit [Ping timeout: 480 seconds]
floof58 has joined #panfrost
MajorBiscuit has quit [Quit: WeeChat 3.6]
avane_ has quit [Ping timeout: 480 seconds]
avane has joined #panfrost