ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
<jekstrand>
Or we can just do fine for everything and YOLO it
vstehle has quit [Ping timeout: 480 seconds]
alyssa has joined #panfrost
<alyssa>
icecream95: afbc24a234e, hah! nice
<alyssa>
jekstrand: icecream95: No Mali hardware has support for real geometry or tessellation shaders. It's emulated all the way down.
<alyssa>
Nevertheless, the hardware has some minor features to make the emulation code cheaper
<alyssa>
For geometry shaders, there are some special modes of the attribute descriptor to make it convenient to load inputs in the presence of adjacency primitives etc
<alyssa>
For tessellation shaders, there's a special pixel format (tess_vertex_pack) that (ab)uses bit trickery to squeeze out extra precision needed for conformance reasons. (Honestly not sure of the details. Might be new since Bifrost, don't remember.)
<alyssa>
Valhall adds layered rendering support (icecream95, it should be obvious from v9.xml in Mesa how the hardware wants it, I imagine v10 is similar)
<alyssa>
This adds an output for gl_LayerID in special IDVS jobs, and tiler support for picking out the right layer for a given fragment job.
<alyssa>
Given panfrost doesn't emulate GS, it's not too interesting. But it probably would make GS+layered cheaper on Valhall than on Bifrost.
<alyssa>
jekstrand: As for derivatives, icecream95 is correct
<alyssa>
Only thing I'd add is the .subgroup modifier does not change the hardware execution, only the *perceived* subgroup size.
<alyssa>
A fragment shader on Valhall, for example, still has 16 wide warps. But CLPER.i32.subgroup4 returns results consistent with a quad.
<jekstrand>
alyssa: Yeah. I think I've got it. The next question is if we care about fine vs. coarse
<alyssa>
GL doesn't so we ( bbrezillon and I) went for the cheapest thing
<alyssa>
If VK does, supporting both would be easy -- CLPER.i32 gives you the tools you need :-)
<jekstrand>
I don't think it needs to be complicated
<alyssa>
sure, but if fine is even an extra instruction over coarse, and I can get away with coarse on GL, i'd like to use coarse on GL ;-p
<jekstrand>
It looks like the current ones are fine. we just need to tweak the id computation to make it coarse
<alyssa>
er, do we do fine now? I forget
<jekstrand>
I may not be reading it right
<alyssa>
"the current ones are fine." oh you mean they're fine, got it, ambiguous sentence is ambiguous T_T
<alyssa>
jekstrand: Why would we want coarse derivatives, if fine derivatives are equal cost?
<jekstrand>
hehe
<jekstrand>
alyssa: coarse derivatives are quad-uniform which can be useful.
<jekstrand>
Also, in d3d land, coarse is supposed to match texturing. However, that assumes your texture unit does coarse derivatives...
<alyssa>
Hmm, ok
<alyssa>
I'm not sure what the texture unit does
<jekstrand>
There's a great article I found about this a while ago...
<alyssa>
also, note to self: clper_v6 should probably be renamed clper_old or something
<alyssa>
since i later discovered g31 uses it despite being v7
<alyssa>
I'd be unsurprised if G51 used it too
camus has joined #panfrost
<icecream95>
alyssa: That reminds me that LD_VAR_F32_V10 isn't v10-only either..
<alyssa>
icecream95: Indeed... I think I have XML for this, not sure if i pushed it
<icecream95>
..And I hope you didn't waste time independently reversing the texture instructions?
JulianGro has quit [Remote host closed the connection]
atler is now known as Guest1745
atler has joined #panfrost
Guest1745 has quit [Ping timeout: 480 seconds]
davidlt has joined #panfrost
camus1 has joined #panfrost
camus has quit [Remote host closed the connection]
rcf has quit [Quit: WeeChat 3.2.1]
rcf has joined #panfrost
davidlt has quit [Ping timeout: 480 seconds]
vstehle has joined #panfrost
davidlt has joined #panfrost
Daanct12 has joined #panfrost
Daaanct12 has joined #panfrost
Daanct12 has quit [Ping timeout: 480 seconds]
robertfoss has joined #panfrost
rasterman has joined #panfrost
Daaanct12 is now known as Daanct12
camus1 has quit [Remote host closed the connection]
camus has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
erlehmann has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
JulianGro has joined #panfrost
Daanct12 has quit [Quit: Quit]
erlehmann has joined #panfrost
JulianGro[m] has joined #panfrost
JulianGro has quit [Quit: Leaving]
MajorBiscuit has joined #panfrost
nlhowell has joined #panfrost
JulianGro has joined #panfrost
JulianGro has quit [Remote host closed the connection]
karolherbst_ has joined #panfrost
karolherbst has quit [Remote host closed the connection]
MajorBiscuit has quit [Ping timeout: 480 seconds]
JulianGro has joined #panfrost
jelly has quit [Remote host closed the connection]
<jekstrand>
Hrm... I think I'm setting up the descriptor properly. Maybe it's not actually getting bound?
nlhowell has quit [Remote host closed the connection]
nlhowell has joined #panfrost
jelly has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
<alyssa>
jekstrand: I need more context to help
rasterman has quit [Quit: Gettin' stinky!]
<jekstrand>
I figured it out. Was missing bbrezillon's patch to add a default sampler.
<jekstrand>
alyssa: ^^
<alyssa>
Ah.
<alyssa>
The hardware's insistence on a sampler for texture buffers is a bit annoying
<alyssa>
This might be lifted in Valhall, not sure
<alyssa>
jekstrand: Unrelated but I'm super excited to see Connor's preamble stuff land
<alyssa>
There's no good way to implement preamble shaders on Mali (although I heard rumours of this changing, maybe in v10?)
<alyssa>
but bifrost+ strongly assumes you don't do uniform-on-uniform/immediate arithmetic so I'm hoping for nontrivial gains
<alyssa>
jekstrand: Re that dummy sampler patch, why are we disabling seamless cube map?
<alyssa>
(Why do we need any non-default parameters?)
<jekstrand>
alyssa: It's in bbrezillon's panvk tmp branch
<jekstrand>
alyssa: seamless cube maps? Because GL is stupid.
<icecream95>
Preamble shaders seem to just be extra rows in the shader descriptor in v10
<alyssa>
icecream95: I.e. there are real preamble shaders?
<alyssa>
with a store_fau or move_to_fau instruction or something like that?
<icecream95>
I think so
<alyssa>
(As opposed to STORE + pass the FAU buffer as a sysval, old-style)
<icecream95>
It still uses normal memory stores, but it seems that a couple of registers with addresses are preloaded
<alyssa>
Interesting, okay
<alyssa>
I thought we might get a move to fau instruction, like AGX has
<alyssa>
I will have to shake my magic 8-ball harder
jelly has quit []
<icecream95>
Also, v10 has a weird way of storing pointers to FAUs.. they seem to be embedded inside a resource table IIRC
davidlt has quit [Ping timeout: 480 seconds]
<icecream95>
(The preamble has to chase some pointers to find where to store the FAUs)
<alyssa>
fun
<alyssa>
Valhall (v9) is quite flexible in how resource tables are used
<alyssa>
Mesa (well, my branch thereof) is only using a small subset to emulate Midgard-style resources
<alyssa>
It's supposed to map to Vulkan descriptor sets
<icecream95>
You should probably set up the v10 blob so that you can compare against v9 yourself..
<alyssa>
Possibly, but I'm supposed to be writing a v9 driver and not reversing v10
jelly has joined #panfrost
<jekstrand>
Now trying to figure out why the vertex_attribute_divisor tests don't pass on panvk with bbrezillon's patches...
<icecream95>
alyssa: The contents of preamble shaders in v10 seems very similar to v9
<alyssa>
fair enough
* alyssa
needs to finish this essay due tonight
<alyssa>
it is 16:00
<alyssa>
good work alyssa, great planning as always
<alyssa>
🙃
bluebugs has quit [Quit: Leaving]
bluebugs has joined #panfrost
rasterman has joined #panfrost
<icecream95>
alyssa: It seems that pandecode (in panloader) does not even disassemble (fragment, at least) preambles for v9?
jelly has quit [Ping timeout: 480 seconds]
jelly has joined #panfrost
<alyssa>
icecream95: v9 doesn't support preambles natively...
<alyssa>
unless I am greatly mistaken
<icecream95>
I.. think you are mistaken?
* icecream95
double checks to make sure it wasn't an IDVS varying shader
<icecream95>
Yup, preambles are very much supported natively
<icecream95>
alyssa: I'm using the asurada-25.0_p32 blob, maybe you were using a different one which doesn't use preambles?
<icecream95>
The compute shader is stored after the last IDVS shader (either position or varying) in the shader descriptor table, and is not referenced directly
<icecream95>
I wonder.. maybe the shader is actually not being used, and the calculations were being done on the CPU?
<alyssa>
Entirely possible
<alyssa>
Especially if it's all ALU
<alyssa>
something like `texture(tex, uniform_coords)` ought to trigger a real preamble shader
<alyssa>
though I don't know what heuristics the DDK uses
<icecream95>
Why isn't disabling ASLR working for the v9 blob? Maybe the threading messes it up..
<icecream95>
git diff -U10000000 --no-index --word-diff=porcelain dump1 dump2 | sed '/^-/d;/^+/s/[^+]/x/g' | cut -c2- | sed -z 's/\n\([^\n]\)/\1/g'
<icecream95>
^^ == workaround for ASLR messing up dumps. Use on two dumps that are supposed to have the same output
<icecream95>
(Well, in this case it is threading causing non-deterministic execution)
<icecream95>
alyssa: Nope, I think there is actual preframe support
<alyssa>
preframe != preamble
erlehmann has quit [Ping timeout: 480 seconds]
<icecream95>
Oops, I meant preamble
erlehmann has joined #panfrost
karolherbst_ is now known as karolherbst
<jekstrand>
Ugh... indexed drawing on panfrost...
<jekstrand>
What does the hardware use the vertex_range for? What happens if we set it to UINT<N>_MAX where N is the number of bits in the index buffer?
<jekstrand>
Looks like maybe we allocate that much space for the VS output always?
* jekstrand
wonders how the blob VK driver handles this
* jekstrand
wonders if we should just implement Vulkan by compiling the GL driver as CL code.
<alyssa>
jekstrand: ...Do you really want the answer to that?
<alyssa>
because it sucks hard.
<cwabbott>
jekstrand: at least before IDVS, the way that vertex shaders work is that the vertex shader gets run over the entire vertex buffer completely ignoring the index buffer, like a compute shader (and actually they're pretty much the same)
<jekstrand>
alyssa: What bbrezillon has coded up will work for all 98% of CTS tests that use index buffers but is invalid.