ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs - <macc24> i have been here before it was popular
<jekstrand> Or we can just do fine for everything and YOLO it
vstehle has quit [Ping timeout: 480 seconds]
alyssa has joined #panfrost
<alyssa> icecream95: afbc24a234e, hah! nice
<alyssa> jekstrand: icecream95: No Mali hardware has support for real geometry or tessellation shaders. It's emulated all the way down.
<alyssa> Nevertheless, the hardware has some minor features to make the emulation code cheaper
<alyssa> For geometry shaders, there are some special modes of the attribute descriptor to make it convenient to load inputs in the presence of adjacency primitives etc
<alyssa> For tessellation shaders, there's a special pixel format (tess_vertex_pack) that (ab)uses bit trickery to squeeze out extra precision needed for conformance reasons. (Honestly not sure of the details. Might be new since Bifrost, don't remember.)
<alyssa> Valhall adds layered rendering support (icecream95, it should be obvious from v9.xml in Mesa how the hardware wants it, I imagine v10 is similar)
<alyssa> This adds an output for gl_LayerID in special IDVS jobs, and tiler support for picking out the right layer for a given fragment job.
<alyssa> Given panfrost doesn't emulate GS, it's not too interesting. But it probably would make GS+layered cheaper on Valhall than on Bifrost.
<alyssa> jekstrand: As for derivatives, icecream95 is correct
<alyssa> Only thing I'd add is the .subgroup modifier does not change the hardware execution, only the *perceived* subgroup size.
<alyssa> A fragment shader on Valhall, for example, still has 16 wide warps. But CLPER.i32.subgroup4 returns results consistent with a quad.
<jekstrand> alyssa: Yeah. I think I've got it. The next question is if we care about fine vs. coarse
<alyssa> GL doesn't so we ( bbrezillon and I) went for the cheapest thing
<alyssa> If VK does, supporting both would be easy -- CLPER.i32 gives you the tools you need :-)
<jekstrand> I don't think it needs to be complicated
<alyssa> sure, but if fine is even an extra instruction over coarse, and I can get away with coarse on GL, i'd like to use coarse on GL ;-p
<jekstrand> It looks like the current ones are fine. we just need to tweak the id computation to make it coarse
<alyssa> er, do we do fine now? I forget
<jekstrand> I may not be reading it right
<alyssa> "the current ones are fine." oh you mean they're fine, got it, ambiguous sentence is ambiguous T_T
<alyssa> jekstrand: Why would we want coarse derivatives, if fine derivatives are equal cost?
<jekstrand> hehe
<jekstrand> alyssa: coarse derivatives are quad-uniform which can be useful.
<jekstrand> Also, in d3d land, coarse is supposed to match texturing. However, that assumes your texture unit does coarse derivatives...
<alyssa> Hmm, ok
<alyssa> I'm not sure what the texture unit does
<jekstrand> There's a great article I found about this a while ago...
<alyssa> also, note to self: clper_v6 should probably be renamed clper_old or something
<alyssa> since i later discovered g31 uses it despite being v7
<alyssa> I'd be unsurprised if G51 used it too
camus has joined #panfrost
<icecream95> alyssa: That reminds me that LD_VAR_F32_V10 isn't v10-only either..
<alyssa> icecream95: Indeed... I think I have XML for this, not sure if i pushed it
<icecream95> ..And I hope you didn't waste time independently reversing the texture instructions?
JulianGro has quit [Remote host closed the connection]
atler is now known as Guest1745
atler has joined #panfrost
Guest1745 has quit [Ping timeout: 480 seconds]
davidlt has joined #panfrost
camus1 has joined #panfrost
camus has quit [Remote host closed the connection]
rcf has quit [Quit: WeeChat 3.2.1]
rcf has joined #panfrost
davidlt has quit [Ping timeout: 480 seconds]
vstehle has joined #panfrost
davidlt has joined #panfrost
Daanct12 has joined #panfrost
Daaanct12 has joined #panfrost
Daanct12 has quit [Ping timeout: 480 seconds]
robertfoss has joined #panfrost
rasterman has joined #panfrost
Daaanct12 is now known as Daanct12
camus1 has quit [Remote host closed the connection]
camus has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
erlehmann has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
JulianGro has joined #panfrost
Daanct12 has quit [Quit: Quit]
erlehmann has joined #panfrost
JulianGro[m] has joined #panfrost
JulianGro has quit [Quit: Leaving]
MajorBiscuit has joined #panfrost
nlhowell has joined #panfrost
JulianGro has joined #panfrost
JulianGro has quit [Remote host closed the connection]
karolherbst_ has joined #panfrost
karolherbst has quit [Remote host closed the connection]
MajorBiscuit has quit [Ping timeout: 480 seconds]
JulianGro has joined #panfrost
jelly has quit [Remote host closed the connection]
<jekstrand> Hrm... I think I'm setting up the descriptor properly. Maybe it's not actually getting bound?
nlhowell has quit [Remote host closed the connection]
nlhowell has joined #panfrost
jelly has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
<alyssa> jekstrand: I need more context to help
rasterman has quit [Quit: Gettin' stinky!]
<jekstrand> I figured it out. Was missing bbrezillon's patch to add a default sampler.
<jekstrand> alyssa: ^^
<alyssa> Ah.
<alyssa> The hardware's insistence on a sampler for texture buffers is a bit annoying
<alyssa> This might be lifted in Valhall, not sure
<alyssa> jekstrand: Unrelated but I'm super excited to see Connor's preamble stuff land
<alyssa> There's no good way to implement preamble shaders on Mali (although I heard rumours of this changing, maybe in v10?)
<alyssa> but bifrost+ strongly assumes you don't do uniform-on-uniform/immediate arithmetic so I'm hoping for nontrivial gains
<alyssa> jekstrand: Re that dummy sampler patch, why are we disabling seamless cube map?
<alyssa> (Why do we need any non-default parameters?)
<jekstrand> alyssa: It's in bbrezillon's panvk tmp branch
<jekstrand> alyssa: seamless cube maps? Because GL is stupid.
<icecream95> Preamble shaders seem to just be extra rows in the shader descriptor in v10
<alyssa> icecream95: I.e. there are real preamble shaders?
<alyssa> with a store_fau or move_to_fau instruction or something like that?
<icecream95> I think so
<alyssa> (As opposed to STORE + pass the FAU buffer as a sysval, old-style)
<icecream95> It still uses normal memory stores, but it seems that a couple of registers with addresses are preloaded
<alyssa> Interesting, okay
<alyssa> I thought we might get a move to fau instruction, like AGX has
<alyssa> I will have to shake my magic 8-ball harder
jelly has quit []
<icecream95> Also, v10 has a weird way of storing pointers to FAUs.. they seem to be embedded inside a resource table IIRC
davidlt has quit [Ping timeout: 480 seconds]
<icecream95> (The preamble has to chase some pointers to find where to store the FAUs)
<alyssa> fun
<alyssa> Valhall (v9) is quite flexible in how resource tables are used
<alyssa> Mesa (well, my branch thereof) is only using a small subset to emulate Midgard-style resources
<alyssa> It's supposed to map to Vulkan descriptor sets
<icecream95> You should probably set up the v10 blob so that you can compare against v9 yourself..
<alyssa> Possibly, but I'm supposed to be writing a v9 driver and not reversing v10
jelly has joined #panfrost
<jekstrand> Now trying to figure out why the vertex_attribute_divisor tests don't pass on panvk with bbrezillon's patches...
<icecream95> alyssa: The contents of preamble shaders in v10 seems very similar to v9
<alyssa> fair enough
* alyssa needs to finish this essay due tonight
<alyssa> it is 16:00
<alyssa> good work alyssa, great planning as always
<alyssa> 🙃
bluebugs has quit [Quit: Leaving]
bluebugs has joined #panfrost
rasterman has joined #panfrost
<icecream95> alyssa: It seems that pandecode (in panloader) does not even disassemble (fragment, at least) preambles for v9?
jelly has quit [Ping timeout: 480 seconds]
jelly has joined #panfrost
<alyssa> icecream95: v9 doesn't support preambles natively...
<alyssa> unless I am greatly mistaken
<icecream95> I.. think you are mistaken?
* icecream95 double checks to make sure it wasn't an IDVS varying shader
<icecream95> Yup, preambles are very much supported natively
<icecream95> alyssa: I'm using the asurada-25.0_p32 blob, maybe you were using a different one which doesn't use preambles?
<icecream95> The compute shader is stored after the last IDVS shader (either position or varying) in the shader descriptor table, and is not referenced directly
<icecream95> I wonder.. maybe the shader is actually not being used, and the calculations were being done on the CPU?
<alyssa> Entirely possible
<alyssa> Especially if it's all ALU
<alyssa> something like `texture(tex, uniform_coords)` ought to trigger a real preamble shader
<alyssa> though I don't know what heuristics the DDK uses
<icecream95> Why isn't disabling ASLR working for the v9 blob? Maybe the threading messes it up..
<icecream95> git diff -U10000000 --no-index --word-diff=porcelain dump1 dump2 | sed '/^-/d;/^+/s/[^+]/x/g' | cut -c2- | sed -z 's/\n\([^\n]\)/\1/g'
<icecream95> ^^ == workaround for ASLR messing up dumps. Use on two dumps that are supposed to have the same output
<icecream95> (Well, in this case it is threading causing non-deterministic execution)
<icecream95> alyssa: Nope, I think there is actual preframe support
<alyssa> preframe != preamble
erlehmann has quit [Ping timeout: 480 seconds]
<icecream95> Oops, I meant preamble
erlehmann has joined #panfrost
karolherbst_ is now known as karolherbst
<jekstrand> Ugh... indexed drawing on panfrost...
<jekstrand> What does the hardware use the vertex_range for? What happens if we set it to UINT<N>_MAX where N is the number of bits in the index buffer?
<jekstrand> Looks like maybe we allocate that much space for the VS output always?
* jekstrand wonders how the blob VK driver handles this
* jekstrand wonders if we should just implement Vulkan by compiling the GL driver as CL code.
<alyssa> jekstrand: ...Do you really want the answer to that?
<alyssa> because it sucks hard.
<cwabbott> jekstrand: at least before IDVS, the way that vertex shaders work is that the vertex shader gets run over the entire vertex buffer completely ignoring the index buffer, like a compute shader (and actually they're pretty much the same)
<jekstrand> alyssa: What bbrezillon has coded up will work for all 98% of CTS tests that use index buffers but is invalid.
rasterman has quit [Quit: Gettin' stinky!]