#panfrost on 2022-03-10 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

00:00 <jekstrand> Or we can just do fine for everything and YOLO it

01:08 vstehle has quit [Ping timeout: 480 seconds]

01:24 alyssa has joined #panfrost

01:25 <alyssa> icecream95: afbc24a234e, hah! nice

01:26 <alyssa> jekstrand: icecream95: No Mali hardware has support for real geometry or tessellation shaders. It's emulated all the way down.

01:27 <alyssa> Nevertheless, the hardware has some minor features to make the emulation code cheaper

01:27 <alyssa> For geometry shaders, there are some special modes of the attribute descriptor to make it convenient to load inputs in the presence of adjacency primitives etc

01:29 <alyssa> For tessellation shaders, there's a special pixel format (tess_vertex_pack) that (ab)uses bit trickery to squeeze out extra precision needed for conformance reasons. (Honestly not sure of the details. Might be new since Bifrost, don't remember.)

01:29 <alyssa> Valhall adds layered rendering support (icecream95, it should be obvious from v9.xml in Mesa how the hardware wants it, I imagine v10 is similar)

01:30 <alyssa> This adds an output for gl_LayerID in special IDVS jobs, and tiler support for picking out the right layer for a given fragment job.

01:32 <alyssa> Given panfrost doesn't emulate GS, it's not too interesting. But it probably would make GS+layered cheaper on Valhall than on Bifrost.

01:34 <alyssa> jekstrand: As for derivatives, icecream95 is correct

01:35 <alyssa> Only thing I'd add is the .subgroup modifier does not change the hardware execution, only the *perceived* subgroup size.

01:36 <alyssa> A fragment shader on Valhall, for example, still has 16 wide warps. But CLPER.i32.subgroup4 returns results consistent with a quad.

01:36 <jekstrand> alyssa: Yeah. I think I've got it. The next question is if we care about fine vs. coarse

01:36 <alyssa> GL doesn't so we ( bbrezillon and I) went for the cheapest thing

01:37 <alyssa> If VK does, supporting both would be easy -- CLPER.i32 gives you the tools you need :-)

01:37 <jekstrand> I don't think it needs to be complicated

01:38 <alyssa> sure, but if fine is even an extra instruction over coarse, and I can get away with coarse on GL, i'd like to use coarse on GL ;-p

01:38 <jekstrand> It looks like the current ones are fine. we just need to tweak the id computation to make it coarse

01:38 <alyssa> er, do we do fine now? I forget

01:38 <jekstrand> I may not be reading it right

01:38 <alyssa> "the current ones are fine." oh you mean they're fine, got it, ambiguous sentence is ambiguous T_T

01:39 <alyssa> jekstrand: Why would we want coarse derivatives, if fine derivatives are equal cost?

01:39 <jekstrand> hehe

01:39 <jekstrand> alyssa: coarse derivatives are quad-uniform which can be useful.

01:40 <jekstrand> Also, in d3d land, coarse is supposed to match texturing. However, that assumes your texture unit does coarse derivatives...

01:41 <alyssa> Hmm, ok

01:41 <alyssa> I'm not sure what the texture unit does

01:42 <jekstrand> There's a great article I found about this a while ago...

01:44 <jekstrand> https://bgolus.medium.com/distinctive-derivative-differences-cce38d36797b

01:49 <alyssa> also, note to self: clper_v6 should probably be renamed clper_old or something

01:49 <alyssa> since i later discovered g31 uses it despite being v7

01:49 <alyssa> I'd be unsurprised if G51 used it too

01:49 camus has joined #panfrost

01:51 <icecream95> alyssa: That reminds me that LD_VAR_F32_V10 isn't v10-only either..

01:52 <alyssa> icecream95: Indeed... I think I have XML for this, not sure if i pushed it

01:52 <icecream95> ..And I hope you didn't waste time independently reversing the texture instructions?

02:56 JulianGro has quit [Remote host closed the connection]

03:05 atler is now known as Guest1745

03:05 atler has joined #panfrost

03:07 Guest1745 has quit [Ping timeout: 480 seconds]

04:10 davidlt has joined #panfrost

04:40 camus1 has joined #panfrost

04:40 camus has quit [Remote host closed the connection]

04:48 rcf has quit [Quit: WeeChat 3.2.1]

04:51 rcf has joined #panfrost

05:21 davidlt has quit [Ping timeout: 480 seconds]

06:00 vstehle has joined #panfrost

07:54 davidlt has joined #panfrost

07:58 Daanct12 has joined #panfrost

08:52 Daaanct12 has joined #panfrost

08:59 Daanct12 has quit [Ping timeout: 480 seconds]

09:48 robertfoss has joined #panfrost

10:12 rasterman has joined #panfrost

10:27 Daaanct12 is now known as Daanct12

11:06 camus1 has quit [Remote host closed the connection]

11:06 camus has joined #panfrost

11:11 erlehmann has quit [Ping timeout: 480 seconds]

11:14 erlehmann has joined #panfrost

12:11 erlehmann has quit [Ping timeout: 480 seconds]

12:15 JulianGro has joined #panfrost

12:37 Daanct12 has quit [Quit: Quit]

12:37 erlehmann has joined #panfrost

12:46 JulianGro[m] has joined #panfrost

13:05 JulianGro has quit [Quit: Leaving]

13:36 MajorBiscuit has joined #panfrost

13:59 nlhowell has joined #panfrost

16:08 JulianGro has joined #panfrost

16:28 JulianGro has quit [Remote host closed the connection]

17:25 karolherbst_ has joined #panfrost

17:26 karolherbst has quit [Remote host closed the connection]

17:45 MajorBiscuit has quit [Ping timeout: 480 seconds]

18:29 JulianGro has joined #panfrost

20:04 jelly has quit [Remote host closed the connection]

20:08 <jekstrand> Hrm... I think I'm setting up the descriptor properly. Maybe it's not actually getting bound?

20:13 nlhowell has quit [Remote host closed the connection]

20:13 nlhowell has joined #panfrost

20:13 jelly has joined #panfrost

20:23 nlhowell has quit [Ping timeout: 480 seconds]

20:31 <alyssa> jekstrand: I need more context to help

20:32 rasterman has quit [Quit: Gettin' stinky!]

20:41 <jekstrand> I figured it out. Was missing bbrezillon's patch to add a default sampler.

20:41 <jekstrand> alyssa: ^^

20:47 <alyssa> Ah.

20:48 <alyssa> The hardware's insistence on a sampler for texture buffers is a bit annoying

20:48 <alyssa> This might be lifted in Valhall, not sure

20:49 <alyssa> jekstrand: Unrelated but I'm super excited to see Connor's preamble stuff land

20:50 <alyssa> There's no good way to implement preamble shaders on Mali (although I heard rumours of this changing, maybe in v10?)

20:51 <alyssa> but bifrost+ strongly assumes you don't do uniform-on-uniform/immediate arithmetic so I'm hoping for nontrivial gains

20:53 <alyssa> jekstrand: Re that dummy sampler patch, why are we disabling seamless cube map?

20:53 <alyssa> (Why do we need any non-default parameters?)

20:55 <jekstrand> alyssa: It's in bbrezillon's panvk tmp branch

20:55 <jekstrand> alyssa: seamless cube maps? Because GL is stupid.

20:55 <icecream95> Preamble shaders seem to just be extra rows in the shader descriptor in v10

20:56 <alyssa> icecream95: I.e. there are real preamble shaders?

20:56 <alyssa> with a store_fau or move_to_fau instruction or something like that?

20:56 <icecream95> I think so

20:56 <alyssa> (As opposed to STORE + pass the FAU buffer as a sysval, old-style)

20:57 <icecream95> It still uses normal memory stores, but it seems that a couple of registers with addresses are preloaded

20:57 <alyssa> Interesting, okay

20:57 <alyssa> I thought we might get a move to fau instruction, like AGX has

20:57 <alyssa> I will have to shake my magic 8-ball harder

21:01 jelly has quit []

21:01 <icecream95> Also, v10 has a weird way of storing pointers to FAUs.. they seem to be embedded inside a resource table IIRC

21:03 davidlt has quit [Ping timeout: 480 seconds]

21:03 <icecream95> (The preamble has to chase some pointers to find where to store the FAUs)

21:05 <alyssa> fun

21:06 <alyssa> Valhall (v9) is quite flexible in how resource tables are used

21:07 <alyssa> Mesa (well, my branch thereof) is only using a small subset to emulate Midgard-style resources

21:07 <alyssa> It's supposed to map to Vulkan descriptor sets

21:08 <icecream95> You should probably set up the v10 blob so that you can compare against v9 yourself..

21:10 <alyssa> Possibly, but I'm supposed to be writing a v9 driver and not reversing v10

21:12 jelly has joined #panfrost

21:17 <jekstrand> Now trying to figure out why the vertex_attribute_divisor tests don't pass on panvk with bbrezillon's patches...

21:19 <icecream95> alyssa: The contents of preamble shaders in v10 seems very similar to v9

21:21 <alyssa> fair enough

21:21 * alyssa needs to finish this essay due tonight

21:21 <alyssa> it is 16:00

21:21 <alyssa> good work alyssa, great planning as always

21:21 <alyssa> 🙃

21:25 bluebugs has quit [Quit: Leaving]

21:30 bluebugs has joined #panfrost

21:34 rasterman has joined #panfrost

21:40 <icecream95> alyssa: It seems that pandecode (in panloader) does not even disassemble (fragment, at least) preambles for v9?

21:41 jelly has quit [Ping timeout: 480 seconds]

21:46 jelly has joined #panfrost

21:53 <alyssa> icecream95: v9 doesn't support preambles natively...

21:53 <alyssa> unless I am greatly mistaken

22:00 <icecream95> I.. think you are mistaken?

22:02 * icecream95 double checks to make sure it wasn't an IDVS varying shader

22:04 <icecream95> Yup, preambles are very much supported natively

22:06 <icecream95> alyssa: I'm using the asurada-25.0_p32 blob, maybe you were using a different one which doesn't use preambles?

22:09 <icecream95> The compute shader is stored after the last IDVS shader (either position or varying) in the shader descriptor table, and is not referenced directly

22:10 <icecream95> I wonder.. maybe the shader is actually not being used, and the calculations were being done on the CPU?

22:10 <alyssa> Entirely possible

22:10 <alyssa> Especially if it's all ALU

22:10 <alyssa> something like `texture(tex, uniform_coords)` ought to trigger a real preamble shader

22:11 <alyssa> though I don't know what heuristics the DDK uses

22:15 <icecream95> Why isn't disabling ASLR working for the v9 blob? Maybe the threading messes it up..

22:28 <icecream95> git diff -U10000000 --no-index --word-diff=porcelain dump1 dump2 | sed '/^-/d;/^+/s/[^+]/x/g' | cut -c2- | sed -z 's/\n\([^\n]\)/\1/g'

22:29 <icecream95> ^^ == workaround for ASLR messing up dumps. Use on two dumps that are supposed to have the same output

22:30 <icecream95> (Well, in this case it is threading causing non-deterministic execution)

22:32 <icecream95> alyssa: Nope, I think there is actual preframe support

22:41 <alyssa> preframe != preamble

22:41 erlehmann has quit [Ping timeout: 480 seconds]

22:42 <icecream95> Oops, I meant preamble

22:43 erlehmann has joined #panfrost

23:27 karolherbst_ is now known as karolherbst

23:47 <jekstrand> Ugh... indexed drawing on panfrost...

23:49 <jekstrand> What does the hardware use the vertex_range for? What happens if we set it to UINT<N>_MAX where N is the number of bits in the index buffer?

23:50 <jekstrand> Looks like maybe we allocate that much space for the VS output always?

23:51 * jekstrand wonders how the blob VK driver handles this

23:52 * jekstrand wonders if we should just implement Vulkan by compiling the GL driver as CL code.

23:59 <alyssa> jekstrand: ...Do you really want the answer to that?

23:59 <alyssa> because it sucks hard.

23:59 <cwabbott> jekstrand: at least before IDVS, the way that vertex shaders work is that the vertex shader gets run over the entire vertex buffer completely ignoring the index buffer, like a compute shader (and actually they're pretty much the same)

23:59 <jekstrand> alyssa: What bbrezillon has coded up will work for all 98% of CTS tests that use index buffers but is invalid.

23:59 rasterman has quit [Quit: Gettin' stinky!]