#panfrost on 2022-03-15 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

00:04 `join_subline has quit [Remote host closed the connection]

00:04 `join_subline has joined #panfrost

00:09 JulianGro has quit [Remote host closed the connection]

00:10 rasterman has joined #panfrost

00:20 erlehmann has joined #panfrost

00:21 rasterman has quit [Quit: Gettin' stinky!]

01:14 <alyssa> yes, and the LD_TILE would be in the same position, blend shaders on Bifrost+ aren't special they're just function calls

01:14 <alyssa> Pass: 39896, Fail: 2529, Crash: 91, Warn: 8, Skip: 187, Flake: 2, Duration: 25:41, Remaining: 0

01:15 <alyssa> 93% pass rate on deqp-gles3 on v9, i'll take it i guess

01:15 <alyssa> cd ~

01:16 <alyssa> most of that is no xfb

01:16 <alyssa> though MRT+blend shader is a sizeable chunk too

01:17 <icecream95> cd -

01:22 <alyssa> admittedly if we ever want real functionc alls, some of the RA fun comes back

01:23 <alyssa> I dunno. Given that there's no change to the fast path, and it improves some parts of the slow path, I'm not sure perf is a good reason to keep blend shaders around on bifrost

01:24 <alyssa> I know the hw designers go to great pains to avoid shader variants in GLES, but.. meh?

01:24 <alyssa> what pathological cases are we worried about, a game using an uber shader with different non-fixed function blend modes?

01:33 <icecream95> alyssa: For real function calls.. I would suggest to:

01:33 <icecream95> Do RA from inner functions first, then when hitting a function call create a bunch of nodes with fixed solutions and add constraints with all of the live nodes

01:33 <icecream95> The problem is how to handle a function calling itself; we can't do RA for that function until we know the post-RA liveness, but to know that we have to do RA...

01:33 <icecream95> This means that outer functions do all of the spilling. I don't know if that's better or worse than doing it another way

01:39 <alyssa> I mean. You know how it's done on CPUs :-p

01:39 <icecream95> alyssa: But doing it like that is no fun

01:39 <icecream95> This way you get the "optimal" calling convention for each function

01:41 JulianGro has joined #panfrost

01:59 <alyssa> yes, well

01:59 <alyssa> i don't disagree :-p

02:00 <alyssa> RA -- the hardest compiler problem that matters less than you think

02:00 <icecream95> (And takes less time than you think, once I get around to creating an MR with my optimisations)

02:06 <HdkR> RA, the only thing between me and having all redundant moves removed from my code.

02:07 atler has quit [Ping timeout: 480 seconds]

02:08 atler has joined #panfrost

02:24 JulianGro has quit [Remote host closed the connection]

04:22 <jekstrand> icecream95, alyssa: No GPU APIs currently allow recursion. The function call graph is always a DAG.

04:22 <jekstrand> Not even OpenCL allows it

06:16 <tomeu> anybody here would be able to test a couple of kernel patches? https://lore.kernel.org/lkml/20220314224253.236359-1-dmitry.osipenko@collabora.com/

06:24 <HdkR> jekstrand: Technically the Nvidia blob can cause recursion with some pointer hacking :P

07:04 Daanct12 has joined #panfrost

07:27 jambalaya has quit [Ping timeout: 480 seconds]

07:44 jambalaya has joined #panfrost

08:01 MajorBiscuit has joined #panfrost

08:54 <icecream95> alyssa: Hmm.. always adding constraints to the ends of nodearrays (rather than doing a binary search for an existing element) makes lcra_add_node_interference 60% faster, but lcra_solve is 25% slower and lcra_count_constraints returns slightly wrong values. ACK or NAK?

08:55 <icecream95> (Overall, running shader-db on shaders/skia/781.shader_test is about 5% faster)

09:08 indy has quit [Ping timeout: 480 seconds]

09:15 <cwabbott> alyssa: I haven't been following it too closely, but I'm sure one of these days there's going to be an extension to make the blend state dynamic on VK

09:15 <cwabbott> there's a huge push right now to make all the things dynamic

09:15 <cwabbott> I don't think realistically you'll be able to get away from blend shaders and having an ABI

09:29 Major_Biscuit has joined #panfrost

09:35 MajorBiscuit has quit [Ping timeout: 480 seconds]

09:53 rasterman has joined #panfrost

10:07 rasterman has quit [Quit: Gettin' stinky!]

10:40 camus1 has joined #panfrost

10:41 camus has quit [Ping timeout: 480 seconds]

10:46 rkanwal has joined #panfrost

11:11 erlehmann has quit [Ping timeout: 480 seconds]

11:18 indy has joined #panfrost

11:19 erlehmann has joined #panfrost

11:36 rasterman has joined #panfrost

12:00 Daanct12 has quit [Ping timeout: 480 seconds]

12:24 nlhowell has joined #panfrost

12:33 nlhowell has quit [Ping timeout: 480 seconds]

12:42 <alyssa> icecream95: Ummm

12:42 <alyssa> cwabbott: grumble grumble

12:46 <alyssa> how is that supposed to work with AMD..?

12:47 <alyssa> i guess amd has actual blending nvm

12:47 <cwabbott> the same thing as radeonsi, prologs and epilogs

12:47 <cwabbott> and yes AMD has blending

12:48 <alyssa> grumble grumble grumble

12:48 <cwabbott> it turns out game devs really really want us to suck it up and do that kind of thing

12:48 <alyssa> game devs suuuck

12:51 <alyssa> I'm also interested in how often blend shaders are hit in real workloads (not just the CTS)

12:51 <alyssa> and how often "different blend shader requiring states are used with the same FS" come up

12:51 <alyssa> because if the latter is rare, eating the key isn't so bad ....

13:01 enunes has quit [Quit: ZNC - https://znc.in]

13:10 <cwabbott> well, if it's dynamic in vulkan then you're probably not even going to be able to just eat the key

13:10 <cwabbott> vulkan is very much against that sort of thing

13:11 JulianGro has joined #panfrost

13:24 enunes has joined #panfrost

13:28 <alyssa> grumble

13:28 <alyssa> I hate blend shaders because supporting them hurts even the apps that don't use them.

13:33 <alyssa> (because right now, we have to assume every fragment shader might use blend shaders)

13:33 nlhowell has joined #panfrost

13:49 nlhowell is now known as Guest2215

13:49 nlhowell has joined #panfrost

13:55 Guest2215 has quit [Ping timeout: 480 seconds]

14:33 <alyssa> jekstrand: re dynamic logic ops, conceptually that's a 4-to-1 mux, with the 4-bits the truth table of the desired logic op and the 2-bits we're muxing with the source and destination

14:34 <alyssa> which expands to 3 2-to-1 muxes, and we have a bitwise MUX op on bifrost/valhall, so it's 3 instructions and 4 sysvals

14:35 rkanwal has quit [Ping timeout: 480 seconds]

14:37 <alyssa> in practice there may be better implementations, but point is -- if we know there is a logic op (but we don't which what logic op), we can do it dynamically "no worse" than 3x more ALU

14:38 <alyssa> so it's not a compelling reason

14:38 enunes has quit [Quit: ZNC - https://znc.in]

14:39 <alyssa> for the color write dynamic state, that's color _write_ and not color _mask_

14:39 <alyssa> i.e. only all-0 or all-1

14:39 <alyssa> which can be done with the fixed function hw even when the blend itself is in the shader

14:40 <alyssa> (or throw in a branch before the st_tile, it's a single instruction only and can't introduce divergence, pretty cheap)

14:44 enunes has joined #panfrost

14:44 enunes has quit []

14:45 nlhowell is now known as Guest2221

14:45 nlhowell has joined #panfrost

14:45 enunes has joined #panfrost

14:49 <alyssa> mm, no, if the blend descriptors are uniform (not keyed), the driver can toggle the "no output" bit dynamically

14:49 <alyssa> so the shader code is unchanged for that case, I think

14:50 <alyssa> (Realistically we should do that for that extension regardless of blend shaders)

14:50 <alyssa> (If we go the blend shader route, that bit gates the blend shader itself and would skip the whole thing)

14:52 Guest2221 has quit [Ping timeout: 480 seconds]

15:00 enunes has quit [Quit: ZNC - https://znc.in]

15:02 enunes has joined #panfrost

15:08 erlehmann has quit [Ping timeout: 480 seconds]

15:11 enunes has quit [Remote host closed the connection]

15:12 erlehmann has joined #panfrost

15:16 JulianGro has quit [Remote host closed the connection]

15:33 nlhowell is now known as Guest2226

15:33 nlhowell has joined #panfrost

15:40 Guest2226 has quit [Ping timeout: 480 seconds]

15:43 enunes has joined #panfrost

16:04 JulianGro has joined #panfrost

16:16 enunes has quit [Quit: ZNC - https://znc.in]

16:19 enunes has joined #panfrost

16:29 nlhowell is now known as Guest2233

16:29 nlhowell has joined #panfrost

16:35 JulianGro has quit [Remote host closed the connection]

16:36 Guest2233 has quit [Ping timeout: 480 seconds]

16:43 enunes has quit [Remote host closed the connection]

16:47 enunes has joined #panfrost

16:57 cphealy has quit [Remote host closed the connection]

17:08 enunes has quit [Quit: ZNC - https://znc.in]

17:09 enunes has joined #panfrost

17:13 <alarumbe> hi alyssa, would you mind having a second look at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15185 ?

17:14 <alarumbe> maybe I can just add Wno-pointer-arith to the meson build file for when building panfrost tools

17:14 rkanwal has joined #panfrost

17:16 cphealy has joined #panfrost

17:18 nlhowell is now known as Guest2239

17:18 nlhowell has joined #panfrost

17:22 rkanwal has quit [Ping timeout: 480 seconds]

17:25 Guest2239 has quit [Ping timeout: 480 seconds]

17:26 Major_Biscuit has quit []

17:31 <alyssa> please do that

17:55 <jekstrand> HdkR: Yeah, and I think Intel has an openCL extension for function pointers. I was trying to forget about those things. :P

17:55 <HdkR> :D

17:55 <daniels> hm, https://gitlab.freedesktop.org/mesa/mesa/-/jobs/19776551 is a new random fail

17:57 <jekstrand> Yeah, that one bit me yesterday

18:06 <alyssa> me too :| but I thought I broke something in the MR

18:06 <alyssa> skips list it but uhh what got broken

18:11 <macc24> uh is pointer arithemetic bad?

18:11 <robmur01> macc24: it is if you do it on void* and want to be portable

18:12 <macc24> robmur01: heh

18:13 <daniels> alyssa: I don't see it in skips ... ?

18:13 <HdkR> Just make sure to always do ptr arithmetic on char* or uintptr_t ;)

18:13 * robmur01 remembers he never saw through with that threat to try building panfrost with MSVC...

18:13 <macc24> i sometimes like to do stuff in funniest way possible

18:14 <macc24> one time i casted a 8 element uint8_t array into uint64_t

18:14 <macc24> to randomize it

18:14 nlhowell is now known as Guest2245

18:14 nlhowell has joined #panfrost

18:21 Guest2245 has quit [Ping timeout: 480 seconds]

18:35 lcn has joined #panfrost

19:03 nlhowell is now known as Guest2250

19:03 nlhowell has joined #panfrost

19:09 Guest2250 has quit [Ping timeout: 480 seconds]

19:09 <alyssa> daniels: sorry, meant that as a verb -- feel free to add it to skips itmt

19:20 <daniels> alyssa: ahh, thanks

19:20 <daniels> alyssa: should I skip .* or .36?

19:48 JulianGro has joined #panfrost

19:50 <alyssa> just .36, I think

19:50 <alyssa> but either way will take a look eventually and hopefully revert

19:58 nlhowell is now known as Guest2255

19:58 nlhowell has joined #panfrost

20:05 Guest2255 has quit [Ping timeout: 480 seconds]

20:06 rkanwal has joined #panfrost

20:41 rasterman has quit [Quit: Gettin' stinky!]

20:47 nlhowell is now known as Guest2260

20:47 nlhowell has joined #panfrost

20:49 rasterman has joined #panfrost

20:54 Guest2260 has quit [Ping timeout: 480 seconds]

21:05 nlhowell has quit [Ping timeout: 480 seconds]

22:00 <alyssa> jekstrand: "panvk: Always set texture/sampler_index to 0 when doing indirects"

22:01 <alyssa> that probably ought to be part of the compiler? I just never saw the case in GL so never handled it, I assume

22:02 <jekstrand> alyssa: I don't care who emits the add

22:02 <alyssa> fair.

22:03 <jekstrand> I thought GL generated that sometimes

22:03 * jekstrand looks

22:03 <alyssa> Possible

22:04 <alyssa> if it's not in the gles31 cts, it's probably broken~

22:04 <jekstrand> alyssa: Yeah, not seeing why you wouldn't get that in your back-end.

22:04 <jekstrand> nir_lower_samplers clearly sets texture/sampler_index

22:05 <jekstrand> Maybe gallium is being weird?

22:05 <alyssa> shrug

22:06 <alyssa> ...Now I'm wondering if "immediate + indirect --> indirect of iadd" is a useful lower_tex option for other backends

22:06 <jekstrand> Maybe? It certainly wouldn't be hard to add.

22:06 <alyssa> (doing it in the backend, after NIR, is easy but requires care to ensure we don't emit "add 0")

22:07 <jekstrand> Yeah

22:07 <jekstrand> We should also add constant folding for texture/sampler_offset, probably.

22:08 <alyssa> hm?

22:08 <alyssa> the opposite of that lowering, you mean?

22:08 <jekstrand> Yup

22:08 <jekstrand> If the texture/sampler_offset is an immediate, fold it into tex->texture/sampler_index.

22:10 <jekstrand> Annoyingly, the tests I was trying to fix then fail on something having to do with blending once I hack enough other compiler stuff to work. :(

22:13 <alyssa> D:

22:38 <jekstrand> Wow.... the Invocation packet.

22:38 <jekstrand> I'm... I don't even know.

22:39 simon-perretta-img_ has joined #panfrost

22:41 <jekstrand> How are you supposed to expose limits with this?

22:41 <jekstrand> It's like it can dispatch 4G invocations split up however you want?

22:42 <jekstrand> That's kind-of cool, I guess, but it's strange.

22:42 <jekstrand> very strange.

22:42 <jekstrand> It'll be all sorts of entertaining for OpenCL.

22:42 rkanwal has quit [Ping timeout: 480 seconds]

22:44 <jekstrand> I guess maybe you expose a max workgroup size of 256 and max global size of 256^3?

22:44 <jekstrand> Or 1024 and 128^3 and have a bit left over

22:44 * jekstrand is so confused

22:44 <cwabbott> I actually remember wondering the same thing, lol

22:45 simon-perretta-img has quit [Ping timeout: 480 seconds]

22:45 * jekstrand looks up a Mali in gpuinfo

22:45 <cwabbott> and it turns out that with the limits they exposed in GLES you could launch something larger than 2^32 total instances

22:45 <cwabbott> I tried it, and... out-of-memory error

22:46 <jekstrand> The blob appears to advertise 512 workgroup size and (2^32)^3[5~

22:46 <jekstrand> number of workgroups

22:46 <jekstrand> I guess they just blast out a pile of dispaptches if it goes over the limit?

22:47 <cwabbott> nope... it just errors out

22:47 <cwabbott> with some weird error like out of GPU memory

22:47 <jekstrand> Well, yeah. I don't assume you can actually dispatch that many

22:48 <jekstrand> Even if you didn't OOM, it'd take that tiny GPU millenia to complete

22:48 <cwabbott> I guess they're saved by the fact that no CTS test would be so crazy to dispatch that many

22:48 <cwabbott> yeah, that would be a long test...

22:51 <alyssa> cwabbott: still thinking about "what if there was a dynamic blend state in vk 1.4"

22:52 <alyssa> undecided how I feel about it

22:52 <alyssa> jekstrand: also seemed undecided (?)

22:53 <jekstrand> I feel like valhal bringup is more important than features that don't yet exist going into a vk version in 2 years.

22:53 <alyssa> I mean. Fiar.

22:53 <alyssa> Fair

22:53 <alyssa> Valhall + Blend shaders isn't *that* much work it just feels so pointless.

22:53 <alyssa> (and counter productive)

22:54 <alyssa> I think I convinced myself the way to go for it is a pseudo instruction that gets lowered "just before" packing (after RA and scheduling, in particular)

22:55 <alyssa> and then the branch offsets can be hardcoded but nobody cares

22:55 <alyssa> (In particular, lowered after all opt passes)

22:56 rasterman has quit [Quit: Gettin' stinky!]

22:58 <jekstrand> I feel like the INVOCATION packet almost calls for whatever the GenXML version of a lowering pass is. Something where you have a struct that's defined by us which is actually what you fill out and then the pack process does non-trivial work to turn it into the stupid thing.

22:59 <alyssa> jekstrand: good news, INVOCATION goes away on valhall

22:59 <jekstrand> :)

22:59 <alyssa> like most bad things in bifrost

23:00 <alyssa> (that dated back to midgard)

23:00 <alyssa> that one goes back to t604

23:00 <alyssa> like blend shaders

23:00 <jekstrand> valhal-only, when?

23:01 <alyssa> oy

23:01 <anarsoul> jekstrand: they still ship SoCs with utgard and midgard :)

23:01 <jekstrand> anarsoul: Yes. I know. And utgard will never ever go away no matter how hard we all collectively try to kill it.

23:01 <jekstrand> But that doesn't mean it needs Vulkan support. :P

23:02 * anarsoul isn't trying to kill utgard :P

23:02 <jekstrand> See, there's the problem. :P

23:02 <alyssa> anarsoul: well get with the program

23:03 <icecream95> On the topic of Utgard, I note that a chapter in the book of Norse legends I have is called "At the Castle of Utgard"

23:03 <icecream95> Neither Midgard, Bifrost nor Valhall get their own chapters.

23:03 <alyssa> I will point out Bifrost is the rainbow /bridge/

23:04 <alyssa> you were supposed to cross it, but never stay on it

23:04 <alyssa> it was always just to get you to Valhall

23:04 <jekstrand> :D

23:04 <anarsoul> jekstrand: well, panfrost has too many devs working on it, so I'm afraid there's nothing fun left for me :)

23:04 <anarsoul> thus utgard and lima

23:04 simon-perretta-img_ has quit []

23:05 simon-perretta-img_ has joined #panfrost

23:05 <icecream95> "Still greater was the astonishment of all when Utgarda continued to tell them that he and the giant were the same person"

23:05 <icecream95> Sorry for the spoiler

23:05 simon-perretta-img_ has quit []

23:05 simon-perretta-img has joined #panfrost

23:14 <alyssa> anarsoul: too many devs? :C

23:14 * jekstrand doesn't think panfrost has that problem

23:16 <icecream95> How many clones would I want?.. One for reverse engineering Valhall, one for doing weird compute shader stuff, one for making the compiler faster, one for debugging, one for..

23:16 <alyssa> same hat

23:16 <alyssa> $$\hat{\text{same}}$$

23:19 <anarsoul> alyssa: you've got at least 3 now? youself, icecream95 and jekstrand?

23:19 * jekstrand isn't really a panfrost dedv

23:19 <jekstrand> *dev

23:23 <alyssa> nobody full time year round

23:39 <icecream95> RK3588 is coming soon.. https://www.pine64.org/2022/03/15/march-update-introducing-the-quartzpro64/

23:43 <alyssa> Uh oh :-p

23:47 <daniels> yeah, Rock 5B was also already announced