ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
join_subline has quit [Remote host closed the connection]
join_subline has joined #panfrost
JulianGro has quit [Remote host closed the connection]
rasterman has joined #panfrost
erlehmann has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
<alyssa> yes, and the LD_TILE would be in the same position, blend shaders on Bifrost+ aren't special they're just function calls
<alyssa> Pass: 39896, Fail: 2529, Crash: 91, Warn: 8, Skip: 187, Flake: 2, Duration: 25:41, Remaining: 0
<alyssa> 93% pass rate on deqp-gles3 on v9, i'll take it i guess
<alyssa> cd ~
<alyssa> most of that is no xfb
<alyssa> though MRT+blend shader is a sizeable chunk too
<icecream95> cd -
<alyssa> admittedly if we ever want real functionc alls, some of the RA fun comes back
<alyssa> I dunno. Given that there's no change to the fast path, and it improves some parts of the slow path, I'm not sure perf is a good reason to keep blend shaders around on bifrost
<alyssa> I know the hw designers go to great pains to avoid shader variants in GLES, but.. meh?
<alyssa> what pathological cases are we worried about, a game using an uber shader with different non-fixed function blend modes?
<icecream95> alyssa: For real function calls.. I would suggest to:
<icecream95> Do RA from inner functions first, then when hitting a function call create a bunch of nodes with fixed solutions and add constraints with all of the live nodes
<icecream95> The problem is how to handle a function calling itself; we can't do RA for that function until we know the post-RA liveness, but to know that we have to do RA...
<icecream95> This means that outer functions do all of the spilling. I don't know if that's better or worse than doing it another way
<alyssa> I mean. You know how it's done on CPUs :-p
<icecream95> alyssa: But doing it like that is no fun
<icecream95> This way you get the "optimal" calling convention for each function
JulianGro has joined #panfrost
<alyssa> yes, well
<alyssa> i don't disagree :-p
<alyssa> RA -- the hardest compiler problem that matters less than you think
<icecream95> (And takes less time than you think, once I get around to creating an MR with my optimisations)
<HdkR> RA, the only thing between me and having all redundant moves removed from my code.
atler has quit [Ping timeout: 480 seconds]
atler has joined #panfrost
JulianGro has quit [Remote host closed the connection]
<jekstrand> icecream95, alyssa: No GPU APIs currently allow recursion. The function call graph is always a DAG.
<jekstrand> Not even OpenCL allows it
<tomeu> anybody here would be able to test a couple of kernel patches? https://lore.kernel.org/lkml/20220314224253.236359-1-dmitry.osipenko@collabora.com/
<HdkR> jekstrand: Technically the Nvidia blob can cause recursion with some pointer hacking :P
Daanct12 has joined #panfrost
jambalaya has quit [Ping timeout: 480 seconds]
jambalaya has joined #panfrost
MajorBiscuit has joined #panfrost
<icecream95> alyssa: Hmm.. always adding constraints to the ends of nodearrays (rather than doing a binary search for an existing element) makes lcra_add_node_interference 60% faster, but lcra_solve is 25% slower and lcra_count_constraints returns slightly wrong values. ACK or NAK?
indy has quit [Ping timeout: 480 seconds]
<cwabbott> alyssa: I haven't been following it too closely, but I'm sure one of these days there's going to be an extension to make the blend state dynamic on VK
<cwabbott> there's a huge push right now to make all the things dynamic
<cwabbott> I don't think realistically you'll be able to get away from blend shaders and having an ABI
Major_Biscuit has joined #panfrost
MajorBiscuit has quit [Ping timeout: 480 seconds]
rasterman has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
camus1 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
rkanwal has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
indy has joined #panfrost
erlehmann has joined #panfrost
rasterman has joined #panfrost
Daanct12 has quit [Ping timeout: 480 seconds]
nlhowell has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
<alyssa> icecream95: Ummm
<alyssa> cwabbott: grumble grumble
<alyssa> how is that supposed to work with AMD..?
<alyssa> i guess amd has actual blending nvm
<cwabbott> the same thing as radeonsi, prologs and epilogs
<cwabbott> and yes AMD has blending
<alyssa> grumble grumble grumble
<cwabbott> it turns out game devs really really want us to suck it up and do that kind of thing
<alyssa> game devs suuuck
<alyssa> I'm also interested in how often blend shaders are hit in real workloads (not just the CTS)
<alyssa> and how often "different blend shader requiring states are used with the same FS" come up
<alyssa> because if the latter is rare, eating the key isn't so bad ....
enunes has quit [Quit: ZNC - https://znc.in]
<cwabbott> well, if it's dynamic in vulkan then you're probably not even going to be able to just eat the key
<cwabbott> vulkan is very much against that sort of thing
JulianGro has joined #panfrost
enunes has joined #panfrost
<alyssa> grumble
<alyssa> I hate blend shaders because supporting them hurts even the apps that don't use them.
<alyssa> (because right now, we have to assume every fragment shader might use blend shaders)
nlhowell has joined #panfrost
nlhowell is now known as Guest2215
nlhowell has joined #panfrost
Guest2215 has quit [Ping timeout: 480 seconds]
<alyssa> jekstrand: re dynamic logic ops, conceptually that's a 4-to-1 mux, with the 4-bits the truth table of the desired logic op and the 2-bits we're muxing with the source and destination
<alyssa> which expands to 3 2-to-1 muxes, and we have a bitwise MUX op on bifrost/valhall, so it's 3 instructions and 4 sysvals
rkanwal has quit [Ping timeout: 480 seconds]
<alyssa> in practice there may be better implementations, but point is -- if we know there is a logic op (but we don't which what logic op), we can do it dynamically "no worse" than 3x more ALU
<alyssa> so it's not a compelling reason
enunes has quit [Quit: ZNC - https://znc.in]
<alyssa> for the color write dynamic state, that's color _write_ and not color _mask_
<alyssa> i.e. only all-0 or all-1
<alyssa> which can be done with the fixed function hw even when the blend itself is in the shader
<alyssa> (or throw in a branch before the st_tile, it's a single instruction only and can't introduce divergence, pretty cheap)
enunes has joined #panfrost
enunes has quit []
nlhowell is now known as Guest2221
nlhowell has joined #panfrost
enunes has joined #panfrost
<alyssa> mm, no, if the blend descriptors are uniform (not keyed), the driver can toggle the "no output" bit dynamically
<alyssa> so the shader code is unchanged for that case, I think
<alyssa> (Realistically we should do that for that extension regardless of blend shaders)
<alyssa> (If we go the blend shader route, that bit gates the blend shader itself and would skip the whole thing)
Guest2221 has quit [Ping timeout: 480 seconds]
enunes has quit [Quit: ZNC - https://znc.in]
enunes has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
enunes has quit [Remote host closed the connection]
erlehmann has joined #panfrost
JulianGro has quit [Remote host closed the connection]
nlhowell is now known as Guest2226
nlhowell has joined #panfrost
Guest2226 has quit [Ping timeout: 480 seconds]
enunes has joined #panfrost
JulianGro has joined #panfrost
enunes has quit [Quit: ZNC - https://znc.in]
enunes has joined #panfrost
nlhowell is now known as Guest2233
nlhowell has joined #panfrost
JulianGro has quit [Remote host closed the connection]
Guest2233 has quit [Ping timeout: 480 seconds]
enunes has quit [Remote host closed the connection]
enunes has joined #panfrost
cphealy has quit [Remote host closed the connection]
enunes has quit [Quit: ZNC - https://znc.in]
enunes has joined #panfrost
<alarumbe> hi alyssa, would you mind having a second look at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15185 ?
<alarumbe> maybe I can just add Wno-pointer-arith to the meson build file for when building panfrost tools
rkanwal has joined #panfrost
cphealy has joined #panfrost
nlhowell is now known as Guest2239
nlhowell has joined #panfrost
rkanwal has quit [Ping timeout: 480 seconds]
Guest2239 has quit [Ping timeout: 480 seconds]
Major_Biscuit has quit []
<alyssa> please do that
<jekstrand> HdkR: Yeah, and I think Intel has an openCL extension for function pointers. I was trying to forget about those things. :P
<HdkR> :D
<jekstrand> Yeah, that one bit me yesterday
<alyssa> me too :| but I thought I broke something in the MR
<alyssa> skips list it but uhh what got broken
<macc24> uh is pointer arithemetic bad?
<robmur01> macc24: it is if you do it on void* and want to be portable
<macc24> robmur01: heh
<daniels> alyssa: I don't see it in skips ... ?
<HdkR> Just make sure to always do ptr arithmetic on char* or uintptr_t ;)
* robmur01 remembers he never saw through with that threat to try building panfrost with MSVC...
<macc24> i sometimes like to do stuff in funniest way possible
<macc24> one time i casted a 8 element uint8_t array into uint64_t
<macc24> to randomize it
nlhowell is now known as Guest2245
nlhowell has joined #panfrost
Guest2245 has quit [Ping timeout: 480 seconds]
lcn has joined #panfrost
nlhowell is now known as Guest2250
nlhowell has joined #panfrost
Guest2250 has quit [Ping timeout: 480 seconds]
<alyssa> daniels: sorry, meant that as a verb -- feel free to add it to skips itmt
<daniels> alyssa: ahh, thanks
<daniels> alyssa: should I skip .* or .36?
JulianGro has joined #panfrost
<alyssa> just .36, I think
<alyssa> but either way will take a look eventually and hopefully revert
nlhowell is now known as Guest2255
nlhowell has joined #panfrost
Guest2255 has quit [Ping timeout: 480 seconds]
rkanwal has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
nlhowell is now known as Guest2260
nlhowell has joined #panfrost
rasterman has joined #panfrost
Guest2260 has quit [Ping timeout: 480 seconds]
nlhowell has quit [Ping timeout: 480 seconds]
<alyssa> jekstrand: "panvk: Always set texture/sampler_index to 0 when doing indirects"
<alyssa> that probably ought to be part of the compiler? I just never saw the case in GL so never handled it, I assume
<jekstrand> alyssa: I don't care who emits the add
<alyssa> fair.
<jekstrand> I thought GL generated that sometimes
* jekstrand looks
<alyssa> Possible
<alyssa> if it's not in the gles31 cts, it's probably broken~
<jekstrand> alyssa: Yeah, not seeing why you wouldn't get that in your back-end.
<jekstrand> nir_lower_samplers clearly sets texture/sampler_index
<jekstrand> Maybe gallium is being weird?
<alyssa> shrug
<alyssa> ...Now I'm wondering if "immediate + indirect --> indirect of iadd" is a useful lower_tex option for other backends
<jekstrand> Maybe? It certainly wouldn't be hard to add.
<alyssa> (doing it in the backend, after NIR, is easy but requires care to ensure we don't emit "add 0")
<jekstrand> Yeah
<jekstrand> We should also add constant folding for texture/sampler_offset, probably.
<alyssa> hm?
<alyssa> the opposite of that lowering, you mean?
<jekstrand> Yup
<jekstrand> If the texture/sampler_offset is an immediate, fold it into tex->texture/sampler_index.
<jekstrand> Annoyingly, the tests I was trying to fix then fail on something having to do with blending once I hack enough other compiler stuff to work. :(
<alyssa> D:
<jekstrand> Wow.... the Invocation packet.
<jekstrand> I'm... I don't even know.
simon-perretta-img_ has joined #panfrost
<jekstrand> How are you supposed to expose limits with this?
<jekstrand> It's like it can dispatch 4G invocations split up however you want?
<jekstrand> That's kind-of cool, I guess, but it's strange.
<jekstrand> very strange.
<jekstrand> It'll be all sorts of entertaining for OpenCL.
rkanwal has quit [Ping timeout: 480 seconds]
<jekstrand> I guess maybe you expose a max workgroup size of 256 and max global size of 256^3?
<jekstrand> Or 1024 and 128^3 and have a bit left over
* jekstrand is so confused
<cwabbott> I actually remember wondering the same thing, lol
simon-perretta-img has quit [Ping timeout: 480 seconds]
* jekstrand looks up a Mali in gpuinfo
<cwabbott> and it turns out that with the limits they exposed in GLES you could launch something larger than 2^32 total instances
<cwabbott> I tried it, and... out-of-memory error
<jekstrand> The blob appears to advertise 512 workgroup size and (2^32)^3[5~
<jekstrand> number of workgroups
<jekstrand> I guess they just blast out a pile of dispaptches if it goes over the limit?
<cwabbott> nope... it just errors out
<cwabbott> with some weird error like out of GPU memory
<jekstrand> Well, yeah. I don't assume you can actually dispatch that many
<jekstrand> Even if you didn't OOM, it'd take that tiny GPU millenia to complete
<cwabbott> I guess they're saved by the fact that no CTS test would be so crazy to dispatch that many
<cwabbott> yeah, that would be a long test...
<alyssa> cwabbott: still thinking about "what if there was a dynamic blend state in vk 1.4"
<alyssa> undecided how I feel about it
<alyssa> jekstrand: also seemed undecided (?)
<jekstrand> I feel like valhal bringup is more important than features that don't yet exist going into a vk version in 2 years.
<alyssa> I mean. Fiar.
<alyssa> Fair
<alyssa> Valhall + Blend shaders isn't *that* much work it just feels so pointless.
<alyssa> (and counter productive)
<alyssa> I think I convinced myself the way to go for it is a pseudo instruction that gets lowered "just before" packing (after RA and scheduling, in particular)
<alyssa> and then the branch offsets can be hardcoded but nobody cares
<alyssa> (In particular, lowered after all opt passes)
rasterman has quit [Quit: Gettin' stinky!]
<jekstrand> I feel like the INVOCATION packet almost calls for whatever the GenXML version of a lowering pass is. Something where you have a struct that's defined by us which is actually what you fill out and then the pack process does non-trivial work to turn it into the stupid thing.
<alyssa> jekstrand: good news, INVOCATION goes away on valhall
<jekstrand> :)
<alyssa> like most bad things in bifrost
<alyssa> (that dated back to midgard)
<alyssa> that one goes back to t604
<alyssa> like blend shaders
<jekstrand> valhal-only, when?
<alyssa> oy
<anarsoul> jekstrand: they still ship SoCs with utgard and midgard :)
<jekstrand> anarsoul: Yes. I know. And utgard will never ever go away no matter how hard we all collectively try to kill it.
<jekstrand> But that doesn't mean it needs Vulkan support. :P
* anarsoul isn't trying to kill utgard :P
<jekstrand> See, there's the problem. :P
<alyssa> anarsoul: well get with the program
<icecream95> On the topic of Utgard, I note that a chapter in the book of Norse legends I have is called "At the Castle of Utgard"
<icecream95> Neither Midgard, Bifrost nor Valhall get their own chapters.
<alyssa> I will point out Bifrost is the rainbow /bridge/
<alyssa> you were supposed to cross it, but never stay on it
<alyssa> it was always just to get you to Valhall
<jekstrand> :D
<anarsoul> jekstrand: well, panfrost has too many devs working on it, so I'm afraid there's nothing fun left for me :)
<anarsoul> thus utgard and lima
simon-perretta-img_ has quit []
simon-perretta-img_ has joined #panfrost
<icecream95> "Still greater was the astonishment of all when Utgarda continued to tell them that he and the giant were the same person"
<icecream95> Sorry for the spoiler
simon-perretta-img_ has quit []
simon-perretta-img has joined #panfrost
<alyssa> anarsoul: too many devs? :C
* jekstrand doesn't think panfrost has that problem
<icecream95> How many clones would I want?.. One for reverse engineering Valhall, one for doing weird compute shader stuff, one for making the compiler faster, one for debugging, one for..
<alyssa> same hat
<alyssa> $$\hat{\text{same}}$$
<anarsoul> alyssa: you've got at least 3 now? youself, icecream95 and jekstrand?
* jekstrand isn't really a panfrost dedv
<jekstrand> *dev
<alyssa> nobody full time year round
<alyssa> Uh oh :-p
<daniels> yeah, Rock 5B was also already announced