ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
`join_subline has quit [Remote host closed the connection]
`join_subline has joined #panfrost
JulianGro has quit [Remote host closed the connection]
rasterman has joined #panfrost
erlehmann has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
<alyssa>
yes, and the LD_TILE would be in the same position, blend shaders on Bifrost+ aren't special they're just function calls
<alyssa>
93% pass rate on deqp-gles3 on v9, i'll take it i guess
<alyssa>
cd ~
<alyssa>
most of that is no xfb
<alyssa>
though MRT+blend shader is a sizeable chunk too
<icecream95>
cd -
<alyssa>
admittedly if we ever want real functionc alls, some of the RA fun comes back
<alyssa>
I dunno. Given that there's no change to the fast path, and it improves some parts of the slow path, I'm not sure perf is a good reason to keep blend shaders around on bifrost
<alyssa>
I know the hw designers go to great pains to avoid shader variants in GLES, but.. meh?
<alyssa>
what pathological cases are we worried about, a game using an uber shader with different non-fixed function blend modes?
<icecream95>
alyssa: For real function calls.. I would suggest to:
<icecream95>
Do RA from inner functions first, then when hitting a function call create a bunch of nodes with fixed solutions and add constraints with all of the live nodes
<icecream95>
The problem is how to handle a function calling itself; we can't do RA for that function until we know the post-RA liveness, but to know that we have to do RA...
<icecream95>
This means that outer functions do all of the spilling. I don't know if that's better or worse than doing it another way
<alyssa>
I mean. You know how it's done on CPUs :-p
<icecream95>
alyssa: But doing it like that is no fun
<icecream95>
This way you get the "optimal" calling convention for each function
JulianGro has joined #panfrost
<alyssa>
yes, well
<alyssa>
i don't disagree :-p
<alyssa>
RA -- the hardest compiler problem that matters less than you think
<icecream95>
(And takes less time than you think, once I get around to creating an MR with my optimisations)
<HdkR>
RA, the only thing between me and having all redundant moves removed from my code.
atler has quit [Ping timeout: 480 seconds]
atler has joined #panfrost
JulianGro has quit [Remote host closed the connection]
<jekstrand>
icecream95, alyssa: No GPU APIs currently allow recursion. The function call graph is always a DAG.
<HdkR>
jekstrand: Technically the Nvidia blob can cause recursion with some pointer hacking :P
Daanct12 has joined #panfrost
jambalaya has quit [Ping timeout: 480 seconds]
jambalaya has joined #panfrost
MajorBiscuit has joined #panfrost
<icecream95>
alyssa: Hmm.. always adding constraints to the ends of nodearrays (rather than doing a binary search for an existing element) makes lcra_add_node_interference 60% faster, but lcra_solve is 25% slower and lcra_count_constraints returns slightly wrong values. ACK or NAK?
<icecream95>
(Overall, running shader-db on shaders/skia/781.shader_test is about 5% faster)
indy has quit [Ping timeout: 480 seconds]
<cwabbott>
alyssa: I haven't been following it too closely, but I'm sure one of these days there's going to be an extension to make the blend state dynamic on VK
<cwabbott>
there's a huge push right now to make all the things dynamic
<cwabbott>
I don't think realistically you'll be able to get away from blend shaders and having an ABI
Major_Biscuit has joined #panfrost
MajorBiscuit has quit [Ping timeout: 480 seconds]
rasterman has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
camus1 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
rkanwal has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
indy has joined #panfrost
erlehmann has joined #panfrost
rasterman has joined #panfrost
Daanct12 has quit [Ping timeout: 480 seconds]
nlhowell has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
<alyssa>
icecream95: Ummm
<alyssa>
cwabbott: grumble grumble
<alyssa>
how is that supposed to work with AMD..?
<alyssa>
i guess amd has actual blending nvm
<cwabbott>
the same thing as radeonsi, prologs and epilogs
<cwabbott>
and yes AMD has blending
<alyssa>
grumble grumble grumble
<cwabbott>
it turns out game devs really really want us to suck it up and do that kind of thing
<alyssa>
game devs suuuck
<alyssa>
I'm also interested in how often blend shaders are hit in real workloads (not just the CTS)
<alyssa>
and how often "different blend shader requiring states are used with the same FS" come up
<alyssa>
because if the latter is rare, eating the key isn't so bad ....
<cwabbott>
well, if it's dynamic in vulkan then you're probably not even going to be able to just eat the key
<cwabbott>
vulkan is very much against that sort of thing
JulianGro has joined #panfrost
enunes has joined #panfrost
<alyssa>
grumble
<alyssa>
I hate blend shaders because supporting them hurts even the apps that don't use them.
<alyssa>
(because right now, we have to assume every fragment shader might use blend shaders)
nlhowell has joined #panfrost
nlhowell is now known as Guest2215
nlhowell has joined #panfrost
Guest2215 has quit [Ping timeout: 480 seconds]
<alyssa>
jekstrand: re dynamic logic ops, conceptually that's a 4-to-1 mux, with the 4-bits the truth table of the desired logic op and the 2-bits we're muxing with the source and destination
<alyssa>
which expands to 3 2-to-1 muxes, and we have a bitwise MUX op on bifrost/valhall, so it's 3 instructions and 4 sysvals
rkanwal has quit [Ping timeout: 480 seconds]
<alyssa>
in practice there may be better implementations, but point is -- if we know there is a logic op (but we don't which what logic op), we can do it dynamically "no worse" than 3x more ALU
<alyssa>
...Now I'm wondering if "immediate + indirect --> indirect of iadd" is a useful lower_tex option for other backends
<jekstrand>
Maybe? It certainly wouldn't be hard to add.
<alyssa>
(doing it in the backend, after NIR, is easy but requires care to ensure we don't emit "add 0")
<jekstrand>
Yeah
<jekstrand>
We should also add constant folding for texture/sampler_offset, probably.
<alyssa>
hm?
<alyssa>
the opposite of that lowering, you mean?
<jekstrand>
Yup
<jekstrand>
If the texture/sampler_offset is an immediate, fold it into tex->texture/sampler_index.
<jekstrand>
Annoyingly, the tests I was trying to fix then fail on something having to do with blending once I hack enough other compiler stuff to work. :(
<alyssa>
D:
<jekstrand>
Wow.... the Invocation packet.
<jekstrand>
I'm... I don't even know.
simon-perretta-img_ has joined #panfrost
<jekstrand>
How are you supposed to expose limits with this?
<jekstrand>
It's like it can dispatch 4G invocations split up however you want?
<jekstrand>
That's kind-of cool, I guess, but it's strange.
<jekstrand>
very strange.
<jekstrand>
It'll be all sorts of entertaining for OpenCL.
rkanwal has quit [Ping timeout: 480 seconds]
<jekstrand>
I guess maybe you expose a max workgroup size of 256 and max global size of 256^3?
<jekstrand>
Or 1024 and 128^3 and have a bit left over
* jekstrand
is so confused
<cwabbott>
I actually remember wondering the same thing, lol
simon-perretta-img has quit [Ping timeout: 480 seconds]
* jekstrand
looks up a Mali in gpuinfo
<cwabbott>
and it turns out that with the limits they exposed in GLES you could launch something larger than 2^32 total instances
<cwabbott>
I tried it, and... out-of-memory error
<jekstrand>
The blob appears to advertise 512 workgroup size and (2^32)^3[5~
<jekstrand>
number of workgroups
<jekstrand>
I guess they just blast out a pile of dispaptches if it goes over the limit?
<cwabbott>
nope... it just errors out
<cwabbott>
with some weird error like out of GPU memory
<jekstrand>
Well, yeah. I don't assume you can actually dispatch that many
<jekstrand>
Even if you didn't OOM, it'd take that tiny GPU millenia to complete
<cwabbott>
I guess they're saved by the fact that no CTS test would be so crazy to dispatch that many
<cwabbott>
yeah, that would be a long test...
<alyssa>
cwabbott: still thinking about "what if there was a dynamic blend state in vk 1.4"
<alyssa>
undecided how I feel about it
<alyssa>
jekstrand: also seemed undecided (?)
<jekstrand>
I feel like valhal bringup is more important than features that don't yet exist going into a vk version in 2 years.
<alyssa>
I mean. Fiar.
<alyssa>
Fair
<alyssa>
Valhall + Blend shaders isn't *that* much work it just feels so pointless.
<alyssa>
(and counter productive)
<alyssa>
I think I convinced myself the way to go for it is a pseudo instruction that gets lowered "just before" packing (after RA and scheduling, in particular)
<alyssa>
and then the branch offsets can be hardcoded but nobody cares
<alyssa>
(In particular, lowered after all opt passes)
rasterman has quit [Quit: Gettin' stinky!]
<jekstrand>
I feel like the INVOCATION packet almost calls for whatever the GenXML version of a lowering pass is. Something where you have a struct that's defined by us which is actually what you fill out and then the pack process does non-trivial work to turn it into the stupid thing.
<alyssa>
jekstrand: good news, INVOCATION goes away on valhall
<jekstrand>
:)
<alyssa>
like most bad things in bifrost
<alyssa>
(that dated back to midgard)
<alyssa>
that one goes back to t604
<alyssa>
like blend shaders
<jekstrand>
valhal-only, when?
<alyssa>
oy
<anarsoul>
jekstrand: they still ship SoCs with utgard and midgard :)
<jekstrand>
anarsoul: Yes. I know. And utgard will never ever go away no matter how hard we all collectively try to kill it.
<jekstrand>
But that doesn't mean it needs Vulkan support. :P
* anarsoul
isn't trying to kill utgard :P
<jekstrand>
See, there's the problem. :P
<alyssa>
anarsoul: well get with the program
<icecream95>
On the topic of Utgard, I note that a chapter in the book of Norse legends I have is called "At the Castle of Utgard"
<icecream95>
Neither Midgard, Bifrost nor Valhall get their own chapters.
<alyssa>
I will point out Bifrost is the rainbow /bridge/
<alyssa>
you were supposed to cross it, but never stay on it
<alyssa>
it was always just to get you to Valhall
<jekstrand>
:D
<anarsoul>
jekstrand: well, panfrost has too many devs working on it, so I'm afraid there's nothing fun left for me :)
<anarsoul>
thus utgard and lima
simon-perretta-img_ has quit []
simon-perretta-img_ has joined #panfrost
<icecream95>
"Still greater was the astonishment of all when Utgarda continued to tell them that he and the giant were the same person"
<icecream95>
Sorry for the spoiler
simon-perretta-img_ has quit []
simon-perretta-img has joined #panfrost
<alyssa>
anarsoul: too many devs? :C
* jekstrand
doesn't think panfrost has that problem
<icecream95>
How many clones would I want?.. One for reverse engineering Valhall, one for doing weird compute shader stuff, one for making the compiler faster, one for debugging, one for..
<alyssa>
same hat
<alyssa>
$$\hat{\text{same}}$$
<anarsoul>
alyssa: you've got at least 3 now? youself, icecream95 and jekstrand?