ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
megi has quit [Quit: WeeChat 3.5]
megi has joined #panfrost
icecream95 has joined #panfrost
simon-perretta-img has quit [Ping timeout: 480 seconds]
Daanct12 has joined #panfrost
Daanct12 has quit [Remote host closed the connection]
Daanct12 has joined #panfrost
hexdump01 has joined #panfrost
hexdump0815 has quit [Ping timeout: 480 seconds]
Daanct12 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest884
nlhowell has joined #panfrost
Guest884 has quit [Read error: Connection reset by peer]
Daanct12 has joined #panfrost
nlhowell has quit [Read error: Connection reset by peer]
nlhowell has joined #panfrost
karolherbst has quit [Ping timeout: 480 seconds]
karolherbst has joined #panfrost
nlhowell is now known as Guest887
nlhowell has joined #panfrost
Guest887 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest893
nlhowell has joined #panfrost
Guest893 has quit [Ping timeout: 480 seconds]
nlhowell has quit [Read error: Connection reset by peer]
nlhowell has joined #panfrost
MajorBiscuit has joined #panfrost
<icecream95>
alyssa: I'm having trouble finding the calls to set_sampler_views using the features implemented in "Fix set_sampler_views for big GL". Was there any point in that commit?
nlhowell has quit [Read error: Connection reset by peer]
nlhowell has joined #panfrost
rasterman has joined #panfrost
MajorBiscuit has quit [Quit: WeeChat 3.4]
MajorBiscuit has joined #panfrost
Daanct12 has quit [Ping timeout: 480 seconds]
camus has quit [Remote host closed the connection]
camus has joined #panfrost
simon-perretta-img has joined #panfrost
pi__ has joined #panfrost
pi_ has quit [Ping timeout: 480 seconds]
nlhowell has quit [Ping timeout: 480 seconds]
rkanwal has joined #panfrost
icecream95 has quit [Ping timeout: 480 seconds]
jcwasmx86[m] has joined #panfrost
MajorBiscuit has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #panfrost
MajorBiscuit has quit []
<alyssa>
was just trying to implement the gallium api...
<alyssa>
apparently mesa/st doesn't use it. cursed.
<alyssa>
nor does any other st? why is this in the gallium api?
<alyssa>
at any rate, that sounds like a #dri-devel question
nlhowell has joined #panfrost
simon-perretta-img_ has joined #panfrost
simon-perretta-img has quit [Ping timeout: 480 seconds]
nlhowell has quit [Ping timeout: 480 seconds]
simon-perretta-img_ has quit []
simon-perretta-img has joined #panfrost
<jekstrand>
I thought set_sampler_views was for texture views in GL
guillaume_g has quit []
rasterman has quit [Quit: Gettin' stinky!]
pendingchaos_ has joined #panfrost
pendingchaos has quit [Ping timeout: 480 seconds]
* alyssa
cleans up Valhall hack
jcwasmx86[m] has left #panfrost [#panfrost]
Daanct12 has joined #panfrost
Daanct12 has quit [Remote host closed the connection]
pendingchaos_ is now known as pendingchaos
icecream95 has joined #panfrost
<icecream95>
panfrost_instance_id seems to be broken for a vertex count of 1...
<alyssa>
icecream95: midgard only, wontfix? ;-p
<icecream95>
alyssa: "will look again when I'm slightly more awake". Have you been sleeping for the past two weeks?
<icecream95>
(!16343)
<alyssa>
probably
<alyssa>
first patches there are r-b
<icecream95>
I'm a bit concerned, if you can't tell whether you are awake or not
<icecream95>
first how many? all except for the stencil tracking?
<alyssa>
first 3, i mean
<alyssa>
in patch 4:
<alyssa>
+ } else if (batch->clear & PIPE_CLEAR_STENCIL) {
<alyssa>
+ z_rsrc->constant_stencil = true;
<alyssa>
I think you need to also set z_rsc->stencil_value = batch->clear_stencil?
<alyssa>
I can't tell if that has a behaviour effect, but if you don't, the documented invariant of stencil_value is broken
<alyssa>
("The stencil value if constant_stencil is set")
<icecream95>
Nothing changes stencil_value, I see what you mean
<icecream95>
You must be awake now then?
<alyssa>
Apparently so :-)
<alyssa>
after spending a week feeling stupid trying to understand aco's RA, everything else feels so much easier ;-p
<icecream95>
But I suspect that no-one would ever complain even if that bug was not fixed
<alyssa>
which bug?
<alyssa>
the one I just complained about? :D
<icecream95>
stencil_value
<icecream95>
The assumption is that if you are not using the stencil buffer, it's unlikely that you care if it is 0 or 255
<alyssa>
ah well
<alyssa>
Even if the code works as-is, that the documented invariant is broken means it's liable to break later when something starts to rely on that invariant
<icecream95>
I should probably write some Piglit tests for this. Annoyingly shader_runner does not appear to support stencil buffers
<icecream95>
alyssa: For the second bug you pointed out, I'm sure that I carefully ordered things to make it work correctly... I guess I wasn't careful enough then
<alyssa>
Happens to the best \s/
<icecream95>
Both bugs are caught by existing Piglit tests, there is no need to write more
<alyssa>
Woo
<icecream95>
Oh wait the first one isn't
* alyssa
wonders if !16378 actually moves the fps needle
<alyssa>
I should try that MR with Manhattan on Bifrost ... shader-db claims it gets rid of all the spilling in manhattan
<alyssa>
It would help to have arm64/linux builds of gfxbench....
<icecream95>
Use the Windows ARM build? It couldn't be *that* hard to get Panfrost working on Windows, could it?
<alyssa>
it could :p
<alyssa>
In order of most to least reasonable: use an apitrace/renderdoc, use Android, use FEX with a Linux x86 build
<alyssa>
HdkR: Does gfxbench work on FEX yet? ;p
simon-perretta-img_ has joined #panfrost
<icecream95>
Even less reasonable: Run Windows in a VM, and pass GPU jobs over TCP to the host to avoid having to write a kernel driver
<icecream95>
Windows supports the BSD sockets API doesn't it? I already have the code written for that
simon-perretta-img has quit [Ping timeout: 480 seconds]
<icecream95>
But adding a glFlush is enough to break one of the Piglit tests
<alyssa>
I should stop coding for the day and eat dinner...
<alyssa>
icecream95: FWIW, I intend to merge a subset of !15874 -- the nodearray patches (which I understand), but not the SIMD patches (which I don't), or the split/adjacent stuff (it seems a lot cleaner to just support vec8 natively..? I realize that goes against your goal to make LCRA wow efficient, but I'd rather use a bit more memory to keep things simpler (and leave open the possibility of hybrid or SSA RA
<alyssa>
someday)
<alyssa>
With the "scalar IR" patches I merged today, vec8 support shouldn't require any changes to e.g. optimization passes.
<alyssa>
(It *will* need dynamically allocated sources for vec8 collects, to avoid raising MAX_SRCS to 8. I was intending to make that change anyway. Hard req for phis, I would add.)
<alyssa>
Took a while for it to click that...
<alyssa>
1. The interference matrix is necessarily sparse. We'd get O(N) space/time if it was *all* sparse.
rkanwal has quit [Read error: Connection reset by peer]
rkanwal has joined #panfrost
<icecream95>
Supporting vec8 in RA would double memory bandwidth used (which seems to be the bottleneck, at least with NEON) unless we accept a much lower cap on node_count or use __attribute__((packed))
<alyssa>
2. For a sufficiently high threshold, there are O(1) rows of the matrix with more nonzeroes than the threshold, so correctly implemented nodearrays are *also* O(n) space/time.
<alyssa>
3. hence sparse/dense switching is purely constant factor to all-sparse. I assume you've profiled this, else you wouldn't have bothered implementing it
<icecream95>
"correctly implemented". You mean correct as in theoretically optimal, or as in something which actually performs well enough?
<alyssa>
I mean that it does what it says on the tin and doesn't have some silly bug where everything is dense, or something like that :-p
<alyssa>
#2 is worth commenting on
<alyssa>
erm
<alyssa>
it would be if my brain weren't shutting down right now from lack of food
<alyssa>
I should probably go fix that
<icecream95>
Filling a nodearray randomly is O(n^2) time because insertion has to move a bunch of elements out of the way
<alyssa>
woof, I was just thinking about iteration time for the solve
<icecream95>
Doing O(1) insertion is possible for interference (but not liveness), but it doesn't make it faster by enough to be worth the extra complexity