ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
megi has quit [Quit: WeeChat 3.5]
megi has joined #panfrost
icecream95 has joined #panfrost
simon-perretta-img has quit [Ping timeout: 480 seconds]
Daanct12 has joined #panfrost
Daanct12 has quit [Remote host closed the connection]
Daanct12 has joined #panfrost
hexdump01 has joined #panfrost
hexdump0815 has quit [Ping timeout: 480 seconds]
Daanct12 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest884
nlhowell has joined #panfrost
Guest884 has quit [Read error: Connection reset by peer]
Daanct12 has joined #panfrost
nlhowell has quit [Read error: Connection reset by peer]
nlhowell has joined #panfrost
karolherbst has quit [Ping timeout: 480 seconds]
karolherbst has joined #panfrost
nlhowell is now known as Guest887
nlhowell has joined #panfrost
Guest887 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest893
nlhowell has joined #panfrost
Guest893 has quit [Ping timeout: 480 seconds]
nlhowell has quit [Read error: Connection reset by peer]
nlhowell has joined #panfrost
MajorBiscuit has joined #panfrost
<icecream95> alyssa: I'm having trouble finding the calls to set_sampler_views using the features implemented in "Fix set_sampler_views for big GL". Was there any point in that commit?
nlhowell has quit [Read error: Connection reset by peer]
nlhowell has joined #panfrost
rasterman has joined #panfrost
MajorBiscuit has quit [Quit: WeeChat 3.4]
MajorBiscuit has joined #panfrost
Daanct12 has quit [Ping timeout: 480 seconds]
camus has quit [Remote host closed the connection]
camus has joined #panfrost
simon-perretta-img has joined #panfrost
pi__ has joined #panfrost
pi_ has quit [Ping timeout: 480 seconds]
nlhowell has quit [Ping timeout: 480 seconds]
rkanwal has joined #panfrost
icecream95 has quit [Ping timeout: 480 seconds]
jcwasmx86[m] has joined #panfrost
MajorBiscuit has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #panfrost
MajorBiscuit has quit []
<alyssa> was just trying to implement the gallium api...
<alyssa> apparently mesa/st doesn't use it. cursed.
<alyssa> nor does any other st? why is this in the gallium api?
<alyssa> at any rate, that sounds like a #dri-devel question
nlhowell has joined #panfrost
simon-perretta-img_ has joined #panfrost
simon-perretta-img has quit [Ping timeout: 480 seconds]
nlhowell has quit [Ping timeout: 480 seconds]
simon-perretta-img_ has quit []
simon-perretta-img has joined #panfrost
<jekstrand> I thought set_sampler_views was for texture views in GL
guillaume_g has quit []
rasterman has quit [Quit: Gettin' stinky!]
pendingchaos_ has joined #panfrost
pendingchaos has quit [Ping timeout: 480 seconds]
* alyssa cleans up Valhall hack
jcwasmx86[m] has left #panfrost [#panfrost]
Daanct12 has joined #panfrost
Daanct12 has quit [Remote host closed the connection]
pendingchaos_ is now known as pendingchaos
icecream95 has joined #panfrost
<icecream95> panfrost_instance_id seems to be broken for a vertex count of 1...
<alyssa> icecream95: midgard only, wontfix? ;-p
<icecream95> alyssa: "will look again when I'm slightly more awake". Have you been sleeping for the past two weeks?
<icecream95> (!16343)
<alyssa> probably
<alyssa> first patches there are r-b
<icecream95> I'm a bit concerned, if you can't tell whether you are awake or not
<icecream95> first how many? all except for the stencil tracking?
<alyssa> first 3, i mean
<alyssa> in patch 4:
<alyssa> + } else if (batch->clear & PIPE_CLEAR_STENCIL) {
<alyssa> + z_rsrc->constant_stencil = true;
<alyssa> I think you need to also set z_rsc->stencil_value = batch->clear_stencil?
<alyssa> I can't tell if that has a behaviour effect, but if you don't, the documented invariant of stencil_value is broken
<alyssa> ("The stencil value if constant_stencil is set")
<icecream95> Nothing changes stencil_value, I see what you mean
<icecream95> You must be awake now then?
<alyssa> Apparently so :-)
<alyssa> after spending a week feeling stupid trying to understand aco's RA, everything else feels so much easier ;-p
<icecream95> But I suspect that no-one would ever complain even if that bug was not fixed
<alyssa> which bug?
<alyssa> the one I just complained about? :D
<icecream95> stencil_value
<icecream95> The assumption is that if you are not using the stencil buffer, it's unlikely that you care if it is 0 or 255
<alyssa> ah well
<alyssa> Even if the code works as-is, that the documented invariant is broken means it's liable to break later when something starts to rely on that invariant
<icecream95> I should probably write some Piglit tests for this. Annoyingly shader_runner does not appear to support stencil buffers
<alyssa> Actually, the bug seems more severe
<alyssa> doesn't this break all stencil clears?
<alyssa> glClearStencil(0xfa); glClear(STENCIL); glStencilMask(enable); glDraw(); glFlush();
<alyssa> glClearStencil(0xfa); glClear(STENCIL); glStencilMask(enable); glDraw(); glFlush();
<icecream95> Only stencil clears where the stencil buffer is not used in the same batch
<alyssa> I'm not so sure?
<alyssa> oh, right, I see
<alyssa> glClearStencil(0xfa); glClear(STENCIL); glFlush();
<alyssa> glClearStencil(0xca); glClear(STENCIL); glStencilMask(enable); glDraw(); glFlush();
<alyssa> maybe that is the broken sequence?
<alyssa> first frame has a stencil clear but no stencil draw, so constant_stencil = true
<alyssa> next frame, constant_stencil is true, so clear_stencil is overriden with z_rsrc->stencil_value, which is zeroed (or maybe uninit)
<alyssa> so the second frame will clear with 0x00 instead of 0xca
<alyssa> icecream95: https://rosenzweig.io/hmm
<alyssa> I think this is what you meant?
<icecream95> alyssa: For the second bug you pointed out, I'm sure that I carefully ordered things to make it work correctly... I guess I wasn't careful enough then
<alyssa> Happens to the best \s/
<icecream95> Both bugs are caught by existing Piglit tests, there is no need to write more
<alyssa> Woo
<icecream95> Oh wait the first one isn't
* alyssa wonders if !16378 actually moves the fps needle
<alyssa> I should try that MR with Manhattan on Bifrost ... shader-db claims it gets rid of all the spilling in manhattan
<alyssa> It would help to have arm64/linux builds of gfxbench....
<icecream95> Use the Windows ARM build? It couldn't be *that* hard to get Panfrost working on Windows, could it?
<alyssa> it could :p
<alyssa> In order of most to least reasonable: use an apitrace/renderdoc, use Android, use FEX with a Linux x86 build
<alyssa> HdkR: Does gfxbench work on FEX yet? ;p
simon-perretta-img_ has joined #panfrost
<icecream95> Even less reasonable: Run Windows in a VM, and pass GPU jobs over TCP to the host to avoid having to write a kernel driver
<icecream95> Windows supports the BSD sockets API doesn't it? I already have the code written for that
simon-perretta-img has quit [Ping timeout: 480 seconds]
<icecream95> But adding a glFlush is enough to break one of the Piglit tests
<alyssa> I should stop coding for the day and eat dinner...
<alyssa> icecream95: FWIW, I intend to merge a subset of !15874 -- the nodearray patches (which I understand), but not the SIMD patches (which I don't), or the split/adjacent stuff (it seems a lot cleaner to just support vec8 natively..? I realize that goes against your goal to make LCRA wow efficient, but I'd rather use a bit more memory to keep things simpler (and leave open the possibility of hybrid or SSA RA
<alyssa> someday)
<alyssa> With the "scalar IR" patches I merged today, vec8 support shouldn't require any changes to e.g. optimization passes.
<alyssa> (It *will* need dynamically allocated sources for vec8 collects, to avoid raising MAX_SRCS to 8. I was intending to make that change anyway. Hard req for phis, I would add.)
<alyssa> Took a while for it to click that...
<alyssa> 1. The interference matrix is necessarily sparse. We'd get O(N) space/time if it was *all* sparse.
rkanwal has quit [Read error: Connection reset by peer]
rkanwal has joined #panfrost
<icecream95> Supporting vec8 in RA would double memory bandwidth used (which seems to be the bottleneck, at least with NEON) unless we accept a much lower cap on node_count or use __attribute__((packed))
<alyssa> 2. For a sufficiently high threshold, there are O(1) rows of the matrix with more nonzeroes than the threshold, so correctly implemented nodearrays are *also* O(n) space/time.
<alyssa> 3. hence sparse/dense switching is purely constant factor to all-sparse. I assume you've profiled this, else you wouldn't have bothered implementing it
<icecream95> "correctly implemented". You mean correct as in theoretically optimal, or as in something which actually performs well enough?
<alyssa> I mean that it does what it says on the tin and doesn't have some silly bug where everything is dense, or something like that :-p
<alyssa> #2 is worth commenting on
<alyssa> erm
<alyssa> it would be if my brain weren't shutting down right now from lack of food
<alyssa> I should probably go fix that
<icecream95> Filling a nodearray randomly is O(n^2) time because insertion has to move a bunch of elements out of the way
<alyssa> woof, I was just thinking about iteration time for the solve
<icecream95> Doing O(1) insertion is possible for interference (but not liveness), but it doesn't make it faster by enough to be worth the extra complexity
<alyssa> Sure
* icecream95 wishes C supported Option types