<jannau>
looks familiar, Alyssa looked at some/all of them and said they use atomics
bisko has quit [Ping timeout: 480 seconds]
<alyssa>
correct
<lina>
TL;DR today I fixed a bunch of kernel stuff, both more sizing/param issues (I have testable test data for it now!) and also implemented the sync TVB growth event though I don't think we actually use/need it now, it was getting triggered by some bad settings I think. But you can test it with ASAHI_MESA_DEBUG=synctvb if you want (see open MR).
<lina>
I got through GLES3 in that mode but GLES31 deadlocked or something, I'll look into it later with lockdep (might be a regression with synctvb, if so we don't care). Without any overrides things work ^^
<alyssa>
woot
<alyssa>
any progress on our bugs? or is this strategy of "well, it definitely fixed SOMETHING!"
<alyssa>
:-D
possiblemeatball has quit [Quit: Quit]
chadmed has quit [Ping timeout: 480 seconds]
chadmed has joined #asahi-gpu
<lina>
alyssa: Well the one jannau ran into with the TVB size at least... and I'm pretty sure I fixed other stuff too, yes ^^
<alyssa>
I heh
<alyssa>
it's my understanding the atomics with clustering isssue and the heisenbug are the hot items
<alyssa>
(blocking CTS)
maria has quit [Ping timeout: 480 seconds]
maria has joined #asahi-gpu
WindowPa- has joined #asahi-gpu
WindowPain has quit [Read error: Connection reset by peer]
<_jannau__>
I think only atomics are blocking CTS on t600x, the heisenbug didn't reproduce there
cylm has joined #asahi-gpu
Dementor has quit [Remote host closed the connection]
Dementor has joined #asahi-gpu
nela0 has joined #asahi-gpu
nela has quit [Ping timeout: 480 seconds]
nela0 is now known as nela
<alyssa>
_jannau__: let's do process of elimination
<alyssa>
_jannau__: can you git clone github.com/dougallj/applegpu on macOS on t600x, run https://paste.debian.net/1285009 and send the output?
<alyssa>
(Or anyone else with an M1 Pro/Max/Ultra)
<alyssa>
if that changes anything, let me know drm_asahi_params_global::num_clusters_total so I can do the proper runtime detection and not wedge t8103
<jannau>
alyssa: all atomic tests in es31 pass now on t6002
<alyssa>
Oof.
<alyssa>
Alright.
<alyssa>
I mean, that's good but
<alyssa>
spicy.
<alyssa>
I had a hunch already that bit 45 controls cache of some kind
<alyssa>
this.. adds to that suspicion >:)
* alyssa
digs up her notes on cache bits
<jannau>
num_clusters_total should be 2 / 4 / 8 for m1 pro / max / ultra
<jannau>
the lower core count versions have always cores in all clusters disabled
<jannau>
shall I start a full CTS run?
<alyssa>
jannau: Let me get you a non-stupid branch to use
<jannau>
ok
Guest4807 has quit [Quit: Bridge terminating on SIGTERM]
rhysmdnz has quit [Quit: Bridge terminating on SIGTERM]
<alyssa>
kicking off CTS on M1 to break sure I didn't break anything in the meantime
<alyssa>
jannau: alyssa/mesa:agx/es31-v2
Jamie has joined #asahi-gpu
rhysmdnz has joined #asahi-gpu
Jamie is now known as Guest5071
<alyssa>
That doesn't have your wide_color fix on the mesa side, and the 10-bit thing still hasn't landed upstream anyway, so unfortunately still an x11_egl run
<alyssa>
but... should pass, probably, maybe? O:)
<jannau>
aye
<alyssa>
(Strictly you could do a Wayland run with your patch and point to it and say "look it's your bug" but it would complicate things.)
<alyssa>
anyway, M1 CTS is running
<jannau>
we could do wayland and disable 10-bit formats
<alyssa>
meh
<alyssa>
easier just to do the x11_egl run for now
<alyssa>
anyway, M1 CTS is running
<jannau>
m1 ultra x11_egl CTS is running as well
<alyssa>
who will win? =D
<alyssa>
nominally the ultra but the CTS is completely single threaded CPU bound
<alyssa>
so..
<jannau>
the failing cts run was slower on the ultra than a succesful one on m1
<jannau>
iirc 41 minutes vs. 28 min
<alyssa>
sure
<alyssa>
deqp-runner does a bunch of debug stuff for failing tests, idk what the real CTS runner is doing
balrog has quit [Quit: Bye]
balrog has joined #asahi-gpu
yuka has quit [Remote host closed the connection]
<alyssa>
M1 seemed to finish
yuka has joined #asahi-gpu
<jannau>
finished here as well after 52 minutes, still failed
<jannau>
error is in all cases: "(x,y)= (0,0). Color RGBA(0,0,0,1) is different than expected RGBA(0.1,0.2,0.3,1)"
<alyssa>
jannau: The potentially "spicy" part of those tests is that they feed transform feedback results into the inidrect draw
<alyssa>
since we dispatch xfb with the VDM (i.e. as vertex shaders with no fragment output), that's a forward VDM->VDM dependency, that does not require a full flush of the batch, but it does require a memory barrier
<alyssa>
I'm wondering if this is morally the same issue as the atomics
<alyssa>
the barrier we're using is strong enough to flush a cluster but not the whole system
<alyssa>
See line 2892 of agx_state.c
<alyssa>
try setting more bits (this will require adding extra fields in asahi/lib/cmdbuf.xml)
<alyssa>
If someone has the ability to run wrap.dylib + agxdecode against t600x this corresponds to a metal memory barrier, that's where I got those magic bits in the first place
<alyssa>
but in this case since we know what we're looking for bruteforcing might be faster anyway lol
<alyssa>
jannau: any luck?
<jannau>
alyssa: no. I think we're looking at different agx_state.c files. is line 2892 'cfg.unknown_30 = frag_tex_count >= 4;'?