ChanServ changed the topic of #asahi-gpu to: Asahi Linux: porting Linux to Apple Silicon macs | GPU / 3D graphics stack black-box RE and development (NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
chengsun has quit [Quit: Quit]
chengsun has joined #asahi-gpu
Etrien___ has quit [Ping timeout: 480 seconds]
ella-0 has quit [Read error: Connection reset by peer]
sadams0978 has joined #asahi-gpu
sadams0978 has quit [Quit: Konversation terminated!]
Etrien has quit [Read error: Connection reset by peer]
Etrien has joined #asahi-gpu
pthariensflame has joined #asahi-gpu
pthariensflame has quit [Quit: Textual IRC Client: www.textualapp.com]
<alyssa> It's really interesting comparing statistics for a given compiler shader *across* instruction sets
<alyssa> here's a chunky shader from glmark2 -bterrain
<alyssa> shaders/glmark/1-22.shader_test - MESA_SHADER_FRAGMENT shader: 1294 inst, 8600 bytes, 50 halfregs, 1 threads, 0 loops, 0:0 spills:fills
<alyssa> shaders/glmark/1-22.shader_test - MESA_SHADER_FRAGMENT shader: 1249 inst, 16.015625 cycles, 16.015625 fma, 3.453125 cvt, 0.000000 sfu, 0.125000 v, 0.000000 t, 0.000000 ls, 632 quadwords, 1 threads, 0 loops, 0:0 spills:fills
<alyssa> top is Apple M1, bottom is Mali-G57
<alyssa> off the bat you'll notice that the statistics for Mali are a lot more advanced, because we actually understand the uarch there
<alyssa> (AGX will get similar stuff in time~)
<alyssa> but here's an obvious one:
<alyssa> AGX is 8600 bytes
<alyssa> Mali is ~10,000 bytes
<alyssa> (quadwords = 16 bytes)
<alyssa> This is interesting, a Mali-G57 instruction can actually do *more* than an AGX instruction
<alyssa> there are fewer of them (1249 vs 1294), but they're bigger
<alyssa> Why? mali-g57 uses fixed-length instructions, all are 8 bytes, and many are sparse
<alyssa> AGX uses variable length instructions, in this shader instructions are an average of 6.65 bytes
<alyssa> Similarly, register pressure
<alyssa> it's not explicitly printed for Mali-G57, but "1 thread" means that it's using at least 33 dword registers
<alyssa> whereas AGX is using 50/2 = 25 dword registers
<alyssa> why would the same program use so many fewer registers on AGX?
<alyssa> one answer is "I did a better job at register allocation for AGX than Mali"
chengsun_ has joined #asahi-gpu
chengsun has quit [Ping timeout: 480 seconds]
chengsun_ has quit [Read error: Connection reset by peer]
chengsun has joined #asahi-gpu
chengsun_ has joined #asahi-gpu
chengsun has quit [Ping timeout: 480 seconds]
chengsun_ has quit [Quit: Quit]
chengsun has joined #asahi-gpu
chengsun_ has joined #asahi-gpu
chengsun has quit [Ping timeout: 480 seconds]
chengsun_ has quit [Ping timeout: 480 seconds]
ella-0 has joined #asahi-gpu
Graypup_ has quit [Quit: meow]
Graypup_ has joined #asahi-gpu
bluetail has joined #asahi-gpu
Etrien has quit [Read error: Connection reset by peer]
Etrien has joined #asahi-gpu
SSJ_GZ has joined #asahi-gpu
Etrien has quit [Read error: Connection reset by peer]
Etrien has joined #asahi-gpu
<lina> Nice!! <3
Etrien__ has joined #asahi-gpu
Etrien has quit [Ping timeout: 480 seconds]
Etrien__ has quit [Read error: Connection reset by peer]
Etrien has joined #asahi-gpu
sharonmary6[m] has quit []
MatrixTravelerbot[m]1 has quit []
psydroid[m] has quit []
arisu has quit []
Ella[m] has quit []
Lucy[m] has quit [Quit: Bridge terminating on SIGTERM]
Soroush has quit []
Dcow has joined #asahi-gpu
capta1nt0ad has joined #asahi-gpu
subatomic has joined #asahi-gpu
Dcow_ has joined #asahi-gpu
Dcow has quit [Ping timeout: 480 seconds]
subatomic has quit [Quit: Textual IRC Client: www.textualapp.com]
capta1nt0ad has quit [Quit: Konversation terminated!]
chengsun has joined #asahi-gpu
geochip has joined #asahi-gpu
geochip has quit [Quit: leaving]
chadmed has joined #asahi-gpu
chadmed has quit [Quit: Konversation terminated!]
chadmed has joined #asahi-gpu
chadmed has quit []
kov has quit [Quit: Coyote finally caught me]
kov has joined #asahi-gpu
chadmed has joined #asahi-gpu
chadmed has quit [Read error: No route to host]
chadmed has joined #asahi-gpu
Gaspare has joined #asahi-gpu
chadmed has quit [Read error: No route to host]
n1c has quit [Quit: ZNC 1.8.2+deb1+focal2 - https://znc.in]
n1c has joined #asahi-gpu
Gaspare has quit [Read error: Connection reset by peer]
<lina> So the M1 Ultra just needed a bunch of new buffers and some larger bigger ones, initdata changes, and a couple constants changed, and then it worked.
<lina> There's a Z acceleration/hierarchical Z thing that showed up, and then 5 new buffers adjacent to the tiler?
<lina> I get the feeling they're actually trying to balance work within single jobs between dies, which would explain why they need some new buffers to transfer things around.
<lina> Parallelizing fragment processing is trivial, but fragment is not, since it interacts with the tiler/sorting stuff
<alyssa> lina: as I texted you, the "Z acceleration" buffer is probably just from depth compression, which Apple's driver is aggressive about enabling and took me a lot of time to figure out how to disable
<alyssa> it does visually look like hier-z but I'm not convinced it actually is
<lina> alyssa: I've never seen it enabled on the M1 Mini, ever. I have zero hits for those pointers in all my historical hypervisor logs. But on this one, it showed up, and mesa was faulting without it... so I'm not sure you disabled it ^^
<alyssa> Very curious
<alyssa> I'd love to see the Mesa patch if you've pushed
<alyssa> (the diff from t8103 mesa I mean)
<lina> Let me do that!
<lina> The size is just random though, haven't worked out how to calculate any of it yet.
<alyssa> o
<alyssa> OK
<lina> The stencil one is just a guess though, haven't actually seen it yet
<lina> Looks like it's 1/32 compression and it uses POT addressing, so align zbuffer size to POT and divide by 32 for the accel buffer size
<lina> It seems every 8x4 block of Z pixels maps to one accel buffer byte, 0x03 means clear.
<lina> Still need to look at the deflake buffer sizes... 2 of them are obvious due to adjacency, but do you remember how you figured out the third bound?
<alyssa> lina: guess
<alyssa> or maybe not
<alyssa> no, adjacency as well
<alyssa> finding the next allocation in the same BO and using that as an upper bound
Gaspare has joined #asahi-gpu
Gaspare has quit [Ping timeout: 480 seconds]
Dcow_ has quit [Remote host closed the connection]
Dcow has joined #asahi-gpu
Dcow has quit [Ping timeout: 480 seconds]
Dcow has joined #asahi-gpu
yuyichao_ has quit [Remote host closed the connection]
yuyichao_ has joined #asahi-gpu
rwhitby has joined #asahi-gpu
Dcow has quit [Remote host closed the connection]
Dcow has joined #asahi-gpu
Dcow has quit [Remote host closed the connection]
Dcow has joined #asahi-gpu
rwhitby has quit [Quit: rwhitby]
Etrien__ has joined #asahi-gpu
<phire> I suspect the accleration buffer stuff is just because whatever alyssa did to force a linear buffer, has regressed for some reason
Etrien has quit [Ping timeout: 480 seconds]
SSJ_GZ has quit [Read error: No route to host]
Etrien__ has quit [Read error: Connection reset by peer]
Etrien has joined #asahi-gpu
<alyssa> s/linear/uncompressed/, IIRC it's stil twiddled
<alyssa> Metal really doesn't like rendering to linear
<alyssa> It's probably not a *big* deal to support properly in mesa but that patch is not the way to do it
<phire> so it's just compressed vs uncompressed?
<alyssa> Yes, I think so
<jannau> hah, dcp supports XRGB afterall, just not as separate pixelformat but via a flag in dcp_surface (either unk1 or unk2)
<jannau> but it took me far too long to realize that fbcon displayed just a transparent terminal
<phire> classic bug
<alyssa> jannau: Woof
<alyssa> How'd you discover the flag?