possiblemeatball has quit [Remote host closed the connection]
possiblemeatball has joined #asahi-gpu
possiblemeatball has quit [Quit: Leaving]
possiblemeatball has joined #asahi-gpu
amarioguy has quit [Remote host closed the connection]
possiblemeatball has quit [Quit: Leaving]
alyssa has joined #asahi-gpu
<alyssa>
lina: took a cue from the "wtf pstates?" discussion and hacked my DTS to use 0x6 as the new base
<alyssa>
t-rex is >15% faster
<alyssa>
not 5x faster but that's still substantial
<alyssa>
today on broken nir_lower_blend: per-RT logic ops?
<lina>
alyssa: 3 is 720MHz and 6 is 1287 MHz, so I would expect 78% faster if it scales with clock
<lina>
however, 1 (which is the base on laptops) is 396MHz, so if the laptops are stuck there that's quite a bit worse...
SSJ_GZ has joined #asahi-gpu
thevar1able has quit [Remote host closed the connection]
thevar1able has joined #asahi-gpu
thevar1able has quit [Remote host closed the connection]
thevar1able has joined #asahi-gpu
<bluetail8>
alyssa will it affect longevity negatively?
<bluetail8>
is it effectively an overclock or is it within apples legit p state design?
<bluetail8>
15% is a ton
<chadmed>
aiui we cannot overclock this hardware as the pstates are controlled by the platform and we just ask the firmware nicely to switch states
<bluetail8>
oh. Thats good
yuyichao_ has joined #asahi-gpu
iaguis has joined #asahi-gpu
yuyichao has quit [Ping timeout: 480 seconds]
karolherbst has quit [Remote host closed the connection]
karolherbst has joined #asahi-gpu
yuyichao_ has quit [Quit: Konversation terminated!]
yuyichao has joined #asahi-gpu
possiblemeatball has joined #asahi-gpu
<alyssa>
lina: I would not expect 78% faster for 720 -> 1287, because that's the clock only for actual GPU operation
<alyssa>
but does not affect the latency nor throughput of system memory (which we often stall on)
<alyssa>
and does not affect overhead on the CPU to actually submit work (right now the GPU is idle a ton of the time -- GPU speed ups are then subject to Amdahl's law)
<lina>
alyssa: What I have identified as "utilization" in the metrics does show 100%, but that can't be right with the CPU stalls... so now I wonder what that really measures...
<alyssa>
yeah that's definitely wrong
<lina>
alyssa: Ah, I bet I know what this is. GPU active %, meaning powered on.
<lina>
So for anything >500 FPS or so, that's 100%
<lina>
And indeed glxgears goes to 17-23%, which is what I'd expect for something vsynced that actually lets the GPU power down between frames
<lina>
So really, what I'm calling the stats channel seems to be quite specifically power stats. It's all pstates, power on/off events, power consumption, temperatures, and the like.
<alyssa>
right, ok. that I would believe, yes
<bluetail8>
alyssa Do you know these laws by heart? Yes, the internet is full of knowledge. But I think its something you learn in education which I was denied for the most part.
<alyssa>
power stats vs performance stats I suppose
<lina>
alyssa: I should just expose these properly as a sysfs file... it'd be nice to at least have an agxtop showing power consumption and idle times and such...
<lina>
For performance, I really should just hook up the ktrace channel to perf and get you pretty logs
<lina>
That has events like "TA started", "TA complete", "3D started", "3D complete" but also partial renders, event delivery, and things like that, all timestamped
<lina>
And at least some of those actually include the UUID that is plumbed all the way to the UAPI if I remember correctly, so you can even correlate it with what mesa is doing ^^
<dottedmag>
bluetail8: amdahl's law comes up very often during optimization work.
<bluetail8>
dottedmag Perhaps continue in #asahi-offtopic. Please make the connection. I do not get how A is related to B. Where specifically do I have to look it up? What do I need todo in order to naturally need it? Or do i need to know these things before I can do a specific thing? My math knowledge is very basic for instance, but I need some of it
<bluetail8>
regularly. What do you mean with optimization work? Why would somebody need to remember those laws...
bcrumb has joined #asahi-gpu
bcrumb has quit []
<alyssa>
lina: sounds nice :)
<alyssa>
lina: BTW, are you aware of any mechanism that Mesa could use to get the # of primitives generated?
<alyssa>
Some hw can plumb in a performance counter for this
<alyssa>
the fallback would be generating little 1x1 vertex shaders to increase a counter each draw which is probably fine for a niche feature
possiblemeatball has quit [Quit: Leaving]
<karolherbst>
does DP-MST work already? And if not, are there patches to try?
<karolherbst>
or well.. at least it doens't work on my M2
<sven>
mst will never work on these machines
<sven>
it’s just not supported by the hw
<karolherbst>
uhh....
<karolherbst>
so how do external displays work? USB adapters not doing MST?
<sven>
ah, MST is specifically two (or more I guess) video streams over display port and the hw can’t do that
<corion>
dp-alt, no?
<sven>
DP over the type c ports is just an alternate mode
<karolherbst>
corion: I guess...
<karolherbst>
I just have some USB-C hubs here which are doing this over MST
<karolherbst>
because... why not
<sven>
i have some WIP patches somewhere that enable alternate mode and jannau probably has a branch somewhere with them integrated into his latest DCP work
<sven>
but this is all still rather unstable
<karolherbst>
so I assume things like docking stations are just not properly supported by the hw then?
<sven>
M2 only supports a single external display anyway
<karolherbst>
well.. that's a different question tho
<sven>
you can send two streams tunneled over usb4/thunderbolt
<karolherbst>
not my use case
<sven>
and there are some socks that support that
<sven>
*docks
<karolherbst>
yeah.. fair, but that's not what I asked about at all 🙃
<corion>
karolherbst: What do you mean by docking stations? With one usb-c (dp-alt) connected to my monitor, I carry mouse+kbd+monitor back to the computer.
<sven>
meh, then let someone else help you if you don’t want context
<corion>
... and sound out (headphones plugged into monitor).
<corion>
Fairly sure I could easily also carry ethernet back to the computer.
<alyssa>
there will be no cursed USB spec references in my house
<alyssa>
:p
<karolherbst>
:D
<karolherbst>
fair
<corion>
yes, ma'am.
<alyssa>
thank you
cylm_ has joined #asahi-gpu
cylm has quit [Ping timeout: 480 seconds]
iaguis has quit [Quit: Lost terminal]
Dementor has quit [Remote host closed the connection]
Dementor has joined #asahi-gpu
<alyssa>
so, I learned last night that sample_mask and zs_emit aren't allowed in the same program
<alyssa>
Apple's compiler lowers sample mask writes (including discard_fragment) to emitting a NaN depth, if zs_emit is used anywhere in the program
<alyssa>
I did not realize that was necessary and not just an optimization
<alyssa>
but it raises its own questions, like how to mask out stores from discarded threads, and how to terminate a quad after all threads in the quad are no longer needed for helper threads
<alyssa>
there is an infamous fragment shader, I think it owes to GraphicsFuzz, that does something like:
<alyssa>
for (;;) { if (condition that is eventually true) { discard } }
<alyssa>
If you don't actually terminate the thread when discarding (merely demoting the thread to a helper invocation), the shader will hang the GPU
<alyssa>
but if you actually terminate the thread when discarding -- or you discard a quad when all threads in it are demoted to helper threads -- then the shader will complete successfully
<alyssa>
It's not clear a priori how one would implement that on AGX given the zs_emit transform they do
<alyssa>
starting with a reference pipeline
<alyssa>
adding a store into an FS sets "Set when frag shader spills: true" hm
<alyssa>
unk 2 0 ->3
<alyssa>
disables early-z
<alyssa>
inserts a zs_emit instruction for some reason
<alyssa>
Looks like it inserts the equivalent of `gl_FragDepth = gl_FragCoord.z` for some reason
<alyssa>
disables triangle merging
<alyssa>
sets pass type to punch through
<alyssa>
depth to any (!)
<alyssa>
(surely that last bit is a mistake?!)
<alyssa>
smells like a workaround
<alyssa>
okay let's see
<alyssa>
what the heck
<alyssa>
do I really want to disassemble this in my head
<alyssa>
for "if (..) { discard } write depth; side effect;" we get
<alyssa>
wait_pix 1; if (..) { zs_emit NaN }; store; zs emit; sigal_pix 1;
<alyssa>
so we're allowed multiple zs_emit in a shader and they can terminate threads. maybe.
<alyssa>
dummy stencil if needed
<alyssa>
ok. I guess this is ok.
<alyssa>
0x7fc00000 is the preferred NaN but IDK if it's actually special
possiblemeatball has joined #asahi-gpu
possiblemeatball has quit [Quit: Leaving]
hertz has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<alyssa>
lina: on the verge of saying "screw it" and landing clang-format