ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: | Wiki: | Logs:
possiblemeatball has joined #asahi-gpu
possiblemeatball has quit [Remote host closed the connection]
possiblemeatball has joined #asahi-gpu
possiblemeatball has quit [Quit: Leaving]
possiblemeatball has joined #asahi-gpu
amarioguy has quit [Remote host closed the connection]
possiblemeatball has quit [Quit: Leaving]
alyssa has joined #asahi-gpu
<alyssa> lina: took a cue from the "wtf pstates?" discussion and hacked my DTS to use 0x6 as the new base
<alyssa> t-rex is >15% faster
<alyssa> not 5x faster but that's still substantial
<alyssa> today on broken nir_lower_blend: per-RT logic ops?
<lina> alyssa: 3 is 720MHz and 6 is 1287 MHz, so I would expect 78% faster if it scales with clock
<lina> however, 1 (which is the base on laptops) is 396MHz, so if the laptops are stuck there that's quite a bit worse...
SSJ_GZ has joined #asahi-gpu
thevar1able has quit [Remote host closed the connection]
thevar1able has joined #asahi-gpu
thevar1able has quit [Remote host closed the connection]
thevar1able has joined #asahi-gpu
<bluetail8> alyssa will it affect longevity negatively?
<bluetail8> is it effectively an overclock or is it within apples legit p state design?
<bluetail8> 15% is a ton
<chadmed> aiui we cannot overclock this hardware as the pstates are controlled by the platform and we just ask the firmware nicely to switch states
<bluetail8> oh. Thats good
yuyichao_ has joined #asahi-gpu
iaguis has joined #asahi-gpu
yuyichao has quit [Ping timeout: 480 seconds]
karolherbst has quit [Remote host closed the connection]
karolherbst has joined #asahi-gpu
yuyichao_ has quit [Quit: Konversation terminated!]
yuyichao has joined #asahi-gpu
possiblemeatball has joined #asahi-gpu
<alyssa> lina: I would not expect 78% faster for 720 -> 1287, because that's the clock only for actual GPU operation
<alyssa> but does not affect the latency nor throughput of system memory (which we often stall on)
<alyssa> and does not affect overhead on the CPU to actually submit work (right now the GPU is idle a ton of the time -- GPU speed ups are then subject to Amdahl's law)
<lina> alyssa: What I have identified as "utilization" in the metrics does show 100%, but that can't be right with the CPU stalls... so now I wonder what that really measures...
<alyssa> yeah that's definitely wrong
<lina> alyssa: Ah, I bet I know what this is. GPU active %, meaning powered on.
<lina> So for anything >500 FPS or so, that's 100%
<lina> And indeed glxgears goes to 17-23%, which is what I'd expect for something vsynced that actually lets the GPU power down between frames
<lina> So really, what I'm calling the stats channel seems to be quite specifically power stats. It's all pstates, power on/off events, power consumption, temperatures, and the like.
<alyssa> right, ok. that I would believe, yes
<bluetail8> alyssa Do you know these laws by heart? Yes, the internet is full of knowledge. But I think its something you learn in education which I was denied for the most part.
<alyssa> power stats vs performance stats I suppose
<lina> alyssa: I should just expose these properly as a sysfs file... it'd be nice to at least have an agxtop showing power consumption and idle times and such...
<lina> For performance, I really should just hook up the ktrace channel to perf and get you pretty logs
<lina> That has events like "TA started", "TA complete", "3D started", "3D complete" but also partial renders, event delivery, and things like that, all timestamped
<lina> And at least some of those actually include the UUID that is plumbed all the way to the UAPI if I remember correctly, so you can even correlate it with what mesa is doing ^^
<dottedmag> bluetail8: amdahl's law comes up very often during optimization work.
<bluetail8> dottedmag Perhaps continue in #asahi-offtopic. Please make the connection. I do not get how A is related to B. Where specifically do I have to look it up? What do I need todo in order to naturally need it? Or do i need to know these things before I can do a specific thing? My math knowledge is very basic for instance, but I need some of it
<bluetail8> regularly. What do you mean with optimization work? Why would somebody need to remember those laws...
bcrumb has joined #asahi-gpu
bcrumb has quit []
<alyssa> lina: sounds nice :)
<alyssa> lina: BTW, are you aware of any mechanism that Mesa could use to get the # of primitives generated?
<alyssa> Some hw can plumb in a performance counter for this
<alyssa> the fallback would be generating little 1x1 vertex shaders to increase a counter each draw which is probably fine for a niche feature
possiblemeatball has quit [Quit: Leaving]
<karolherbst> does DP-MST work already? And if not, are there patches to try?
<karolherbst> or well.. at least it doens't work on my M2
<sven> mst will never work on these machines
<sven> it’s just not supported by the hw
<karolherbst> uhh....
<karolherbst> so how do external displays work? USB adapters not doing MST?
<sven> ah, MST is specifically two (or more I guess) video streams over display port and the hw can’t do that
<corion> dp-alt, no?
<sven> DP over the type c ports is just an alternate mode
<karolherbst> corion: I guess...
<karolherbst> I just have some USB-C hubs here which are doing this over MST
<karolherbst> because... why not
<sven> i have some WIP patches somewhere that enable alternate mode and jannau probably has a branch somewhere with them integrated into his latest DCP work
<sven> but this is all still rather unstable
<karolherbst> so I assume things like docking stations are just not properly supported by the hw then?
<sven> M2 only supports a single external display anyway
<karolherbst> well.. that's a different question tho
<sven> you can send two streams tunneled over usb4/thunderbolt
<karolherbst> not my use case
<sven> and there are some socks that support that
<sven> *docks
<karolherbst> yeah.. fair, but that's not what I asked about at all 🙃
<corion> karolherbst: What do you mean by docking stations? With one usb-c (dp-alt) connected to my monitor, I carry mouse+kbd+monitor back to the computer.
<sven> meh, then let someone else help you if you don’t want context
<corion> ... and sound out (headphones plugged into monitor).
<corion> Fairly sure I could easily also carry ethernet back to the computer.
<karolherbst> this is the dock I tried to use:
<karolherbst> with a single display
<karolherbst> but that requires DP-MST
<alyssa> -> #asahi please
<alyssa> or #asahi-dev maybe
<karolherbst> ahh.. sorry
<alyssa> there will be no cursed USB spec references in my house
<alyssa> :p
<karolherbst> :D
<karolherbst> fair
<corion> yes, ma'am.
<alyssa> thank you
cylm_ has joined #asahi-gpu
cylm has quit [Ping timeout: 480 seconds]
iaguis has quit [Quit: Lost terminal]
Dementor has quit [Remote host closed the connection]
Dementor has joined #asahi-gpu
<alyssa> so, I learned last night that sample_mask and zs_emit aren't allowed in the same program
<alyssa> Apple's compiler lowers sample mask writes (including discard_fragment) to emitting a NaN depth, if zs_emit is used anywhere in the program
<alyssa> I did not realize that was necessary and not just an optimization
<alyssa> but it raises its own questions, like how to mask out stores from discarded threads, and how to terminate a quad after all threads in the quad are no longer needed for helper threads
<alyssa> there is an infamous fragment shader, I think it owes to GraphicsFuzz, that does something like:
<alyssa> for (;;) { if (condition that is eventually true) { discard } }
<alyssa> If you don't actually terminate the thread when discarding (merely demoting the thread to a helper invocation), the shader will hang the GPU
<alyssa> but if you actually terminate the thread when discarding -- or you discard a quad when all threads in it are demoted to helper threads -- then the shader will complete successfully
<alyssa> It's not clear a priori how one would implement that on AGX given the zs_emit transform they do
<alyssa> starting with a reference pipeline
<alyssa> adding a store into an FS sets "Set when frag shader spills: true" hm
<alyssa> unk 2 0 ->3
<alyssa> disables early-z
<alyssa> inserts a zs_emit instruction for some reason
<alyssa> Looks like it inserts the equivalent of `gl_FragDepth = gl_FragCoord.z` for some reason
<alyssa> disables triangle merging
<alyssa> sets pass type to punch through
<alyssa> depth to any (!)
<alyssa> (surely that last bit is a mistake?!)
<alyssa> smells like a workaround
<alyssa> okay let's see
<alyssa> what the heck
<alyssa> do I really want to disassemble this in my head
<alyssa> for "if (..) { discard } write depth; side effect;" we get
<alyssa> wait_pix 1; if (..) { zs_emit NaN }; store; zs emit; sigal_pix 1;
<alyssa> so we're allowed multiple zs_emit in a shader and they can terminate threads. maybe.
<alyssa> dummy stencil if needed
<alyssa> ok. I guess this is ok.
<alyssa> 0x7fc00000 is the preferred NaN but IDK if it's actually special
possiblemeatball has joined #asahi-gpu
possiblemeatball has quit [Quit: Leaving]
hertz has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<alyssa> lina: on the verge of saying "screw it" and landing clang-format
pthariensflame has joined #asahi-gpu
pthariensflame has quit [Quit: Textual IRC Client:]
rowin has joined #asahi-gpu
SSJ_GZ has quit [Ping timeout: 480 seconds]
rowin has quit [Remote host closed the connection]
cyrozap has quit [Quit: Client quit]
<alyssa> the deed is done
<alyssa> Guess I should rebase the world
<alyssa> 62 files changed, 3490 insertions(+), 816 deletions(-)
<alyssa> ugh too much agx/next