#asahi-gpu on 2022-12-27 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:46 ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu

00:05 possiblemeatball has joined #asahi-gpu

00:09 possiblemeatball has quit [Remote host closed the connection]

00:11 possiblemeatball has joined #asahi-gpu

00:24 possiblemeatball has quit [Quit: Leaving]

01:00 possiblemeatball has joined #asahi-gpu

01:03 amarioguy has quit [Remote host closed the connection]

02:34 possiblemeatball has quit [Quit: Leaving]

02:59 alyssa has joined #asahi-gpu

02:59 <alyssa> lina: took a cue from the "wtf pstates?" discussion and hacked my DTS to use 0x6 as the new base

02:59 <alyssa> t-rex is >15% faster

03:00 <alyssa> not 5x faster but that's still substantial

04:32 <alyssa> today on broken nir_lower_blend: per-RT logic ops?

05:03 <lina> alyssa: 3 is 720MHz and 6 is 1287 MHz, so I would expect 78% faster if it scales with clock

05:04 <lina> however, 1 (which is the base on laptops) is 396MHz, so if the laptops are stuck there that's quite a bit worse...

08:02 SSJ_GZ has joined #asahi-gpu

10:22 thevar1able has quit [Remote host closed the connection]

10:23 thevar1able has joined #asahi-gpu

10:25 thevar1able has quit [Remote host closed the connection]

10:27 thevar1able has joined #asahi-gpu

10:47 <bluetail8> alyssa will it affect longevity negatively?

10:48 <bluetail8> is it effectively an overclock or is it within apples legit p state design?

10:48 <bluetail8> 15% is a ton

10:56 <chadmed> aiui we cannot overclock this hardware as the pstates are controlled by the platform and we just ask the firmware nicely to switch states

11:47 <bluetail8> oh. Thats good

12:01 yuyichao_ has joined #asahi-gpu

12:03 iaguis has joined #asahi-gpu

12:04 yuyichao has quit [Ping timeout: 480 seconds]

12:32 karolherbst has quit [Remote host closed the connection]

12:35 karolherbst has joined #asahi-gpu

13:35 yuyichao_ has quit [Quit: Konversation terminated!]

13:36 yuyichao has joined #asahi-gpu

14:04 possiblemeatball has joined #asahi-gpu

15:41 <alyssa> lina: I would not expect 78% faster for 720 -> 1287, because that's the clock only for actual GPU operation

15:42 <alyssa> but does not affect the latency nor throughput of system memory (which we often stall on)

15:42 <alyssa> and does not affect overhead on the CPU to actually submit work (right now the GPU is idle a ton of the time -- GPU speed ups are then subject to Amdahl's law)

15:56 <lina> alyssa: What I have identified as "utilization" in the metrics does show 100%, but that can't be right with the CPU stalls... so now I wonder what that really measures...

15:56 <alyssa> yeah that's definitely wrong

16:01 <lina> alyssa: Ah, I bet I know what this is. GPU active %, meaning powered on.

16:01 <lina> So for anything >500 FPS or so, that's 100%

16:02 <lina> And indeed glxgears goes to 17-23%, which is what I'd expect for something vsynced that actually lets the GPU power down between frames

16:03 <lina> So really, what I'm calling the stats channel seems to be quite specifically power stats. It's all pstates, power on/off events, power consumption, temperatures, and the like.

16:05 <alyssa> right, ok. that I would believe, yes

16:05 <bluetail8> alyssa Do you know these laws by heart? Yes, the internet is full of knowledge. But I think its something you learn in education which I was denied for the most part.

16:05 <alyssa> power stats vs performance stats I suppose

16:07 <lina> alyssa: I should just expose these properly as a sysfs file... it'd be nice to at least have an agxtop showing power consumption and idle times and such...

16:08 <lina> For performance, I really should just hook up the ktrace channel to perf and get you pretty logs

16:09 <lina> That has events like "TA started", "TA complete", "3D started", "3D complete" but also partial renders, event delivery, and things like that, all timestamped

16:09 <lina> And at least some of those actually include the UUID that is plumbed all the way to the UAPI if I remember correctly, so you can even correlate it with what mesa is doing ^^

16:34 <dottedmag> bluetail8: amdahl's law comes up very often during optimization work.

16:37 <bluetail8> dottedmag Perhaps continue in #asahi-offtopic. Please make the connection. I do not get how A is related to B. Where specifically do I have to look it up? What do I need todo in order to naturally need it? Or do i need to know these things before I can do a specific thing? My math knowledge is very basic for instance, but I need some of it

16:37 <bluetail8> regularly. What do you mean with optimization work? Why would somebody need to remember those laws...

16:42 bcrumb has joined #asahi-gpu

16:43 bcrumb has quit []

16:55 <alyssa> lina: sounds nice :)

16:55 <alyssa> lina: BTW, are you aware of any mechanism that Mesa could use to get the # of primitives generated?

16:55 <alyssa> Some hw can plumb in a performance counter for this

16:56 <alyssa> the fallback would be generating little 1x1 vertex shaders to increase a counter each draw which is probably fine for a niche feature

17:04 possiblemeatball has quit [Quit: Leaving]

17:10 <karolherbst> does DP-MST work already? And if not, are there patches to try?

17:10 <karolherbst> or well.. at least it doens't work on my M2

17:14 <sven> mst will never work on these machines

17:14 <sven> it’s just not supported by the hw

17:16 <karolherbst> uhh....

17:17 <karolherbst> so how do external displays work? USB adapters not doing MST?

17:18 <sven> ah, MST is specifically two (or more I guess) video streams over display port and the hw can’t do that

17:18 <corion> dp-alt, no?

17:18 <sven> DP over the type c ports is just an alternate mode

17:18 <karolherbst> corion: I guess...

17:18 <karolherbst> I just have some USB-C hubs here which are doing this over MST

17:18 <karolherbst> because... why not

17:19 <sven> i have some WIP patches somewhere that enable alternate mode and jannau probably has a branch somewhere with them integrated into his latest DCP work

17:19 <sven> but this is all still rather unstable

17:19 <karolherbst> so I assume things like docking stations are just not properly supported by the hw then?

17:19 <sven> M2 only supports a single external display anyway

17:19 <karolherbst> well.. that's a different question tho

17:20 <sven> you can send two streams tunneled over usb4/thunderbolt

17:20 <karolherbst> not my use case

17:20 <sven> and there are some socks that support that

17:20 <sven> *docks

17:20 <karolherbst> yeah.. fair, but that's not what I asked about at all 🙃

17:20 <corion> karolherbst: What do you mean by docking stations? With one usb-c (dp-alt) connected to my monitor, I carry mouse+kbd+monitor back to the computer.

17:21 <sven> meh, then let someone else help you if you don’t want context

17:21 <corion> ... and sound out (headphones plugged into monitor).

17:21 <corion> Fairly sure I could easily also carry ethernet back to the computer.

17:21 <karolherbst> this is the dock I tried to use: https://i-tec.pro/wp-content/uploads/vizu-27-1024x1024.png

17:21 <karolherbst> with a single display

17:21 <karolherbst> but that requires DP-MST

17:21 <alyssa> -> #asahi please

17:21 <alyssa> or #asahi-dev maybe

17:22 <karolherbst> ahh.. sorry

17:22 <alyssa> there will be no cursed USB spec references in my house

17:22 <alyssa> :p

17:22 <karolherbst> :D

17:22 <karolherbst> fair

17:22 <corion> yes, ma'am.

17:40 <alyssa> thank you

17:41 cylm_ has joined #asahi-gpu

17:43 cylm has quit [Ping timeout: 480 seconds]

18:17 iaguis has quit [Quit: Lost terminal]

18:20 Dementor has quit [Remote host closed the connection]

18:23 Dementor has joined #asahi-gpu

18:46 <alyssa> so, I learned last night that sample_mask and zs_emit aren't allowed in the same program

18:46 <alyssa> Apple's compiler lowers sample mask writes (including discard_fragment) to emitting a NaN depth, if zs_emit is used anywhere in the program

18:46 <alyssa> I did not realize that was necessary and not just an optimization

18:47 <alyssa> but it raises its own questions, like how to mask out stores from discarded threads, and how to terminate a quad after all threads in the quad are no longer needed for helper threads

18:48 <alyssa> there is an infamous fragment shader, I think it owes to GraphicsFuzz, that does something like:

18:48 <alyssa> for (;;) { if (condition that is eventually true) { discard } }

18:49 <alyssa> If you don't actually terminate the thread when discarding (merely demoting the thread to a helper invocation), the shader will hang the GPU

18:49 <alyssa> but if you actually terminate the thread when discarding -- or you discard a quad when all threads in it are demoted to helper threads -- then the shader will complete successfully

18:50 <alyssa> It's not clear a priori how one would implement that on AGX given the zs_emit transform they do

19:22 <alyssa> starting with a reference pipeline

19:23 <alyssa> adding a store into an FS sets "Set when frag shader spills: true" hm

19:23 <alyssa> unk 2 0 ->3

19:23 <alyssa> disables early-z

19:23 <alyssa> inserts a zs_emit instruction for some reason

19:24 <alyssa> Looks like it inserts the equivalent of `gl_FragDepth = gl_FragCoord.z` for some reason

19:24 <alyssa> disables triangle merging

19:24 <alyssa> sets pass type to punch through

19:24 <alyssa> depth to any (!)

19:24 <alyssa> (surely that last bit is a mistake?!)

19:25 <alyssa> smells like a workaround

19:31 <alyssa> okay let's see

19:31 <alyssa> what the heck

19:31 <alyssa> do I really want to disassemble this in my head

19:32 <alyssa> for "if (..) { discard } write depth; side effect;" we get

19:32 <alyssa> wait_pix 1; if (..) { zs_emit NaN }; store; zs emit; sigal_pix 1;

19:32 <alyssa> so we're allowed multiple zs_emit in a shader and they can terminate threads. maybe.

19:34 <alyssa> dummy stencil if needed

19:34 <alyssa> ok. I guess this is ok.

19:37 <alyssa> 0x7fc00000 is the preferred NaN but IDK if it's actually special

19:58 possiblemeatball has joined #asahi-gpu

20:59 possiblemeatball has quit [Quit: Leaving]

21:01 hertz has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

21:05 <alyssa> lina: on the verge of saying "screw it" and landing clang-format

21:49 pthariensflame has joined #asahi-gpu

22:00 pthariensflame has quit [Quit: Textual IRC Client: www.textualapp.com]

22:46 rowin has joined #asahi-gpu

22:51 SSJ_GZ has quit [Ping timeout: 480 seconds]

23:18 rowin has quit [Remote host closed the connection]

23:49 cyrozap has quit [Quit: Client quit]

23:51 <alyssa> the deed is done

23:51 <alyssa> Guess I should rebase the world

23:59 <alyssa> 62 files changed, 3490 insertions(+), 816 deletions(-)

23:59 <alyssa> ugh too much agx/next