#asahi-gpu on 2023-04-15 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:46 ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu

00:23 A_L_I_C_E has quit [Ping timeout: 480 seconds]

00:24 A_L_I_C_E has joined #asahi-gpu

00:57 A_L_I_C_E has quit [Read error: Connection reset by peer]

00:58 A_L_I_C_E has joined #asahi-gpu

01:13 possiblemeatball has joined #asahi-gpu

01:17 renatorabelo has joined #asahi-gpu

01:20 Emantor has quit [Quit: ZNC - http://znc.in]

01:20 Emantor has joined #asahi-gpu

02:11 renatorabelo has quit [Read error: Connection reset by peer]

02:41 pthariensflame has joined #asahi-gpu

02:44 alyssa has joined #asahi-gpu

02:44 <alyssa> multisampling_fails--;

02:54 pthariensflame has quit [Quit: Textual IRC Client: www.textualapp.com]

03:11 <alyssa> multisampling_fails--;!nin

03:37 possiblemeatball has quit [Quit: Quit]

04:43 A_L_I_C_E has quit [Ping timeout: 480 seconds]

05:14 A_L_I_C_E has joined #asahi-gpu

05:47 A_L_I_C_E has quit [Ping timeout: 480 seconds]

06:05 A_L_I_C_E has joined #asahi-gpu

06:44 chipxxx has quit [Ping timeout: 480 seconds]

07:11 nsklaus has joined #asahi-gpu

09:28 i509vcb has quit [Quit: Connection closed for inactivity]

09:35 cylm has joined #asahi-gpu

12:31 yuka has quit [Remote host closed the connection]

12:31 possiblemeatball has joined #asahi-gpu

12:32 possiblemeatball has quit [Remote host closed the connection]

12:32 possiblemeatball has joined #asahi-gpu

12:35 kujeger has quit [Quit: ZNC 1.8.2 - https://znc.in]

12:38 kujeger has joined #asahi-gpu

13:24 yuka has joined #asahi-gpu

16:01 nsklaus has quit [Quit: ZZZzzz…]

16:05 cylm_ has joined #asahi-gpu

16:47 pthariensflame has joined #asahi-gpu

16:49 cylm_ has quit [Quit: WeeChat 3.8]

16:50 pthariensflame has quit []

16:54 i509vcb has joined #asahi-gpu

16:59 tertu has quit [Quit: so long...]

17:09 pthariensflame has joined #asahi-gpu

17:11 pthariensflame has quit []

17:31 A_L_I_C_E has quit [Remote host closed the connection]

18:00 nsklaus has joined #asahi-gpu

19:15 alyssa has quit [Quit: leaving]

19:32 drubrkletern has joined #asahi-gpu

20:02 ChaosPrincess has quit [Quit: WeeChat 3.8]

20:02 ChaosPrincess has joined #asahi-gpu

20:06 zocker has joined #asahi-gpu

20:20 Hibyehello has quit [Ping timeout: 480 seconds]

20:29 darkapex1 has joined #asahi-gpu

20:31 darkapex has quit [Ping timeout: 480 seconds]

20:31 nela has quit [Ping timeout: 480 seconds]

20:43 drubrkletern has quit [Ping timeout: 480 seconds]

20:50 nela has joined #asahi-gpu

20:53 drubrkletern has joined #asahi-gpu

21:43 Z750 has quit [Quit: bye]

21:44 Z750 has joined #asahi-gpu

22:32 yuka has quit [Remote host closed the connection]

22:33 yuka has joined #asahi-gpu

22:54 alyssa has joined #asahi-gpu

22:55 <alyssa> ok, dolphin apples to apples

22:55 <alyssa> on macos

22:55 <alyssa> melty molten galaxy trace

22:55 <alyssa> 4K specialized shader: 63fps

22:55 <alyssa> 4K ubershaders: 11.5fps

22:56 <alyssa> (moltenvk)

22:57 <alyssa> 4K special shader: ~100fps

22:57 <alyssa> 4K ubershaders: 11.5fps

22:58 <alyssa> so those are our targets ^^

22:58 <alyssa> oh, also:

22:59 <alyssa> native ubershaders (metal): 120fps

22:59 <alyssa> native ubershaders (moltenvk): 120fps

23:00 <alyssa> and just for completeness, on OpenGL:

23:00 <alyssa> native ubershaders: 120fps

23:01 <alyssa> 4K specialized: 54fps

23:01 <alyssa> 4K exclusive: 10fps

23:02 nsklaus has quit [Quit: ZZZzzz…]

23:09 <alyssa> --------------------------------

23:10 <alyssa> Over on agx/next, GLES backend

23:10 <alyssa> 4K specialized: 47fps

23:10 <alyssa> 4K uber: 4.3fps

23:11 <alyssa> native uber: 78fps

23:11 <alyssa> ---------------------

23:11 <alyssa> so... we have work to do.

23:13 <alyssa> 1. 4K specialized perf is slightly behind Apple's GL driver, and about half what it should be (Metal backend)

23:13 <alyssa> 2. 4K uber is a bit less than half of Apple's GL driver and a bit more than a third of what it should be (Metal)

23:14 <alyssa> 3. native uber is not worth quantifying since macos is hitting the 120fps cap here on all backends, we need to be 50% faster to do the same

23:14 <alyssa> so... we have a lot of work to go still :|

23:15 <alyssa> at least it's not 10x slower...

23:15 <alyssa> somewhere between 2x to 3x slower on this workload, though

23:15 <alyssa> it's not 100% clear to me if this is apples to apples (no pun intended)

23:15 <alyssa> given the different backends involved

23:17 <alyssa> some of this (especially ubershader perf) is going to be compiler related, I imagine

23:19 <alyssa> TellowKrinkle: Seeing as you wrote the winning backend, any ideas what to look into for dolphin go fast? O:)

23:19 <TellowKrinkle> BTW afaik most of the difference between Metal and MVK on macOS is whether we have texture compression enabled or not (at the moment, MVK specifies MTLTextureUsagePixelFormatView on pretty much all textures, which disables that).

23:19 <alyssa> ack

23:20 <alyssa> I got big wins already from compression, yeah

23:20 <TellowKrinkle> You can test that by going to the Metal renderer's texture allocation code and adding the PFV flag there

23:20 <alyssa> (we weren't compressing array textures before, that won back a ton of perf)

23:20 <TellowKrinkle> Ahh yeah, Dolphin is 100% array textures

23:21 <alyssa> Yep yep

23:21 <alyssa> Ok, I had locally disabled z/s compression since it was causing issues

23:22 <alyssa> but forcing it on anyway (Dolphin rendering still seems fine)

23:22 <alyssa> 4K specialized: 60fps

23:22 <alyssa> 4K uber: 4.3fps

23:22 <alyssa> native uber: 78fps

23:23 <alyssa> ---

23:23 tertu has joined #asahi-gpu

23:23 <alyssa> so after fixing z/s compression, our gles beats apple's gl

23:23 <alyssa> for specialized shaders

23:24 <alyssa> needle doesn't move for ubershaders since I guess those aren't bandwidth bound

23:25 <alyssa> (our gles specialized is sitill not up to metal -- 60fps vs 100fps -- though maybe some of those differences are within dolphin? maybe?)

23:26 <alyssa> I wonder how much ubershaders are getting hurt by the serialization of the fragment shaders

23:26 <alyssa> because they're reading gl_FragColor right at the top and then don't use it until the bottom

23:26 <alyssa> which forces overdraw to get serialized... shouldn't be too hard to sink the load down, I think

23:27 <TellowKrinkle> If you look at the decompilation of ubershaders, Metal does the same

23:27 <alyssa> sink the load or not?

23:27 <TellowKrinkle> Load happens before the main ubershader loop

23:27 <alyssa> Oh

23:27 <alyssa> Well. That's not great but not our problem then, heh

23:28 * alyssa should maybe look at one of those disassembled Metal ubershaders

23:29 * alyssa doesn't want to be 37% the performance of Metal :-V

23:29 <alyssa> (Granted, 37% the performance of Metal would still be totally playable. And there's piles of content we'll have available that's just not possible on macOS but still.)

23:30 <alyssa> hopefully some of that is something dumb

23:30 <alyssa> there's a long tail of ALU cycle shaving we could do for the ubershaders, but I'm not convinced any of it would move the needle

23:30 drubrkletern has quit [Remote host closed the connection]

23:31 <alyssa> seeing as I've literally never seen an ALU saving optimization matter for fps

23:32 <alyssa> scheduling might be significant for the ubershaders though

23:32 <alyssa> in terms of both pressure and latency, given how big these are

23:33 <TellowKrinkle> Uhh do you know if things like HSR and stuff are getting enabled for non-ubershaders? I've noticed Apple GPUs having a much larger gap between uber and non-uber shaders than e.g. AMD, and have been wondering if that's due to Apple GPUs getting a good bit of their performance from optimizations that get blocked by ubershaders.

23:34 <TellowKrinkle> Hybrid in particular sees much worse slowdowns when small amounts of ubershader are inserted into an otherwise non-ubershader render than I see on AMD

23:34 <alyssa> HSR is going to be hurt by the gl_FragDepth write, I think

23:35 <TellowKrinkle> We shouldn't do that in specialized shaders though

23:36 <TellowKrinkle> (Unless you have accurate depth or whatever that flag is called enabled)

23:36 <alyssa> right. if specialized shaders don't usually gl_FragDepth write, but ubershaders do, that'll be a gap

23:36 <alyssa> as I mentioned above loading gl_FragColor at the top of the shader is going to force the shaders to be totally serialized, which is also going to magnify the issue (maybe)

23:39 <alyssa> here I'm going to try copypasting some code from panfrost and see if that makes things go brr

23:40 <TellowKrinkle> BTW does gpuvis work on Asahi yet? That might be a good way to see if we're running into scheduling issues.

23:40 <alyssa> probably not

23:40 * alyssa looks up what gpuvis is

23:40 <alyssa> oh that

23:41 <alyssa> 's a question for lina I think

23:42 <alyssa> TellowKrinkle: also is there any cool stuff Dolphin can do on Linux but not macOS that I can use to sell this if perf doesn't pan out? :~P

23:43 <TellowKrinkle> I've used Apple's equivalent of that for debugging almost all the GPU-related optimizations I've done for either PCSX2 or Dolphin

23:45 <TellowKrinkle> I don't think Dolphin has too much Linux specific stuff. I'm not too familiar with the more uncommon stuff though, I only joined the project a year ago.

23:47 <alyssa> Aww :p

23:47 <alyssa> Was worth a try ;P

23:51 <TellowKrinkle> Let me quick run a no-PFV test of melty molten galaxy and make sure that accounts for all the performance difference between MVK and Metal