ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
A_L_I_C_E has quit [Ping timeout: 480 seconds]
A_L_I_C_E has joined #asahi-gpu
A_L_I_C_E has quit [Read error: Connection reset by peer]
A_L_I_C_E has joined #asahi-gpu
possiblemeatball has joined #asahi-gpu
renatorabelo has joined #asahi-gpu
Emantor has quit [Quit: ZNC - http://znc.in]
Emantor has joined #asahi-gpu
renatorabelo has quit [Read error: Connection reset by peer]
pthariensflame has joined #asahi-gpu
alyssa has joined #asahi-gpu
<alyssa> multisampling_fails--;
pthariensflame has quit [Quit: Textual IRC Client: www.textualapp.com]
<alyssa> multisampling_fails--;!nin
possiblemeatball has quit [Quit: Quit]
A_L_I_C_E has quit [Ping timeout: 480 seconds]
A_L_I_C_E has joined #asahi-gpu
A_L_I_C_E has quit [Ping timeout: 480 seconds]
A_L_I_C_E has joined #asahi-gpu
chipxxx has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-gpu
i509vcb has quit [Quit: Connection closed for inactivity]
cylm has joined #asahi-gpu
yuka has quit [Remote host closed the connection]
possiblemeatball has joined #asahi-gpu
possiblemeatball has quit [Remote host closed the connection]
possiblemeatball has joined #asahi-gpu
kujeger has quit [Quit: ZNC 1.8.2 - https://znc.in]
kujeger has joined #asahi-gpu
yuka has joined #asahi-gpu
nsklaus has quit [Quit: ZZZzzz…]
cylm_ has joined #asahi-gpu
pthariensflame has joined #asahi-gpu
cylm_ has quit [Quit: WeeChat 3.8]
pthariensflame has quit []
i509vcb has joined #asahi-gpu
tertu has quit [Quit: so long...]
pthariensflame has joined #asahi-gpu
pthariensflame has quit []
A_L_I_C_E has quit [Remote host closed the connection]
nsklaus has joined #asahi-gpu
alyssa has quit [Quit: leaving]
drubrkletern has joined #asahi-gpu
ChaosPrincess has quit [Quit: WeeChat 3.8]
ChaosPrincess has joined #asahi-gpu
zocker has joined #asahi-gpu
Hibyehello has quit [Ping timeout: 480 seconds]
darkapex1 has joined #asahi-gpu
darkapex has quit [Ping timeout: 480 seconds]
nela has quit [Ping timeout: 480 seconds]
drubrkletern has quit [Ping timeout: 480 seconds]
nela has joined #asahi-gpu
drubrkletern has joined #asahi-gpu
Z750 has quit [Quit: bye]
Z750 has joined #asahi-gpu
yuka has quit [Remote host closed the connection]
yuka has joined #asahi-gpu
alyssa has joined #asahi-gpu
<alyssa> ok, dolphin apples to apples
<alyssa> on macos
<alyssa> melty molten galaxy trace
<alyssa> 4K specialized shader: 63fps
<alyssa> 4K ubershaders: 11.5fps
<alyssa> (moltenvk)
<alyssa> 4K special shader: ~100fps
<alyssa> 4K ubershaders: 11.5fps
<alyssa> so those are our targets ^^
<alyssa> oh, also:
<alyssa> native ubershaders (metal): 120fps
<alyssa> native ubershaders (moltenvk): 120fps
<alyssa> and just for completeness, on OpenGL:
<alyssa> native ubershaders: 120fps
<alyssa> 4K specialized: 54fps
<alyssa> 4K exclusive: 10fps
nsklaus has quit [Quit: ZZZzzz…]
<alyssa> --------------------------------
<alyssa> Over on agx/next, GLES backend
<alyssa> 4K specialized: 47fps
<alyssa> 4K uber: 4.3fps
<alyssa> native uber: 78fps
<alyssa> ---------------------
<alyssa> so... we have work to do.
<alyssa> 1. 4K specialized perf is slightly behind Apple's GL driver, and about half what it should be (Metal backend)
<alyssa> 2. 4K uber is a bit less than half of Apple's GL driver and a bit more than a third of what it should be (Metal)
<alyssa> 3. native uber is not worth quantifying since macos is hitting the 120fps cap here on all backends, we need to be 50% faster to do the same
<alyssa> so... we have a lot of work to go still :|
<alyssa> at least it's not 10x slower...
<alyssa> somewhere between 2x to 3x slower on this workload, though
<alyssa> it's not 100% clear to me if this is apples to apples (no pun intended)
<alyssa> given the different backends involved
<alyssa> some of this (especially ubershader perf) is going to be compiler related, I imagine
<alyssa> TellowKrinkle: Seeing as you wrote the winning backend, any ideas what to look into for dolphin go fast? O:)
<TellowKrinkle> BTW afaik most of the difference between Metal and MVK on macOS is whether we have texture compression enabled or not (at the moment, MVK specifies MTLTextureUsagePixelFormatView on pretty much all textures, which disables that).
<alyssa> ack
<alyssa> I got big wins already from compression, yeah
<TellowKrinkle> You can test that by going to the Metal renderer's texture allocation code and adding the PFV flag there
<alyssa> (we weren't compressing array textures before, that won back a ton of perf)
<TellowKrinkle> Ahh yeah, Dolphin is 100% array textures
<alyssa> Yep yep
<alyssa> Ok, I had locally disabled z/s compression since it was causing issues
<alyssa> but forcing it on anyway (Dolphin rendering still seems fine)
<alyssa> 4K specialized: 60fps
<alyssa> 4K uber: 4.3fps
<alyssa> native uber: 78fps
<alyssa> ---
tertu has joined #asahi-gpu
<alyssa> so after fixing z/s compression, our gles beats apple's gl
<alyssa> for specialized shaders
<alyssa> needle doesn't move for ubershaders since I guess those aren't bandwidth bound
<alyssa> (our gles specialized is sitill not up to metal -- 60fps vs 100fps -- though maybe some of those differences are within dolphin? maybe?)
<alyssa> I wonder how much ubershaders are getting hurt by the serialization of the fragment shaders
<alyssa> because they're reading gl_FragColor right at the top and then don't use it until the bottom
<alyssa> which forces overdraw to get serialized... shouldn't be too hard to sink the load down, I think
<TellowKrinkle> If you look at the decompilation of ubershaders, Metal does the same
<alyssa> sink the load or not?
<TellowKrinkle> Load happens before the main ubershader loop
<alyssa> Oh
<alyssa> Well. That's not great but not our problem then, heh
* alyssa should maybe look at one of those disassembled Metal ubershaders
* alyssa doesn't want to be 37% the performance of Metal :-V
<alyssa> (Granted, 37% the performance of Metal would still be totally playable. And there's piles of content we'll have available that's just not possible on macOS but still.)
<alyssa> hopefully some of that is something dumb
<alyssa> there's a long tail of ALU cycle shaving we could do for the ubershaders, but I'm not convinced any of it would move the needle
drubrkletern has quit [Remote host closed the connection]
<alyssa> seeing as I've literally never seen an ALU saving optimization matter for fps
<alyssa> scheduling might be significant for the ubershaders though
<alyssa> in terms of both pressure and latency, given how big these are
<TellowKrinkle> Uhh do you know if things like HSR and stuff are getting enabled for non-ubershaders? I've noticed Apple GPUs having a much larger gap between uber and non-uber shaders than e.g. AMD, and have been wondering if that's due to Apple GPUs getting a good bit of their performance from optimizations that get blocked by ubershaders.
<TellowKrinkle> Hybrid in particular sees much worse slowdowns when small amounts of ubershader are inserted into an otherwise non-ubershader render than I see on AMD
<alyssa> HSR is going to be hurt by the gl_FragDepth write, I think
<TellowKrinkle> We shouldn't do that in specialized shaders though
<TellowKrinkle> (Unless you have accurate depth or whatever that flag is called enabled)
<alyssa> right. if specialized shaders don't usually gl_FragDepth write, but ubershaders do, that'll be a gap
<alyssa> as I mentioned above loading gl_FragColor at the top of the shader is going to force the shaders to be totally serialized, which is also going to magnify the issue (maybe)
<alyssa> here I'm going to try copypasting some code from panfrost and see if that makes things go brr
<TellowKrinkle> BTW does gpuvis work on Asahi yet? That might be a good way to see if we're running into scheduling issues.
<alyssa> probably not
* alyssa looks up what gpuvis is
<alyssa> oh that
<alyssa> 's a question for lina I think
<alyssa> TellowKrinkle: also is there any cool stuff Dolphin can do on Linux but not macOS that I can use to sell this if perf doesn't pan out? :~P
<TellowKrinkle> I've used Apple's equivalent of that for debugging almost all the GPU-related optimizations I've done for either PCSX2 or Dolphin
<TellowKrinkle> I don't think Dolphin has too much Linux specific stuff. I'm not too familiar with the more uncommon stuff though, I only joined the project a year ago.
<alyssa> Aww :p
<alyssa> Was worth a try ;P
<TellowKrinkle> Let me quick run a no-PFV test of melty molten galaxy and make sure that accounts for all the performance difference between MVK and Metal