ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
possiblemeatball has quit [Quit: Quit]
<alyssa> scheduler doesn't break anything but doesn't seem to be much help
<alyssa> this is despite shader-db showing thread count improvements in a pile of ubershaders
<alyssa> eg
<alyssa> threads helped: dolphin-ubers/221.shader_test MESA_SHADER_FRAGMENT: 512 -> 576 (12.50%)
<alyssa> threads helped: dolphin-ubers/231.shader_test MESA_SHADER_FRAGMENT: 448 -> 512 (14.29%)
<TellowKrinkle> Okay can confirm, if you add MTLTextureUsagePixelFormatView to all textures, Metal performs within 1fps of MoltenVK at 4K
<TellowKrinkle> BTW if you're curious, here's a Dolphin ubershader and its disassembly as of about 4 months ago: https://gist.github.com/TellowKrinkle/ced8ea4781f4a2bcfa292ac50d93dc25
possiblemeatball has joined #asahi-gpu
<alyssa> eyes
<alyssa> no mystery ops, that's a good sign
<alyssa> TellowKrinkle: re ld_tile placement, Metal isn't sinking it past the first block but it is sinking it to the bottom of the first block
<alyssa> first ~40 instructions benefit from some parallelism
<alyssa> register cache hinting might help if that's our bottleneck
<alyssa> idk. if it's "death by a thousand cuts" i guess we can deal with that, but usually there's something more glaring
<alyssa> (given that we're off at 37% of where we should be... if it were 90% there I'd say yeah all the little stuff matters)
<alyssa> opt sink+move hack:
<alyssa> spec 4K: 61fps
<alyssa> native uber: 105fps
<alyssa> uber 4K: 5fps
<alyssa> ---
<alyssa> so a big chunk of the gap seems to be register pressure
<alyssa> (43% of the way there :^)
<TellowKrinkle> BTW here's the gpu activity chart for one frame of specialized 4K in Metal: https://tellowkrinkle.com/img/DolphinMeltyMoltenGalaxyMetal6xInstruments.png
<TellowKrinkle> Looks like runs 10 render passes per frame, and gets nearly full occupancy throughout most of it
user982492 has joined #asahi-gpu
<TellowKrinkle> Looking at melty molten galaxy in a Metal GPU trace and most of the time is spent on the dumbest things, and it looks like they may actually be ALU bound.
<TellowKrinkle> 2ms spent in a fullscreen color clear, which samples a 4x4 all-white texture for who knows what reason. 83 instructions, 12 registers for the fragment shader.
<TellowKrinkle> 1.5ms spent in a fullscreen depth clear (color writes are masked), which also samples that 4x4 all-white texture. 76 instructions, 12 registers.
<TellowKrinkle> 1.4ms spent drawing lava, our first actually useful draw. 219 instructions, 16 registers.
cylm has quit [Ping timeout: 480 seconds]
<TellowKrinkle> 1.2ms spent drawing the ground, 163 instructions, 12 registers.
<TellowKrinkle> 1ms spent making an EFB copy to VRAM, this one's most likely memory bound
hightower4 has joined #asahi-gpu
<TellowKrinkle> Another 1ms making another EFB copy, same as above
user982492_ has joined #asahi-gpu
hightower3 has quit [Ping timeout: 480 seconds]
<TellowKrinkle> 700µs in some shader that does an alpha-blended distortion effect. Samples an EFB copy, so might be memory bound. 231 instructions, 16 registers.
user982492 has quit [Ping timeout: 480 seconds]
<TellowKrinkle> 500µs in an alpha-only EFB copy shader
<TellowKrinkle> 400µs in a full screen pass that just seems to add a shadow for Mario. 55 instructions, 8 registers.
<TellowKrinkle> There's a few more around 200µs, and the rest take up pretty much nothing
cylm has joined #asahi-gpu
<TellowKrinkle> Here's the color + depth clear shader that manages to take up 1/5 of all time spent rendering that dff https://gist.github.com/TellowKrinkle/9e9f1f56a2756864c3814471f45ff064
tertu has quit [Quit: so long...]
deflax has quit []
possiblemeatball has quit [Quit: Quit]
kl has quit [Quit: ZNC - https://znc.in]
kilolima has joined #asahi-gpu
stipa is now known as Guest11221
stipa has joined #asahi-gpu
Guest11221 has quit [Ping timeout: 480 seconds]
user982492_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
pthariensflame has joined #asahi-gpu
pthariensflame has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
pthariensflame has joined #asahi-gpu
pthariensflame has quit [Quit: Textual IRC Client: www.textualapp.com]
darkapex1 has quit []
darkapex has joined #asahi-gpu
nsklaus has joined #asahi-gpu
kit_ty_kate has quit [Quit: WeeChat 3.6]
i509vcb has quit [Quit: Connection closed for inactivity]
hightower4 has quit []
cylm has quit [Quit: WeeChat 3.6]
cylm has joined #asahi-gpu
possiblemeatball has joined #asahi-gpu
chipxxx has joined #asahi-gpu
sosys has joined #asahi-gpu
sosys has quit [Ping timeout: 480 seconds]
possiblemeatball has quit [Quit: Quit]
sosys has joined #asahi-gpu
alyssa has quit [Quit: leaving]
sosys has quit [Quit: leaving]
user982492 has joined #asahi-gpu
i509vcb has joined #asahi-gpu
zalyx0 has quit [Quit: later alligator]
drubrkletern has joined #asahi-gpu
pthariensflame has joined #asahi-gpu
pthariensflame has quit []
landscape15 has joined #asahi-gpu
landscape15 has quit []
c10l has quit [Ping timeout: 480 seconds]
c10l has joined #asahi-gpu
user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
user982492 has joined #asahi-gpu
user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
drubrkletern has quit [Remote host closed the connection]
possiblemeatball has joined #asahi-gpu
user982492 has joined #asahi-gpu
yrlf has quit [Quit: The Lounge - https://thelounge.chat]
yrlf has joined #asahi-gpu
pbsds has quit [Ping timeout: 480 seconds]
zalyx0 has joined #asahi-gpu
tertu has joined #asahi-gpu
zalyx0 has quit [Quit: later alligator]
zalyx0 has joined #asahi-gpu
zalyx0 has quit []
chadmed_ has joined #asahi-gpu
zalyx0 has joined #asahi-gpu
hightower2 has joined #asahi-gpu
nsklaus has quit [Quit: ZZZzzz…]
sosys has joined #asahi-gpu