marcan changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
Gaspare has quit [Quit: Gaspare]
cylm_ has quit [Ping timeout: 480 seconds]
dorkbutt has quit [Read error: Connection reset by peer]
smoke has joined #asahi-gpu
swaggie has quit [Remote host closed the connection]
c10l has quit [Read error: Connection reset by peer]
c10l has joined #asahi-gpu
pjakobsson has joined #asahi-gpu
pjakobsson_ has quit [Ping timeout: 480 seconds]
amarioguy has joined #asahi-gpu
amarioguy has quit [Remote host closed the connection]
amarioguy has joined #asahi-gpu
user982492 has joined #asahi-gpu
swaggie has joined #asahi-gpu
smoke has quit []
pjakobsson_ has joined #asahi-gpu
WindowPain_ is now known as WindowPain
pjakobsson has quit [Ping timeout: 480 seconds]
user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
swaggie has quit [Remote host closed the connection]
c10l has quit [Quit: Ping timeout (120 seconds)]
swaggie has joined #asahi-gpu
c10l has joined #asahi-gpu
swaggie has quit [Ping timeout: 480 seconds]
swaggie has joined #asahi-gpu
<lina> Yay! ^^
c10l has quit [Quit: Ping timeout (120 seconds)]
SSJ_GZ has joined #asahi-gpu
c10l has joined #asahi-gpu
galileo has joined #asahi-gpu
c10l has quit [Ping timeout: 480 seconds]
galileo has quit [Remote host closed the connection]
galileo has joined #asahi-gpu
c10l has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
galileo has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
galileo has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
swaggie has quit [Remote host closed the connection]
swaggie has joined #asahi-gpu
galileo has joined #asahi-gpu
swaggie has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #asahi-gpu
c10l has quit [Ping timeout: 480 seconds]
c10l has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
swaggie has joined #asahi-gpu
galileo has joined #asahi-gpu
swaggie has quit [Ping timeout: 480 seconds]
pjakobsson has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
pjakobsson_ has quit [Ping timeout: 480 seconds]
<handlerug> has glReadPixels been implemented or optimized? just wondering because I believe wlr-screencopy-unstable-v1 uses glReadPixels, and for some reason screenshots take around 4 seconds
MajorBiscuit has quit [Quit: WeeChat 3.6]
MajorBiscuit has joined #asahi-gpu
r0ni has quit [Quit: Textual IRC Client: www.textualapp.com]
swaggie has joined #asahi-gpu
galileo has joined #asahi-gpu
c10l has quit [Ping timeout: 480 seconds]
galileo has quit [Remote host closed the connection]
c10l has joined #asahi-gpu
bcrumb has joined #asahi-gpu
galileo has joined #asahi-gpu
swaggie has quit [Ping timeout: 480 seconds]
galileo has quit [Remote host closed the connection]
bcrumb has quit [Quit: WeeChat 3.7.1]
galileo has joined #asahi-gpu
bcrumb has joined #asahi-gpu
cylm_ has joined #asahi-gpu
bcrumb has quit [Read error: Connection reset by peer]
galileo has quit [Remote host closed the connection]
c10l has quit [Quit: Ping timeout (120 seconds)]
swaggie has joined #asahi-gpu
swaggie has quit [Ping timeout: 480 seconds]
galileo has joined #asahi-gpu
c10l has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
galileo has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
galileo has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
bcrumb has joined #asahi-gpu
galileo has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
Gaspare has joined #asahi-gpu
c10l has quit [Quit: Ping timeout (120 seconds)]
galileo has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
galileo has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
bcrumb has quit [Ping timeout: 480 seconds]
galileo has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
Gaspare has quit [Quit: Gaspare]
galileo has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
galileo has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
galileo has joined #asahi-gpu
galileo has quit [Remote host closed the connection]
galileo has joined #asahi-gpu
galileo has quit []
iaguis has joined #asahi-gpu
MajorBiscuit has quit [Read error: Connection reset by peer]
swaggie has joined #asahi-gpu
swaggie has quit [Ping timeout: 480 seconds]
LinuxM1 has joined #asahi-gpu
swaggie has joined #asahi-gpu
swaggie has quit [Ping timeout: 480 seconds]
swaggie has joined #asahi-gpu
swaggie has quit [Ping timeout: 480 seconds]
iaguis has quit [Quit: leaving]
swaggie has joined #asahi-gpu
swaggie has quit [Ping timeout: 480 seconds]
LinuxM1 has quit [Quit: Leaving]
bcrumb has joined #asahi-gpu
bcrumb has quit []
alyssa has joined #asahi-gpu
<alyssa> handlerug: Implemented, yes. Optimized, no. But 4 seconds seems... excessive.
<alyssa> on an M1 Mini at 4K, `grim` in sway takes 0.34s
<alyssa> (granted this on a release build of agx/next but it should be similar on what lina ships)
<handlerug> huh. my machine is MacBook M1 Max 16", same grim + sway setup
<alyssa> What does top show during that 4s?
<alyssa> 100% cpu usage? 3% cpu? very different implications :p
<handlerug> the compositor freezes so I'm not sure I will be able to see what top shows :P
<alyssa> ahah right
<handlerug> lemme ssh in
<alyssa> I don't think I have anything interesting in agx/next right now
<alyssa> lina: shit!
<alyssa> lina: asahi/mesa needs f5dd101ab69 ("asahi: Set flatshading controls appropriately")
<alyssa> that'll probably fix Darwinia
<alyssa> I thought that was already merged upstream >.>
<alyssa> and e643d9b6937 ("asahi: Identify XML for more flatshading controls")
<alyssa> and 2a094ba5b8d ("asahi,agx,nir: Implement depth and stencil export") for good measure
<alyssa> apparently I do have interesting things in agx/next that I forgot about
<handlerug> alyssa: "grim 1.00s user 0.01s system 36% cpu 2.787 total", and sway at 100% CPU
bcrumb has joined #asahi-gpu
<handlerug> that's when screenshotting the game Celeste. when screenshotting the foot terminal emulator (which uses CPU for rendering), grim takes 1.7s, but it's still quite a lot imo
renatorabelo has joined #asahi-gpu
bluetail has quit [Read error: Connection reset by peer]
bluetail has joined #asahi-gpu
bluetail has quit [Read error: Connection reset by peer]
bluetail has joined #asahi-gpu
Gaspare has joined #asahi-gpu
bcrumb has quit [Quit: WeeChat 3.7.1]
bcrumb has joined #asahi-gpu
<alyssa> handlerug: yes that's indeed excessive
<alyssa> Can you install `sysprof`, run `sysprof-cli screenshot.trace`, run grim (with Celeste or foot), ctrl-c out of `sysprof-cli` and send me the resulting screenshot.trace file?
<alyssa> (You can preview it with `sysprof screenshot.trace`)
bcrumb has quit [Quit: WeeChat 3.7.1]
Gaspare has quit [Quit: Gaspare]
<handlerug> sent in private messages
bcrumb has joined #asahi-gpu
bluetail has quit [Ping timeout: 480 seconds]
bcrumb has quit [Quit: WeeChat 3.7.1]
<alyssa> handlerug: Okay, I've looked at the trace and reproduced similar issues locally
<alyssa> There are three slowdowns here:
<alyssa> 1. Format conversion. Because we don't prefer blit-based texture transfer*, mesa/st falls back to software conversion. This would be more efficiently done on the GPU, especially once we have framebuffer compression with the DCP (we don't yet).
<alyssa> Solution: prefer blit-based texture transfer. Most texture transfers will use a staging blit internally anyway for compression, or worse tile on the CPU, so this is probably good in practice.
<alyssa> 2. Read from write-combine memory (i.e. uncached) from the CPU. The framebuffer, as well as the staging resource used with the blit, are both write-combine which makes readback from the CPU very slow.
<alyssa> Solution: allocate the staging resource as writeback (instead of write-combine), since we know that we will be reading from the CPU. The copy from framebuffer->staging happens on the GPU so the cache is ok there.
<alyssa> 3. deflate is slow. With #1 and #2 solved, depending on the content grim still spends inordinate amounts of time in deflate() in zlib internal to libpng. That's not our fault, and with #1 and #2 fixed, using `grim -t ppm` instead is generally quite fast.
<alyssa> Solution: use JPEG with the JPEG hardware encoder.
<alyssa> https://rosenzweig.io/wip.diff -- this enables blit based compression (solution for #1) and disables write caching (very blunt hammer to workaround #2, the proper fix requires plumbing some flags around the driver which isn't hard but needs some care)
<alyssa> https://github.com/zlib-ng/zlib-ng is probably worth a try for #3
cr1901_ has joined #asahi-gpu
amarioguy has quit [Remote host closed the connection]
cr1901 has quit [Read error: Connection reset by peer]
renato has joined #asahi-gpu
amarioguy has joined #asahi-gpu
<alyssa> more to the point -- upstream zlib doesn't have NEON optimizations which will penalize png compression on arm systems
<alyssa> I don't know enough about the various zlib forks to recommend one, it looks like Chromium is carrying a pile of patches of their own
<alyssa> (including NEON support)
* handlerug is building mesa to test the patch
<alyssa> not sure I recommend using that patch in general since it's had zero testing other than "grim is faster now" :p
<alyssa> but YMMV :p
<handlerug> eh I thought I'd see if it also worked on this machine
<alyssa> sure
<handlerug> weird how it's much faster on mac mini
<handlerug> thanks for the writeup and research!
<alyssa> what's weird about that?
renatorabelo has quit [Ping timeout: 480 seconds]
<alyssa> It's unclear whether blit-based texture transfer should be preferred on integrated GPUs... the gallium docs say blit-based for discrete GPUs and transfer-based for software rasterizers...
<alyssa> iris uses blit-based everywhere
<handlerug> I am not sure what is the key difference between M1 and M1 Max that makes it faster
<alyssa> v3d uses blit based and says they don't want to for perf for glReadPixels because of the uncached read after.. hm
<alyssa> zink uses blit, except for anv/radv where it uses compute
<alyssa> freedreno uses transfer
swaggie has joined #asahi-gpu
<alyssa> I think I want to map PIPE_USAGE_STAGING to writeback and everything else to write-combine and that should do what we want (including for readpix)
bcrumb has joined #asahi-gpu
bcrumb has quit [Quit: WeeChat 3.7.1]
bcrumb has joined #asahi-gpu
bcrumb has quit []
dorkbutt has joined #asahi-gpu
dorkbutt has quit [Max SendQ exceeded]
swaggie has quit [Remote host closed the connection]
swaggie has joined #asahi-gpu
cr1901_ is now known as cr1901
alyssa has left #asahi-gpu [#asahi-gpu]
renatorabelo has joined #asahi-gpu
javier_varez__ has joined #asahi-gpu
zzywysm_ has joined #asahi-gpu
jbowen_ has joined #asahi-gpu
Method_ has joined #asahi-gpu
jbowen has quit [Ping timeout: 480 seconds]
zzywysm has quit [Read error: Connection reset by peer]
digicyc has quit [Remote host closed the connection]
javier_varez_ has quit [Ping timeout: 480 seconds]
javier_varez__ is now known as javier_varez_
jbowen_ has quit []
digicyc has joined #asahi-gpu
jbowen_ has joined #asahi-gpu
renato has quit [resistance.oftc.net larich.oftc.net]
Method has quit [resistance.oftc.net larich.oftc.net]
ids1024 has quit [resistance.oftc.net larich.oftc.net]
jbowen_ has quit [resistance.oftc.net larich.oftc.net]
ids1024 has joined #asahi-gpu
renato has joined #asahi-gpu
jbowen_ has joined #asahi-gpu
Method has joined #asahi-gpu
Method has quit [Ping timeout: 482 seconds]
renato has quit [Ping timeout: 482 seconds]
SSJ_GZ has quit [Ping timeout: 480 seconds]
ajxu2 has joined #asahi-gpu
ajxu2 has quit []
yuka has quit [Remote host closed the connection]
yuka has joined #asahi-gpu
manawyrm has quit [Quit: Read error: 2.99792458 x 10^8 meters/second (Excessive speed of light)]
manawyrm has joined #asahi-gpu