ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
possiblemeatball has joined #asahi-gpu
alyssa has joined #asahi-gpu
<alyssa> lina: I've squashed+rebased your explicit sync patches
<alyssa> (the mesa ones, obviously didn't touch the kernel)
<alyssa> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21538 has the subset that I think can be upstreamed
<alyssa> and agx/next has everything integratino
<alyssa> However, there are some regressions
<alyssa> 1. Even with ASAHI_MESA_DEBUG=sync, there are serious sync glitches with Firefox, at least when WebRender is used (as is the case on agx/next, didn't test without). This is a WSI issue: it happens in GNOME but not in Sway. It might also have to do with Xwayland vs native wayland.
<alyssa> At least with (explicit sync + Firefox + WebRender + GNOME), the rendered content is sometimes from a previous frame instead of a current frame, or at least partially so
<alyssa> The most obvious symptom is typing into some input fields and not seeing the most recently typed character
<alyssa> This is especially noticeable when typing into chat in discord.com
<alyssa> There are other visual artefacts with Firefox in GNOME, but I assume they have the same root cause
<alyssa> (and the glitching when typing is especially noticeable so probably easier to work with)
<alyssa> I'm unsure if {Plasma, Firefox without WebRender} are affected but IMO this regression is blocking even if it's not ... and unless this is a Firefox bug it seems likely it can be hit with things other than Firefox, this is just what I noticed on day #1 of testing
<alyssa> 2. For some reason one of the apitraces I have runs fine (and much faster I think!) in debugoptimized, but faults in release. I have not yet root caused this one. This is not blocking.
<alyssa> Once the Firefox regression is sorted, asusming no other regressions come up this week, I'll cut a new asahi/main release on top of the explicit sync UAPI and that should be good to go then
hightower2 has quit [Ping timeout: 480 seconds]
<alyssa> hm wait the apitrace might be from my sysval rewrite being broken
<alyssa> scratch #2
nela has quit [Ping timeout: 480 seconds]
<alyssa> can't reproduce now
<alyssa> maybe I screwed something up
<alyssa> oh well
<alyssa> Also, I am not convinced that it's necessary to sync in flush_resource
<alyssa> freedreno + explicit sync doesn't seem to do anything for flush resource
<alyssa> this is worth another 6% on manhattan
<alyssa> i think
<alyssa> oh but holy broken
<alyssa> so what other fence is freedreno using that we're not
* alyssa doesn't trust any of the fencing code
nela has joined #asahi-gpu
<alyssa> I'm especially suspicious of agx_fence_create
<alyssa> and all this ctx->syncobj stuff doesn't make much sense to me in an explicit sync world
<alyssa> maybe if I fix that, firefox will fix itself
<alyssa> this stuff is way too complicated
<alyssa> lina: Potential race, what happens if we flush (but do not sync) a batch writing resource A?
<alyssa> before that batch is completed, there is a second batch that reads resource A that gets flushed
<alyssa> Is there a possibility that the second batch reads the wrong contents of A?
<alyssa> It's submitted in the proper order but I don't see any fence/barrier between the two batches (i.e. in agx_flush_writer for batches that are submitted and not active)
<alyssa> honestly I don't understand how any of agx_flush/agx_fence_create work
<alyssa> and every driver I look at does something different and the docs are pretty vague
<alyssa> freedreno is probably closest to what we want, but
akspecs_ has joined #asahi-gpu
tertu2 has joined #asahi-gpu
nepeat_ has joined #asahi-gpu
codingkoopa3 has joined #asahi-gpu
Lightsword_ has joined #asahi-gpu
Z750 has quit [resistance.oftc.net larich.oftc.net]
djorz has quit [resistance.oftc.net larich.oftc.net]
tertu has quit [resistance.oftc.net larich.oftc.net]
akspecs has quit [resistance.oftc.net larich.oftc.net]
nepeat has quit [resistance.oftc.net larich.oftc.net]
wicastC has quit [resistance.oftc.net larich.oftc.net]
alyssa has quit [resistance.oftc.net larich.oftc.net]
hxliew has quit [resistance.oftc.net larich.oftc.net]
Lightsword has quit [resistance.oftc.net larich.oftc.net]
codingkoopa has quit [resistance.oftc.net larich.oftc.net]
hxliew has joined #asahi-gpu
wicastC has joined #asahi-gpu
Z750 has joined #asahi-gpu
alyssa has joined #asahi-gpu
<lina> alyssa: All batches are submitted with BARRIER_RENDER | BARRIER_COMPUTE, which tells the backend code to insert a fence on the last render and compute commands submitted (if any - the kernel will elide that if it already got a completion notification and it knows the GPU queue is idle, since at that point it doesn't even have event IDs attached so there is nothing to fence on)
<lina> On the other hand, we could elide that when we know we do *not* have such a dependency, which I was thinking is the case whenever we submit batches back to back (e.g. whenever we flush_all and there is more than one batch to flush, I think we can avoid that barrier for all but the first, since they're guaranteed not to have any dependencies, otherwise they would have been flushed already), but right now
<lina> we don't
<lina> This is the "currently vertex and fragment are serialized" issue (if you remove those barriers, vertex for batch+1 can run concurrent with fragment for batch)
<lina> Of course this is all cache issues aside. If we run into cache issues between back to back submissions themselves (like the CPU->GPU one I ran into) we probably need to set more bits in that VDM Barrier command...
<lina> I still don't know exactly what caches the firmware flushes, when, and how to control any of that. We still have at least 3 mystery bits of freedom in the submission commands.
<lina> As for the WSI, my understanding is that that ctx->syncobj is supposed to just track the last submitted batch, so that when WSI wants a barrier on "all past work" we just clone that.
<lina> But there is the whole attaching fences to dma-bufs story that I haven't looked at at all... I don't know how the mesa hooks for that work, if it isn't all taken care of by gallium already based on the existing fence support...
<alyssa> right.. ok
<alyssa> I'm not worried about optimizing those barriers(yet)
<alyssa> I am worried about the WSI story
<alyssa> and the firefox wonkiness suggests that not all is right in oz
<lina> Let me look for the DMA-BUF stuff...
<lina> Fingers crossed it's a magic cap we set and mesa does it for us ^^
<alyssa> hahahaha :(
<lina> I get the feeling what we're doing now is *sufficient* for correct WSI but may not be what is actually expected of us...
<lina> alyssa: Good news: it's in common WSI code! Bad news: it's in common *Vulkan* WSI code. ^^;;
<alyssa> Yeah, that's what I expected
<lina> Okay, now where do I plug this in for us...
kesslerd_ has joined #asahi-gpu
djorz has joined #asahi-gpu
<alyssa> We might be the first 100% explicit sync gallium driver
<alyssa> that's kinda neat :p
<lina> So do, like, other pure explicit sync drivers not exist? Because that is the only usage of this API anywhere in mesa...
<alyssa> No, I don't think they do
<lina> Oof...
<alyssa> I mean
<alyssa> It wasn't been possible to have a pure explicit sync Linux (not Android) driver before the ioctls
<alyssa> and those landed this year
<alyssa> so any driver older than that necessarily has an implicit sync path for WSI
<lina> Right...
<alyssa> so it's just us and powervr
<alyssa> and they're not bothering with gl
<alyssa> and also are not a driver yet
<alyssa> I see DMA_BUF_IOCTL_EXPORT_SYNC_FILE and DMA_BUF_IOCTL_IMPORT_SYNC_FILE in there for their gallium driver
<lina> Yay! ^^
cylm has joined #asahi-gpu
possiblemeatball has quit [Quit: Quit]
kesslerd has quit [Remote host closed the connection]
kesslerd_ has quit [Remote host closed the connection]
kesslerd has joined #asahi-gpu
kesslerd has quit [Remote host closed the connection]
<lina> alyssa: Fixed the Firefox issue, it turns out we need to make resources shareable in flush_resource... so now it has to blit if not already shareable.
<lina> Also the ail stuff for multisampling compression is wrong, I threw in a random guess but that needs tests. That fixed WebGL Aquarium.
<lina> I also added some nice debugging stuff along the way and more asserts ^^
<lina> Wooooah I don't know if this is me removing the sync in flush_resource or something you did, but glmark2 just jumped up a ton. Like, from 4500 to 6600 (!!)
<lina> Wait and that's a debug build
<lina> Now hitting over 9000 FPS on some tests...
<lina> That's on GBM, a bit less on Wayland but the output looks legit... I don't think it's fast because it's broken...
<lina> ^^
<lina> Release build GBM score: 7252 ^^
bisko has joined #asahi-gpu
wixde has joined #asahi-gpu
bisko has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
nyilas has joined #asahi-gpu
chipxxx has joined #asahi-gpu
chip_x has quit [Ping timeout: 480 seconds]
chip_x has joined #asahi-gpu
chip__ has joined #asahi-gpu
chip_x has quit [Remote host closed the connection]
chipxxx has quit [Ping timeout: 480 seconds]
hightower2 has joined #asahi-gpu
wixde has quit [Ping timeout: 480 seconds]
ChaosPrincess has quit [Quit: WeeChat 3.8]
ChaosPrincess has joined #asahi-gpu
chip_x has joined #asahi-gpu
chip__ has quit [Ping timeout: 480 seconds]
possiblemeatball has joined #asahi-gpu
kesslerd has joined #asahi-gpu
kesslerd has quit []
chipxxx has joined #asahi-gpu
maria6 has joined #asahi-gpu
maria has quit [Ping timeout: 480 seconds]
maria6 is now known as maria
chip_x has quit [Ping timeout: 480 seconds]
chipxxx has quit [Remote host closed the connection]
chipxxx has joined #asahi-gpu
wixde has joined #asahi-gpu
Cromulent has joined #asahi-gpu
<lina> I still get some weird hangs though... I think something is still broken, probably in WSI ^^;;;
<lina> But at least xonotic now regularly reaches the default 250fps cap!
le0n has quit [Remote host closed the connection]
le0n has joined #asahi-gpu
bisko has joined #asahi-gpu
bisko has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
kesslerd has joined #asahi-gpu
kesslerd has quit []
bisko has joined #asahi-gpu
wixde has quit [Ping timeout: 480 seconds]
bisko has quit [Ping timeout: 480 seconds]
kesslerd has joined #asahi-gpu
wixde has joined #asahi-gpu
chip_x has joined #asahi-gpu
chipxxx has quit [Ping timeout: 480 seconds]
bluetail has quit [Quit: The Lounge - https://thelounge.chat]
bisko has joined #asahi-gpu
cy8aer has quit [Remote host closed the connection]
bisko has quit [Ping timeout: 480 seconds]
bluetail has joined #asahi-gpu
cylm has quit [Ping timeout: 480 seconds]
<alyssa> lina: Sounds like good progress :)
<alyssa> afaict there are a lot of multisampling issues
<alyssa> and I'm still not convinced there's not a register or two we're not yet exposing in the uapi so I was going to leave it to you to work through the deqps when you had some time
<alyssa> and glmark hangs my system now
<alyssa> so gonna say something regressed
<alyssa> but yknow
<alyssa> 2 steps fwd 1 step back
<alyssa> what the heck
<alyssa> no faults so what's up. an OOM maybe? idk
<alyssa> nothing in dmesg at the time of the hang, bizarre
<alyssa> only happening in gnome maybe
<alyssa> trex 225fps now
<alyssa> mh 85fps
<alyssa> so that's definitely better
<alyssa> worth another few fps on stk
<alyssa> btw the flush in agx_fence_create is redundant with the one in its only caller
<alyssa> that only interaction still makes me confused tbh
<alyssa> fixing MAX_BATCHES brings mh up to 125fps
<alyssa> ooh and now t-rex is faulting that's neat
<alyssa> or hanging the system
<alyssa> yeah i'm going to say this isn't quite ready yet :)
<alyssa> hopefully the same root cause as your weird hangs
<alyssa> few more fps on stk
<alyssa> getting a lot of random pink rectangles.
Cromulent has quit [Quit: Connection closed for inactivity]
possiblemeatball has quit [Quit: Quit]
bisko has joined #asahi-gpu
possiblemeatball has joined #asahi-gpu
bisko has quit [Ping timeout: 480 seconds]
Dementor has quit [Read error: Connection reset by peer]
Dementor has joined #asahi-gpu
bisko has joined #asahi-gpu
bisko has quit [Ping timeout: 480 seconds]
bisko has joined #asahi-gpu
bisko has quit [Ping timeout: 480 seconds]
nyilas has quit [Remote host closed the connection]
bisko has joined #asahi-gpu
wixde has quit [Read error: Connection reset by peer]
bisko has quit [Ping timeout: 480 seconds]
chipxxx has joined #asahi-gpu
bluetail has quit [Quit: The Lounge - https://thelounge.chat]
chip_x has quit [Ping timeout: 480 seconds]
chipxxx has quit [Remote host closed the connection]
chipxxx has joined #asahi-gpu
pbsds has quit [Quit: The Lounge - https://thelounge.chat]
pbsds has joined #asahi-gpu
bluetail has joined #asahi-gpu
FLHerne_ has joined #asahi-gpu
grange_c68 has joined #asahi-gpu
JoshuaAshton has quit [Ping timeout: 480 seconds]
qdot has quit [Read error: Connection reset by peer]
qdot has joined #asahi-gpu
akemin_dayo has quit [Remote host closed the connection]
JoshuaAshton has joined #asahi-gpu
FLHerne has quit [Read error: Connection reset by peer]
FLHerne_ is now known as FLHerne
grange_c6 has quit [Write error: connection closed]
grange_c68 is now known as grange_c6
akemin_dayo has joined #asahi-gpu
<alyssa> lina: [ 157.505449] apple-mailbox 206408000.mbox: Try increasing MBOX_TX_QUEUE_LEN
<alyssa> [ 157.512331] asahi: WorkQueue: Job::submit() out of order (submit_seq 22491 != 22493)
<alyssa> that doesn't sound great
<alyssa> MESA: warning: [Batch 8] Render (pending): TVB 0/ 0 bytes (0 ovf) | vtx 0.000000 frag 0.000000
<alyssa> Thrive: ../src/gallium/drivers/asahi/agx_batch.c:228: agx_batch_print_stats: Assertion `info->status == DRM_ASAHI_STATUS_COMPLETE' failed.
<alyssa> this is from running
<alyssa> $ MESA_GL_VERSION_OVERRIDE=3.3 ASAHI_MESA_DEBUG=deqp DISPLAY=:0 apitrace replay --loop=1000 ~/Downloads/godot-thrive.trace
<alyssa> and branch agx/explicit-sync-wsi-re
<alyssa> with =sync, it works but with lower perf
<sven> congrats, sounds like you also broke mailbox :D
<alyssa> sven: Woo!
<sven> the only other time I managed to run into that “increase MBOX_TX_QUEUE_LEN” was with DCP
<alyssa> yeet
yrlf has quit [Quit: The Lounge - https://thelounge.chat]
yrlf has joined #asahi-gpu
rowanG337 has joined #asahi-gpu
possiblemeatball has quit [Quit: Quit]
stipa is now known as Guest6002
stipa has joined #asahi-gpu
Guest6002 has quit [Ping timeout: 480 seconds]