#asahi-gpu on 2023-02-26 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:46 ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu

00:04 possiblemeatball has joined #asahi-gpu

00:32 alyssa has joined #asahi-gpu

00:32 <alyssa> lina: I've squashed+rebased your explicit sync patches

00:32 <alyssa> (the mesa ones, obviously didn't touch the kernel)

00:39 <alyssa> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21538 has the subset that I think can be upstreamed

00:40 <alyssa> and agx/next has everything integratino

00:40 <alyssa> However, there are some regressions

00:42 <alyssa> 1. Even with ASAHI_MESA_DEBUG=sync, there are serious sync glitches with Firefox, at least when WebRender is used (as is the case on agx/next, didn't test without). This is a WSI issue: it happens in GNOME but not in Sway. It might also have to do with Xwayland vs native wayland.

00:43 <alyssa> At least with (explicit sync + Firefox + WebRender + GNOME), the rendered content is sometimes from a previous frame instead of a current frame, or at least partially so

00:43 <alyssa> The most obvious symptom is typing into some input fields and not seeing the most recently typed character

00:44 <alyssa> This is especially noticeable when typing into chat in discord.com

00:44 <alyssa> There are other visual artefacts with Firefox in GNOME, but I assume they have the same root cause

00:45 <alyssa> (and the glitching when typing is especially noticeable so probably easier to work with)

00:46 <alyssa> I'm unsure if {Plasma, Firefox without WebRender} are affected but IMO this regression is blocking even if it's not ... and unless this is a Firefox bug it seems likely it can be hit with things other than Firefox, this is just what I noticed on day #1 of testing

00:47 <alyssa> 2. For some reason one of the apitraces I have runs fine (and much faster I think!) in debugoptimized, but faults in release. I have not yet root caused this one. This is not blocking.

00:49 <alyssa> Once the Firefox regression is sorted, asusming no other regressions come up this week, I'll cut a new asahi/main release on top of the explicit sync UAPI and that should be good to go then

00:58 hightower2 has quit [Ping timeout: 480 seconds]

00:59 <alyssa> hm wait the apitrace might be from my sysval rewrite being broken

00:59 <alyssa> scratch #2

01:16 nela has quit [Ping timeout: 480 seconds]

01:19 <alyssa> can't reproduce now

01:19 <alyssa> maybe I screwed something up

01:19 <alyssa> oh well

01:26 <alyssa> Also, I am not convinced that it's necessary to sync in flush_resource

01:26 <alyssa> freedreno + explicit sync doesn't seem to do anything for flush resource

01:26 <alyssa> this is worth another 6% on manhattan

01:26 <alyssa> i think

01:26 <alyssa> oh but holy broken

01:27 <alyssa> so what other fence is freedreno using that we're not

01:36 * alyssa doesn't trust any of the fencing code

01:36 nela has joined #asahi-gpu

01:38 <alyssa> I'm especially suspicious of agx_fence_create

01:39 <alyssa> and all this ctx->syncobj stuff doesn't make much sense to me in an explicit sync world

01:39 <alyssa> maybe if I fix that, firefox will fix itself

01:41 <alyssa> this stuff is way too complicated

01:52 <alyssa> lina: Potential race, what happens if we flush (but do not sync) a batch writing resource A?

01:53 <alyssa> before that batch is completed, there is a second batch that reads resource A that gets flushed

01:53 <alyssa> Is there a possibility that the second batch reads the wrong contents of A?

01:54 <alyssa> It's submitted in the proper order but I don't see any fence/barrier between the two batches (i.e. in agx_flush_writer for batches that are submitted and not active)

01:59 <alyssa> honestly I don't understand how any of agx_flush/agx_fence_create work

01:59 <alyssa> and every driver I look at does something different and the docs are pretty vague

01:59 <alyssa> freedreno is probably closest to what we want, but

02:03 akspecs_ has joined #asahi-gpu

02:04 tertu2 has joined #asahi-gpu

02:04 nepeat_ has joined #asahi-gpu

02:05 codingkoopa3 has joined #asahi-gpu

02:06 Lightsword_ has joined #asahi-gpu

02:07 Z750 has quit [resistance.oftc.net larich.oftc.net]

02:07 djorz has quit [resistance.oftc.net larich.oftc.net]

02:07 tertu has quit [resistance.oftc.net larich.oftc.net]

02:07 akspecs has quit [resistance.oftc.net larich.oftc.net]

02:07 nepeat has quit [resistance.oftc.net larich.oftc.net]

02:07 wicastC has quit [resistance.oftc.net larich.oftc.net]

02:07 alyssa has quit [resistance.oftc.net larich.oftc.net]

02:07 hxliew has quit [resistance.oftc.net larich.oftc.net]

02:07 Lightsword has quit [resistance.oftc.net larich.oftc.net]

02:07 codingkoopa has quit [resistance.oftc.net larich.oftc.net]

02:07 hxliew has joined #asahi-gpu

02:08 wicastC has joined #asahi-gpu

02:10 Z750 has joined #asahi-gpu

02:11 alyssa has joined #asahi-gpu

02:14 <lina> alyssa: All batches are submitted with BARRIER_RENDER | BARRIER_COMPUTE, which tells the backend code to insert a fence on the last render and compute commands submitted (if any - the kernel will elide that if it already got a completion notification and it knows the GPU queue is idle, since at that point it doesn't even have event IDs attached so there is nothing to fence on)

02:15 <lina> On the other hand, we could elide that when we know we do *not* have such a dependency, which I was thinking is the case whenever we submit batches back to back (e.g. whenever we flush_all and there is more than one batch to flush, I think we can avoid that barrier for all but the first, since they're guaranteed not to have any dependencies, otherwise they would have been flushed already), but right now

02:15 <lina> we don't

02:16 <lina> This is the "currently vertex and fragment are serialized" issue (if you remove those barriers, vertex for batch+1 can run concurrent with fragment for batch)

02:17 <lina> Of course this is all cache issues aside. If we run into cache issues between back to back submissions themselves (like the CPU->GPU one I ran into) we probably need to set more bits in that VDM Barrier command...

02:17 <lina> I still don't know exactly what caches the firmware flushes, when, and how to control any of that. We still have at least 3 mystery bits of freedom in the submission commands.

02:19 <lina> As for the WSI, my understanding is that that ctx->syncobj is supposed to just track the last submitted batch, so that when WSI wants a barrier on "all past work" we just clone that.

02:19 <lina> But there is the whole attaching fences to dma-bufs story that I haven't looked at at all... I don't know how the mesa hooks for that work, if it isn't all taken care of by gallium already based on the existing fence support...

02:20 <alyssa> right.. ok

02:20 <alyssa> I'm not worried about optimizing those barriers(yet)

02:20 <alyssa> I am worried about the WSI story

02:21 <alyssa> and the firefox wonkiness suggests that not all is right in oz

02:21 <lina> Let me look for the DMA-BUF stuff...

02:21 <lina> Fingers crossed it's a magic cap we set and mesa does it for us ^^

02:22 <alyssa> hahahaha :(

02:22 <lina> I get the feeling what we're doing now is *sufficient* for correct WSI but may not be what is actually expected of us...

02:24 <lina> alyssa: Good news: it's in common WSI code! Bad news: it's in common *Vulkan* WSI code. ^^;;

02:24 <alyssa> Yeah, that's what I expected

02:24 <lina> Okay, now where do I plug this in for us...

02:24 kesslerd_ has joined #asahi-gpu

02:24 djorz has joined #asahi-gpu

02:25 <alyssa> We might be the first 100% explicit sync gallium driver

02:25 <alyssa> that's kinda neat :p

02:25 <lina> So do, like, other pure explicit sync drivers not exist? Because that is the only usage of this API anywhere in mesa...

02:25 <alyssa> No, I don't think they do

02:25 <lina> Oof...

02:25 <alyssa> I mean

02:26 <alyssa> It wasn't been possible to have a pure explicit sync Linux (not Android) driver before the ioctls

02:26 <alyssa> and those landed this year

02:26 <alyssa> so any driver older than that necessarily has an implicit sync path for WSI

02:26 <lina> Right...

02:27 <alyssa> so it's just us and powervr

02:27 <alyssa> and they're not bothering with gl

02:27 <alyssa> and also are not a driver yet

02:27 <alyssa> lina: Good news for you: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20418.patch

02:28 <alyssa> I see DMA_BUF_IOCTL_EXPORT_SYNC_FILE and DMA_BUF_IOCTL_IMPORT_SYNC_FILE in there for their gallium driver

02:28 <lina> Yay! ^^

03:40 cylm has joined #asahi-gpu

05:05 possiblemeatball has quit [Quit: Quit]

05:17 kesslerd has quit [Remote host closed the connection]

05:18 kesslerd_ has quit [Remote host closed the connection]

05:19 kesslerd has joined #asahi-gpu

05:34 kesslerd has quit [Remote host closed the connection]

07:09 <lina> alyssa: Fixed the Firefox issue, it turns out we need to make resources shareable in flush_resource... so now it has to blit if not already shareable.

07:09 <lina> Also the ail stuff for multisampling compression is wrong, I threw in a random guess but that needs tests. That fixed WebGL Aquarium.

07:10 <lina> I also added some nice debugging stuff along the way and more asserts ^^

07:16 <lina> Wooooah I don't know if this is me removing the sync in flush_resource or something you did, but glmark2 just jumped up a ton. Like, from 4500 to 6600 (!!)

07:17 <lina> Wait and that's a debug build

07:18 <lina> Now hitting over 9000 FPS on some tests...

07:21 <lina> That's on GBM, a bit less on Wayland but the output looks legit... I don't think it's fast because it's broken...

07:21 <lina> ^^

07:29 <lina> Release build GBM score: 7252 ^^

09:48 bisko has joined #asahi-gpu

10:08 wixde has joined #asahi-gpu

10:25 bisko has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

10:42 nyilas has joined #asahi-gpu

10:45 chipxxx has joined #asahi-gpu

10:50 chip_x has quit [Ping timeout: 480 seconds]

10:54 chip_x has joined #asahi-gpu

10:57 chip__ has joined #asahi-gpu

10:58 chip_x has quit [Remote host closed the connection]

11:00 chipxxx has quit [Ping timeout: 480 seconds]

11:38 hightower2 has joined #asahi-gpu

11:42 wixde has quit [Ping timeout: 480 seconds]

11:45 ChaosPrincess has quit [Quit: WeeChat 3.8]

11:46 ChaosPrincess has joined #asahi-gpu

11:46 chip_x has joined #asahi-gpu

11:52 chip__ has quit [Ping timeout: 480 seconds]

13:11 possiblemeatball has joined #asahi-gpu

13:26 kesslerd has joined #asahi-gpu

13:28 kesslerd has quit []

13:35 chipxxx has joined #asahi-gpu

13:39 maria6 has joined #asahi-gpu

13:39 maria has quit [Ping timeout: 480 seconds]

13:39 maria6 is now known as maria

13:40 chip_x has quit [Ping timeout: 480 seconds]

14:37 chipxxx has quit [Remote host closed the connection]

14:37 chipxxx has joined #asahi-gpu

14:40 wixde has joined #asahi-gpu

14:44 Cromulent has joined #asahi-gpu

15:15 <lina> I still get some weird hangs though... I think something is still broken, probably in WSI ^^;;;

15:16 <lina> But at least xonotic now regularly reaches the default 250fps cap!

15:20 le0n has quit [Remote host closed the connection]

15:21 le0n has joined #asahi-gpu

15:35 bisko has joined #asahi-gpu

15:42 bisko has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

15:43 kesslerd has joined #asahi-gpu

15:44 kesslerd has quit []

15:47 bisko has joined #asahi-gpu

15:53 wixde has quit [Ping timeout: 480 seconds]

15:55 bisko has quit [Ping timeout: 480 seconds]

15:55 kesslerd has joined #asahi-gpu

15:56 wixde has joined #asahi-gpu

16:08 chip_x has joined #asahi-gpu

16:14 chipxxx has quit [Ping timeout: 480 seconds]

16:53 bluetail has quit [Quit: The Lounge - https://thelounge.chat]

16:56 bisko has joined #asahi-gpu

16:59 cy8aer has quit [Remote host closed the connection]

17:04 bisko has quit [Ping timeout: 480 seconds]

17:11 bluetail has joined #asahi-gpu

17:13 cylm has quit [Ping timeout: 480 seconds]

17:17 <alyssa> lina: Sounds like good progress :)

17:17 <alyssa> afaict there are a lot of multisampling issues

17:17 <alyssa> and I'm still not convinced there's not a register or two we're not yet exposing in the uapi so I was going to leave it to you to work through the deqps when you had some time

17:39 <alyssa> and glmark hangs my system now

17:39 <alyssa> so gonna say something regressed

17:40 <alyssa> but yknow

17:40 <alyssa> 2 steps fwd 1 step back

17:41 <alyssa> what the heck

17:41 <alyssa> no faults so what's up. an OOM maybe? idk

17:42 <alyssa> nothing in dmesg at the time of the hang, bizarre

17:43 <alyssa> only happening in gnome maybe

17:44 <alyssa> trex 225fps now

17:45 <alyssa> mh 85fps

17:45 <alyssa> so that's definitely better

17:47 <alyssa> worth another few fps on stk

17:51 <alyssa> btw the flush in agx_fence_create is redundant with the one in its only caller

17:51 <alyssa> that only interaction still makes me confused tbh

17:54 <alyssa> fixing MAX_BATCHES brings mh up to 125fps

17:56 <alyssa> ooh and now t-rex is faulting that's neat

17:56 <alyssa> or hanging the system

17:56 <alyssa> yeah i'm going to say this isn't quite ready yet :)

17:57 <alyssa> hopefully the same root cause as your weird hangs

18:03 <alyssa> few more fps on stk

18:04 <alyssa> getting a lot of random pink rectangles.

18:31 Cromulent has quit [Quit: Connection closed for inactivity]

18:34 possiblemeatball has quit [Quit: Quit]

19:04 bisko has joined #asahi-gpu

19:08 possiblemeatball has joined #asahi-gpu

19:13 bisko has quit [Ping timeout: 480 seconds]

19:35 Dementor has quit [Read error: Connection reset by peer]

19:35 Dementor has joined #asahi-gpu

20:01 bisko has joined #asahi-gpu

20:09 bisko has quit [Ping timeout: 480 seconds]

20:18 bisko has joined #asahi-gpu

20:26 bisko has quit [Ping timeout: 480 seconds]

20:49 nyilas has quit [Remote host closed the connection]

20:52 bisko has joined #asahi-gpu

20:58 wixde has quit [Read error: Connection reset by peer]

21:00 bisko has quit [Ping timeout: 480 seconds]

21:18 chipxxx has joined #asahi-gpu

21:23 bluetail has quit [Quit: The Lounge - https://thelounge.chat]

21:24 chip_x has quit [Ping timeout: 480 seconds]

21:32 chipxxx has quit [Remote host closed the connection]

21:32 chipxxx has joined #asahi-gpu

21:42 pbsds has quit [Quit: The Lounge - https://thelounge.chat]

21:43 pbsds has joined #asahi-gpu

21:58 bluetail has joined #asahi-gpu

22:04 FLHerne_ has joined #asahi-gpu

22:05 grange_c68 has joined #asahi-gpu

22:05 JoshuaAshton has quit [Ping timeout: 480 seconds]

22:05 qdot has quit [Read error: Connection reset by peer]

22:05 qdot has joined #asahi-gpu

22:05 akemin_dayo has quit [Remote host closed the connection]

22:05 JoshuaAshton has joined #asahi-gpu

22:05 FLHerne has quit [Read error: Connection reset by peer]

22:06 FLHerne_ is now known as FLHerne

22:06 grange_c6 has quit [Write error: connection closed]

22:06 grange_c68 is now known as grange_c6

22:06 akemin_dayo has joined #asahi-gpu

22:17 <alyssa> lina: [ 157.505449] apple-mailbox 206408000.mbox: Try increasing MBOX_TX_QUEUE_LEN

22:17 <alyssa> [ 157.512331] asahi: WorkQueue: Job::submit() out of order (submit_seq 22491 != 22493)

22:17 <alyssa> that doesn't sound great

22:17 <alyssa> MESA: warning: [Batch 8] Render (pending): TVB 0/ 0 bytes (0 ovf) | vtx 0.000000 frag 0.000000

22:17 <alyssa> Thrive: ../src/gallium/drivers/asahi/agx_batch.c:228: agx_batch_print_stats: Assertion `info->status == DRM_ASAHI_STATUS_COMPLETE' failed.

22:17 <alyssa> this is from running

22:17 <alyssa> $ MESA_GL_VERSION_OVERRIDE=3.3 ASAHI_MESA_DEBUG=deqp DISPLAY=:0 apitrace replay --loop=1000 ~/Downloads/godot-thrive.trace

22:17 <alyssa> with the trace here https://gitlab.freedesktop.org/gfx-ci/tracie/traces-db/-/blob/master/godot/godot-thrive.trace

22:18 <alyssa> and branch agx/explicit-sync-wsi-re

22:19 <alyssa> with =sync, it works but with lower perf

22:21 <sven> congrats, sounds like you also broke mailbox :D

22:21 <alyssa> sven: Woo!

22:22 <sven> the only other time I managed to run into that “increase MBOX_TX_QUEUE_LEN” was with DCP

22:47 <alyssa> yeet

23:00 yrlf has quit [Quit: The Lounge - https://thelounge.chat]

23:01 yrlf has joined #asahi-gpu

23:31 rowanG337 has joined #asahi-gpu

23:40 possiblemeatball has quit [Quit: Quit]

23:53 stipa is now known as Guest6002

23:53 stipa has joined #asahi-gpu

23:55 Guest6002 has quit [Ping timeout: 480 seconds]