#asahi-gpu on 2023-02-15 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:46 ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu

00:10 seeeath has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

00:13 seeeath has joined #asahi-gpu

00:15 seeeath has quit []

00:16 kesslerd has joined #asahi-gpu

00:16 seeeath has joined #asahi-gpu

01:08 hightower4 has quit [Remote host closed the connection]

01:08 hightower4 has joined #asahi-gpu

01:29 hightower4 has quit [Ping timeout: 480 seconds]

01:31 seeeath has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

01:34 seeeath has joined #asahi-gpu

02:03 kesslerd has quit [Quit: Leaving]

02:17 seeeath has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

02:47 seeeath has joined #asahi-gpu

03:01 seeeath has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

03:26 <alyssa> lina: Any chance I could get some pre-stream code review? *sweats*

03:41 seeeath has joined #asahi-gpu

03:41 seeeath has quit []

03:57 seeeath has joined #asahi-gpu

04:28 stipa has quit [Read error: Connection reset by peer]

04:29 stipa has joined #asahi-gpu

05:20 seeeath has quit [Read error: Connection reset by peer]

05:21 seeeath has joined #asahi-gpu

05:23 seeeath_ has joined #asahi-gpu

05:26 seeeath__ has joined #asahi-gpu

05:29 seeeath has quit [Ping timeout: 480 seconds]

05:30 seeeath has joined #asahi-gpu

05:31 seeeath_ has quit [Ping timeout: 480 seconds]

05:35 seeeath__ has quit [Read error: Connection reset by peer]

05:35 seeeath_ has joined #asahi-gpu

05:40 seeeath has quit [Ping timeout: 480 seconds]

05:52 seeeath has joined #asahi-gpu

05:53 seeeath__ has joined #asahi-gpu

05:55 seeeath_ has quit [Read error: Connection reset by peer]

05:55 seeeath__ has quit [Read error: Connection reset by peer]

05:58 seeeath_ has joined #asahi-gpu

06:00 seeeath has quit [Ping timeout: 480 seconds]

07:11 WindowPa- has joined #asahi-gpu

07:11 WindowPain has quit [Read error: Connection reset by peer]

07:51 MajorBiscuit has joined #asahi-gpu

08:44 gorbypark has joined #asahi-gpu

08:45 gorbypark has quit [Quit: Konversation terminated!]

08:45 gorbypark has joined #asahi-gpu

08:51 gorbypark has quit [Quit: Konversation terminated!]

10:07 <lina> alyssa: This is triggering on the EGL testsuite: Assertion `maxx > minx && maxy > miny' failed

10:07 <lina> dEQP-EGL.functional.swap_buffers_with_damage.resize_after_swap.buffer_age_render

11:07 cylm has quit [Read error: Connection reset by peer]

11:58 <daniels> lina: which winsys?

12:00 MajorBiscuit has quit [Ping timeout: 480 seconds]

12:11 <lina> daniels: x11/egl (I think that's what the dEQP build picked by default?)

12:18 <daniels> ah yes

12:19 <daniels> try either skipping that subset of tests, or EGL_PLATFORM=surfaceless

12:20 <daniels> the latter is good anyway if you don't want your running session to get trashed

12:30 MajorBiscuit has joined #asahi-gpu

12:46 possiblemeatball has joined #asahi-gpu

12:52 <alyssa> lina: K, will look when I get a chance

12:52 <alyssa> (unlikely to be today)

13:06 seeeath_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

13:19 alyssa has quit [Quit: leaving]

13:33 <lina> alyssa: With that commented out, we pass all the EGL tests except robust_gl_3* context creation stuff which sounds not applicable... ^^

13:33 <lina> I did fix some more threading bugs ^^

13:53 kesslerd has joined #asahi-gpu

14:02 cylm has joined #asahi-gpu

14:09 Cyrinux9 has quit []

14:09 Cyrinux9 has joined #asahi-gpu

14:20 kesslerd has quit [Ping timeout: 480 seconds]

15:11 seeeath has joined #asahi-gpu

15:18 <lina> 800x600 surfaceless glmark score: 4496

15:26 <lina> Pushed all the explicit sync stuff to gpu/explicit-sync (kernel) and agx/explicit-sync (mesa), seems to work well ^^

15:32 <lina> One little thing: sometimes you get GPU faults when killing things. Since everything is async now and BOs are tracked by userspace, if userspace dies in-progress GPU operations lose their BOs. The driver itself keeps all the critical structures alive until the jobs complete/fail, so nothing breaks, you just get GPU fault logs.

15:38 alyssa has joined #asahi-gpu

15:38 <alyssa> lina: niiiiice :)

15:39 <alyssa> thanks for the heads up about exit faults

15:39 <alyssa> as long as the fault handling is nice and robut that sounds fine :)

15:40 <lina> ^^

15:40 <lina> I'm still not sure if the firmware stuff is really 100% robust yet, we'll find out when I turn on timestamps again since that made it really obvious in the past.

15:40 <alyssa> heh

15:41 <lina> Also perf works again. It was an upstream regression (on all ARM!), jannau found it while I was busy with mesa ^^

15:42 <alyssa> ahahahah got it nice

15:42 <alyssa> sounds like I got a kernel upgrade to do

15:42 <alyssa> will this make my supertuxkart go brr

15:42 <lina> I'm regularly hitting 200+ FPS on xonotic now at least ^^

15:43 <alyssa> yeah that's a nice bump

15:43 <alyssa> a few more fps is good ;0

15:44 <jannau> no, just on all arm without the standard arm PMUv3

15:44 <lina> Any ideas on how to land this? Most of the changes are in general code, but then end up talking to syncobjs of course. At this point we probably want to drop macOS first?

15:45 <lina> jannau: Oh, I thought it was more general. So basically just Apple then?

15:46 <jannau> yes. apple, x-gene and thunderx2

15:47 <lina> alyssa: I also did some perf opt on the kernel side, there was some easy low-hanging CPU fruit with array initialization that gave me an extra 10% or so on glmark (for 4000+ FPS workloads, won't do as much for things that are more GPU-limited)

15:48 <alyssa> nice

15:48 <lina> At this point I think the next biggest submission overhead thing is just memory allocs, so I probably want to add some general mechanism to pool together random unrelated allocs for a single submission. But that can probably wait a bit.

15:48 <alyssa> lina: Please send the "drop macOS" MR, I'll review and merge

15:48 <alyssa> and rebase your branch on top of that

15:48 <lina> alyssa: Okay ^^

15:49 <alyssa> Thanks

15:49 <lina> I couldn't figure out how to get real-time traces, so I don't know how much of the overhead is context switching, mailbox stuff, etc.

15:51 <alyssa> lina: "asahi: Fix shader key cloning overreads"

15:51 <alyssa> I'm regretting the unsafe union

15:51 <lina> ^^;;

15:51 <lina> I think a compiler upgrade started complaining about that one

15:51 <lina> It wasn't actually breaking anything, just warning

15:52 <alyssa> yeah I get the warning too

15:52 * alyssa sweats

15:53 <lina> I'll be back in a bit, I'm going to get some conbini dinner or something

15:54 * alyssa builds new kernel

15:55 bluetail9 has joined #asahi-gpu

15:55 mkurz has joined #asahi-gpu

15:55 <alyssa> lina: As for how to land this... what's easiest for me is going to be

15:55 <alyssa> 1. Remove macOS upstream

15:55 <alyssa> 2. Land my outstanding asahi MRs upstream

15:56 <alyssa> 3. Rebase my agx/next upstream. It now consists of Linux UAPI patches + some feature work that's not yet ready. I drop the feature work and push an agx/linux branch.

15:56 <alyssa> 4. You rebase explicit sync on top of agx/linux and squash everything down to a new agx/linux-explicit-sync branch, that is just the new UAPI and not the old.

15:57 <alyssa> 5. I use agx/linux-explicit-sync as the new base for my feature work and pick on top.

15:57 bluetail91 has joined #asahi-gpu

15:57 <alyssa> Only question then is whether any of the code from agx/linux-explicit-sync is *not* specific to the UAPI (and should be upstreamed ahead of the new UAPI)

15:57 bluetail98 has joined #asahi-gpu

15:58 <alyssa> Definitely the memctx bug fix is, and should be split off into its own MR (this can happen ahead of the macOS stuff)

15:58 <alyssa> ditto for shader key clone overread

15:58 <alyssa> those 2/3 patches can probably go in now

16:00 <alyssa> agx_fence.c can probably go upstream

16:00 <alyssa> and agx_fence.h

16:00 <alyssa> since there's nothing downstream specific about drmSyncobjWait

16:01 <alyssa> possibly some of the agx_batch stuff could too? IDK

16:03 thevar1able has joined #asahi-gpu

16:04 bluetail9 has quit [Ping timeout: 480 seconds]

16:05 bluetail91 has quit [Ping timeout: 480 seconds]

16:09 <lina> Most of what I did today is not UAPI-specific, it's all the batch tracking stuff to make explicit sync work

16:09 <lina> I didn't even though the UAPI or its wrapper

16:09 <alyssa> Yeah, I see that

16:09 <lina> *touch

16:09 <alyssa> It would be nice to get that upstream

16:09 <alyssa> I think getting everything else merged first is probably easiest to avoid conflict hell

16:10 <lina> Yeah ^^

16:12 <lina> Any comments on the general approach? I just hacked on it until it worked, I want to know what you think about the submitted bitfield and all that

16:12 <alyssa> haven't read it yte

16:12 <alyssa> supposed to be doing at least 3 other things right now

16:13 <lina> Also right now it doesn't try to track batch-to-batch dependencies so it always does a full barrier between batches (so no vert/frag concurrency). That can clearly be fixed but then we need to be more careful about when we need to insert the barrier and not.

16:19 <alyssa> We never did fix that on Panfrost...

16:19 <lina> ^^;;

16:22 seeeath_ has joined #asahi-gpu

16:24 <alyssa> nice, melty molten galaxy at native resolution is going at 55fps now

16:25 <alyssa> .....with exclusive ubershaders (~:

16:25 <lina> Nice!

16:25 <lina> What was the FPS before?

16:25 <alyssa> IDK. not that. :p

16:26 <lina> wwwwww

16:26 <alyssa> supertuxkart (ES3 renderer) isn't helped as much as I hoped... oh well

16:26 <alyssa> glmark2 scores are that of an adult gpu though

16:27 <lina> terrain is like 160 now I think?

16:27 <alyssa> 128 here

16:27 <alyssa> where do I get the bigger fps

16:27 <lina> What resolution?

16:27 <alyssa> 800x600

16:28 <lina> Oh, did you set your CPUs to performance and pin to the p-cores?

16:28 <lina> The CPU scheduler is our enemy...

16:28 <alyssa> ugh

16:28 <alyssa> right

16:28 <alyssa> marcan: can you make the CPU scheduler not suck? thx

16:29 <lina> Wasn't chadmed working on that?

16:29 seeeath has quit [Ping timeout: 480 seconds]

16:29 <lina> But also I have my suspicions about mailbox hurting with that too...

16:30 <marcan> yeah, that was chadmed and the EAS stuff

16:31 <alyssa> ok, setting CPUs to performance brings terrain up to 150

16:31 <lina> taskset to 4-7?

16:31 <lina> I get 169 ^^

16:32 <alyssa> 157 on x11

16:32 <lina> Ah, I'm using gbm which probably gains a bit

16:33 <alyssa> ah, yeah that'll do it

16:33 <lina> What numbers are you expecting for terrain vs. other GPUs?

16:33 <alyssa> IDK I don't have other adult GPUs

16:33 <lina> wwwww

16:33 <lina> Okay what does panfrost get?

16:34 <alyssa> depends on the mali

16:34 <alyssa> on mali-t860, around 30fps (compared to the DDK getting 38fps iirc)

16:34 <alyssa> on mali-g52, around 50 iirc

16:34 <alyssa> on mali-g57, I want to say in the 70s?

16:35 <alyssa> actually that's not even fair

16:35 <alyssa> RK3399, Amlogic something or other, and MT8192 respectively

16:35 <alyssa> I think SuperTuxKart was losing the cpu scheduling lottery really hard

16:35 <lina> Okay, so we're actually doing reasonably well for a tiler ^^

16:35 <alyssa> with explicit sync but default settings, 28fps

16:36 <alyssa> with explicit sync but minfreq=maxfreq and taskset 0xf0, 37fps

16:36 <alyssa> that's a huge difference

16:36 <lina> Nice!

16:36 <alyssa> (this is on Black Forest with my opinionated graphics settings)

16:36 <alyssa> (Black Forest being the most demanding standard track)

16:36 <lina> BTW, I just set the governor to "performance" and don't mess with frequencies

16:37 <alyssa> oh, that's neat

16:37 <alyssa> I'm used to governors being broken

16:37 <lina> "performance" is just "max" ^^

16:37 <alyssa> after all those years living in the States...

16:37 <lina> wwwwwwwwwwwwwwwwww

16:37 <lina> Sis...

16:38 <alyssa> what am i not allowed to make fun of governments in north america now

16:40 <lina> You are, it was just unexpected wwwww

16:43 kesslerd has joined #asahi-gpu

16:50 <alyssa> quake3 is better but still dropping frames, idk what's up there yet

16:52 <alyssa> shrug

16:52 <alyssa> not a priority

16:54 <alyssa> HOLY SHIT

16:56 <alyssa> with the latest kernel + mesa and pinning cpu to maxfreq and pinning big cores

16:56 <alyssa> t-re is up to trex is up to 196fps

16:56 <alyssa> (was 55fps)

16:56 <alyssa> also, manhattan is now broken and closing it mid frame just hung my system

16:56 <alyssa> but hey you win some you lose some

16:59 <alyssa> lina: https://rosenzweig.io/splat.txt

16:59 <alyssa> ^^ splat

16:59 <alyssa> it's clearly faulting every frame (some Mesa bug, presumably), that's not the issue

16:59 <lina> Oh mailbox. Oof. Yeah.

16:59 <alyssa> the issue is then ctrl-c'ing out of the process mid-fault hangs the system

16:59 seeeath_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

17:00 <alyssa> which is a kernel bug obviously

17:00 <lina> Another thing I added/fixed in this round is that GPU firmware crashes should actually cleanly abort all jobs (and mesa has an abort() in that path)

17:00 <lina> But this isn't that

17:00 <lina> This is mailbox.

17:00 <alyssa> (~:

17:00 <alyssa> I will hold off on debugging manhattan until you (or marcan) fixes this since it'll be too painful otherwise

17:00 <lina> marcan: Can we please get rid of mailbox pretty please? ;;

17:00 <alyssa> also, I just saw some random graphical corruption flicker on gnome

17:00 <lina> alyssa: You can probably just increase that constant to a silly value to work around it

17:01 <alyssa> (but no fault)

17:01 <alyssa> so I'm guessing we're still missing some syncs somewhere

17:01 <alyssa> there it is again uh

17:01 <lina> KDE was glitchy as heck until I remembered that a gallium flush should be a full sync...

17:01 <lina> But I'm sure I missed something

17:01 <alyssa> yeah ok, gnome with two terminal windows open, tabbing between them quickly (meta+backtick) and you see sheered artefacts of the wrong terminal

17:02 <alyssa> q17:01 < lina> KDE was glitchy as heck until I remembered that a gallium flush should be a full sync...

17:03 <alyssa> No, it shouldn't.

17:03 <alyssa> Gallium flush is just a flush, no sync

17:03 <lina> Okay then there's another problem somewhere else ^^;;

17:03 <alyssa> If the user wants to wait for the result, they need to wait for the fence

17:03 <alyssa> it's a flush_all, not a sync_all

17:04 <alyssa> I can't think of a case where you actually need a sync_all

17:04 <alyssa> probably not even context destruction

17:04 <lina> Well KDE becomes a mess without it... so either kwin is broken or we're missing syncs in some other place...

17:04 <alyssa> definitely missing syncs somewhere else

17:04 <alyssa> seeing as gnome is broken with it

17:05 <lina> We do need the sync_all in context destruction if we don't want faults on clean exits

17:05 <lina> Plus without that we leak memory

17:05 <alyssa> sure, ok

17:05 <daniels> lina: is this KWin as a Wayland compositor (backing on to GBM) or as an Xorg WM?

17:05 <lina> Wayland

17:06 <daniels> good answer :)

17:06 <alyssa> :D

17:06 <daniels> in that case, if you're seeing horribly incomplete bits which suggest KWin's own rendering is broken (i.e. it's not due to stale client data but e.g. clients being intermingled), and you're confident that your internal batch-to-batch tracking is fine (or even oversyncing), it's probably missing synchronisation indeed

17:07 <daniels> Gallium flush only requires a hardware flush indeed, but there's a separate flush_frontbuffer which is your hook for eglSwapBuffers

17:07 <alyssa> Both undersyncing and oversyncing in the same driver? Wow! :-D

17:07 <daniels> that would be the place at which you need to place a dma_fence for all batch completions on the dma_resv, then make sure the DCP driver respects that fence

17:08 <lina> I don't sync in flush_frontbuffer yet, I wasn't sure about that one, so maybe that's it...

17:08 <daniels> (the other method is that KWin creates an EGLSyncKHR representing everything in that ctx being completed, exports that to a dma_fence, then places that as the IN_FENCE_FD on the plane during the KMS commit, but I don't think KWin does that, i.e. you need to rely on implicit sync for inter-device completion)

17:08 <lina> I also have no idea what DCP does ^^;;

17:08 <daniels> heh

17:09 <daniels> the easiest way to find out is to grep it for prepare_fb

17:09 <daniels> if it uses the drm_gem_shmem (IIRC) helper, then it'll wait on all fences before continuing

17:09 * alyssa isn't totally sure how the batch tracking here works yet

17:09 <jannau> dcp does nothing explicitly, if the helpers do it correctly it should work otherwise it's broken

17:09 <daniels> but yeah, flush_frontbuffer is your place to update the dma_resv for implicit sync - it's the handover from your rendering to an external user, be it GBM or a Wayland server

17:10 <alyssa> oh, ok. "active" means "unsubmitted and active", "submitted" means "submitted but not yet done". got it

17:11 <lina> I just realized we're doing a CPU copy in there right now with no sync... I clearly glossed over it way too quickly today...

17:11 <alyssa> that's only on macOS

17:11 <alyssa> I think

17:11 <daniels> jannau: oh nice, looks like drm_atomic_helper_prepare_planes() even calls the default prepare_fb (i.e. wait on fences) for you, so if you're just using the stock atomic helpers then you shouldn't even need to set the prepare_fb func yourself

17:11 <lina> alyssa: Yes, active used to mean both but it was more of a mess that way

17:11 <lina> alyssa: The code is unconditional there...

17:11 <alyssa> daniels: I was under the impression flush_frontbuffer is only used for software winsys

17:12 <alyssa> I don't think we implement it on Panfrost, it's only in Asahi for the macOS "just pretend we're llvmpipe" code

17:12 <alyssa> then again panfrost doesn't do explicit sync yet

17:12 <alyssa> (I shudder in fear of the pancsf undersyncing bug reports)

17:13 seeeath has joined #asahi-gpu

17:14 <alyssa> lina: Panfrost is all implicit sync so in general assume it's all broken ;)

17:14 <alyssa> I think iris is probably the best reference here/ IDK

17:15 <lina> I wasn't sure of what driver to look at, couldn't find a good reference for what has proper explicit sync...

17:15 <alyssa> Yeah, I'm not sure either

17:15 <lina> So I just winged it myself in the end ^^;;

17:15 <alyssa> Hey it's all good

17:15 <alyssa> iris with the Xe patchset is probably going to be best but idk

17:15 <jannau> on dcp side everything should be ok then through drm_atomic_helper_commit

17:16 <jannau> Xe MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20418

17:17 <daniels> eglSwapBuffers -> dri2_wl_swap_buffers_with_damage -> dri2_flush_drawable_for_swapbuffers -> DRI2flushExt::flush_with_flags(__DRI2_FLUSH_DRAWABLE | __DRI2_THROTTLE_SWAPBUFFER) -> dri_flush -> st_context_flush(ST_FLUSH_END_OF_FRAME) -> st_context_flush(ST_FLUSH_END_OF_FRAME) -> st_flush(PIPE_FLUSH_END_OF_FRAME) -> pipe->flush(PIPE_FLUSH_END_OF_FRAME) -> your driver

17:18 <daniels> (you can replace dri2_wl_swap_buffers_with_damage with dri2_drm_swap_buffers depending on context)

17:19 <lina> agx_flush does do the fence right, but it also didn't work when I didn't also force a full sync so clearly something is broken...

17:20 bisko has quit [Quit: Textual IRC Client: www.textualapp.com]

17:20 <lina> Then again I stole the fence code from panfrost so who knows whether it works? ^^

17:20 <lina> *hides*

17:24 <daniels> lina: by 'do the fence right', do you mean that it'll update the excl slot in the dma_resv?

17:26 <lina> The function gets passed a pipe_fence_handle** and it replaces it with a new handle (that's from panfrost) which is a clone of the syncobj from the last batch flushed... is it supposed to do something different?

17:26 * alyssa needs to be eating lunch / doing homework / going to class / doing her day job right now so is going to pop off

17:26 bisko has joined #asahi-gpu

17:26 <alyssa> not being able to review WSI code right now, I don't know how I'll cope :~

17:28 <lina> I should get some sleep... and other than the Rust meeting tomorrow night, I think I'm going to take a break until next week ^^

17:31 <daniels> lina: I mean down in the kernel driver - the agx DRM driver needs to know that it needs to update the dma_resv struct on the dma_buf struct with the fence, so then when the DCP goes to source from that dma_buf, it knows to wait first

17:31 <daniels> the old-school way to do that is to just put a flag on your CS ioctl that you should update the dma_resv (specifying either shared for read-only ops or excl for write/RW) with the fence generated by that CS

17:32 <daniels> the new-school way to do that is that gfxstrand has some helpers for Vulkan WSI which do that automatically (by using a separate dmabuf ioctl which shoves the fence into the resv without the need for the agx CS to do it), but how they're hooked up in Vulkan I don't know, and WSI is also not Gallium :P

17:34 <lina> I think the plan was the new-school way, the kernel driver only knows about out syncobjs (which it replaces the fence on)

17:36 <lina> Anyway, this is clearly quite broken but I think I should get some sleep ^^;;

17:36 <lina> At least the batch tracking works well now, I'm sure alyssa can fix WSI for us!

17:37 <alyssa> I definitely can't

17:37 <alyssa> I don't know anything about WSI

17:37 <alyssa> Lina is going to fix WSI, right?

17:37 <alyssa> (-:

17:37 <lina> I thought you had stuff to do sis, why are you still here? ^^;;

17:37 <alyssa> ~~because I have a compulsive Internet use problem~~

17:38 <alyssa> ~~too many emails~~

17:38 <alyssa> umm

17:38 * alyssa poofs

17:38 <lina> mood........

17:38 * lina flops

17:44 kesslerd has quit [Quit: Leaving]

18:15 <cy8aer> ........

18:15 <cy8aer> (sorry, wrong chat)

18:21 lawrence has quit [Quit: The Lounge - https://thelounge.chat]

18:23 lawrence has joined #asahi-gpu

18:41 MajorBiscuit has quit [Quit: WeeChat 3.6]

18:59 cylm_ has joined #asahi-gpu

19:01 cylm has quit [Ping timeout: 480 seconds]

19:16 possiblemeatball has quit [Quit: Quit]

19:16 possiblemeatball has joined #asahi-gpu

19:17 c10l9 has quit []

19:17 c10l has joined #asahi-gpu

20:08 possiblemeatball has quit [Quit: Quit]

22:58 seeeath has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]