cylm has quit [Read error: Connection reset by peer]
<daniels>
lina: which winsys?
MajorBiscuit has quit [Ping timeout: 480 seconds]
<lina>
daniels: x11/egl (I think that's what the dEQP build picked by default?)
<daniels>
ah yes
<daniels>
try either skipping that subset of tests, or EGL_PLATFORM=surfaceless
<daniels>
the latter is good anyway if you don't want your running session to get trashed
MajorBiscuit has joined #asahi-gpu
possiblemeatball has joined #asahi-gpu
<alyssa>
lina: K, will look when I get a chance
<alyssa>
(unlikely to be today)
seeeath_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
alyssa has quit [Quit: leaving]
<lina>
alyssa: With that commented out, we pass all the EGL tests except robust_gl_3* context creation stuff which sounds not applicable... ^^
<lina>
I did fix some more threading bugs ^^
kesslerd has joined #asahi-gpu
cylm has joined #asahi-gpu
Cyrinux9 has quit []
Cyrinux9 has joined #asahi-gpu
kesslerd has quit [Ping timeout: 480 seconds]
seeeath has joined #asahi-gpu
<lina>
800x600 surfaceless glmark score: 4496
<lina>
Pushed all the explicit sync stuff to gpu/explicit-sync (kernel) and agx/explicit-sync (mesa), seems to work well ^^
<lina>
One little thing: sometimes you get GPU faults when killing things. Since everything is async now and BOs are tracked by userspace, if userspace dies in-progress GPU operations lose their BOs. The driver itself keeps all the critical structures alive until the jobs complete/fail, so nothing breaks, you just get GPU fault logs.
alyssa has joined #asahi-gpu
<alyssa>
lina: niiiiice :)
<alyssa>
thanks for the heads up about exit faults
<alyssa>
as long as the fault handling is nice and robut that sounds fine :)
<lina>
^^
<lina>
I'm still not sure if the firmware stuff is really 100% robust yet, we'll find out when I turn on timestamps again since that made it really obvious in the past.
<alyssa>
heh
<lina>
Also perf works again. It was an upstream regression (on all ARM!), jannau found it while I was busy with mesa ^^
<alyssa>
ahahahah got it nice
<alyssa>
sounds like I got a kernel upgrade to do
<alyssa>
will this make my supertuxkart go brr
<lina>
I'm regularly hitting 200+ FPS on xonotic now at least ^^
<alyssa>
yeah that's a nice bump
<alyssa>
a few more fps is good ;0
<jannau>
no, just on all arm without the standard arm PMUv3
<lina>
Any ideas on how to land this? Most of the changes are in general code, but then end up talking to syncobjs of course. At this point we probably want to drop macOS first?
<lina>
jannau: Oh, I thought it was more general. So basically just Apple then?
<jannau>
yes. apple, x-gene and thunderx2
<lina>
alyssa: I also did some perf opt on the kernel side, there was some easy low-hanging CPU fruit with array initialization that gave me an extra 10% or so on glmark (for 4000+ FPS workloads, won't do as much for things that are more GPU-limited)
<alyssa>
nice
<lina>
At this point I think the next biggest submission overhead thing is just memory allocs, so I probably want to add some general mechanism to pool together random unrelated allocs for a single submission. But that can probably wait a bit.
<alyssa>
lina: Please send the "drop macOS" MR, I'll review and merge
<alyssa>
and rebase your branch on top of that
<lina>
alyssa: Okay ^^
<alyssa>
Thanks
<lina>
I couldn't figure out how to get real-time traces, so I don't know how much of the overhead is context switching, mailbox stuff, etc.
<lina>
I think a compiler upgrade started complaining about that one
<lina>
It wasn't actually breaking anything, just warning
<alyssa>
yeah I get the warning too
* alyssa
sweats
<lina>
I'll be back in a bit, I'm going to get some conbini dinner or something
* alyssa
builds new kernel
bluetail9 has joined #asahi-gpu
mkurz has joined #asahi-gpu
<alyssa>
lina: As for how to land this... what's easiest for me is going to be
<alyssa>
1. Remove macOS upstream
<alyssa>
2. Land my outstanding asahi MRs upstream
<alyssa>
3. Rebase my agx/next upstream. It now consists of Linux UAPI patches + some feature work that's not yet ready. I drop the feature work and push an agx/linux branch.
<alyssa>
4. You rebase explicit sync on top of agx/linux and squash everything down to a new agx/linux-explicit-sync branch, that is just the new UAPI and not the old.
<alyssa>
5. I use agx/linux-explicit-sync as the new base for my feature work and pick on top.
bluetail91 has joined #asahi-gpu
<alyssa>
Only question then is whether any of the code from agx/linux-explicit-sync is *not* specific to the UAPI (and should be upstreamed ahead of the new UAPI)
<alyssa>
Only question then is whether any of the code from agx/linux-explicit-sync is *not* specific to the UAPI (and should be upstreamed ahead of the new UAPI)
bluetail98 has joined #asahi-gpu
<alyssa>
Definitely the memctx bug fix is, and should be split off into its own MR (this can happen ahead of the macOS stuff)
<alyssa>
ditto for shader key clone overread
<alyssa>
those 2/3 patches can probably go in now
<alyssa>
agx_fence.c can probably go upstream
<alyssa>
and agx_fence.h
<alyssa>
since there's nothing downstream specific about drmSyncobjWait
<alyssa>
possibly some of the agx_batch stuff could too? IDK
thevar1able has joined #asahi-gpu
bluetail9 has quit [Ping timeout: 480 seconds]
bluetail91 has quit [Ping timeout: 480 seconds]
<lina>
Most of what I did today is not UAPI-specific, it's all the batch tracking stuff to make explicit sync work
<lina>
I didn't even though the UAPI or its wrapper
<alyssa>
Yeah, I see that
<lina>
*touch
<alyssa>
It would be nice to get that upstream
<alyssa>
I think getting everything else merged first is probably easiest to avoid conflict hell
<lina>
Yeah ^^
<lina>
Any comments on the general approach? I just hacked on it until it worked, I want to know what you think about the submitted bitfield and all that
<alyssa>
haven't read it yte
<alyssa>
supposed to be doing at least 3 other things right now
<lina>
Also right now it doesn't try to track batch-to-batch dependencies so it always does a full barrier between batches (so no vert/frag concurrency). That can clearly be fixed but then we need to be more careful about when we need to insert the barrier and not.
<alyssa>
We never did fix that on Panfrost...
<lina>
^^;;
seeeath_ has joined #asahi-gpu
<alyssa>
nice, melty molten galaxy at native resolution is going at 55fps now
<alyssa>
.....with exclusive ubershaders (~:
<lina>
Nice!
<lina>
What was the FPS before?
<alyssa>
IDK. not that. :p
<lina>
wwwwww
<alyssa>
supertuxkart (ES3 renderer) isn't helped as much as I hoped... oh well
<alyssa>
glmark2 scores are that of an adult gpu though
<lina>
terrain is like 160 now I think?
<alyssa>
128 here
<alyssa>
where do I get the bigger fps
<lina>
What resolution?
<alyssa>
800x600
<lina>
Oh, did you set your CPUs to performance and pin to the p-cores?
<lina>
The CPU scheduler is our enemy...
<alyssa>
ugh
<alyssa>
right
<alyssa>
marcan: can you make the CPU scheduler not suck? thx
<lina>
Wasn't chadmed working on that?
seeeath has quit [Ping timeout: 480 seconds]
<lina>
But also I have my suspicions about mailbox hurting with that too...
<marcan>
yeah, that was chadmed and the EAS stuff
<alyssa>
ok, setting CPUs to performance brings terrain up to 150
<lina>
taskset to 4-7?
<lina>
I get 169 ^^
<alyssa>
157 on x11
<lina>
Ah, I'm using gbm which probably gains a bit
<alyssa>
ah, yeah that'll do it
<lina>
What numbers are you expecting for terrain vs. other GPUs?
<alyssa>
IDK I don't have other adult GPUs
<lina>
wwwww
<lina>
Okay what does panfrost get?
<alyssa>
depends on the mali
<alyssa>
on mali-t860, around 30fps (compared to the DDK getting 38fps iirc)
<alyssa>
on mali-g52, around 50 iirc
<alyssa>
on mali-g57, I want to say in the 70s?
<alyssa>
actually that's not even fair
<alyssa>
RK3399, Amlogic something or other, and MT8192 respectively
<alyssa>
I think SuperTuxKart was losing the cpu scheduling lottery really hard
<lina>
Okay, so we're actually doing reasonably well for a tiler ^^
<alyssa>
with explicit sync but default settings, 28fps
<alyssa>
with explicit sync but minfreq=maxfreq and taskset 0xf0, 37fps
<alyssa>
that's a huge difference
<lina>
Nice!
<alyssa>
(this is on Black Forest with my opinionated graphics settings)
<alyssa>
(Black Forest being the most demanding standard track)
<lina>
BTW, I just set the governor to "performance" and don't mess with frequencies
<alyssa>
oh, that's neat
<alyssa>
I'm used to governors being broken
<lina>
"performance" is just "max" ^^
<alyssa>
after all those years living in the States...
<lina>
wwwwwwwwwwwwwwwwww
<lina>
Sis...
<alyssa>
what am i not allowed to make fun of governments in north america now
<lina>
You are, it was just unexpected wwwww
kesslerd has joined #asahi-gpu
<alyssa>
quake3 is better but still dropping frames, idk what's up there yet
<alyssa>
shrug
<alyssa>
not a priority
<alyssa>
HOLY SHIT
<alyssa>
with the latest kernel + mesa and pinning cpu to maxfreq and pinning big cores
<alyssa>
t-re is up to trex is up to 196fps
<alyssa>
(was 55fps)
<alyssa>
also, manhattan is now broken and closing it mid frame just hung my system
<alyssa>
it's clearly faulting every frame (some Mesa bug, presumably), that's not the issue
<lina>
Oh mailbox. Oof. Yeah.
<alyssa>
the issue is then ctrl-c'ing out of the process mid-fault hangs the system
seeeath_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<alyssa>
which is a kernel bug obviously
<lina>
Another thing I added/fixed in this round is that GPU firmware crashes should actually cleanly abort all jobs (and mesa has an abort() in that path)
<lina>
But this isn't that
<lina>
This is mailbox.
<alyssa>
(~:
<alyssa>
I will hold off on debugging manhattan until you (or marcan) fixes this since it'll be too painful otherwise
<lina>
marcan: Can we please get rid of mailbox pretty please? ;;
<alyssa>
also, I just saw some random graphical corruption flicker on gnome
<lina>
alyssa: You can probably just increase that constant to a silly value to work around it
<alyssa>
(but no fault)
<alyssa>
so I'm guessing we're still missing some syncs somewhere
<alyssa>
there it is again uh
<lina>
KDE was glitchy as heck until I remembered that a gallium flush should be a full sync...
<lina>
But I'm sure I missed something
<alyssa>
yeah ok, gnome with two terminal windows open, tabbing between them quickly (meta+backtick) and you see sheered artefacts of the wrong terminal
<alyssa>
q17:01 < lina> KDE was glitchy as heck until I remembered that a gallium flush should be a full sync...
<alyssa>
No, it shouldn't.
<alyssa>
Gallium flush is just a flush, no sync
<lina>
Okay then there's another problem somewhere else ^^;;
<alyssa>
If the user wants to wait for the result, they need to wait for the fence
<alyssa>
it's a flush_all, not a sync_all
<alyssa>
I can't think of a case where you actually need a sync_all
<alyssa>
probably not even context destruction
<lina>
Well KDE becomes a mess without it... so either kwin is broken or we're missing syncs in some other place...
<alyssa>
definitely missing syncs somewhere else
<alyssa>
seeing as gnome is broken with it
<lina>
We do need the sync_all in context destruction if we don't want faults on clean exits
<lina>
Plus without that we leak memory
<alyssa>
sure, ok
<daniels>
lina: is this KWin as a Wayland compositor (backing on to GBM) or as an Xorg WM?
<lina>
Wayland
<daniels>
good answer :)
<alyssa>
:D
<daniels>
in that case, if you're seeing horribly incomplete bits which suggest KWin's own rendering is broken (i.e. it's not due to stale client data but e.g. clients being intermingled), and you're confident that your internal batch-to-batch tracking is fine (or even oversyncing), it's probably missing synchronisation indeed
<daniels>
Gallium flush only requires a hardware flush indeed, but there's a separate flush_frontbuffer which is your hook for eglSwapBuffers
<alyssa>
Both undersyncing and oversyncing in the same driver? Wow! :-D
<daniels>
that would be the place at which you need to place a dma_fence for all batch completions on the dma_resv, then make sure the DCP driver respects that fence
<lina>
I don't sync in flush_frontbuffer yet, I wasn't sure about that one, so maybe that's it...
<daniels>
(the other method is that KWin creates an EGLSyncKHR representing everything in that ctx being completed, exports that to a dma_fence, then places that as the IN_FENCE_FD on the plane during the KMS commit, but I don't think KWin does that, i.e. you need to rely on implicit sync for inter-device completion)
<lina>
I also have no idea what DCP does ^^;;
<daniels>
heh
<daniels>
the easiest way to find out is to grep it for prepare_fb
<daniels>
if it uses the drm_gem_shmem (IIRC) helper, then it'll wait on all fences before continuing
* alyssa
isn't totally sure how the batch tracking here works yet
<jannau>
dcp does nothing explicitly, if the helpers do it correctly it should work otherwise it's broken
<daniels>
but yeah, flush_frontbuffer is your place to update the dma_resv for implicit sync - it's the handover from your rendering to an external user, be it GBM or a Wayland server
<alyssa>
oh, ok. "active" means "unsubmitted and active", "submitted" means "submitted but not yet done". got it
<lina>
I just realized we're doing a CPU copy in there right now with no sync... I clearly glossed over it way too quickly today...
<alyssa>
that's only on macOS
<alyssa>
I think
<daniels>
jannau: oh nice, looks like drm_atomic_helper_prepare_planes() even calls the default prepare_fb (i.e. wait on fences) for you, so if you're just using the stock atomic helpers then you shouldn't even need to set the prepare_fb func yourself
<lina>
alyssa: Yes, active used to mean both but it was more of a mess that way
<lina>
alyssa: The code is unconditional there...
<alyssa>
daniels: I was under the impression flush_frontbuffer is only used for software winsys
<alyssa>
I don't think we implement it on Panfrost, it's only in Asahi for the macOS "just pretend we're llvmpipe" code
<alyssa>
then again panfrost doesn't do explicit sync yet
<alyssa>
(I shudder in fear of the pancsf undersyncing bug reports)
seeeath has joined #asahi-gpu
<alyssa>
lina: Panfrost is all implicit sync so in general assume it's all broken ;)
<alyssa>
I think iris is probably the best reference here/ IDK
<lina>
I wasn't sure of what driver to look at, couldn't find a good reference for what has proper explicit sync...
<alyssa>
Yeah, I'm not sure either
<lina>
So I just winged it myself in the end ^^;;
<alyssa>
Hey it's all good
<alyssa>
iris with the Xe patchset is probably going to be best but idk
<jannau>
on dcp side everything should be ok then through drm_atomic_helper_commit
<lina>
Then again I stole the fence code from panfrost so who knows whether it works? ^^
<lina>
*hides*
<daniels>
lina: by 'do the fence right', do you mean that it'll update the excl slot in the dma_resv?
<lina>
The function gets passed a pipe_fence_handle** and it replaces it with a new handle (that's from panfrost) which is a clone of the syncobj from the last batch flushed... is it supposed to do something different?
* alyssa
needs to be eating lunch / doing homework / going to class / doing her day job right now so is going to pop off
bisko has joined #asahi-gpu
<alyssa>
not being able to review WSI code right now, I don't know how I'll cope :~
<lina>
I should get some sleep... and other than the Rust meeting tomorrow night, I think I'm going to take a break until next week ^^
<daniels>
lina: I mean down in the kernel driver - the agx DRM driver needs to know that it needs to update the dma_resv struct on the dma_buf struct with the fence, so then when the DCP goes to source from that dma_buf, it knows to wait first
<daniels>
the old-school way to do that is to just put a flag on your CS ioctl that you should update the dma_resv (specifying either shared for read-only ops or excl for write/RW) with the fence generated by that CS
<daniels>
the new-school way to do that is that gfxstrand has some helpers for Vulkan WSI which do that automatically (by using a separate dmabuf ioctl which shoves the fence into the resv without the need for the agx CS to do it), but how they're hooked up in Vulkan I don't know, and WSI is also not Gallium :P
<lina>
I think the plan was the new-school way, the kernel driver only knows about out syncobjs (which it replaces the fence on)
<lina>
Anyway, this is clearly quite broken but I think I should get some sleep ^^;;
<lina>
At least the batch tracking works well now, I'm sure alyssa can fix WSI for us!
<alyssa>
I definitely can't
<alyssa>
I don't know anything about WSI
<alyssa>
Lina is going to fix WSI, right?
<alyssa>
(-:
<lina>
I thought you had stuff to do sis, why are you still here? ^^;;
<alyssa>
~~because I have a compulsive Internet use problem~~