#zink on 2022-04-27 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:52 ChanServ changed the topic of #zink to: official development channel for the mesa3d zink driver || https://docs.mesa3d.org/drivers/zink.html

04:44 eukara has quit []

04:45 eukara has joined #zink

09:31 cheako has quit [Quit: Connection closed for inactivity]

12:08 <zmike> ajax: bad news

12:08 <zmike> glxgears only allocates BACK_LEFT, so your patch causes it to disconnect on startup

12:08 <zmike> checking for && !BACK_LEFT just doesn't do anything

13:32 <ajax> oh blah

13:34 <zmike> here's this in case you want to do some mode probing at some point https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16193

13:34 <ajax> but. makes sense. seems like we should check for dead dt in zink_flush_frontbuffer and bubble that out through kopper_copy_to_front instead

13:35 <ajax> and then also not return -1 from the !front_left case

13:40 * ajax tries that

13:40 <zmike> I think probably that would need a zink_kopper checking function like

13:40 <zmike> https://pastebin.com/n1jky1z2

13:40 <zmike> since flush_frontbuffer has no return

13:41 <ajax> oh i was going to just make it return bool

13:41 <ajax> it's not loader api, who cares

13:41 <zmike> hm

13:41 <zmike> I'm not sure how useful a return from that function would be?

13:41 <zmike> in theory such a return would be returning whether the flush was successful

13:41 <zmike> but the flush is async

13:41 <zmike> so ?

13:42 <zmike> having a separate check function would enable state to be checked whenever

13:43 <ajax> return whether the flush was actually enqueued? and don't do it if dead dt?

13:44 <ajax> i mean that's all we're trying to catch, here, right. the window got zapped but nothing is making glx throw an error about it, so let's make SwapBuffers notice when that happens

13:44 <zmike> yeah I suppose

13:44 <ajax> maybe i don't understand the async dispatch

13:44 <ajax> but yeah, check function works too

13:44 * ajax types more

13:44 <zmike> I'm looking at it as trying to catch it at the first available spot

13:44 <zmike> which isn't swapbuffers

13:44 <zmike> but if I look at the swapbuffers approach

13:45 <zmike> ideally we probably don't want to be backporting a change in core gallium api to 22.1?

13:45 <zmike> not to mention going through every driver and updating it for something nobody else needs

13:45 * ajax nods

13:45 * zmike dreads trying to push that job through CI

13:56 <zmike> ok

13:56 <zmike> I got it working

13:57 <ajax> oh yeah?

13:57 <zmike> ajax: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16194

13:58 <ajax> you didn't build that? is_kill doesn't exist in main yet

13:59 <zmike> I did build it

13:59 <zmike> just...not against main :D

13:59 <ajax> looks almost exactly like what i had locally though

13:59 <ajax> great minds

13:59 <zmike> 🤝

14:00 <ajax> i don't get the extra check after you call zink_kopper_check though?

14:00 <zmike> dunno, you had it so I figured you had a reason for it

14:00 <ajax> hah! right, that was me

14:01 <ajax> i think that can go, whatever case it's catching is equivalent to dead swapchain

14:02 <zmike> do I not have a MR up yet for is_kill?

14:02 <zmike> I can't keep track

14:02 <ajax> you do, i acked it, you said not until something else landed

14:02 <zmike> ohhhhhhhh right

14:02 <zmike> haha

14:02 <zmike> okay so that MR and this MR need to be combined

14:04 <zmike> ajax: rb from you on the zink_kopper_check patch I assume?

14:48 <ajax> zmike: yeah rb

15:10 <zmike> ajax: seems like swapinterval should be good to go today?

15:15 <ajax> yeah i think so. going to rebase atop !16038 and do another smoke test

15:19 <zmike> cool

15:19 <ajax> do i have it right that if we don't ask for VK_KHR_shared_presentable_image we shouldn't see demand/continuous refresh modes when we ask what's supported?

15:22 <zmike> that should be correct, yes

15:24 <zmike> my assert was more a futile case in future proofing, not related to the comment

15:25 <ajax> nod. i'm just used to gl giving you whatever enums it feels like, and if vk was like that then the first driver to add that extn would be punished by breaking zink

15:25 <zmike> haha yeah

15:25 <zmike> it should be fine tho

15:32 <zmike> were you planning on adding the new x11 version as a mesa dep?

15:35 <ajax> probably should, once that releases

15:35 <ajax> mattst88 did the last release so i was trying to get his input before pushing another one out

15:52 <ajax> zmike: would it be worth it instead to have the kopper loaders warn/refuse to run on a non-threadsafe Display* ? it's easy to detect, it's ABI we're already using when we say LockDisplay(dpy)

15:56 <zmike> ajax: hm

15:56 <zmike> probably less hostile from a build perspective, but then probably have a bunch of people with zink installed who can't use it

15:56 <zmike> then again, if the error message is clear enough it'd be a good impetus for distros to adopt the new version

16:02 <ajax> KOPPER_I_WANT_THREAD_SAFETY_BUGS=1 MESA_LOADER_DRIVER_OVERRIDE=zink glxinfo

16:02 <zmike> haha

16:03 <zmike> KOPPER_I_AM_CHOOSING_TO_HAVE_THREAD_SAFETY_BUGS_OF_MY_OWN_FREE_WILL=1

16:04 <ajax> trying to think if there's a plausible way for libGL to notice if it's being loaded after someone has called XOpenDisplay, and if not, XInitThreads()

16:05 <zmike> just call it anyway? 🤔

16:06 <ajax> well. there's a race if we call XInitThreads while another thread is inside an xlib routine, i think

16:06 <zmike> is that still the case if it's already been called?

16:10 <zmike> if not, then imo yolo since they're already going to have thread errors anyway

16:10 <ajax> XInitThreads is safe to call a second time if it completed safely the first time

16:11 <zmike> seems like it can just be called unconditionally then if zink is being used

16:11 <ajax> very very very obscure chance of a crash if you do that

16:11 <zmike> hm

16:11 <ajax> and, no guarantee that it'll help

16:12 <zmike> seems like calling it at all is a bit of a hail mary

16:12 <ajax> because if the Display was created before XInitDisplay then it's never getting its own lock vtable filled in

16:12 <ajax> oh it's a shit api design, no doubt

16:12 <zmike> on our part I meant

16:12 <zmike> in mesa

16:13 <ajax> oh wait, hell yes this rules

16:14 <ajax> _XErrorFunction is globally visible, and it's basically the first thing XOpenDisplay fills in, and it's never non-null thereafter.

16:14 <zmike> smrt

16:16 <ajax> which means: i can put a ctor in libGL, if (!_XErrorFunction) XInitThreads(); else fprintf(stderr, "hold on to your butts");

16:16 <zmike> ship it

16:17 <ajax> but i do have to do it as a constructor, because by the time you call your first GLX function it's too late, the Display done been created

16:17 <ajax> and do it from libEGL too

16:17 <ajax> at least if the xlib platform support is built

16:17 <zmike> grimace

16:17 <ajax> not a problem, just a thing

16:18 <ajax> the annoying bit is the component that tickles the bug is wsi not kopper or zink

16:19 <ajax> ooh. good point. this belongs in wsi, and you dlsym your way to _XErrorFunction instead

16:19 <ajax> because frankly this bug exists in anything else that uses both xlib and our vulkan drivers

16:23 cheako has joined #zink

16:59 <zmike> ajax: on another topic, what effect, if any, does your wait -> poll special event patch have on cpu usage?

16:59 <zmike> the wait variant uses select, but poll just checks over and over

17:05 LexSfX has quit []

17:08 <ajax> zmike: you only hit that loop when every image in the swapchain belongs to the server, such that you're waiting for one to be released so ANI doesn't block

17:08 <ajax> (which is stupid, imo, because ANI is precisely the place that is specified as blocking, but whatever)

17:09 <ajax> zmike: should that happen, you'd start playing ping-pong with xserver ang GetGeometry requests, until the server deigns to release one of your images back to you

17:10 <ajax> if you are in this situation: your swapchain is too short.

17:10 <zmike> ajax: I'm wondering because we have the same issue in dri3 frontend

17:10 <ajax> yeah, i know

17:11 <zmike> I've been experimenting with fixing it in a similar way, but the cpu usage is pretty intense

17:11 LexSfX has joined #zink

17:11 <ajax> i could be convinced to do like usleep(100) between sending the getgeo and reading its reply

17:12 <ajax> probably that rounds up to 1ms

17:12 <zmike> probably fine in vk wsi since that path isn't hit much?

17:13 <zmike> I was seeing CPU usage in dri3 with unmapped window

17:13 <zmike> as we'd expect for that case

17:14 <zmike> but not sure what else to do to handle it since it's still supposed to block

17:23 LexSfX has quit [Ping timeout: 480 seconds]

17:23 LexSfX has joined #zink

18:09 LexSfX has quit [Read error: Connection reset by peer]

18:10 LexSfX has joined #zink

18:20 LexSfX has quit [Ping timeout: 480 seconds]

18:21 LexSfX has joined #zink

18:38 <ajax> zmike: https://paste.centos.org/view/raw/e2c920d8 fixes the glx-swap-pixmap crash at least

18:39 <ajax> still an xfail, i think, but no longer a crash

18:39 <zmike> ah yeah good call

18:39 <ajax> Xwayland doesn't seem to have OML_swap_control so i can't repro the other one immediately

18:40 <ajax> might be the same issue though

18:40 <zmike> I'm running xorg so should be simple

18:40 <zmike> just been afk a couple hours

18:41 <ajax> yeah, been in and out for me too today

18:41 <zmike> that kind of day

18:47 <zmike> ajax: you have a better idea for the dri3 scenario? seems like mimicking the vk behavior is too rough on the cpu

18:48 <zmike> esp for hidden windows

18:53 <ajax> the problem here is you're calling dri3_wait_for_event_locked from dri3_find_back?

18:54 <ajax> and it stalls forever?

18:54 <zmike> yup

18:55 <zmike> from glxmakecurrent

18:55 <zmike> but this can also happen for hidden windows

18:55 <zmike> have to dig into it more when I get back

18:55 <ajax> this with... radeonsi? iris?

18:55 <zmike> radeonsi

18:56 <ajax> i might be able to do something clever there

18:56 <zmike> 🤔

18:57 <zmike> seems like we should just evict the wait calls from mesa entirely

18:57 <zmike> given that we know they can deadlock

18:57 <ajax> agreed

19:19 LexSfX has quit [Ping timeout: 480 seconds]

19:19 LexSfX has joined #zink

20:24 <zmike> hm the other test fail was similar but not quite

20:24 <zmike> just null ctx from swapbuffers at startup

20:24 <ajax> i don't follow?

20:24 <zmike> first check in kopperSwapBuffers

20:24 <zmike> should be return 0

20:25 <ajax> ugh. yeah i guess.

20:25 <ajax> so here's something you don't want to learn

20:25 <zmike> I'm ready

20:26 <ajax> glx lets you call glXSwapBuffers on arbitrary windows, without a GL context.

20:26 <ajax> in principle this means: FROM ANOTHER CLIENT

20:26 <zmike> incredible

20:27 <ajax> and, potentially on a glxdrawable that is not current to any context

20:28 <zmike> what's the expected behavior for that?

20:28 <ajax> he fixes the cab^W^W^W^Wthe buffers get swapped

20:28 <zmike> haha

20:29 <ajax> if any context was current on that drawable, the front and back buffers would be dirtied.

20:30 <ajax> it almost makes sense if you remember that the X server was the thing that owned all the objects in Xsgi

20:30 <ajax> if there were buffers to be swapped, they belonged to the X server, so that's who would do it

20:31 <ajax> which means you literally send a GLXSwapBuffers request, even for direct contexts.

20:31 <zmike> 🤔

20:31 <zmike> yeah, that almost does make sense

20:32 <ajax> as a model, it ain't so bad, tbh

20:32 <zmike> could maybe be worse?

20:32 <zmike> going back to this dri3 thing a moment, I think I may have overstated the cpu usageness of my attempted fix here

20:33 <zmike> because transposing the vk wsi thingy onto this seems to work

20:33 <zmike> and I'm not noticing a ton of spinning

20:33 <ajax> i'd considered adding a perf counter to the swapchain and throwing a debug message when the swapchain gets destroyed

20:34 <ajax> (for every time you need the xcb_get_geometry check that is)

20:34 <zmike> baby steps

20:46 LexSfX has quit []

20:55 <ajax> okay, more lateral thinking here.

20:55 <zmike> oh no

20:58 <ajax> why run the present event loop in the queue thread in the first place

20:59 <zmike> I thought we had this idea of multiple threads already

21:05 <ajax> ugh i hate wsi so much

21:05 <zmike> amen

21:05 <ajax> chain->last_present_msc is only updated when the event comes back

21:07 <ajax> which means if you QueuePresent twice fast enough, even though you're in fifo mode, you'll post two presents with the same target msc

21:07 <ajax> which i am pretty sure xserver treats as a "replace what was queued for that msc" command

21:08 <ajax> why was i programmed to feel pain

21:09 <ajax> oh, no, it can't, because present_queued guards that...

21:16 <ajax> so fragile, all of it

21:22 <zmike> tfp MR updated, I think it should be better now

21:31 <ajax> lgtm

21:31 <ajax> i'm tagging out for the day

21:31 <zmike> cool

21:31 <zmike> solid progress all around I think