ChanServ changed the topic of #zink to: official development channel for the mesa3d zink driver || https://docs.mesa3d.org/drivers/zink.html
eukara has quit []
eukara has joined #zink
cheako has quit [Quit: Connection closed for inactivity]
<zmike> ajax: bad news
<zmike> glxgears only allocates BACK_LEFT, so your patch causes it to disconnect on startup
<zmike> checking for && !BACK_LEFT just doesn't do anything
<ajax> oh blah
<zmike> here's this in case you want to do some mode probing at some point https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16193
<ajax> but. makes sense. seems like we should check for dead dt in zink_flush_frontbuffer and bubble that out through kopper_copy_to_front instead
<ajax> and then also not return -1 from the !front_left case
* ajax tries that
<zmike> I think probably that would need a zink_kopper checking function like
<zmike> since flush_frontbuffer has no return
<ajax> oh i was going to just make it return bool
<ajax> it's not loader api, who cares
<zmike> hm
<zmike> I'm not sure how useful a return from that function would be?
<zmike> in theory such a return would be returning whether the flush was successful
<zmike> but the flush is async
<zmike> so ?
<zmike> having a separate check function would enable state to be checked whenever
<ajax> return whether the flush was actually enqueued? and don't do it if dead dt?
<ajax> i mean that's all we're trying to catch, here, right. the window got zapped but nothing is making glx throw an error about it, so let's make SwapBuffers notice when that happens
<zmike> yeah I suppose
<ajax> maybe i don't understand the async dispatch
<ajax> but yeah, check function works too
* ajax types more
<zmike> I'm looking at it as trying to catch it at the first available spot
<zmike> which isn't swapbuffers
<zmike> but if I look at the swapbuffers approach
<zmike> ideally we probably don't want to be backporting a change in core gallium api to 22.1?
<zmike> not to mention going through every driver and updating it for something nobody else needs
* ajax nods
* zmike dreads trying to push that job through CI
<zmike> ok
<zmike> I got it working
<ajax> oh yeah?
<ajax> you didn't build that? is_kill doesn't exist in main yet
<zmike> I did build it
<zmike> just...not against main :D
<ajax> looks almost exactly like what i had locally though
<ajax> great minds
<zmike> 🤝
<ajax> i don't get the extra check after you call zink_kopper_check though?
<zmike> dunno, you had it so I figured you had a reason for it
<ajax> hah! right, that was me
<ajax> i think that can go, whatever case it's catching is equivalent to dead swapchain
<zmike> do I not have a MR up yet for is_kill?
<zmike> I can't keep track
<ajax> you do, i acked it, you said not until something else landed
<zmike> ohhhhhhhh right
<zmike> haha
<zmike> okay so that MR and this MR need to be combined
<zmike> ajax: rb from you on the zink_kopper_check patch I assume?
<ajax> zmike: yeah rb
<zmike> ajax: seems like swapinterval should be good to go today?
<ajax> yeah i think so. going to rebase atop !16038 and do another smoke test
<zmike> cool
<ajax> do i have it right that if we don't ask for VK_KHR_shared_presentable_image we shouldn't see demand/continuous refresh modes when we ask what's supported?
<zmike> that should be correct, yes
<zmike> my assert was more a futile case in future proofing, not related to the comment
<ajax> nod. i'm just used to gl giving you whatever enums it feels like, and if vk was like that then the first driver to add that extn would be punished by breaking zink
<zmike> haha yeah
<zmike> it should be fine tho
<zmike> were you planning on adding the new x11 version as a mesa dep?
<ajax> probably should, once that releases
<ajax> mattst88 did the last release so i was trying to get his input before pushing another one out
<ajax> zmike: would it be worth it instead to have the kopper loaders warn/refuse to run on a non-threadsafe Display* ? it's easy to detect, it's ABI we're already using when we say LockDisplay(dpy)
<zmike> ajax: hm
<zmike> probably less hostile from a build perspective, but then probably have a bunch of people with zink installed who can't use it
<zmike> then again, if the error message is clear enough it'd be a good impetus for distros to adopt the new version
<ajax> KOPPER_I_WANT_THREAD_SAFETY_BUGS=1 MESA_LOADER_DRIVER_OVERRIDE=zink glxinfo
<zmike> haha
<zmike> KOPPER_I_AM_CHOOSING_TO_HAVE_THREAD_SAFETY_BUGS_OF_MY_OWN_FREE_WILL=1
<ajax> trying to think if there's a plausible way for libGL to notice if it's being loaded after someone has called XOpenDisplay, and if not, XInitThreads()
<zmike> just call it anyway? 🤔
<ajax> well. there's a race if we call XInitThreads while another thread is inside an xlib routine, i think
<zmike> is that still the case if it's already been called?
<zmike> if not, then imo yolo since they're already going to have thread errors anyway
<ajax> XInitThreads is safe to call a second time if it completed safely the first time
<zmike> seems like it can just be called unconditionally then if zink is being used
<ajax> very very very obscure chance of a crash if you do that
<zmike> hm
<ajax> and, no guarantee that it'll help
<zmike> seems like calling it at all is a bit of a hail mary
<ajax> because if the Display was created before XInitDisplay then it's never getting its own lock vtable filled in
<ajax> oh it's a shit api design, no doubt
<zmike> on our part I meant
<zmike> in mesa
<ajax> oh wait, hell yes this rules
<ajax> _XErrorFunction is globally visible, and it's basically the first thing XOpenDisplay fills in, and it's never non-null thereafter.
<zmike> smrt
<ajax> which means: i can put a ctor in libGL, if (!_XErrorFunction) XInitThreads(); else fprintf(stderr, "hold on to your butts");
<zmike> ship it
<ajax> but i do have to do it as a constructor, because by the time you call your first GLX function it's too late, the Display done been created
<ajax> and do it from libEGL too
<ajax> at least if the xlib platform support is built
<zmike> grimace
<ajax> not a problem, just a thing
<ajax> the annoying bit is the component that tickles the bug is wsi not kopper or zink
<ajax> ooh. good point. this belongs in wsi, and you dlsym your way to _XErrorFunction instead
<ajax> because frankly this bug exists in anything else that uses both xlib and our vulkan drivers
cheako has joined #zink
<zmike> ajax: on another topic, what effect, if any, does your wait -> poll special event patch have on cpu usage?
<zmike> the wait variant uses select, but poll just checks over and over
LexSfX has quit []
<ajax> zmike: you only hit that loop when every image in the swapchain belongs to the server, such that you're waiting for one to be released so ANI doesn't block
<ajax> (which is stupid, imo, because ANI is precisely the place that is specified as blocking, but whatever)
<ajax> zmike: should that happen, you'd start playing ping-pong with xserver ang GetGeometry requests, until the server deigns to release one of your images back to you
<ajax> if you are in this situation: your swapchain is too short.
<zmike> ajax: I'm wondering because we have the same issue in dri3 frontend
<ajax> yeah, i know
<zmike> I've been experimenting with fixing it in a similar way, but the cpu usage is pretty intense
LexSfX has joined #zink
<ajax> i could be convinced to do like usleep(100) between sending the getgeo and reading its reply
<ajax> probably that rounds up to 1ms
<zmike> probably fine in vk wsi since that path isn't hit much?
<zmike> I was seeing CPU usage in dri3 with unmapped window
<zmike> as we'd expect for that case
<zmike> but not sure what else to do to handle it since it's still supposed to block
LexSfX has quit [Ping timeout: 480 seconds]
LexSfX has joined #zink
LexSfX has quit [Read error: Connection reset by peer]
LexSfX has joined #zink
LexSfX has quit [Ping timeout: 480 seconds]
LexSfX has joined #zink
<ajax> zmike: https://paste.centos.org/view/raw/e2c920d8 fixes the glx-swap-pixmap crash at least
<ajax> still an xfail, i think, but no longer a crash
<zmike> ah yeah good call
<ajax> Xwayland doesn't seem to have OML_swap_control so i can't repro the other one immediately
<ajax> might be the same issue though
<zmike> I'm running xorg so should be simple
<zmike> just been afk a couple hours
<ajax> yeah, been in and out for me too today
<zmike> that kind of day
<zmike> ajax: you have a better idea for the dri3 scenario? seems like mimicking the vk behavior is too rough on the cpu
<zmike> esp for hidden windows
<ajax> the problem here is you're calling dri3_wait_for_event_locked from dri3_find_back?
<ajax> and it stalls forever?
<zmike> yup
<zmike> from glxmakecurrent
<zmike> but this can also happen for hidden windows
<zmike> have to dig into it more when I get back
<ajax> this with... radeonsi? iris?
<zmike> radeonsi
<ajax> i might be able to do something clever there
<zmike> 🤔
<zmike> seems like we should just evict the wait calls from mesa entirely
<zmike> given that we know they can deadlock
<ajax> agreed
LexSfX has quit [Ping timeout: 480 seconds]
LexSfX has joined #zink
<zmike> hm the other test fail was similar but not quite
<zmike> just null ctx from swapbuffers at startup
<ajax> i don't follow?
<zmike> first check in kopperSwapBuffers
<zmike> should be return 0
<ajax> ugh. yeah i guess.
<ajax> so here's something you don't want to learn
<zmike> I'm ready
<ajax> glx lets you call glXSwapBuffers on arbitrary windows, without a GL context.
<ajax> in principle this means: FROM ANOTHER CLIENT
<zmike> incredible
<ajax> and, potentially on a glxdrawable that is not current to any context
<zmike> what's the expected behavior for that?
<ajax> he fixes the cab^W^W^W^Wthe buffers get swapped
<zmike> haha
<ajax> if any context was current on that drawable, the front and back buffers would be dirtied.
<ajax> it almost makes sense if you remember that the X server was the thing that owned all the objects in Xsgi
<ajax> if there were buffers to be swapped, they belonged to the X server, so that's who would do it
<ajax> which means you literally send a GLXSwapBuffers request, even for direct contexts.
<zmike> 🤔
<zmike> yeah, that almost does make sense
<ajax> as a model, it ain't so bad, tbh
<zmike> could maybe be worse?
<zmike> going back to this dri3 thing a moment, I think I may have overstated the cpu usageness of my attempted fix here
<zmike> because transposing the vk wsi thingy onto this seems to work
<zmike> and I'm not noticing a ton of spinning
<ajax> i'd considered adding a perf counter to the swapchain and throwing a debug message when the swapchain gets destroyed
<ajax> (for every time you need the xcb_get_geometry check that is)
<zmike> baby steps
LexSfX has quit []
<ajax> okay, more lateral thinking here.
<zmike> oh no
<ajax> why run the present event loop in the queue thread in the first place
<zmike> I thought we had this idea of multiple threads already
<ajax> ugh i hate wsi so much
<zmike> amen
<ajax> chain->last_present_msc is only updated when the event comes back
<ajax> which means if you QueuePresent twice fast enough, even though you're in fifo mode, you'll post two presents with the same target msc
<ajax> which i am pretty sure xserver treats as a "replace what was queued for that msc" command
<ajax> why was i programmed to feel pain
<ajax> oh, no, it can't, because present_queued guards that...
<ajax> so fragile, all of it
<zmike> tfp MR updated, I think it should be better now
<ajax> lgtm
<ajax> i'm tagging out for the day
<zmike> cool
<zmike> solid progress all around I think