ChanServ changed the topic of #zink to: official development channel for the mesa3d zink driver || https://docs.mesa3d.org/drivers/zink.html
LexSfX has quit []
LexSfX has joined #zink
LexSfX has quit []
<zmike> ajax: I ran the thing several times yesterday with kopper and didn't hit that issue, so I'm of the opinion that it's not worth looking into further unless you're able to reproduce it with the full series
* ajax reupps
<ajax> [xcb] Unknown sequence number while processing queue
<ajax> [xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
<ajax> [xcb] Aborting, sorry about that.
<ajax> glxgears: xcb_io.c:269: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.
<ajax> >_<
<zmike> yeah that's still the drisw xlib thing
<zmike> or at least it looks similar
<ajax> lovely
<zmike> this is glxgears?
<ajax> yeah, just hold down an arrow key to rotate them and it'll die eventually
<ajax> clearly something in event handling is Wrong
<zmike> TIL you can rotate them
<zmike> I just hit the same error in glmark2 though
<ajax> and, glmark2 still dies with a disconnect after the first [build] scene
<zmike> hm I got the exact same error you posted in mine
<zmike> it's weird that it only happens in these types of apps
<ajax> they're both xcb/xlib internal asserts, they're probably the same root cause just happening in different libraries
<zmike> yea
<ajax> so this machine is an intel coffeelake on the motherboard that is polite enough to stay active even if you plug in an rx480
<zmike> wow
<ajax> i'm using the radeon for display but i can DRI_PRIME= and test through anv too
<zmike> handy
<ajax> and that kinda runs glmark2 okay, like it's not crashing
<ajax> but it is doing this, which makes me very suspicious:
<ajax> [texture] texture-filter=nearest: FPS: 1006 FrameTime: 0.994 ms
<ajax> Error: glXCreateNewContext failed
<ajax> Error: CanvasGeneric: Invalid EGL state
<zmike> wat
<ajax> srs
<ajax> like, nothing we did touches that path
* ajax wields valgrind
<zmike> I'm assuming it's just the mismatch of using xlib+drisw in frontend and xcb in wsi
<zmike> somehow
<kusma> zmike: Seems !15429 fixed the crashes in dEQP-EGL.functional.color_clears.single_context.gles2.rgb888_window, but Valgrind is still not quite happy: https://gitlab.freedesktop.org/-/snippets/5071
<kusma> Seems we're still reading a free'd memory...
<zmike> I still don't know where you're getting that test from
<zmike> it doesn't exist for me
<zmike> wondering if I should just put in a bo walk on context free to avoid having more rube goldberg code that impacts perf
<anholt> you have deqp-egl, right?
<kusma> I get it from 501679ad2d24cbfbd70c35ec459034a7cde41a82 (HEAD, tag: opengl-cts-4.6.2.0)
<kusma> It's listed in the egl-master.txt files
<kusma> zmike: ^
<zmike> do I need to build with some special flag or something?
<anholt> check build-deqp.sh for how it's built.
<kusma> zmike: just fell out of my build without anything special...
<ajax> zmike: context release is not a performance path, i don't think
<zmike> checking build scripts...
<zmike> ajax: right, that's why I was considering skipping any further and likely more complex "fixes" for scenarios that can only occur in the presence of context destruction
<zmike> I have deqp-egl
<zmike> hm I see now
<zmike> I've somehow broken my cts build
<zmike> there we go
<ajax> so i think i'm having a stroke
<zmike> what's your address so I can call emergency services
<Sachiel> sure thing, next you'll be asking for his SSN and mother's maiden name
<zmike> whoa whoa I'm not a US marshal or anything
<zmike> kusma: I've now run that test on main and kopper, in valgrind and asan, and I have no issues
<ajax> so i've got Xwayland and glmark both under gdb
<ajax> and i've hacked libGL to actually throw an error when CreateNewContext fails instead of silently absorbing it
<ajax> and on the Xwayland side i have a breakpoint set on the GLXIsDirect handler that ouht to be generating the GLXBadContext on GLXIsDirect that i'm seeing, in the xcb error
<ajax> but: i hit the error handler in xlib before/without hitting __glXDisp_IsDirect
<ajax> in fact, without even hitting the __glXDisp_CreateNewContext corresponding to the request immediately above
<zmike> this feels like deja vu
<zmike> I've had similar things happen before
<ajax> i think there's something subtly wrong about how libglx is switching back and forth between xlib and xcb
<zmike> are you trying to tell me that not even glx is safe from zink?
<ajax> glx isn't really safe full stop
<zmike> hahahah
<anholt> I think they hadn't invented the idea of safe back then.
<ajax> XIO: fatal IO error 62 (Timer expired) on X server ":0"
<ajax> now that's a new one
<kusma> zmike: Hmm, odd... I'm seeing this on AMD, BTW.
<ajax> fixed it
<ajax> absolutely despise the fix, but fixed it
<ajax> like: if this is the kind of thing we need to do, hoo boy do we need to do a lot more of it
<zmike> uh oh
<zmike> seems bad
<zmike> you sure this can't be fixed by just not using xlib in glx?
<zmike> I thought we'd agreed that was likely to be the root of all evil previously
<ajax> there's a load-bearing "just" in that sentence
<ajax> i mean yeah i'd love to, it's just,
<zmike> well sure
<zmike> anyway, that seems awful and I'm glad you're the one who found it and not me
<zmike> not sure I can take another one of those this week
<ajax> i need to figure out why exactly that helps because it's clearly something to do with the xlib and xcb halves of the brain not talking to each other
<ajax> and, there's a lot of that
<zmike> :/
<ajax> otoh. there's only like 60 calls to GetReq in libglx.
<ajax> maybe this is less odious than i think
<ajax> but. libGL bakes in libX11 to its abi, because everything takes a Display * not an xcb_connection_t *
<ajax> so i'm still at the risk of needing to do whatever this handshake dance is on the way in to every GLX function, since something else in the process is probably using xlib
<ajax> which, fine, that's __glXSetupForCommand anyway, but ugh this is all horrible.
<zmike> not ideal at all
<ajax> can i have a beer yet
<zmike> rb
<ajax> so one insight here is, the reason i don't hit __glXDisp_IsDirect is because that's not where the GLXBadContext is generated
<ajax> because i have glvnd enabled in my Xwayland build for whatever reason, which means vnd is trying to look up the glx provider for the screen based on the incoming context's xid
<ajax> and not finding it, because i have indeed also not seen a GLXCreateNewContext request, it's still languishing in xlib's send buffer
<ajax> hence the flush fixing things: after XFlush xlib's wirte queue is empty, so when xcb sends the GLXIsDirect it's after the GLXCreateNewContext from the xlib side
<ajax> but you would have hoped that merely calling into xcb at all would have triggered the whole release callback bit of xcb_take_socket
<ajax> so... i hate it
<ajax> i have congealed the hate into a merge request
<zmike> my brain
<kusma> Uuuh, is 3472fed4da18d99622517af5aa5c32b1f797c299 correct? What happens if the vertex buffer is reused across multiple batches without re-binding?
xroumegue has quit [Ping timeout: 480 seconds]