ChanServ changed the topic of #zink to: official development channel for the mesa3d zink driver || https://docs.mesa3d.org/drivers/zink.html
cheako has joined #zink
<zmike> ajax: any luck?
<ajax> not yet
<ajax> but was in a call all morning so *cracks knuckles*
<ajax> zmike: GALLIUM_THREAD=0 fixes the crash
<ajax> wonder if i can get valgrind to help me here
<ajax> ==807942== Thread 2 glxgears:zfq0:
<ajax> ==807942== Invalid read of size 4
<ajax> ==807942== at 0x617008B: simple_mtx_lock (simple_mtx.h:98)
<ajax> ==807942== by 0x5524E46: util_queue_thread_func (u_queue.c:313)
<ajax> ==807942== by 0x617203B: kopper_present (zink_kopper.c:497)
<ajax> ==807942== by 0x5523999: impl_thrd_routine (threads_posix.h:87)
<ajax> ==807942== by 0x4BA4A86: start_thread (pthread_create.c:435)
<ajax> ==807942== by 0x4C288D3: clone (clone.S:100)
<ajax> ==807942== Address 0x1d18 is not stack'd, malloc'd or (recently) free'd
<ajax> i think i smell smoke
<zmike> ajax: 🤔
<zmike> I should be seeing this myself then
<ajax> Thread 2 "glxgears:zfq0" hit Breakpoint 1, kopper_present (data=0x555555dff290, gdata=0x0, thread_idx=0) at ../src/gallium/drivers/zink/zink_kopper.c:491
<zmike> zmike/copper works fine
<ajax> null gdata can't be right
<zmike> I'm building your branch now
<zmike> yep it's specific to your branch
<zmike> let's see...
<ajax> well i did rebase to main
<zmike> I've been running zink-wip with kopper off main for months, so I think I'd have seen it if it was just from a rebase
<zmike> unless...
<ajax> src/gallium/drivers/zink/zink_screen.c: util_queue_init(&screen->flush_queue, "zfq", 8, 1, UTIL_QUEUE_INIT_RESIZE_IF_FULL, NULL);
<ajax> no wonder the screen argument is null, we're not even trying
<zmike> hm yea that looks like the only difference
<zmike> I guess a victim of taking out renderdoc
* zmike was planning to do this later on once the branch was ready to be landed
<zmike> also looks like you added back some zink_screen struct members I deleted
<zmike> bool needs_mesa_wsi;
<zmike> bool needs_mesa_flush_wsi;
<ajax> guh, thought i got those right
<zmike> otherwise everything looks like I'd expect
<ajax> yeah must have been fallout from renderdoc, the initial commit just moved the uqi so the later commit to fix that hunk would have maybe gotten confused
<ajax> ah well
<ajax> have a meeting now, will fix up shortly
<ajax> maybe gotten confused meaning i f'd up the resolution obviously
<zmike> meh it happens
<ajax> :|
<ajax> on main (or, where director's cut diverges from main anyway) i reliably crash in glmark2
<zmike> hm that looks like what kusma was describing the other day
<kusma> Yeah, that's exactly what I'm observing.
<zmike> hm I fixed a case related that last year
<kusma> I'm triggering it with the dEQP-EGL.functional.color_clears.single_context.gles2.rgb888_window test-case, BTW
<ajax> n=1 here but kopper scored 6132 on that test (and actually managed to complete all of them, unlike main)
<kusma> Probably a test that only runs when not using the surfaceless stuff that IIRC we use on CI
<zmike> is there a separate list which runs that test?
<zmike> I haven't seen it fail here
<zmike> and I don't use surfaceless on one of my machines
<kusma> I triggered it from a full test-run here
<ajax> valgrind maybe sheds some light here: https://paste.centos.org/view/raw/417bd347
<kusma> But you can run it separately by doing "deqp/external/openglcts/modules/glcts -n dEQP-EGL.functional.color_clears.single_context.gles2.rgb888_window", for instance
<kusma> ajax: Yeah, looks like what I found also...
<kusma> batch usage is uninitialized, seems we're trying to reclaim garbage
<kusma> That is, free'd memory
<ajax> zink_batch_state_destroy not waiting for all outstanding batches?
<zmike> looks more like a context was destroyed and somehow its usage remained on the resource
<zmike> which shouldn't be possible :thinking:
<kusma> Does the zink_bo maybe need to get the zink_batch_usage-pointers nulled out somehow?
<kusma> I mean,the zink_bo objects live longer in this case, and reference the zink_batch_usage-structs
<kusma> But I can't say I understand all this cache-magic here
<kusma> Anyway, I need to hit the sack
<zmike> that happens during batch reset
<zmike> I guess I'll look tomorrow now that I have repro cases if it's not solved by the time I get up
* ajax walking back in history to find what broke glmark2
<ajax> the answer is it's been broken since before meson devenv support was added, and i can't be bothered to rewrite the wrapper script to still do the old thing
<ajax> ngh
<zmike> yeah I'd guess it's been broken in one form or another for a very long time
<zmike> but the how is still a mystery here