<anholt>
dj-death: the obvious line to me in the failing job log is 2023-04-06 14:44:50.293979: [ 650.772497] Fence expiration time out i915-0000:00:02.0:glcts[2102]:18d32!
<anholt>
dj-death: and right before that the gpu was throttled to 150 mhz
<anholt>
retry still hangs, but doesn't have the the throttling so that's probably red herring
<anholt>
your "passing" run is still clearly bad news, you've got new flakes.
<anholt>
highly recommend being in #intel-ci and #zink-ci to see the stream of those
<anholt>
I would grab a tgl, and c27.r1.caselist.txt from that first job, and see if you can run that caselist stably.
<anholt>
(also, validation layer. note that I've got a WIP to do validation on zink on tgl so you don't have to worry about that as a dev)
<anholt>
interesting that KHR-GL46.compute_shader.conditional-dispatching didn't fail in c27's first run, but did in the second. given its presence in the "passing" run, that feels very likely to be an important test to be looking at.
<anholt>
glx-make-current just got kinda rewritten in piglit and we just merged that piglit to mesa. so perhaps it's now flaky on zink? haven't seen it in #zink-ci or issue #8759
<anholt>
anyway, I'd start with sorting out the deqp fail before looking at glx-make-current
<dj-death>
yep
<dj-death>
so the thing is I grab 2 Gfx12 machines here, ran the entire GL46 CTS
<dj-death>
no hang
<dj-death>
one was TGL, the other ADL
<dj-death>
tried the failing tests on simulation too, no issue
<anholt>
did you download the caselist from mesa ci?
<dj-death>
the main difference for me is that my TGL machine only has 8 threads rather than 9 on the CI
<anholt>
and run specifically that caselist?
<dj-death>
I'm using the one in tree
<dj-death>
ah no, just the entire thing with deqp-runner
<dj-death>
same command line
<anholt>
it's really important to use the specific caselist when debugging.