<lynxeye>
tomeu: Need offsets in the dump, to make sense out of the jumps, but this looks like the FE is simply in the WAIT/LINK loop after it completed all previous work.
<lynxeye>
the GL.EVENT state load should have triggered an IRQ
<lynxeye>
tomeu: okay, so it seems the SH aka shader array is stuck and you probably don't get a PE event until SH finished it's work.
<tomeu>
that is the only difference I currently see with galcore when dumping the registers (I have hacked etnaviv so the register dumps match at power up and submission)
<tomeu>
I'm not submitting work to the SH though
<tomeu>
in the dumps, PM_MODULE_STATUS reports the SH being gated for galcore, but not for etnaviv
<tomeu>
even if I program the PM_MODULE_CONTROLS register to the same value as galcore (DISABLE_MODULE_CLOCK_GATING_RA_EZ: 1, DISABLE_MODULE_CLOCK_GATING_NN: 1, DISABLE_MODULE_CLOCK_GATING_RA_HZ: 1)
<lynxeye>
Maybe the NN or TP operations use some SH resources internally? I don't really know enough about the NPU inner workings to be sure about this.
<tomeu>
sometimes, the MC is also shown as IDLE wit galcore, but not with etnaviv
<tomeu>
but then, how can it be gated in the galcore case?
<lynxeye>
tomeu: gating and idle are just instantaneus snapshots of the current state. While an engine might be active during a command execution it might already be idle and gated again when you take your register dump.
<tomeu>
hmm, could there be something in the cmdstream additions from the etnaviv kernel driver that wake the SH up?
<lynxeye>
The MC will be idle when the FE is in the WAIT state of the WAIT/LINK loop and no other work is being done by the GPU
<tomeu>
because the command stream that I submit from userspace is the same as galcore's
<tomeu>
(which is why mesa doesn't show this issue when hacked up to use galcore)
<lynxeye>
Maybe try hacking out the SHADER_L2 cache flush? That's the only thing I could think of that would involve SH from the kernel.
<tomeu>
hmm, no luck, though I noticed that the DMA addr at the hang is now 10b0
<lynxeye>
is the pulse eater set to the same value as galcore?
<tomeu>
yeah, all registers are :/
<tomeu>
(except addresses, of course)
<tomeu>
ok, now that I understand a bit better how the cmdstream is put together, I will compare with the blob's after lunch
<tomeu>
thanks!
<tomeu>
it's the last piece that I haven't checked yet :)
tomeu1 has joined #etnaviv
<tomeu>
galcore has 0x3 in UNK28 in both the STALL and SEMAPHORE commands, but setting them in etnaviv made no difference :/
<lynxeye>
as far as I know, those bits only control some FE internal stall mechanism, so it's very unlikely that those have something to do with SH
<tomeu>
yeah, the command stream now looks the same as galcore's, and the GPU still hangs
<tomeu>
maybe you would see something unexpected there
<tomeu>
hmm, what about TS.FLUSH_CACHE, doesn't that need the SH?
<lynxeye>
TS is a separate engine and only has interactions with TX and PE, so I don't see how it would trigger SH
<lynxeye>
Hm, from your trace it seems to flush the shader L1 cache and also trigger the cache flush state twice(?)
<tomeu>
yeah, that is the case for all NN, TP and CL jobs
<tomeu>
if you mean the two GL.FLUSH_CACHE at the end
<tomeu>
that's the userspace command buffer, fwiw
<tomeu>
things I don't understand is what links to 0x031407658
<tomeu>
and why there is a link to 0x31402c40 when there is no command buf there
<tomeu>
lynxeye: btw, I did a small experiment and shortcircuited the timeout func to just signal the out fence and continue as normal, without resetting the GPU. the next job (NN) did also time out, so the GPU is probably really hung, instead of being just a sync matter
<tomeu>
otherwise, if the GPU is reset after the first (TP) job, the second one progresses without timing out
<tomeu>
note that the hang happens after the TP job runs successfully, at least I have all the expected contents in the output buffer
<tomeu>
btw, the 0x0 loaded into 0x010A4 correspond to the even 0
<tomeu>
*event 0
<tomeu>
in jobs where in a single batch are several TP or NN jobs that can run in parallel, it contains a seqno for each event
<tomeu>
(which must match the lower 8 bits of INST_ADDR)
<lynxeye>
I don't really follow how this 10a4 state works. What do you need to write there?
<tomeu>
normally nothing, only if you want to have the NN job start processing the output of a previous TP job before it has processed the whole input buffer
<tomeu>
in all other cases, it has zero
<tomeu>
have just ran the test suite, and looks like the 0x010A4 and the two cache flushes aren't needed at all
<tomeu>
so it is the TP.INST_ADDR what kicks the job, and something at the very end of it is what causes the GPU to actually hang
<lynxeye>
tomeu: did you already try to disable mlcg?
<tomeu>
hmm, I have tried setting MODULE_CONTROLS to 0xffffffff, but I will try now with ENABLE_MODULE_CLOCK_GATING
<tomeu>
no luck :/
tlwoerner has quit [Quit: Leaving]
tlwoerner has joined #etnaviv
cphealy has joined #etnaviv
<lynxeye>
tomeu: can you try putting a PE->FE semaphore stall behind the shader L1 cache flush? Does the GPU then die on the semaphore wait?
<tomeu>
you mean after the two GL.FLUSH_CACHE at the end of the userspace cmdstream?