ChanServ changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://oftc.irclog.whitequark.org/etnaviv
JohnnyonFlame has joined #etnaviv
dos1 has quit [Quit: Kabum!]
dos1 has joined #etnaviv
T_UNIX has joined #etnaviv
cmeissl[m] has joined #etnaviv
mvlad has joined #etnaviv
JohnnyonFlame has quit [Read error: Connection reset by peer]
pcercuei has joined #etnaviv
lynxeye has joined #etnaviv
pcercuei has quit [Quit: brb]
pcercuei has joined #etnaviv
tomeu has joined #etnaviv
<tomeu> so, the DMA adddr at timeout is 10d0, and the last instruction in ring.bin is:
<tomeu> 0x40000002, /* LINK (8) PREFETCH=0x2,OP=LINK */
<tomeu> 0x000010d0, /* ADDRESS *0x10d0 */
<tomeu> how can I figure out what instructions are at 0x10d0?
<lynxeye> tomeu: is there a cmd buffer mapped at 0x1000?
<tomeu> lynxeye: hmm, how can I check?
<lynxeye> tomeu: The dump tool prints all the buffers it found to stdout.
<lynxeye> tomeu: Or just look at the files from the unpacker, should be called cmd-00001000.bin
<tomeu> lynxeye: ah, the ring is mapped at 0x1000
<lynxeye> Oh... right. So the jump is simply to offset 0xd0 in the kernel ring.
<tomeu> ok, then if I counted right, that is the WAIT command at the end of the ring:
<tomeu> ESC[0m 0x08010e01, /* LOAD_STATE (1) Base: 0x03804 Size: 1 Fixp: 0 */... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/hMDoxYkpDnxXkoypdFhuxQFz>)
<lynxeye> tomeu: Need offsets in the dump, to make sense out of the jumps, but this looks like the FE is simply in the WAIT/LINK loop after it completed all previous work.
<lynxeye> the GL.EVENT state load should have triggered an IRQ
<tomeu> seems to be the case, yeah:
<tomeu> tomeu@arm-64:~/mesa/etna-gpu-tools/etnaviv-20231102160858$ xxd ring.bin | grep 00d0:
<tomeu> 000000d0: c800 0038 0000 0000 0200 0040 d010 0000 ...8.......@....
<lynxeye> tomeu: what's the status of the engine idle bits?
<tomeu> 00000004 = 7fffbff6 Idle: FE- DE+ PE+ SH- PA+ SE+ RA+ TX+ VG+ IM+ FP+ TS+
<tomeu> no idea why the SH isn't reported as idle
<lynxeye> tomeu: okay, so it seems the SH aka shader array is stuck and you probably don't get a PE event until SH finished it's work.
<tomeu> that is the only difference I currently see with galcore when dumping the registers (I have hacked etnaviv so the register dumps match at power up and submission)
<tomeu> I'm not submitting work to the SH though
<tomeu> in the dumps, PM_MODULE_STATUS reports the SH being gated for galcore, but not for etnaviv
<tomeu> even if I program the PM_MODULE_CONTROLS register to the same value as galcore (DISABLE_MODULE_CLOCK_GATING_RA_EZ: 1, DISABLE_MODULE_CLOCK_GATING_NN: 1, DISABLE_MODULE_CLOCK_GATING_RA_HZ: 1)
<lynxeye> Maybe the NN or TP operations use some SH resources internally? I don't really know enough about the NPU inner workings to be sure about this.
<tomeu> sometimes, the MC is also shown as IDLE wit galcore, but not with etnaviv
<tomeu> but then, how can it be gated in the galcore case?
<lynxeye> tomeu: gating and idle are just instantaneus snapshots of the current state. While an engine might be active during a command execution it might already be idle and gated again when you take your register dump.
<tomeu> hmm, could there be something in the cmdstream additions from the etnaviv kernel driver that wake the SH up?
<lynxeye> The MC will be idle when the FE is in the WAIT state of the WAIT/LINK loop and no other work is being done by the GPU
<tomeu> because the command stream that I submit from userspace is the same as galcore's
<tomeu> (which is why mesa doesn't show this issue when hacked up to use galcore)
<lynxeye> Maybe try hacking out the SHADER_L2 cache flush? That's the only thing I could think of that would involve SH from the kernel.
<tomeu> hmm, no luck, though I noticed that the DMA addr at the hang is now 10b0
<lynxeye> is the pulse eater set to the same value as galcore?
<tomeu> yeah, all registers are :/
<tomeu> (except addresses, of course)
<tomeu> ok, now that I understand a bit better how the cmdstream is put together, I will compare with the blob's after lunch
<tomeu> thanks!
<tomeu> it's the last piece that I haven't checked yet :)
tomeu1 has joined #etnaviv
<tomeu> galcore has 0x3 in UNK28 in both the STALL and SEMAPHORE commands, but setting them in etnaviv made no difference :/
<lynxeye> as far as I know, those bits only control some FE internal stall mechanism, so it's very unlikely that those have something to do with SH
<tomeu> yeah, the command stream now looks the same as galcore's, and the GPU still hangs
<tomeu> maybe you would see something unexpected there
<tomeu> hmm, what about TS.FLUSH_CACHE, doesn't that need the SH?
<lynxeye> TS is a separate engine and only has interactions with TX and PE, so I don't see how it would trigger SH
<lynxeye> Hm, from your trace it seems to flush the shader L1 cache and also trigger the cache flush state twice(?)
<tomeu> yeah, that is the case for all NN, TP and CL jobs
<tomeu> if you mean the two GL.FLUSH_CACHE at the end
<tomeu> that's the userspace command buffer, fwiw
<tomeu> things I don't understand is what links to 0x031407658
<tomeu> and why there is a link to 0x31402c40 when there is no command buf there
<tomeu> lynxeye: btw, I did a small experiment and shortcircuited the timeout func to just signal the out fence and continue as normal, without resetting the GPU. the next job (NN) did also time out, so the GPU is probably really hung, instead of being just a sync matter
<tomeu> otherwise, if the GPU is reset after the first (TP) job, the second one progresses without timing out
<tomeu> note that the hang happens after the TP job runs successfully, at least I have all the expected contents in the output buffer
<tomeu> btw, the 0x0 loaded into 0x010A4 correspond to the even 0
<tomeu> *event 0
<tomeu> in jobs where in a single batch are several TP or NN jobs that can run in parallel, it contains a seqno for each event
<tomeu> (which must match the lower 8 bits of INST_ADDR)
<lynxeye> I don't really follow how this 10a4 state works. What do you need to write there?
<tomeu> normally nothing, only if you want to have the NN job start processing the output of a previous TP job before it has processed the whole input buffer
<tomeu> in all other cases, it has zero
<tomeu> have just ran the test suite, and looks like the 0x010A4 and the two cache flushes aren't needed at all
<tomeu> so it is the TP.INST_ADDR what kicks the job, and something at the very end of it is what causes the GPU to actually hang
<lynxeye> tomeu: did you already try to disable mlcg?
<tomeu> hmm, I have tried setting MODULE_CONTROLS to 0xffffffff, but I will try now with ENABLE_MODULE_CLOCK_GATING
<tomeu> no luck :/
tlwoerner has quit [Quit: Leaving]
tlwoerner has joined #etnaviv
cphealy has joined #etnaviv
<lynxeye> tomeu: can you try putting a PE->FE semaphore stall behind the shader L1 cache flush? Does the GPU then die on the semaphore wait?
<tomeu> you mean after the two GL.FLUSH_CACHE at the end of the userspace cmdstream?
<lynxeye> yep
<tomeu> lynxeye: at the timeout, the DMA address points to line 49 at https://paste.debian.net/1297073/
<tomeu> lynxeye: is it enough with adding a STALL? or should I also add a GL.SEMAPHORE_TOKEN ?
<lynxeye> tomeu: see etna_stall() in mesa. You need to arm the semaphore by emitting SEMAPHORE_TOKEN before the FE STALL command.
<tomeu> ok, cool, I'm adding this:
<tomeu> etna_emit_load_state(stream, VIVS_GL_SEMAPHORE_TOKEN >> 2, 1, 0);... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/AQuYDERAaSeUgJJuPpZYQNIn>)
<tomeu> ok, it stopped at the same plce
<tomeu> place
<lynxeye> Seem you mixed up the direction. Needs to be from=FE to=PE
CoLa[m] has joined #etnaviv
<tomeu> lynxeye: same address :/
<lynxeye> that's strange. So the GPU thinks that everything in the user cmd way okay and processed, but then dies after the kernel ring cache flushes?
cphealy has quit [Quit: Leaving]
<tomeu> well, I do get the correct contents in the output buffer
<tomeu> so the TP job itself seems to be fine
<tomeu> but something afterwards causes the GPU to hang, or at least prevents the IRQ to be fired
<lynxeye> tried to remove the kernel ring cache flushes?
<tomeu> not those ones, no
<lynxeye> I mean a CL/NN only GPU shouldn't even have color, depth or ts caches, so the flushes are pretty pointless anyway.
<tomeu> lynxeye: no diff, I'm afraid
<tomeu> guess I will have to improve my tooling to make sure I compare everything that is going around in the NPU
lynxeye has quit [Quit: Leaving.]
<dv_> on imx6 mainline BSPs, is there a way to access the GC320 in userspace?
<dv_> can I somehow access etnaviv's 2D API through mesa/gallium?
<cmeissl[m]> The only way I am aware of accessing the 2d api in mainline is by using libdrm_etnaviv directly.
<dv_> that would be ok for me
<cmeissl[m]> But that only gets access to the command buffer. You have to setup all commands yourself afaik
<cmeissl[m]> And some more, a little outdated examples in the old etnaviv repo here https://github.com/etnaviv/etna_viv/tree/master/attic/test2d
sergi1 has joined #etnaviv
cphealy has joined #etnaviv
<dv_> maybe it would be worthwhile to create a stable userspace API frontend for this
<dv_> the GC320 core is really useful
mvlad has quit [Remote host closed the connection]
<cphealy> dv_: can you elaborate on the "really useful" you have experienced?
<cphealy> blending? CSC? Rotation? Scaling?
<dv_> cphealy: frame compositing for example
<dv_> take input frames and place them somewhere on the output frames
<dv_> but yes, CSC is another big one