#etnaviv on 2023-11-03 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:47 ChanServ changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://oftc.irclog.whitequark.org/etnaviv

00:33 JohnnyonFlame has joined #etnaviv

02:32 dos1 has quit [Quit: Kabum!]

02:32 dos1 has joined #etnaviv

02:36 T_UNIX has joined #etnaviv

04:19 cmeissl[m] has joined #etnaviv

07:50 mvlad has joined #etnaviv

08:33 JohnnyonFlame has quit [Read error: Connection reset by peer]

08:56 pcercuei has joined #etnaviv

08:58 lynxeye has joined #etnaviv

10:07 pcercuei has quit [Quit: brb]

10:10 pcercuei has joined #etnaviv

10:18 tomeu has joined #etnaviv

10:20 <tomeu> so, the DMA adddr at timeout is 10d0, and the last instruction in ring.bin is:

10:20 <tomeu> 0x40000002, /* LINK (8) PREFETCH=0x2,OP=LINK */

10:20 <tomeu> 0x000010d0, /* ADDRESS *0x10d0 */

10:21 <tomeu> how can I figure out what instructions are at 0x10d0?

10:29 <lynxeye> tomeu: is there a cmd buffer mapped at 0x1000?

10:34 <tomeu> lynxeye: hmm, how can I check?

10:35 <lynxeye> tomeu: The dump tool prints all the buffers it found to stdout.

10:35 <lynxeye> tomeu: Or just look at the files from the unpacker, should be called cmd-00001000.bin

10:36 <tomeu> lynxeye: ah, the ring is mapped at 0x1000

10:37 <lynxeye> Oh... right. So the jump is simply to offset 0xd0 in the kernel ring.

10:39 <tomeu> ok, then if I counted right, that is the WAIT command at the end of the ring:

10:39 <tomeu> ESC[0m 0x08010e01, /* LOAD_STATE (1) Base: 0x03804 Size: 1 Fixp: 0 */... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/hMDoxYkpDnxXkoypdFhuxQFz>)

10:42 <lynxeye> tomeu: Need offsets in the dump, to make sense out of the jumps, but this looks like the FE is simply in the WAIT/LINK loop after it completed all previous work.

10:43 <lynxeye> the GL.EVENT state load should have triggered an IRQ

10:43 <tomeu> seems to be the case, yeah:

10:43 <tomeu> tomeu@arm-64:~/mesa/etna-gpu-tools/etnaviv-20231102160858$ xxd ring.bin | grep 00d0:

10:43 <tomeu> 000000d0: c800 0038 0000 0000 0200 0040 d010 0000 ...8.......@....

10:43 <lynxeye> tomeu: what's the status of the engine idle bits?

10:44 <tomeu> 00000004 = 7fffbff6 Idle: FE- DE+ PE+ SH- PA+ SE+ RA+ TX+ VG+ IM+ FP+ TS+

10:45 <tomeu> no idea why the SH isn't reported as idle

10:45 <lynxeye> tomeu: okay, so it seems the SH aka shader array is stuck and you probably don't get a PE event until SH finished it's work.

10:45 <tomeu> that is the only difference I currently see with galcore when dumping the registers (I have hacked etnaviv so the register dumps match at power up and submission)

10:45 <tomeu> I'm not submitting work to the SH though

10:46 <tomeu> in the dumps, PM_MODULE_STATUS reports the SH being gated for galcore, but not for etnaviv

10:46 <tomeu> even if I program the PM_MODULE_CONTROLS register to the same value as galcore (DISABLE_MODULE_CLOCK_GATING_RA_EZ: 1, DISABLE_MODULE_CLOCK_GATING_NN: 1, DISABLE_MODULE_CLOCK_GATING_RA_HZ: 1)

10:49 <lynxeye> Maybe the NN or TP operations use some SH resources internally? I don't really know enough about the NPU inner workings to be sure about this.

10:49 <tomeu> sometimes, the MC is also shown as IDLE wit galcore, but not with etnaviv

10:50 <tomeu> but then, how can it be gated in the galcore case?

10:54 <lynxeye> tomeu: gating and idle are just instantaneus snapshots of the current state. While an engine might be active during a command execution it might already be idle and gated again when you take your register dump.

10:55 <tomeu> hmm, could there be something in the cmdstream additions from the etnaviv kernel driver that wake the SH up?

10:55 <lynxeye> The MC will be idle when the FE is in the WAIT state of the WAIT/LINK loop and no other work is being done by the GPU

10:55 <tomeu> because the command stream that I submit from userspace is the same as galcore's

10:55 <tomeu> (which is why mesa doesn't show this issue when hacked up to use galcore)

10:58 <lynxeye> Maybe try hacking out the SHADER_L2 cache flush? That's the only thing I could think of that would involve SH from the kernel.

11:00 <tomeu> hmm, no luck, though I noticed that the DMA addr at the hang is now 10b0

11:02 <lynxeye> is the pulse eater set to the same value as galcore?

11:07 <tomeu> yeah, all registers are :/

11:08 <tomeu> (except addresses, of course)

11:13 <tomeu> ok, now that I understand a bit better how the cmdstream is put together, I will compare with the blob's after lunch

11:13 <tomeu> thanks!

11:13 <tomeu> it's the last piece that I haven't checked yet :)

11:38 tomeu1 has joined #etnaviv

12:40 <tomeu> galcore has 0x3 in UNK28 in both the STALL and SEMAPHORE commands, but setting them in etnaviv made no difference :/

12:54 <lynxeye> as far as I know, those bits only control some FE internal stall mechanism, so it's very unlikely that those have something to do with SH

12:56 <tomeu> yeah, the command stream now looks the same as galcore's, and the GPU still hangs

12:56 <tomeu> https://paste.debian.net/1297050/

12:57 <tomeu> maybe you would see something unexpected there

13:09 <tomeu> hmm, what about TS.FLUSH_CACHE, doesn't that need the SH?

13:14 <lynxeye> TS is a separate engine and only has interactions with TX and PE, so I don't see how it would trigger SH

13:19 <lynxeye> Hm, from your trace it seems to flush the shader L1 cache and also trigger the cache flush state twice(?)

13:36 <tomeu> yeah, that is the case for all NN, TP and CL jobs

13:36 <tomeu> if you mean the two GL.FLUSH_CACHE at the end

13:37 <tomeu> that's the userspace command buffer, fwiw

13:38 <tomeu> things I don't understand is what links to 0x031407658

13:38 <tomeu> and why there is a link to 0x31402c40 when there is no command buf there

13:50 <tomeu> lynxeye: btw, I did a small experiment and shortcircuited the timeout func to just signal the out fence and continue as normal, without resetting the GPU. the next job (NN) did also time out, so the GPU is probably really hung, instead of being just a sync matter

13:52 <tomeu> otherwise, if the GPU is reset after the first (TP) job, the second one progresses without timing out

13:52 <tomeu> note that the hang happens after the TP job runs successfully, at least I have all the expected contents in the output buffer

14:02 <tomeu> btw, the 0x0 loaded into 0x010A4 correspond to the even 0

14:02 <tomeu> *event 0

14:03 <tomeu> in jobs where in a single batch are several TP or NN jobs that can run in parallel, it contains a seqno for each event

14:05 <tomeu> (which must match the lower 8 bits of INST_ADDR)

14:23 <lynxeye> I don't really follow how this 10a4 state works. What do you need to write there?

14:28 <tomeu> normally nothing, only if you want to have the NN job start processing the output of a previous TP job before it has processed the whole input buffer

14:28 <tomeu> in all other cases, it has zero

14:35 <tomeu> have just ran the test suite, and looks like the 0x010A4 and the two cache flushes aren't needed at all

14:37 <tomeu> so it is the TP.INST_ADDR what kicks the job, and something at the very end of it is what causes the GPU to actually hang

14:51 <lynxeye> tomeu: did you already try to disable mlcg?

14:57 <tomeu> hmm, I have tried setting MODULE_CONTROLS to 0xffffffff, but I will try now with ENABLE_MODULE_CLOCK_GATING

15:01 <tomeu> no luck :/

15:24 tlwoerner has quit [Quit: Leaving]

15:25 tlwoerner has joined #etnaviv

15:50 cphealy has joined #etnaviv

15:50 <lynxeye> tomeu: can you try putting a PE->FE semaphore stall behind the shader L1 cache flush? Does the GPU then die on the semaphore wait?

15:51 <tomeu> you mean after the two GL.FLUSH_CACHE at the end of the userspace cmdstream?

15:52 <lynxeye> yep

16:05 <tomeu> lynxeye: at the timeout, the DMA address points to line 49 at https://paste.debian.net/1297073/

16:08 <tomeu> lynxeye: is it enough with adding a STALL? or should I also add a GL.SEMAPHORE_TOKEN ?

16:10 <lynxeye> tomeu: see etna_stall() in mesa. You need to arm the semaphore by emitting SEMAPHORE_TOKEN before the FE STALL command.

16:12 <tomeu> ok, cool, I'm adding this:

16:12 <tomeu> etna_emit_load_state(stream, VIVS_GL_SEMAPHORE_TOKEN >> 2, 1, 0);... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/AQuYDERAaSeUgJJuPpZYQNIn>)

16:13 <tomeu> ok, it stopped at the same plce

16:13 <tomeu> place

16:15 <lynxeye> Seem you mixed up the direction. Needs to be from=FE to=PE

16:15 CoLa[m] has joined #etnaviv

16:18 <tomeu> lynxeye: same address :/

16:26 <lynxeye> that's strange. So the GPU thinks that everything in the user cmd way okay and processed, but then dies after the kernel ring cache flushes?

16:26 cphealy has quit [Quit: Leaving]

16:35 <tomeu> well, I do get the correct contents in the output buffer

16:36 <tomeu> so the TP job itself seems to be fine

16:36 <tomeu> but something afterwards causes the GPU to hang, or at least prevents the IRQ to be fired

16:42 <lynxeye> tried to remove the kernel ring cache flushes?

16:43 <tomeu> not those ones, no

16:47 <lynxeye> I mean a CL/NN only GPU shouldn't even have color, depth or ts caches, so the flushes are pretty pointless anyway.

16:53 <tomeu> lynxeye: no diff, I'm afraid

16:54 <tomeu> guess I will have to improve my tooling to make sure I compare everything that is going around in the NPU

17:51 lynxeye has quit [Quit: Leaving.]

19:42 <dv_> on imx6 mainline BSPs, is there a way to access the GC320 in userspace?

19:42 <dv_> can I somehow access etnaviv's 2D API through mesa/gallium?

19:45 <cmeissl[m]> The only way I am aware of accessing the 2d api in mainline is by using libdrm_etnaviv directly.

19:51 <dv_> that would be ok for me

19:55 <cmeissl[m]> But that only gets access to the command buffer. You have to setup all commands yourself afaik

19:57 <cmeissl[m]> There is an example in the drm repo https://gitlab.freedesktop.org/mesa/drm/-/blob/main/tests/etnaviv/etnaviv_2d_test.c?ref_type=heads

19:59 <cmeissl[m]> And some more, a little outdated examples in the old etnaviv repo here https://github.com/etnaviv/etna_viv/tree/master/attic/test2d

20:20 sergi1 has joined #etnaviv

20:26 cphealy has joined #etnaviv

20:32 <dv_> maybe it would be worthwhile to create a stable userspace API frontend for this

20:32 <dv_> the GC320 core is really useful

21:46 mvlad has quit [Remote host closed the connection]

21:52 <cphealy> dv_: can you elaborate on the "really useful" you have experienced?

21:52 <cphealy> blending? CSC? Rotation? Scaling?

22:16 <dv_> cphealy: frame compositing for example

22:16 <dv_> take input frames and place them somewhere on the output frames

22:25 <dv_> but yes, CSC is another big one