#etnaviv on 2024-05-17 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:47 ChanServ changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://oftc.irclog.whitequark.org/etnaviv

06:07 tlwoerner_ has quit [Remote host closed the connection]

06:07 tlwoerner_ has joined #etnaviv

07:24 chewitt has joined #etnaviv

07:52 samuelig has joined #etnaviv

07:53 chewitt has quit [Quit: Zzz..]

07:54 chewitt has joined #etnaviv

07:57 samuelig has quit [Quit: Bye!]

07:59 lynxeye has joined #etnaviv

08:33 chewitt has quit [Ping timeout: 480 seconds]

09:47 pcercuei has joined #etnaviv

11:32 cmeissl[m] has joined #etnaviv

11:32 CoLa[m] has joined #etnaviv

11:32 disastroustwitch[m] has joined #etnaviv

11:32 jwillikers[m] has joined #etnaviv

11:32 matrix638[m] has joined #etnaviv

11:32 sergi has joined #etnaviv

11:33 sythemeta847[m] has joined #etnaviv

11:33 T_UNIX has joined #etnaviv

11:33 underpantsgnome[m] has joined #etnaviv

11:33 tomeu has joined #etnaviv

11:33 wv[m] has joined #etnaviv

12:39 f_ has joined #etnaviv

13:46 karolherbst_ has joined #etnaviv

13:53 karolherbst has quit [Ping timeout: 480 seconds]

14:50 sravn has quit [Ping timeout: 480 seconds]

14:54 sravn has joined #etnaviv

14:55 <ManMower> https://www.nxp.com/docs/en/errata/IMX8_1N94W.pdf does anyone know off-hand if mesa handles ERR010916? I'll check later when I have a chance, just thought someone here might have a quick answer.

15:02 <bl4ckb0ne> lynxeye: i dumped the `etnaviv_gpu` buffer on the mmu handler and got something interesting

15:02 <bl4ckb0ne> dma address is `0x000012c`

15:02 <bl4ckb0ne> erm `0x000012c0`

15:03 <bl4ckb0ne> > [ 88.572474] cmd 000002c0: 380000c8 00000001 40000002 000012c0

15:03 <lynxeye> ManMower: Nope, Mesa doesn't handle this one. But it also doesn't handle CLAMP_TO_BORDER, so not sure if the conditions for this erratum are hit.

15:03 <bl4ckb0ne> there's a waitlink

15:03 <lynxeye> bl4ckb0ne: That's just the FE idle loop.

15:04 <bl4ckb0ne> its not supposed to be replaced?

15:04 <lynxeye> If there is no further work queued the FE will just execute this short wait/link sequence. When you queue more work, the link is replaced with a link into the user command buffer.

15:05 <bl4ckb0ne> there is more

15:05 <bl4ckb0ne> https://paste.sr.ht/~bl4ckb0ne/78980a453a4e633d978a4970c9dd50ef0ff6c561

15:10 <bl4ckb0ne> lynxeye: that line is looping around right? wait + fetch 2 at 2c0 ?

15:11 <lynxeye> you sure that this isn't just noise from previous executions? The next link command being followed by GL_EVENT with the same event ID doesn't look like it's work directly queued one after another

15:12 <lynxeye> right, it's stalling the FE for a few cycles in the wait command then executes the link back to the wait cmd

15:13 <bl4ckb0ne> https://paste.sr.ht/~bl4ckb0ne/5a708bf2741f4039a0a682ee14c81c968d626ad8 thats what's producing the log

15:15 <lynxeye> yea, but note that the kernel ring buffer is never cleared, only overwritten. So once you submitted enough work to cycle though the kernel ring buffer, it's completely filled with valid looking stuff, which already has been executed.

15:17 <bl4ckb0ne> so anything beneath might be already executed

15:17 <bl4ckb0ne> that might explain the gpu hang?

15:20 <lynxeye> not really. If you end up in the idle loop all commands should have been executed, caches flushed and events triggered.

15:22 <bl4ckb0ne> hm

15:22 <lynxeye> If the FE has already reached the idle loop after coming back from user command execution and you get a MMU fault then my best guess would still be some kind of cache writeback.

15:24 <bl4ckb0ne> im running with `ETNA_MESA_DEBUG=draw_stall,zero,cflush_all,flush_all`, shouldnt that be taken care of?

15:28 <lynxeye> Maybe look in the downstream driver if the GPU cache flush routines there handle some caches that we don't yet handle in the etnaviv driver. AFAIR there are some cache types like L2 that I haven't yet seen on any GPU, but the gc7000 r6009 being one of the bigger ones it might have some of those.

15:29 <lynxeye> The userspace debug options won't help if the MMU fault happens after return to the kernel ring. The etnaviv kernel driver is responsible for flushing the write caches before sending the job done event.

15:31 <bl4ckb0ne> there's a icache proper to halti5 in the mxc driver

15:32 <lynxeye> cflush_all will flush the known caches _before the next_ draw call, but if there isn't another draw call as you already returned to the kernel ring it won't help

15:33 <lynxeye> you could try to set all the cache bits you can find in rnndb and the downstream driver in the GL_FLUSH_CACHE command in etnaviv_buffer.c and see if it makes the problem go away.

15:34 <bl4ckb0ne> in etnaviv_buffer_end ?

15:34 <bl4ckb0ne> or etnaviv_buffer_queue

15:36 <lynxeye> buffer_queue is the more relevant one. buffer_end is only used when the gpu goes into runtime suspend

15:43 <bl4ckb0ne> oh right about that one, there's a sem/stall on SYNC_RECIPIENT in the `need_flush` condition that is not followed by the `has_blt` block (the one that does the blt enabled/sem/sta/disable)

15:45 <bl4ckb0ne> https://elixir.bootlin.com/linux/v6.1.70/source/drivers/gpu/drm/etnaviv/etnaviv_buffer.c#L431 here

15:48 <lynxeye> There shouldn't be a need for a blt sync there, as we only need a way to stall the FE after the MMU cache invalidation there. The stall a few lines down needs to make sure that all write caches are flushed before the event is sent, which requires a blt sync as blt has its own caches.

15:49 <bl4ckb0ne> do we know the blt flush bits?

15:50 <lynxeye> afaik blt auto flushes through the enable/disable sequence

15:58 <bl4ckb0ne> would it be possible that a `fe_running` check is missing and one idleloop is considered a stall?

15:58 <bl4ckb0ne> hang*

15:59 <lynxeye> nope, the hangcheck time is canceled when the driver has seen all events for the active jobs. Nothing to do with the fe_running state

16:00 funderscore has joined #etnaviv

16:00 <bl4ckb0ne> ack

16:02 <lynxeye> but then I thought you see a MMU fault? In this case the GPU actively tells you that something went wrong, we just handle it the same as a timer detected stall afterwards, as the GPU is definitely hung after a MMU fault.

16:09 <bl4ckb0ne> yeah i see a faukt

16:09 <bl4ckb0ne> page not present on addr 0x0

16:16 f_ has quit [Quit: To contact me, PM f_[xmpp] or send an email. See https://vitali64.duckdns.org/.]

16:22 matrix638[m] has quit []

16:24 funderscore is now known as f_

16:33 sythemeta847[m] has quit []

16:38 tlwoerner_ has quit [Ping timeout: 480 seconds]

16:38 agx has quit []

16:52 <bl4ckb0ne> lynxeye: thats was the origin of my question in #dri-devel earlier, i havent been able to pinpoint in the cmdbuf why the address is 0

16:53 underpantsgnome[m] has quit []

16:54 lynxeye has quit [Ping timeout: 480 seconds]

17:08 lynxeye has joined #etnaviv

17:10 sravn has quit []

17:11 tlwoerner has joined #etnaviv

17:19 tlwoerner_ has joined #etnaviv

17:19 tlwoerner_ has quit [Read error: Connection reset by peer]

17:20 tlwoerner_ has joined #etnaviv

17:20 tlwoerner has quit [Ping timeout: 480 seconds]

17:44 sravn has joined #etnaviv

18:13 lynxeye has quit [Quit: Leaving.]

18:45 sergi has quit []

22:33 karolherbst_ is now known as karolherbst