ChanServ changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://oftc.irclog.whitequark.org/etnaviv
tlwoerner_ has quit [Remote host closed the connection]
tlwoerner_ has joined #etnaviv
chewitt has joined #etnaviv
samuelig has joined #etnaviv
chewitt has quit [Quit: Zzz..]
chewitt has joined #etnaviv
samuelig has quit [Quit: Bye!]
lynxeye has joined #etnaviv
chewitt has quit [Ping timeout: 480 seconds]
pcercuei has joined #etnaviv
cmeissl[m] has joined #etnaviv
CoLa[m] has joined #etnaviv
disastroustwitch[m] has joined #etnaviv
jwillikers[m] has joined #etnaviv
matrix638[m] has joined #etnaviv
sergi has joined #etnaviv
sythemeta847[m] has joined #etnaviv
T_UNIX has joined #etnaviv
underpantsgnome[m] has joined #etnaviv
tomeu has joined #etnaviv
wv[m] has joined #etnaviv
f_ has joined #etnaviv
karolherbst_ has joined #etnaviv
karolherbst has quit [Ping timeout: 480 seconds]
sravn has quit [Ping timeout: 480 seconds]
sravn has joined #etnaviv
<ManMower> https://www.nxp.com/docs/en/errata/IMX8_1N94W.pdf does anyone know off-hand if mesa handles ERR010916? I'll check later when I have a chance, just thought someone here might have a quick answer.
<bl4ckb0ne> lynxeye: i dumped the `etnaviv_gpu` buffer on the mmu handler and got something interesting
<bl4ckb0ne> dma address is `0x000012c`
<bl4ckb0ne> erm `0x000012c0`
<bl4ckb0ne> > [ 88.572474] cmd 000002c0: 380000c8 00000001 40000002 000012c0
<lynxeye> ManMower: Nope, Mesa doesn't handle this one. But it also doesn't handle CLAMP_TO_BORDER, so not sure if the conditions for this erratum are hit.
<bl4ckb0ne> there's a waitlink
<lynxeye> bl4ckb0ne: That's just the FE idle loop.
<bl4ckb0ne> its not supposed to be replaced?
<lynxeye> If there is no further work queued the FE will just execute this short wait/link sequence. When you queue more work, the link is replaced with a link into the user command buffer.
<bl4ckb0ne> there is more
<bl4ckb0ne> lynxeye: that line is looping around right? wait + fetch 2 at 2c0 ?
<lynxeye> you sure that this isn't just noise from previous executions? The next link command being followed by GL_EVENT with the same event ID doesn't look like it's work directly queued one after another
<lynxeye> right, it's stalling the FE for a few cycles in the wait command then executes the link back to the wait cmd
<lynxeye> yea, but note that the kernel ring buffer is never cleared, only overwritten. So once you submitted enough work to cycle though the kernel ring buffer, it's completely filled with valid looking stuff, which already has been executed.
<bl4ckb0ne> so anything beneath might be already executed
<bl4ckb0ne> that might explain the gpu hang?
<lynxeye> not really. If you end up in the idle loop all commands should have been executed, caches flushed and events triggered.
<bl4ckb0ne> hm
<lynxeye> If the FE has already reached the idle loop after coming back from user command execution and you get a MMU fault then my best guess would still be some kind of cache writeback.
<bl4ckb0ne> im running with `ETNA_MESA_DEBUG=draw_stall,zero,cflush_all,flush_all`, shouldnt that be taken care of?
<lynxeye> Maybe look in the downstream driver if the GPU cache flush routines there handle some caches that we don't yet handle in the etnaviv driver. AFAIR there are some cache types like L2 that I haven't yet seen on any GPU, but the gc7000 r6009 being one of the bigger ones it might have some of those.
<lynxeye> The userspace debug options won't help if the MMU fault happens after return to the kernel ring. The etnaviv kernel driver is responsible for flushing the write caches before sending the job done event.
<bl4ckb0ne> there's a icache proper to halti5 in the mxc driver
<lynxeye> cflush_all will flush the known caches _before the next_ draw call, but if there isn't another draw call as you already returned to the kernel ring it won't help
<lynxeye> you could try to set all the cache bits you can find in rnndb and the downstream driver in the GL_FLUSH_CACHE command in etnaviv_buffer.c and see if it makes the problem go away.
<bl4ckb0ne> in etnaviv_buffer_end ?
<bl4ckb0ne> or etnaviv_buffer_queue
<lynxeye> buffer_queue is the more relevant one. buffer_end is only used when the gpu goes into runtime suspend
<bl4ckb0ne> oh right about that one, there's a sem/stall on SYNC_RECIPIENT in the `need_flush` condition that is not followed by the `has_blt` block (the one that does the blt enabled/sem/sta/disable)
<lynxeye> There shouldn't be a need for a blt sync there, as we only need a way to stall the FE after the MMU cache invalidation there. The stall a few lines down needs to make sure that all write caches are flushed before the event is sent, which requires a blt sync as blt has its own caches.
<bl4ckb0ne> do we know the blt flush bits?
<lynxeye> afaik blt auto flushes through the enable/disable sequence
<bl4ckb0ne> would it be possible that a `fe_running` check is missing and one idleloop is considered a stall?
<bl4ckb0ne> hang*
<lynxeye> nope, the hangcheck time is canceled when the driver has seen all events for the active jobs. Nothing to do with the fe_running state
funderscore has joined #etnaviv
<bl4ckb0ne> ack
<lynxeye> but then I thought you see a MMU fault? In this case the GPU actively tells you that something went wrong, we just handle it the same as a timer detected stall afterwards, as the GPU is definitely hung after a MMU fault.
<bl4ckb0ne> yeah i see a faukt
<bl4ckb0ne> page not present on addr 0x0
f_ has quit [Quit: To contact me, PM f_[xmpp] or send an email. See https://vitali64.duckdns.org/.]
matrix638[m] has quit []
funderscore is now known as f_
sythemeta847[m] has quit []
tlwoerner_ has quit [Ping timeout: 480 seconds]
agx has quit []
<bl4ckb0ne> lynxeye: thats was the origin of my question in #dri-devel earlier, i havent been able to pinpoint in the cmdbuf why the address is 0
underpantsgnome[m] has quit []
lynxeye has quit [Ping timeout: 480 seconds]
lynxeye has joined #etnaviv
sravn has quit []
tlwoerner has joined #etnaviv
tlwoerner_ has joined #etnaviv
tlwoerner_ has quit [Read error: Connection reset by peer]
tlwoerner_ has joined #etnaviv
tlwoerner has quit [Ping timeout: 480 seconds]
sravn has joined #etnaviv
lynxeye has quit [Quit: Leaving.]
sergi has quit []
karolherbst_ is now known as karolherbst