#etnaviv on 2023-11-29 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:47 ChanServ changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://oftc.irclog.whitequark.org/etnaviv

00:52 DPA has quit [Ping timeout: 480 seconds]

00:54 DPA has joined #etnaviv

07:22 mvlad has joined #etnaviv

07:26 frieder has joined #etnaviv

07:31 JohnnyonFlame has quit [Remote host closed the connection]

08:17 pjakobsson has joined #etnaviv

08:51 lynxeye has joined #etnaviv

09:14 pcercuei has joined #etnaviv

09:17 <MoeIcenowy> lynxeye: I am now thinking about the MMU of GC620 has changed

09:17 <MoeIcenowy> but I have no evidence at all...

09:19 <lynxeye> MoeIcenowy: I have not seen any hints about a new MMU version in any of the downstream kernel drivers. Usually hints for new stuff like that appear in the downstream drivers long before there are any actual SoC implementations.

09:19 <lynxeye> So that's quite unlikely.

09:20 <lynxeye> What happens if you don't do any drawing, but just submit a NOP commandstream?

09:23 <MoeIcenowy> etnaviv_cmd_stream_test ?

09:25 <MoeIcenowy> oops this seems to be only some unit test

09:25 <MoeIcenowy> not smoke test

09:26 <lynxeye> MoeIcenowy: Yep, we don't have a nop test, butyou can simply avoid calling gen_cmd_stream in the 2d test, so you submit a empty stream.

09:27 <lynxeye> Empty stream will still cause the kernel driver to wake/initialize the GPU and trigger a IRQ after the GPU jumped back from the empty stream.

09:27 <MoeIcenowy> it timeouts too

09:29 <MoeIcenowy> maybe I should prevent runtime pm and check /sys/kernel/debug/dri/2/gpu ?

09:30 <lynxeye> yea, worth a shot.

09:31 <MoeIcenowy> well things are just recovered

09:32 <MoeIcenowy> the recovery code at least made idle 0x7fffffff again

09:32 <lynxeye> also worth checking your DT again. irq wired up correctly? all relevant clocks enabled? does the system have dma address restrictions and are those correctly described in the dt?

09:32 <MoeIcenowy> well it says 'DMA seems to be stuck'

09:33 <lynxeye> which just means the FE is stuck or not yet started.

09:34 <lynxeye> when the GPU comes out of reset or a GPU recovery we don't start the FE right away. It's only started as soon as the first user commandstream arrives.

09:36 <MoeIcenowy> maybe I should try devcoredump?

09:36 <MoeIcenowy> (never used this thing before)

09:38 <lynxeye> it won't help you much when even the nop stream isn't executing

09:42 <lynxeye> but maybe you could use it to see the FE dma address when it hangs, to see if it even started running

10:10 <MoeIcenowy> lynxeye: is there any way to raise a test IRQ?

10:13 <lynxeye> MoeIcenowy: There's no register based way as far as I know. You need to start the FE and let it execute a state load to VIVS_GL_EVENT.

10:56 <MoeIcenowy> okay so this sounds it's just submitting sth?

10:57 <MoeIcenowy> well 0x10 register stays 0, does this mean the HW isn't trying to send a interrupt yet?

11:12 <MoeIcenowy> strange, the kernel driver refers to a register at 0x800, which is mystery

11:12 <MoeIcenowy> (this register seems to be written 0x10 when the GPU is GC620 rev 0x5552

11:19 <lynxeye> might be worth checking the downstream driver for things that target the specific GPU or the system integration (e.g. some SoCs need specific AXI attribute settings)

11:52 <MoeIcenowy> btw what's the definition of sec_mode ?

11:52 <MoeIcenowy> the downstream driver here seems to say its secureMode = gcvSECURE_IN_NORMAL

11:52 <MoeIcenowy> what's the corresponding etnaviv sec_mode?

11:55 <MoeIcenowy> the downstream feature db unsets FEATURE_SECURITY but sets FEATURE_SECURITY_AHB

12:04 <lynxeye> uh, that might explain the issue. Currently etnaviv only uses the secure FE start registers when both security and security_ahb is set.

12:04 <lynxeye> which is the same as the downstream driver did a while back

12:05 <lynxeye> MoeIcenowy: gcvSECURE_IN_NORMAL is equivalent to ETNA_SEC_KERNEL

12:06 <lynxeye> so you might want to edit the check in etnaviv_gpu_init to only look at the SECURITY_AHB bit.

12:06 <MoeIcenowy> lynxeye: well I tried to do this, but 0x3a8 seems to be not perfoming here...

12:06 <MoeIcenowy> it cannot be read

12:07 <MoeIcenowy> and nothing happens when it's written (the reset seems to be not working)

12:07 shoragan has quit [Quit: quit]

12:07 <lynxeye> I really have no idea what the specific differences are between SECURITY and SECURITY_AHB.

13:02 <MoeIcenowy> strange, after I set dma mask to 32-bit instead of 40-bit

13:03 <MoeIcenowy> the timed out waiting for idle: idle=0x7ffffffe message disappeared

13:03 <MoeIcenowy> although `recover hung GPU!` still present

13:04 <MoeIcenowy> The downstream driver puts an error when MMU_PAGE_DESCRIPTOR feature is not available and the MMU command buffer is beyond 4G

13:06 <lynxeye> is the cma area on your system beyond the 32bit mark?

13:06 <MoeIcenowy> yes, it's beyond

13:06 <MoeIcenowy> the system has 16G RAM

13:06 <MoeIcenowy> and the DT explicitly make the CMA the end of RAM

13:07 <lynxeye> not sure how this interacts with the MMU pagetable allocations. They are alloced as coherent, so normally the coherent dma mask should take care of keeping them below 4G.

13:09 <MoeIcenowy> well okay the kernel version is too old that the coherent dma mask is still 40b

13:15 shoragan has joined #etnaviv

13:32 <MoeIcenowy> oh interesting, I tried to cat /sys/kernel/debug/dri/2/gpu before the hang recovery, it captures a non-idle FE, and still "DMA seems to be stuck", but address 0/1 is now 0x00001010 (state 0/1 still 0x0) and last fetch 64 bit word is 0x55758082 0x1141b7fd

13:35 <MoeIcenowy> looks like FE is now trying to work?

13:37 <lynxeye> MoeIcenowy: yep, low address means the MMU is initialized and the FE tries to execute thing. Last fetched word content doesn't look in any way like actual commands, so seems the GPU is fetching garbage.

13:37 <lynxeye> Maybe another MMU setup issue?

14:28 otavio_ has quit [Ping timeout: 480 seconds]

14:29 <MoeIcenowy> lynxeye: btw when I am trying to send a "dummy command stream" in userspace

14:29 <MoeIcenowy> what do the device get?

14:29 <MoeIcenowy> also a null sequence?

14:29 <MoeIcenowy> or there are some basical things?

14:32 <MoeIcenowy> btw how to analyze devcoredump by etnaviv?

14:33 <MoeIcenowy> okay viv-unpack

14:33 <lynxeye> MoeIcenowy: if you don't submit any commands, the FE will be started, then jumps to the empty user commandstream and jumps right back to the kernel command ring, where it should execute the send event aka. fire IRQ

14:35 <MoeIcenowy> okay there's a "kernel command ring"

14:38 <MoeIcenowy> when I viv-unpack the devcoredump, I see buffers list "2 ring 00001000 00001000 4096" "3 cmd 00002000 00000948 2376"

14:38 <MoeIcenowy> do this mean the 0x1010 address belong to the ring buffer?

14:54 <MoeIcenowy> lynxeye: oops I got where 0x55758082 0x1141b7fd is from

14:54 <MoeIcenowy> it's the word at physical memory 0x1008 0x100c

14:54 <MoeIcenowy> (part of the system firmware (opensbi))

14:55 <MoeIcenowy> so they seems to be executing from physical memory 0x1000 ....

14:57 <MoeIcenowy> the mmu gets just totally bypassed

15:04 <MoeIcenowy> well writing to 0x180 does not react, but writing to 0x380 does

15:04 <MoeIcenowy> I surely should config the secure mmu registers instead of original ones

15:10 <MoeIcenowy> strange, 0x18c reacts

15:19 <MoeIcenowy> okay new SAFE_ADDR is at 0x398 0x39c

15:38 otavio_ has joined #etnaviv

16:44 <MoeIcenowy> is there a parser for command stream?

17:10 lynxeye has quit [Quit: Leaving.]

17:22 frieder has quit [Remote host closed the connection]

17:31 otavio__ has joined #etnaviv

17:38 otavio_ has quit [Ping timeout: 480 seconds]

19:45 cphealy has quit [Ping timeout: 480 seconds]

21:07 dv_ has quit [Quit: WeeChat 3.8]

21:19 dv_ has joined #etnaviv

21:42 JohnnyonFlame has joined #etnaviv

23:57 pcercuei has quit [Quit: dodo]