JohnnyonFlame has quit [Remote host closed the connection]
pjakobsson has joined #etnaviv
lynxeye has joined #etnaviv
pcercuei has joined #etnaviv
<MoeIcenowy>
lynxeye: I am now thinking about the MMU of GC620 has changed
<MoeIcenowy>
but I have no evidence at all...
<lynxeye>
MoeIcenowy: I have not seen any hints about a new MMU version in any of the downstream kernel drivers. Usually hints for new stuff like that appear in the downstream drivers long before there are any actual SoC implementations.
<lynxeye>
So that's quite unlikely.
<lynxeye>
What happens if you don't do any drawing, but just submit a NOP commandstream?
<MoeIcenowy>
etnaviv_cmd_stream_test ?
<MoeIcenowy>
oops this seems to be only some unit test
<MoeIcenowy>
not smoke test
<lynxeye>
MoeIcenowy: Yep, we don't have a nop test, butyou can simply avoid calling gen_cmd_stream in the 2d test, so you submit a empty stream.
<lynxeye>
Empty stream will still cause the kernel driver to wake/initialize the GPU and trigger a IRQ after the GPU jumped back from the empty stream.
<MoeIcenowy>
it timeouts too
<MoeIcenowy>
maybe I should prevent runtime pm and check /sys/kernel/debug/dri/2/gpu ?
<lynxeye>
yea, worth a shot.
<MoeIcenowy>
well things are just recovered
<MoeIcenowy>
the recovery code at least made idle 0x7fffffff again
<lynxeye>
also worth checking your DT again. irq wired up correctly? all relevant clocks enabled? does the system have dma address restrictions and are those correctly described in the dt?
<MoeIcenowy>
well it says 'DMA seems to be stuck'
<lynxeye>
which just means the FE is stuck or not yet started.
<lynxeye>
when the GPU comes out of reset or a GPU recovery we don't start the FE right away. It's only started as soon as the first user commandstream arrives.
<MoeIcenowy>
maybe I should try devcoredump?
<MoeIcenowy>
(never used this thing before)
<lynxeye>
it won't help you much when even the nop stream isn't executing
<lynxeye>
but maybe you could use it to see the FE dma address when it hangs, to see if it even started running
<MoeIcenowy>
lynxeye: is there any way to raise a test IRQ?
<lynxeye>
MoeIcenowy: There's no register based way as far as I know. You need to start the FE and let it execute a state load to VIVS_GL_EVENT.
<MoeIcenowy>
okay so this sounds it's just submitting sth?
<MoeIcenowy>
well 0x10 register stays 0, does this mean the HW isn't trying to send a interrupt yet?
<MoeIcenowy>
strange, the kernel driver refers to a register at 0x800, which is mystery
<MoeIcenowy>
(this register seems to be written 0x10 when the GPU is GC620 rev 0x5552
<lynxeye>
might be worth checking the downstream driver for things that target the specific GPU or the system integration (e.g. some SoCs need specific AXI attribute settings)
<MoeIcenowy>
btw what's the definition of sec_mode ?
<MoeIcenowy>
the downstream driver here seems to say its secureMode = gcvSECURE_IN_NORMAL
<MoeIcenowy>
what's the corresponding etnaviv sec_mode?
<MoeIcenowy>
the downstream feature db unsets FEATURE_SECURITY but sets FEATURE_SECURITY_AHB
<lynxeye>
uh, that might explain the issue. Currently etnaviv only uses the secure FE start registers when both security and security_ahb is set.
<lynxeye>
which is the same as the downstream driver did a while back
<lynxeye>
MoeIcenowy: gcvSECURE_IN_NORMAL is equivalent to ETNA_SEC_KERNEL
<lynxeye>
so you might want to edit the check in etnaviv_gpu_init to only look at the SECURITY_AHB bit.
<MoeIcenowy>
lynxeye: well I tried to do this, but 0x3a8 seems to be not perfoming here...
<MoeIcenowy>
it cannot be read
<MoeIcenowy>
and nothing happens when it's written (the reset seems to be not working)
shoragan has quit [Quit: quit]
<lynxeye>
I really have no idea what the specific differences are between SECURITY and SECURITY_AHB.
<MoeIcenowy>
strange, after I set dma mask to 32-bit instead of 40-bit
<MoeIcenowy>
the timed out waiting for idle: idle=0x7ffffffe message disappeared
<MoeIcenowy>
although `recover hung GPU!` still present
<MoeIcenowy>
The downstream driver puts an error when MMU_PAGE_DESCRIPTOR feature is not available and the MMU command buffer is beyond 4G
<lynxeye>
is the cma area on your system beyond the 32bit mark?
<MoeIcenowy>
yes, it's beyond
<MoeIcenowy>
the system has 16G RAM
<MoeIcenowy>
and the DT explicitly make the CMA the end of RAM
<lynxeye>
not sure how this interacts with the MMU pagetable allocations. They are alloced as coherent, so normally the coherent dma mask should take care of keeping them below 4G.
<MoeIcenowy>
well okay the kernel version is too old that the coherent dma mask is still 40b
shoragan has joined #etnaviv
<MoeIcenowy>
oh interesting, I tried to cat /sys/kernel/debug/dri/2/gpu before the hang recovery, it captures a non-idle FE, and still "DMA seems to be stuck", but address 0/1 is now 0x00001010 (state 0/1 still 0x0) and last fetch 64 bit word is 0x55758082 0x1141b7fd
<MoeIcenowy>
looks like FE is now trying to work?
<lynxeye>
MoeIcenowy: yep, low address means the MMU is initialized and the FE tries to execute thing. Last fetched word content doesn't look in any way like actual commands, so seems the GPU is fetching garbage.
<lynxeye>
Maybe another MMU setup issue?
otavio_ has quit [Ping timeout: 480 seconds]
<MoeIcenowy>
lynxeye: btw when I am trying to send a "dummy command stream" in userspace
<MoeIcenowy>
what do the device get?
<MoeIcenowy>
also a null sequence?
<MoeIcenowy>
or there are some basical things?
<MoeIcenowy>
btw how to analyze devcoredump by etnaviv?
<MoeIcenowy>
okay viv-unpack
<lynxeye>
MoeIcenowy: if you don't submit any commands, the FE will be started, then jumps to the empty user commandstream and jumps right back to the kernel command ring, where it should execute the send event aka. fire IRQ
<MoeIcenowy>
okay there's a "kernel command ring"
<MoeIcenowy>
when I viv-unpack the devcoredump, I see buffers list "2 ring 00001000 00001000 4096" "3 cmd 00002000 00000948 2376"
<MoeIcenowy>
do this mean the 0x1010 address belong to the ring buffer?
<MoeIcenowy>
lynxeye: oops I got where 0x55758082 0x1141b7fd is from
<MoeIcenowy>
it's the word at physical memory 0x1008 0x100c
<MoeIcenowy>
(part of the system firmware (opensbi))
<MoeIcenowy>
so they seems to be executing from physical memory 0x1000 ....
<MoeIcenowy>
the mmu gets just totally bypassed
<MoeIcenowy>
well writing to 0x180 does not react, but writing to 0x380 does
<MoeIcenowy>
I surely should config the secure mmu registers instead of original ones
<MoeIcenowy>
strange, 0x18c reacts
<MoeIcenowy>
okay new SAFE_ADDR is at 0x398 0x39c
otavio_ has joined #etnaviv
<MoeIcenowy>
is there a parser for command stream?
lynxeye has quit [Quit: Leaving.]
frieder has quit [Remote host closed the connection]