#etnaviv on 2021-05-28 — irc logs at oftc.irclog.whitequark.org

2021-05-25 15:56 austriancoder changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://oftc.irclog.whitequark.org/etnaviv

00:52 adjtm has quit [Quit: Leaving]

00:54 adjtm has joined #etnaviv

01:11 dos has joined #etnaviv

01:15 dos1 has quit [Ping timeout: 480 seconds]

03:14 dos has quit [Ping timeout: 480 seconds]

05:31 adjtm is now known as Guest257

05:31 adjtm has joined #etnaviv

05:37 Guest257 has quit [Ping timeout: 480 seconds]

07:04 JohnnyonFlame has quit [Ping timeout: 480 seconds]

07:57 lynxeye has joined #etnaviv

08:09 mwalle has joined #etnaviv

08:21 <mwalle> Hi, I try to enable Etnaviv support on the NXP LS1028A SoC (which contains an GC7000 core similar to the iMX8 as far as I know). It contains a mali-dp display controller. Thanks to the nice guys at #linux-dri I was already able to already add kmsro support for mali-dp

08:21 pcercuei has joined #etnaviv

08:22 <mwalle> If i start glmark, I'm getting a black screen and gpu hang, trying to recover messages

08:22 <austriancoder> mntmn: ^

08:22 * austriancoder does not have access to such hw

08:22 <mwalle> https://pastebin.com/raw/jsY0N18K << basically this (without the "failed to set crtc: -22", which I already fixed)

08:23 <mwalle> I don't see any interrupts either, I presume there should be some, no?

08:24 <lynxeye> mwalle: First thing: get the exact GPU model/rev from etnaviv, then dig into the Vivante kernel driver hwdb to get the feature bits of that core and add it to the etnaviv hwdb.

08:24 <mwalle> oh, I'm running the latest linux-next and mesa-21.0.3 (the latter just because that was on my current buildroot setup)

08:25 <mwalle> lynxeye: ok let me have a look

08:25 * austriancoder hopes that the kernel based hwdb does not get extended and we switch over to user space based one in mesa

08:25 <austriancoder> lynxeye: btw. what happended to the kernel patches - nothing landed yet in 5.13

08:27 <mwalle> [ 7.608663] etnaviv-gpu f0c0000.gpu: model: GC7000, revision: 6202

08:27 <mwalle> this one i presume

08:27 <mwalle> which begs the question, where do I get all the hwdb values from

08:27 <austriancoder> mwalle: they should be under debugfs

08:28 <lynxeye> mwalle: There's a huge hwdb in the Vivante kernel driver, just need to look up this specific model/rev.

08:31 <mwalle> lynxeye: mh? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/etnaviv/etnaviv_hwdb.c < there are only two entries

08:31 <lynxeye> austriancoder: We've had like 3 patches staged the last time around, so it dropped off my prio list and I didn't send a pull-req in time for 5.13. :/ Should all land in 5.14.

08:31 <lynxeye> mwalle: Not the etnaviv driver, the Vivante kernel driver.

08:32 <lynxeye> Basically you need to build a etnaviv hwdb entry from the values in the Vivante driver.

08:32 <austriancoder> lynxeye: really? I am not happy with this - might it be possible to move etnaviv to drm-misc and/or add me as maintainer?

08:32 <lynxeye> mwalle: If we already had the entry in etnaviv you wouldn't run into this again.

08:33 <mwalle> lynxeye: ah :)

08:34 <mwalle> but there are also "some" values in debugfs

08:34 <lynxeye> austriancoder: While it's unfortunate, the changes are all in linux-next and I don't see any userspace changes depending on this, yet.

08:35 <lynxeye> mwalle: The values in debugfs are what is read from the hardware, which is basically all lies since Vivante started to solely rely on the hwdb.

08:36 <austriancoder> lynxeye: I tried to send in the patches in time to get into the next release (5.13 at the time) also even there is no public userspace bits yet the changes did not introduce any new ioctl

08:38 <austriancoder> lynxeye: okay I have to live with it ..

08:46 <lynxeye> austriancoder: I can only apologize for missing the merge and will try to be more consistent in the future there. Still everything you do in userspace needs to cope with those new params not being there, as we need to work with older kernels.

08:47 <austriancoder> lynxeye: I am ware of that fact

08:48 <austriancoder> .. s/ware/aware

08:48 <mwalle> is there any script how to convert that long per bit feature list from the vivante driver to the etnaviv 32bit features values?

08:48 <mwalle> *script/documentation

09:00 <lynxeye> mwalle: One of my coworkers started something, currently very specific to one GPU. I would be happy to see this cleaned up in the github etna_viv repo.

09:00 <lynxeye> https://paste.debian.net/1199154/

09:12 <mwalle> lynxeye: i presume its for the imx8 variant? ;)

09:12 <mwalle> and thanks

09:12 <lynxeye> mwalle: imx8mp to be specific.

09:12 <mwalle> mp or mq?

09:14 <lynxeye> mwalle: mp

09:14 <lynxeye> for the mq I did it by hand

09:29 <mwalle> lynxeye: regarding that script

09:29 <mwalle> {"chipMinorFeatures0_HAS_SQRT_TRIG",0x00100000, 1 , 0x1, "gcFEATURE_BIT_REG_DefaultReg0" },

09:29 <mwalle> is that correct?

10:34 <mwalle> lynxeye: nice, now I'm getting at least iommu errors

10:35 <lynxeye> mwalle: You might also want to take a look at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255

10:36 <lynxeye> I suspect the GPU in the ls1028 is closer to the 8mp GPU than to the 8mq one.

10:36 <lynxeye> mwalle: I haven't checked all the values in the converter program, so take it with a grain of salt.

10:40 <mwalle> lynxeye: I haven't found anything else there, except some "fixesN" where the kernel seem to have a better name

10:50 <mwalle> I've disabled the iommu for now. and getting MMU errors from the GPU https://pastebin.com/raw/6FZUchiR

10:50 <mwalle> fault address looks suspcious

11:22 JohnnyonFlame has joined #etnaviv

11:44 <mwalle> do I need the phys_baseaddr and the contiguous_mem properties from the vivante kernel driver?

12:10 <lynxeye> mwalle: Nope, etnaviv uses the system CMA region

12:12 <lynxeye> mwalle: Please take a look at the Mesa MR. If one of the PE pipe addresses isn't programmed correctly, you'll get GPU MMU faults with address 0

13:04 JohnnyonFlame has quit [Ping timeout: 480 seconds]

13:18 <mwalle> lynxeye: I've updated to the latest mesa main branch tip with your 5 patches on top. There are some changes. I still get the mmu fault at addr 0, but I also see something happening on the screen (well mostly black but there is definetly some pattern recognizable, which also seems to change with different glmark tests)

13:21 <mwalle> I've also set ETNA_MESA_DEBUG=no_supertile

13:28 <lynxeye> mwalle: Does it change more if you also set no_ts?

13:30 <marex> lynxeye: so why dont you and austriancoder co-maintainer the kernel driver anyway ?

13:30 <mwalle> lynxeye: ahhh :)

13:31 <mwalle> lynxeye: https://pastebin.com/raw/YknYtdb8

13:31 <mwalle> and in fact i also see something on the screen :)

13:32 <lynxeye> mwalle: \o/

13:34 <mwalle> lynxeye: is there a way to debug why there is that first fault? that looks like a valid address though

13:34 <lynxeye> marex: Because one person doing it was working okay, at least as long as there was a big enough number of patches that I didn't forget about the staged stuff...

13:35 <lynxeye> marex: Shared maintenance of a tree requires more coordination and in fact I almost never commit to drm-misc even though I have commit rights, due to the tooling overhead.

13:35 <marex> lynxeye: seems austriancoder would like to help, so maybe give the comaintainership model a try ?

13:35 <marex> lynxeye: I saw it working pretty well in various projects

13:39 <lynxeye> Wouldn't a "hey I haven't seen a kernel PR yet" ping at -rc6 time be easier for starters? The etnaviv list is always copied on all that kernel stuff after all...

13:40 <marex> lynxeye: if you were down due to some infectious disease, maybe not ?

13:41 dos has joined #etnaviv

13:43 <lynxeye> *shrug* TBH I think there are a lot of bigger fires right now.

13:45 <lynxeye> mwalle: Does this hang always happen after starting glmark? May be due to some missing TLB flush or something like that if it just happens a single time at app start.

13:45 <mwalle> lynxeye: yes, on every start, and just once

13:48 <marex> mwalle: I think I observe the MMU faults on STM32MP1 as well during glmark

13:49 <marex> mwalle: do you observe / trigger those consistently ?

13:49 <marex> mwalle: for me it takes days to trigger one

13:49 <marex> lynxeye: like being a single point of failure during pandemic ? ;-)

13:50 <mwalle> marex: on every start right before (?) the first test

13:50 <marex> lynxeye: I think I'll just drop this now

13:50 <marex> mwalle: oh

13:50 <marex> mwalle: and if you use this --run-forever arg, does it always happen on the first test ?

13:50 <mwalle> marex: let me try

13:51 <marex> mwalle: or in fact, if you just select the one test and use --run-forever

13:51 <marex> mwalle: for me it happened on one of the later tests

13:51 <marex> mwalle: in fact, did you scrape devcoredump out of the MMU fault yet ? :)

13:52 <mwalle> marex: sorry I'm really a noob regarding this topic ;) I'm happy that I got that mali-dp and this running

13:52 <mwalle> what is devcoredump?

13:52 <marex> mwalle: there is some functionality where if MMU fault happens, the kernel triggers some userspace udev helper and it writes a file with information

13:53 <marex> there's udev rule and script to store that info

13:53 <marex> mwalle: see https://github.com/etnaviv/etna-gpu-tools , udev/ directory

13:54 <lynxeye> marex: I think those are very different issues. Faults at startup are usually a sign of either a missing TLB flush somewhere or content from the last execution still being stuck in some caches that is attempted to drain out when starting the GPU again.

13:57 <mwalle> marex: no, it doesn't happen with --run-forever (well one time after start)

14:03 <mwalle> btw nxp documented an erratum regarding the clock gating, which isn't working. I'm unsure if this depends on the core or the fact that this is a layerscale where most clocks are static and can't be gated/changed anyway. But it seems that the vivante gpu gates the input clock itself. Thus it might be core erratum

14:04 <lynxeye> mwalle: Huh, am I blind or is there no public errata sheet available for this SoC?

14:05 <lynxeye> The Vivante GPU normally have different ways to gate internal clocks depending on the GPU core generation.

14:05 <mwalle> "GPU hangs if clock gating for rasterizer, setup engine and texture engine are enabled" workaround is enable (?) module level clock gating and disable clock gating fot these three blocks

14:05 <mwalle> lynxeye: yeah no public errata sheets..

14:09 <lynxeye> mwalle: That's interesting. This is the mlcg in the etnaviv kernel driver and we don't have any code in place to avoid SE and TX clock gating.

14:11 <mwalle> lynxeye: so basically something like that: https://pastebin.com/raw/2tvf6etk

14:12 <lynxeye> mwalle: yes

16:19 JohnnyonFlame has joined #etnaviv

19:19 lynxeye has quit [Quit: Leaving.]

19:22 JohnnyonFlame has quit [Read error: Connection reset by peer]

19:39 JohnnyonFlame has joined #etnaviv

19:40 JohnnyonF has joined #etnaviv

19:47 JohnnyonFlame has quit [Read error: Connection reset by peer]

19:47 JohnnyonF has quit [Read error: Connection reset by peer]

22:20 pcercuei has quit [Quit: dodo]