#linux-sunxi on 2023-04-15 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:44 ChanServ changed the topic of #linux-sunxi to: Allwinner/sunxi development - Did you try looking at our wiki? https://linux-sunxi.org - Don't ask to ask. Just ask and wait for an answer! - This channel is logged at https://oftc.irclog.whitequark.org/linux-sunxi

00:25 apritzel has quit [Ping timeout: 480 seconds]

00:30 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

00:30 jernej has joined #linux-sunxi

01:34 cnxsoft has joined #linux-sunxi

01:46 <megi> https://megous.com/dl/tmp/c9a286427ce66d80.png :)

01:47 <megi> this is photodiode output from a LCD that's supposed to be showing 30 Hz all black / all white framebuffer swaps driven by DE2

01:48 <megi> not quite the expected output

01:49 <megi> basically DE2 is scanning out completely wrong buffer a lot of the times each second

01:49 <megi> (using mainline sun4i-drm driver)

02:57 <megi> more interesting experiment might be having say 8 framebuffers with increasingly brighter shades of gray and swap between them in sequence. that might show what exactly the hardware is doing. whether it just sometimes randomly reuses the previous shadow registers in DE2 incorrectly, or whether it's more complicated

03:05 ftg has quit [Read error: Connection reset by peer]

03:10 grming has quit [Quit: Konversation terminated!]

03:11 Danct12 has joined #linux-sunxi

03:49 <megi> https://megous.com/dl/tmp/2e67219a1af5109f.png 4 framebuffers with increasing intensity... app sends them in sequence 0 1 2 3... the display shows 0 1 2 3 1 2 2 3 0 1 2 3 1 1 2 3 0 1 2 3 0 1 3 3 0 1 2 3 (selection of various mistakes the driver does)

03:49 <megi> looks like it sometimes skips "ahead"

03:54 <megi> kinda wonder how to interpret this

04:00 JohnDoe_71Rus has joined #linux-sunxi

04:02 Danct12 has quit [Read error: No route to host]

04:03 Danct12 has joined #linux-sunxi

04:31 Danct12 has quit [Remote host closed the connection]

04:32 Danct12 has joined #linux-sunxi

05:07 vagrantc has quit [Quit: leaving]

06:07 <jernej> megi: I noticed that vendor DE2 driver sets registers only on vblank event (IIRC, certainly on interrupt)

06:08 <jernej> I'm unable to make DE3.3 (H616) working correctly, at least VI scaler. My suspicion is that this has something to do with it

06:09 <jernej> however, this would be quite some work

06:11 <jernej> Additionally, register reads in mixer code should be avoided. As you already discovered.

06:13 <jernej> Although, I wonder, if we can actually use a trick to only cache values most of the time and commit them on vblank. Regmap should already offer everything we need.

06:44 Danct12 has quit [Read error: No route to host]

06:45 Danct12 has joined #linux-sunxi

07:15 warpme has joined #linux-sunxi

07:42 Danct12 has quit [Read error: Connection reset by peer]

07:45 warpme has quit []

08:29 aggi has quit [Ping timeout: 480 seconds]

08:44 aggi has joined #linux-sunxi

08:46 warpme has joined #linux-sunxi

09:28 apritzel has joined #linux-sunxi

09:41 apritzel has quit [Ping timeout: 480 seconds]

09:47 warpme has quit []

10:51 bauen1 has quit [Ping timeout: 480 seconds]

11:17 warpme has joined #linux-sunxi

11:24 bauen1 has joined #linux-sunxi

12:18 <jernej> mripard: would be acceptable solution to put sun4i-drm regmap to cache only mode and sync it only on vblank and modeset events?

13:18 <megi> I also noticed that order of things in BSP code is that on vblank interrupt, BSP code just writes the DOUBLE_BUFFER_RDY=1 bit and then updates registers https://megous.com/dl/tmp/92824604e4a97b6d.png

13:23 <megi> so timing of the buffer flip will probably be very precisely after vsync interrupt in BSP code (not sure if it's a necessity or not, because naively changing the order of things in mainline driver to do sunxi_engine_commit() in tcon interrupt instead of in ctrc's flush() function didn't help

13:26 <megi> I'll have to read Roman's emails on mailing list. Seems like he also tried to delve deep into these issues over the years.

13:27 <megi> I also had an idea of hooking some gpio to a scope and pulsing it number of times on various parts of code to see precise timing of various driver actions in relation to actual screen output

13:27 <megi> maybe thar will also reveal something interesting

14:15 cnxsoft has quit []

14:30 apritzel has joined #linux-sunxi

14:50 <megi> yay, now I can see vblank interrupts, DRM_IOCTL_MODE_ATOMIC calls from userspace and which buffer was commited by counting the number of blue pulses minus 2 https://megous.com/dl/tmp/e894b15489fb62d6.png https://megous.com/dl/tmp/c38a72ae44dc59f5.png :)

14:50 <megi> now I just need to analyse this

14:50 warpme has quit []

14:52 exkc has quit []

15:00 DevrimGecegezer has quit []

15:00 indy has quit [Ping timeout: 480 seconds]

15:19 <megi> tcon vsync, flip event, and subsequent DRM_IOCTL_MODE_ATOMIC call varies quite a bit from 20us to ~1ms in my app (since this goes through userspace)

15:19 <megi> so it's not exactly precise

15:20 <megi> buuut, very interesting thing happens when I add to my userspace app usleep(3000) between flip event from the kernel and my atomic commit

15:20 <megi> https://megous.com/dl/tmp/ce26d85401866013.png

15:20 <megi> thius makes output look as it should

15:20 <megi> so this issue is definitely caused by something that's timing sensitive

15:22 <megi> perfect output, no weird frame skips

15:23 <jernej> megi: what's your refresh rate?

15:24 <megi> I'm using mainline pinephone code for this test, which still has broken DSI clock ratio, so about 40 Hz

15:24 <megi> I can retest with 60Hz with my patches on top

15:25 <jernej> did you see recent attempts to fix DSI refresh rate issue on ML?

15:26 <megi> that usleep value is just a random pick, I haven't tried looking for lowest value that still works

15:26 <megi> yes

15:26 <jernej> any comment? :)

15:26 <megi> I use the same fix for years

15:26 <jernej> anyway, could it be that there are two commits in same period?

15:28 <jernej> I got complains about video playbacks stuttering on LibreELEC on H6 boards, but I don't see them. I wonder if your issue is connected with that.

15:29 <jernej> I always test on 60 Hz rate, so maybe it's connected

15:30 <megi> no, I only do a commit per flip event, and I don't see spurious commits on the scope

15:30 <megi> 60Hz is the same

15:31 <megi> that's where I discovered it

15:31 <megi> I test at 40Hz just to test mainline code, and exclude my out of tree patches

15:32 <megi> the reason you may not see it may simply be that whatever userspace you use takes more time between flip event and atomic commit

15:33 <megi> I'll test different delay values, to see where it starts happening, but for now just 3ms delay between flip event from kernel and atomic commit ioctl call makes it disappear

15:34 <jernej> flip event is triggered from vblank interrupt, right?

15:34 <megi> yes

15:34 <megi> pretty much equivalent to vblank

15:35 <megi> waiting 1ms in userspace also makes it work reliably

15:37 <jernej> hm, I wouldn't trust delays to be completely accurate

15:37 <megi> I see/verify them on scope, too

15:37 <jernej> at least when they're so short

15:37 <jernej> ah, ok

15:37 <megi> they match most of the time

15:37 <megi> 500 us works too

15:40 <megi> uh, the panel crashed from all the blinking, lol

15:40 <megi> but 200 us worked and 50 us no longer worked

15:41 <megi> so the cutoff for reliablility of atomic updates will be somewhere between

15:45 <megi> I guess this all depends on how the userspace app is structure... mine prepares the buffers after flip, then commits them when the next flip comes... but there are other options where you can prepare the buffers after flip and commit them after random time it takes the CPU/GPU to prepare them and in this scenario you'll naturaly have a delay between flip event and atomic commit

15:46 <megi> but I guess the driver should work no matter what, even if you commit right after a flip event

15:47 <megi> not sure what usual linux compositors do

15:49 <megi> anyway, what could be happening in DE2/TCON that makes DOUBLE_BUFFER_RDY=1 write always work some small time after vsync but not too soon after vsync interrupt?

15:53 <jernej> megi: I took a look at vendor display driver and it seems to me that it writes double buffer very soon after vblank interrupt, so this is strange.

15:53 <megi> in working scenario, what I see on the scope when I place the photodiode at the beginning of the display is: vsync, commit of buffer N, scanout of buffer N-1, vsync, commit of buffer N+1, scanout of buffer N, etc.

15:54 <jernej> unless if there is some delay I missed or it could be that double buffer ready is signaled before other registers are written, so there is a delay for whole vblank period

15:56 <megi> in non-working scenario, it looks like the currently commited buffer right after vsync is scanned out

15:56 <megi> instead of previous one

15:57 <megi> so that's the skipping 'ahead' I described here at 5:49

15:59 <megi> so maybe if we always managed to do the commit right after vsync, we'd actually be scanning out the currently commited buffers and not the N-1 ones

16:00 <megi> and the problem is that we leave the commit delay to userspace, so it's unpredictable...

16:00 <megi> and the commit to DE2 HW registers can happen anywhere during the scanout cycle

16:01 <megi> not sure what DE2 expects

16:03 <megi> DOUBLE_BUFFER_RDY naming doesn't suggest it's timing sensitive... sounds just like a HW flag to tell DE2 that current content of registers should be swapped to shadow ones on the next scanout

16:03 <megi> ie. it doesn't cause the swap directly

16:04 <jernej> if that would be the case, I doubt AW would set it in vblank interrupt

16:05 <megi> but the DE2 mixer HW must be doing the actual swap at some time, so maybe this is done some time after vsync interrupt, so we can confuse the HW if we write DOUBLE_BUFFER_RDY=1 too soon after vsync

16:05 <jernej> I think it would be interesting to test if calling sunxi_engine_commit() in vblank instead of atomic_flush changes anything

16:07 <megi> just shifting it there from crtc flush didn't fix the weird output

16:08 <megi> but I did not have gpios set up when I was testing it, so maybe it shifted the sacnned out buffers by one

16:08 <megi> I'll try again

16:10 <megi> I kinda have a clue why it didn't fix anything... the driver would still be updating the DE2 registers in the middle of the time the DE2 is duing the actual registers flip

16:12 <megi> we'd just be telling the hw that we're ready for the registers flip sooner, before the regs are updated by atomic commit

16:13 <megi> so maybe your regmap cache idea may work, along with shifting the engine_commit to vblank interrupt handler

16:13 <megi> that way we'd be able to write registers quickly after vsync and notify the DE2 that they're ready before it does the registers flip

16:14 <megi> and it would not depend on how the userspace times the atomic commit at all

16:14 <jernej> exactly

16:14 <megi> perfectly predictable

16:16 <jernej> I use this patch to avoid issue with register reads: https://github.com/LibreELEC/LibreELEC.tv/blob/master/projects/Allwinner/patches/linux/0027-drm-sun4i-mixer-Add-caching-support.patch

16:16 <jernej> it can be extended to cache all other registers and write them only on vblank

16:17 <jernej> but main pain point is that access to some non-existing locations actually causes bus lockup

16:18 <megi> regcache doesn't track what changed, it writes the entire range on cache flush?

16:18 <jernej> so callback to specify readable and writable registers needs to be added

16:18 <megi> allright

16:18 <megi> sounds tedious :)

16:18 <megi> given the number of registers there

16:21 <jernej> megi: default flat cache is pretty simple: https://elixir.bootlin.com/linux/latest/source/drivers/base/regmap/regcache.c#L289

16:21 <megi> uh

16:22 <jernej> well, it's callback, not a table, so it can be coded in a way to consider mixer configuration and ranges

16:22 <jernej> no need to specify all registers one by one

16:22 <megi> yeah, that's the tedious part

16:23 <megi> and it's still writing everything, so it will take some time

16:23 <jernej> oh, it can be either table or a callback

16:23 <megi> not just the values updated by the driver in the last commit

16:24 <jernej> I didn't check rbtree cache

16:24 <jernej> maybe it's smarter?

16:24 <megi> so it may violate the timing and not work either if it takes more than 50us or so

16:26 <jernej> is it really that slow? in 50 us you can write a lot of registers

16:26 <megi> I don't know :) probably it's not

16:27 <megi> mixer has a lot of registers and it depends on the underlying bus speed, etc. so I don't know

16:28 <jernej> I don't think we use more than a hundred registers

16:29 <jernej> actually, DE2 memory space is sparsely populated

16:29 <megi> that will not be a problem, if we simply skip all the other ranges for the DE2 block that we don't use in the regmap

16:29 <megi> ok, let's try :)

16:48 <megi> hm, if I don't sepcify defaults somewhere, or just make only the exact registers used by the driver writable regcache will zero out random locations in between registers actually used by the driver, with unknown consequences

16:50 <megi> or maybe it can initialize itself from the actual register content?

16:53 <jernej> regmap shouldn't write location which are specified as not writeable

16:53 <megi> looks like rbtree actually tracks the presence of individual registers in the cache https://elixir.bootlin.com/linux/latest/source/drivers/base/regmap/regcache-rbtree.c#L496

16:53 <megi> yes, but if I specify whole blender range as writable, it will write 0 to registers we don't modify from the default

16:54 <jernej> yeah, that's why detailed writeable range would be better

16:54 <jernej> but if rbtree updates only changed values, you can always try that

16:55 <megi> yup https://elixir.bootlin.com/linux/latest/source/drivers/base/regmap/regcache-rbtree.c#L54

16:56 <jernej> well, I see at least one more good thing if readable/writeable callbacks are implemented - sysfs interface would be actually usable

16:57 <megi> detailed register ranges are not desirable for that

16:58 <jernej> currenty, if you read register state there, you get overloaded with zero values

16:58 <jernej> and second, more important thing, it always locks up CPU, at least for me

17:24 <megi> https://megous.com/dl/tmp/da4e9c981a13730b.png :)

17:25 <megi> because the changes are minimal, using the cache there's also much reduced number of mmio writes to DE2 between commits

17:29 <megi> it's just like 7 register writes between commits most of the time

17:30 <megi> though it looks like this is what the mixer driver would do, too for simple case

17:33 <jernej> does it help with your issue?

17:33 <megi> there's no display output at all

17:34 <jernej> do you mind share your changes?

17:34 <megi> I only see vblank interrupts and atomic commits on my scope but the screen is black

17:34 <megi> sure

17:37 hentai has quit [Remote host closed the connection]

17:39 <megi> https://megous.com/dl/tmp/0001-cache.patch

17:40 <megi> not sure what drm_crtc_handle_vblank does

17:41 <jernej> is regcache_drop_region() needed?

17:41 <megi> maybe not

17:45 <megi> yeah, without it rbtree cache syncs repeatedly the entire reg range

17:45 <megi> probably due to mixer probe clearing it at the beginning

17:45 <megi> each commit then looks like https://megous.com/dl/tmp/5c5d5b9b73820ac8.png

17:45 <megi> and muuuch more

17:46 <megi> so I think it's needed

17:47 <megi> but I might need to figure out the initial enablement of cache only mode, better, I think

17:55 apritzel has quit [Ping timeout: 480 seconds]

18:17 <megi> eh uh I didn't move engine commit from crtc to vblank interrupt

18:18 <megi> now wonder it doesn't work

18:32 ftg has joined #linux-sunxi

18:52 <jernej> megi: so you managed to fix it?

19:14 JohnDoe_71Rus has quit []

19:19 warpme has joined #linux-sunxi

19:26 hentai has joined #linux-sunxi

20:01 grming has joined #linux-sunxi

20:51 <megi> no, mixer doesn't seem to work with cache

20:52 <megi> and wifi does not initialize for whatever reason, so I can't see logs

20:54 Daaanct12 is now known as Danct12

21:06 <megi> looks like the issue is that no first vblank interrupt gets generated when sunxi_engine_commit is called only from vblank interrupt

21:06 <megi> so all I get is [CRTC:49:crtc-0] vblank wait timed out

21:13 apritzel has joined #linux-sunxi

21:23 <megi> cache sync is quite slow, majority of time it takes ~110us, sometimes shorter, sometimes longer

21:24 <megi> so even with register update moved to vsync interrupt, before the engine commit, the issue persists, because the delay from vblank to DOUBLE_BUFFER_RDY=1 is too long

21:57 <megi> pretty much the only thing that works for me is adding a delay after flip event to userspace app or to the driver

22:01 grming has quit [Quit: Konversation terminated!]

22:03 grming has joined #linux-sunxi

23:04 HackerKkillinghisLGA775cpuPent has quit []

23:51 <megi> this time may even be display specific... looking at the pinephone panel mode, hline duration is 11.3us and non-active part of the vertical cycle is just 400us with vsync pulse duration being 400us and duration from vsync pulse start to next scanout being just 180us

23:52 <megi> so all this fits pretty nicely with the delay of about 200us that I need to make DE2 scan out the previously commited state and not the new one that I'm commiting right after flip event