<megi>
this is photodiode output from a LCD that's supposed to be showing 30 Hz all black / all white framebuffer swaps driven by DE2
<megi>
not quite the expected output
<megi>
basically DE2 is scanning out completely wrong buffer a lot of the times each second
<megi>
(using mainline sun4i-drm driver)
<megi>
more interesting experiment might be having say 8 framebuffers with increasingly brighter shades of gray and swap between them in sequence. that might show what exactly the hardware is doing. whether it just sometimes randomly reuses the previous shadow registers in DE2 incorrectly, or whether it's more complicated
ftg has quit [Read error: Connection reset by peer]
grming has quit [Quit: Konversation terminated!]
Danct12 has joined #linux-sunxi
<megi>
https://megous.com/dl/tmp/2e67219a1af5109f.png 4 framebuffers with increasing intensity... app sends them in sequence 0 1 2 3... the display shows 0 1 2 3 1 2 2 3 0 1 2 3 1 1 2 3 0 1 2 3 0 1 3 3 0 1 2 3 (selection of various mistakes the driver does)
<megi>
looks like it sometimes skips "ahead"
<megi>
kinda wonder how to interpret this
JohnDoe_71Rus has joined #linux-sunxi
Danct12 has quit [Read error: No route to host]
Danct12 has joined #linux-sunxi
Danct12 has quit [Remote host closed the connection]
Danct12 has joined #linux-sunxi
vagrantc has quit [Quit: leaving]
<jernej>
megi: I noticed that vendor DE2 driver sets registers only on vblank event (IIRC, certainly on interrupt)
<jernej>
I'm unable to make DE3.3 (H616) working correctly, at least VI scaler. My suspicion is that this has something to do with it
<jernej>
however, this would be quite some work
<jernej>
Additionally, register reads in mixer code should be avoided. As you already discovered.
<jernej>
Although, I wonder, if we can actually use a trick to only cache values most of the time and commit them on vblank. Regmap should already offer everything we need.
Danct12 has quit [Read error: No route to host]
Danct12 has joined #linux-sunxi
warpme has joined #linux-sunxi
Danct12 has quit [Read error: Connection reset by peer]
warpme has quit []
aggi has quit [Ping timeout: 480 seconds]
aggi has joined #linux-sunxi
warpme has joined #linux-sunxi
apritzel has joined #linux-sunxi
apritzel has quit [Ping timeout: 480 seconds]
warpme has quit []
bauen1 has quit [Ping timeout: 480 seconds]
warpme has joined #linux-sunxi
bauen1 has joined #linux-sunxi
<jernej>
mripard: would be acceptable solution to put sun4i-drm regmap to cache only mode and sync it only on vblank and modeset events?
<megi>
I also noticed that order of things in BSP code is that on vblank interrupt, BSP code just writes the DOUBLE_BUFFER_RDY=1 bit and then updates registers https://megous.com/dl/tmp/92824604e4a97b6d.png
<megi>
so timing of the buffer flip will probably be very precisely after vsync interrupt in BSP code (not sure if it's a necessity or not, because naively changing the order of things in mainline driver to do sunxi_engine_commit() in tcon interrupt instead of in ctrc's flush() function didn't help
<megi>
I'll have to read Roman's emails on mailing list. Seems like he also tried to delve deep into these issues over the years.
<megi>
I also had an idea of hooking some gpio to a scope and pulsing it number of times on various parts of code to see precise timing of various driver actions in relation to actual screen output
<megi>
maybe thar will also reveal something interesting
<megi>
tcon vsync, flip event, and subsequent DRM_IOCTL_MODE_ATOMIC call varies quite a bit from 20us to ~1ms in my app (since this goes through userspace)
<megi>
so it's not exactly precise
<megi>
buuut, very interesting thing happens when I add to my userspace app usleep(3000) between flip event from the kernel and my atomic commit
<megi>
so this issue is definitely caused by something that's timing sensitive
<megi>
perfect output, no weird frame skips
<jernej>
megi: what's your refresh rate?
<megi>
I'm using mainline pinephone code for this test, which still has broken DSI clock ratio, so about 40 Hz
<megi>
I can retest with 60Hz with my patches on top
<jernej>
did you see recent attempts to fix DSI refresh rate issue on ML?
<megi>
that usleep value is just a random pick, I haven't tried looking for lowest value that still works
<megi>
yes
<jernej>
any comment? :)
<megi>
I use the same fix for years
<jernej>
anyway, could it be that there are two commits in same period?
<jernej>
I got complains about video playbacks stuttering on LibreELEC on H6 boards, but I don't see them. I wonder if your issue is connected with that.
<jernej>
I always test on 60 Hz rate, so maybe it's connected
<megi>
no, I only do a commit per flip event, and I don't see spurious commits on the scope
<megi>
60Hz is the same
<megi>
that's where I discovered it
<megi>
I test at 40Hz just to test mainline code, and exclude my out of tree patches
<megi>
the reason you may not see it may simply be that whatever userspace you use takes more time between flip event and atomic commit
<megi>
I'll test different delay values, to see where it starts happening, but for now just 3ms delay between flip event from kernel and atomic commit ioctl call makes it disappear
<jernej>
flip event is triggered from vblank interrupt, right?
<megi>
yes
<megi>
pretty much equivalent to vblank
<megi>
waiting 1ms in userspace also makes it work reliably
<jernej>
hm, I wouldn't trust delays to be completely accurate
<megi>
I see/verify them on scope, too
<jernej>
at least when they're so short
<jernej>
ah, ok
<megi>
they match most of the time
<megi>
500 us works too
<megi>
uh, the panel crashed from all the blinking, lol
<megi>
but 200 us worked and 50 us no longer worked
<megi>
so the cutoff for reliablility of atomic updates will be somewhere between
<megi>
I guess this all depends on how the userspace app is structure... mine prepares the buffers after flip, then commits them when the next flip comes... but there are other options where you can prepare the buffers after flip and commit them after random time it takes the CPU/GPU to prepare them and in this scenario you'll naturaly have a delay between flip event and atomic commit
<megi>
but I guess the driver should work no matter what, even if you commit right after a flip event
<megi>
not sure what usual linux compositors do
<megi>
anyway, what could be happening in DE2/TCON that makes DOUBLE_BUFFER_RDY=1 write always work some small time after vsync but not too soon after vsync interrupt?
<jernej>
megi: I took a look at vendor display driver and it seems to me that it writes double buffer very soon after vblank interrupt, so this is strange.
<megi>
in working scenario, what I see on the scope when I place the photodiode at the beginning of the display is: vsync, commit of buffer N, scanout of buffer N-1, vsync, commit of buffer N+1, scanout of buffer N, etc.
<jernej>
unless if there is some delay I missed or it could be that double buffer ready is signaled before other registers are written, so there is a delay for whole vblank period
<megi>
in non-working scenario, it looks like the currently commited buffer right after vsync is scanned out
<megi>
instead of previous one
<megi>
so that's the skipping 'ahead' I described here at 5:49
<megi>
so maybe if we always managed to do the commit right after vsync, we'd actually be scanning out the currently commited buffers and not the N-1 ones
<megi>
and the problem is that we leave the commit delay to userspace, so it's unpredictable...
<megi>
and the commit to DE2 HW registers can happen anywhere during the scanout cycle
<megi>
not sure what DE2 expects
<megi>
DOUBLE_BUFFER_RDY naming doesn't suggest it's timing sensitive... sounds just like a HW flag to tell DE2 that current content of registers should be swapped to shadow ones on the next scanout
<megi>
ie. it doesn't cause the swap directly
<jernej>
if that would be the case, I doubt AW would set it in vblank interrupt
<megi>
but the DE2 mixer HW must be doing the actual swap at some time, so maybe this is done some time after vsync interrupt, so we can confuse the HW if we write DOUBLE_BUFFER_RDY=1 too soon after vsync
<jernej>
I think it would be interesting to test if calling sunxi_engine_commit() in vblank instead of atomic_flush changes anything
<megi>
just shifting it there from crtc flush didn't fix the weird output
<megi>
but I did not have gpios set up when I was testing it, so maybe it shifted the sacnned out buffers by one
<megi>
I'll try again
<megi>
I kinda have a clue why it didn't fix anything... the driver would still be updating the DE2 registers in the middle of the time the DE2 is duing the actual registers flip
<megi>
we'd just be telling the hw that we're ready for the registers flip sooner, before the regs are updated by atomic commit
<megi>
so maybe your regmap cache idea may work, along with shifting the engine_commit to vblank interrupt handler
<megi>
that way we'd be able to write registers quickly after vsync and notify the DE2 that they're ready before it does the registers flip
<megi>
and it would not depend on how the userspace times the atomic commit at all
<jernej>
well, it's callback, not a table, so it can be coded in a way to consider mixer configuration and ranges
<jernej>
no need to specify all registers one by one
<megi>
yeah, that's the tedious part
<megi>
and it's still writing everything, so it will take some time
<jernej>
oh, it can be either table or a callback
<megi>
not just the values updated by the driver in the last commit
<jernej>
I didn't check rbtree cache
<jernej>
maybe it's smarter?
<megi>
so it may violate the timing and not work either if it takes more than 50us or so
<jernej>
is it really that slow? in 50 us you can write a lot of registers
<megi>
I don't know :) probably it's not
<megi>
mixer has a lot of registers and it depends on the underlying bus speed, etc. so I don't know
<jernej>
I don't think we use more than a hundred registers
<jernej>
actually, DE2 memory space is sparsely populated
<megi>
that will not be a problem, if we simply skip all the other ranges for the DE2 block that we don't use in the regmap
<megi>
ok, let's try :)
<megi>
hm, if I don't sepcify defaults somewhere, or just make only the exact registers used by the driver writable regcache will zero out random locations in between registers actually used by the driver, with unknown consequences
<megi>
or maybe it can initialize itself from the actual register content?
<jernej>
regmap shouldn't write location which are specified as not writeable
<megi>
but I might need to figure out the initial enablement of cache only mode, better, I think
apritzel has quit [Ping timeout: 480 seconds]
<megi>
eh uh I didn't move engine commit from crtc to vblank interrupt
<megi>
now wonder it doesn't work
ftg has joined #linux-sunxi
<jernej>
megi: so you managed to fix it?
JohnDoe_71Rus has quit []
warpme has joined #linux-sunxi
hentai has joined #linux-sunxi
grming has joined #linux-sunxi
<megi>
no, mixer doesn't seem to work with cache
<megi>
and wifi does not initialize for whatever reason, so I can't see logs
Daaanct12 is now known as Danct12
<megi>
looks like the issue is that no first vblank interrupt gets generated when sunxi_engine_commit is called only from vblank interrupt
<megi>
so all I get is [CRTC:49:crtc-0] vblank wait timed out
apritzel has joined #linux-sunxi
<megi>
cache sync is quite slow, majority of time it takes ~110us, sometimes shorter, sometimes longer
<megi>
so even with register update moved to vsync interrupt, before the engine commit, the issue persists, because the delay from vblank to DOUBLE_BUFFER_RDY=1 is too long
<megi>
pretty much the only thing that works for me is adding a delay after flip event to userspace app or to the driver
grming has quit [Quit: Konversation terminated!]
grming has joined #linux-sunxi
HackerKkillinghisLGA775cpuPent has quit []
<megi>
this time may even be display specific... looking at the pinephone panel mode, hline duration is 11.3us and non-active part of the vertical cycle is just 400us with vsync pulse duration being 400us and duration from vsync pulse start to next scanout being just 180us
<megi>
so all this fits pretty nicely with the delay of about 200us that I need to make DE2 scan out the previously commited state and not the new one that I'm commiting right after flip event