ChanServ changed the topic of #linux-msm to:
cxl000 has joined #linux-msm
<abhinav___> robclark: agreed. slight delta in the sspp status registers (fetch related) but apart from that no significant difference. We can continue debugging on the bug.
<abhinav___> regarding DSPP, those are actually histogram status bits
<abhinav___> it should not make a difference for this issue but weird its 0 for one case and not for the other
<robclark> abhinav___: yeah, they aren't regs driver writes, afaict.. but maybe a hint about what is wrong?
<robclark> it's weird.. but I compared a bunch of different good vs good and bad vs bad (and good vs bad) to rule out things that are normally different, and that was the only thing that stood out
<abhinav___> robclark: got it, yes those are read only regs so driver cannot write to them. one immediate suggestion i can think of is to try the DSI test pattern to see if that comes up. if that works, then we can go further up
<robclark> ok.. not sure how if I remember how to get test pattern.. but if you reply on b/ I can try it in the morning
<robclark> the bridge doesn't seem to be raising any error status bits about dsi signal from SoC, fwiw
<abhinav___> sure i will update the bug
<abhinav___> will continue there
<robclark> sg
elroo_ has quit [Ping timeout: 480 seconds]
<sboyd> abhinav___: where should I call msm_dsi_manager_tpg_enable() from?
<abhinav___> sboyd: this worked for me
marvin24_ has joined #linux-msm
marvin24 has quit [Ping timeout: 480 seconds]
marvin24 has joined #linux-msm
marvin24_ has quit [Ping timeout: 480 seconds]
marvin24_ has joined #linux-msm
marvin24 has quit [Ping timeout: 480 seconds]
pevik_ has joined #linux-msm
<sboyd> abhinav___: yes that works for me. I get the test pattern now.
pg12 has joined #linux-msm
pg12_ has quit [Ping timeout: 480 seconds]
svarbanov has joined #linux-msm
agross_ has joined #linux-msm
CosmicPenguin_ has joined #linux-msm
pundir_ has joined #linux-msm
thara_ has joined #linux-msm
thara has quit [Ping timeout: 480 seconds]
thara_ is now known as thara
CosmicPenguin has quit [Ping timeout: 480 seconds]
agross has quit [Ping timeout: 480 seconds]
agross_ is now known as agross
pundir has quit [Ping timeout: 480 seconds]
pundir_ is now known as pundir
rawouF is now known as rawoul
bamse_ has joined #linux-msm
bamse has quit [Read error: Connection reset by peer]
svarbanov has quit [Ping timeout: 480 seconds]
<robclark> abhinav___, sboyd: test pattern works for me too.. but without the test pattern, if I plug in a bogus address for scanout buffer I get the expected iova faults.. so I guess whatever is going wrong is between sspp and dsi?
<lumag_> robclark, could you maybe compare clocks values (debugcc)?
<robclark> clk_summary dumps more or less matched.. there were some diffs in some qup clocks, which I assume is related to having serial console enabled in the working case
<lumag_> robclark, no, I was talking about https://github.com/andersson/debugcc
<lumag_> It was helpful to me in several obscure cases already
<robclark> haven't tried that.. but maybe that is the testclock thing sboyd tried?
<lumag_> robclark, dunno
<robclark> hmm, does that thing even have sc7180 support?
<lumag_> robclark, no, but you can easily write it if you have downstream debug_cc driver
<lumag_> Just a huuuge table of all the muxes.
<sboyd> Looks like testclock
<lumag_> sboyd, what is testclock? :-)
<sboyd> it's the script that debugcc is based on
<sboyd> lumag_: what clk needs to be measured?
<sboyd> robclark: yeah that's it
<lumag_> sboyd, I was asking robclark if there is a clock measurements diff between good and bad tries
<robclark> I can stash good/bad $debugfs/clk/clk_summary somewhere in a min if that is useful.. and/or try this testclock.py thing
<lumag_> robclark, the testclock is probably more usefull
pevik_ has quit [Quit: Lost terminal]
<robclark> hmm, that script looks to have some other py dependencies that I'm missing
<sboyd> It depends on the mem script
<robclark> ok, any particular clk you want?
<lumag_> robclark, no, nothing particular
CosmicPenguin_ is now known as CosmicPenguin
<robclark> hmm, doesn't seem to be a convenient way to dump them all
<lumag_> I usually did `sdm845-debugcc -a | grep disp`
<robclark> there are some *slight* differences which I guess are rounding errors.. other than that pwrcl_clk is somewhat different (no idea what that is)
<sboyd> Can you print(clocks)
<robclark> good:
<robclark> bad:
* robclark just made a wrapper script to dump each clk individually
<robclark> just dumping pwrcl_clk a few times, it seems to vary wildly so I guess that is unrelated.. I assume the small differences in disp related clks is simply due to how they are sampled?
<sboyd> pwrcl is the CPU
<sboyd> I ran testclock -h | grep disp_cc | xargs testclock
<sboyd> pwrcl stands for power cluster
<robclark> ahh
<sboyd> also this works
<robclark> depending on what $think you are running on, those might not be comparable to mine.. my dumps were from lazor
<sboyd> this is on coachz
<robclark> but doesn't look to me like any interesting difference between the good and bad dumps
<sboyd> I would suspect that the test pattern wouldn't work if the clks were off
<abhinav___> yes for test pattern to work the DSI clocks need to be right. we have to check dpu side
<robclark> IME the bridge status regs would also have some error bits set if it was unhappy with the dsi signal
<sboyd> so how do I look at DPU?
<robclark> sideways?
<robclark> what are you looking for?
<sboyd> robclark: heh abhinav___ said we have to check dpu side
<robclark> you can grab that series I posted that adds debugfs to dump all the register state.. I already attached good/bad dumps to the b/
<abhinav___> sboyd: we can continue on the bug, please attach the clock states as well to it.
<robclark> I can do that
<sboyd> ok if you want to make it slower we can debug in the bug :)
<sboyd> at least the paper trail is there
<abhinav___> sboyd: most likely kalyans team will look into this as its a chrome dpu issue. they dont have IRC access, hence its better to continue there
<sboyd> abhinav___: so the assignee is wrong?
<abhinav___> Yes, but no worries i can reassign
<lumag_> sboyd, robclark: regarding atomic_print_state. Would you like for me to send the v2 with const, or just fixing it at the apply time is fine?
<robclark> lumag_: squashing that fix in when applying is fine
<robclark> abhinav___: in the mean time, if there are other things you can think of trying, we can iterate on IRC and copy/paste updates into b/
<lumag_> robclark, ack
<abhinav___> robclark: sure , will do that too
<Marijn[m]> lumag_: Not sure if you intended to cc me on the clock-controller patches; send-email only seems to pick my address up as cc from the reviewed-by tags ;)
<lumag_> Marijn[m], yep, please excuse me. Just getting torn between several patchseries
<Marijn[m]> No worries, I can download it off of lore and stitch things back together (not subscribed to any list because it is way too noisy, and I haven't found a proper way to deduplicate that with cc'd/to'd mails)
<lumag_> Marijn[m], ugh. Then I'd double my excuses.
<Marijn[m]> lumag_: Not your problem, it's something to deal with in general. Suggestions welcome :)
<Marijn[m]> lumag_: Thanks for commenting on the DSI "ref" patches too, I hope bamse_ and sboydthink it's ready too :)
<Marijn[m]> lumag_: One little issue remaining for the clock cleanup, almost on the finish line :)
<Marijn[m]> (Unfortunately 8996 has a bunch other drivers remaining that could use the same conversion :( )
<lumag_> Marijn[m], yep.
<lumag_> Marijn[m], I'll probably finish the 8996 conversion at some point. Then I'll probably look onto 8916, 8960/8064 and 8939 (minor fixes).
<lumag_> If nobody picks them before I have time for them.
<Marijn[m]> lumag_: I don't want to pick up much of these without having a board/device under my nose to test it on, glad if you can tackle 8996 and all the other SoCs mentioned, none of those are in my posession.
<Marijn[m]> And of course time 😅
<lumag_> Marijn[m], time is the worst thing
<lumag_> always in a shortage
<Marijn[m]> lumag_: Yeah, especially when doing this as a hobby. Have an almost infinite backlog of patches that still need cleaning and sending :(
<Marijn[m]> So much progress on device bringup but none makes it to the lists
<lumag_> Marijn[m], yep. Some of the clocks patches from the set you've been reviewing date back to April
<Marijn[m]> lumag_: One-upping that one: the 8976 patches that are starting to reach the list date back to summer 2019 :)
<lumag_> oh, my.
<lumag_> 2019 was all about crypto, iMX and Zynq for me.
* vknecht[m] could test 8939 stuff, provided with instructions how to properly check the change is ok
<vknecht[m]> 8916 too, but minecrell is the expert for that one ;-)
Daanct12 has joined #linux-msm
Danct12 has quit [Ping timeout: 480 seconds]
Danct12 has joined #linux-msm
Daanct12 has quit [Ping timeout: 480 seconds]
svarbanov has joined #linux-msm
<steev> is patchwork broken? i'm not seeing any patches since the 13th - https://patchwork.kernel.org/project/linux-arm-msm/list/
<Marijn[m]> steev: Yup, I wondered the same, seems to be stuck. I got pointed to https://patches.linaro.org/project/linux-arm-msm/list/, but that doesn't appear to list every patch?
<steev> they may not mirror stuff that bamse marks as queued/not applicable?
<steev> but it's almost missing everything from today
<robclark> abhinav___: one thing that occurred to me.. I guess the CRC/MISR stuff is happening post-mixer.. I guess I could run some igt test which uses CRCs and see if they match in good/bad state?
<abhinav___> robclark: yes MISR is after mixer. we can try that. however i dont know how to interpret it. if it matches, then great but if it doesnt match it does not necessarily mean its bad. so for example, lets say the image being sent out itself is black ( hypothetical but possible ), then MISR will be different but thats a genuine mismatch and expected
<robclark> yeah, just need to pick a test that comes up with the same crc each time.. and maybe hack it up to print the crc
<abhinav___> that actually brings me to the other thing i was planning to do, the black screen you are seeing could be the interface border color ( used when pipes are not fetching anything or pipes are not connected ) OR what if the buffer itself is black
<abhinav___> we can change the interface border color as one try, I would like to try it first before sharing
<abhinav___> for the second one, when you do see the blank screen, is there a way to do a buffer dump to see if its actually black? its hypothetical but we have seen things like this before
<robclark> I hacked in a memset(ptr, 0xff, 4096) to test one of those theories.. I guess I can try hacking the solid-fill color for the other..
<robclark> also, w/ fbcon you should be able to `cat /dev/urandom > /dev/fb0` and see "snow" (which I don't)
<robclark> for fill color, I guess intf_timing_params::border_clr ?
<abhinav___> correct
<abhinav___> so technically
<abhinav___> changing the border_clr here
<abhinav___> which is black by default
<abhinav___> to some other color
<abhinav___> should work
<abhinav___> I just havent tried it
svarbanov has quit [Ping timeout: 480 seconds]
<robclark> yup, that is what I'm trying..
<robclark> (also changed the underflow color to something different)
<abhinav___> underflow is blue
<abhinav___> so if it were underflow you should have seen blue
<robclark> ok, I wasn't sure if 0xff was alpha channel or not..
<abhinav___> no its RGB so 0xff is blue for sure
<robclark> k
svarbanov has joined #linux-msm
<robclark> nope, still black screen
<robclark> abhinav___: huh.. so it seems like setting a non-zero fill color => black screen!
<abhinav___> ok, so i was reading up on the interface border color which we just changed. That one will not affect the result. it has a different meaning. we can try changing two other places
<abhinav___> 84 struct dpu_mdss_color *color,
<abhinav___> 85 u8 border_en)
<abhinav___> 83 static void dpu_hw_lm_setup_border_color(struct dpu_hw_mixer *ctx,
<abhinav___> 86 {
<abhinav___> 87 struct dpu_hw_blk_reg_map *c = &ctx->hw;
<abhinav___> 88
<abhinav___> 89 if (border_en) {
<abhinav___> 90 DPU_REG_WRITE(c, LM_BORDER_COLOR_0,
<abhinav___> 91 (color->color_0 & 0xFFF) |
<abhinav___> 92 ((color->color_1 & 0xFFF) << 0x10));
<abhinav___> 93 DPU_REG_WRITE(c, LM_BORDER_COLOR_1,
<abhinav___> 94 (color->color_2 & 0xFFF) |
<abhinav___> 95 ((color->color_3 & 0xFFF) << 0x10));
<abhinav___> 96 }
<abhinav___> 97 }
<abhinav___> 98
<abhinav___> this one is the mixer color
<abhinav___> if we call this function and if this color shows up
<abhinav___> then pipe itself is not staged
<robclark> ok.. I was wondering what we were changing in intf.. because on mdp5 it was mixer where I needed to set it :-P
<abhinav___> which is unlikely
<abhinav___> because from the reg dump pipe is staged
<abhinav___> but still lets call this
<abhinav___> then there is a second color
<abhinav___> 1050 * These updates have to be done immediately before the plane flush
<abhinav___> 1049 /*
<abhinav___> 1051 * timing, and may not be moved to the atomic_update/mode_set functions.
<abhinav___> 1052 */
<abhinav___> 1053 if (pdpu->is_error)
<abhinav___> 1054 /* force white frame with 100% alpha pipe output on error */
<abhinav___> 1055 _dpu_plane_color_fill(pdpu, 0xFFFFFF, 0xFF);
<abhinav___> this is for the solid color of the plane
<abhinav___> so if there is an issue with fetch
<abhinav___> of the plane
<abhinav___> it should show this
<abhinav___> I would suggest changing this to two different colors
<abhinav___> and see which one comes
<robclark> hmm, is setup_border_color() not actually called anywhere? I guess maybe we haven't tested having non fullscreen planes only?
<abhinav___> and it will tell us where the issue is
<abhinav___> correct its not called today
<abhinav___> regarding interface color which i asked to change earlier, it only takes effect when there is less active pixels on the screen than what interface is programmed for
<abhinav___> so that has a different use-case
<abhinav___> i am hoping one of these tell us the clue
<robclark> abhinav___: even this => black screen
<robclark> on mdp5 it was a bit more involved.. we had to set an enable bit, and skip the base stage in the mixer
<abhinav___> alright, no surprise on this one as the pipe was staged
<abhinav___> so plane solid color has to be the next
<robclark> see mdp5_ctl_blend(), fwiw..
<robclark> abhinav___: I suppose the pdup->is_error path might not be tested?
svarbanov has quit [Ping timeout: 480 seconds]
<lumag_> robclark, mea culpa. I don't think I tested it
<robclark> I guess it is *normally* an error path..
<robclark> ie. so we don't necessarily have a good way to test it
<robclark> ok, I'm going back to the igt/crc idea
<abhinav___> robclark: yes probably not tested. we can work on that in the bug and then try it once its fixed. its an important test
<robclark> I'd be open to addition of some way to test it.. maybe some debugfs switch and corresponding igt test?
<abhinav___> yes will figure it out. the crash seems to be from scaler path [ 4.078512] dpu_hw_setup_scaler3+0x524/0x7e4
<abhinav___> [ 4.078515] _dpu_hw_sspp_setup_scaler3+0x78/0xa8 so lumag_ another fix there?
<lumag_> I'll try taking a look in one of the next days
<robclark> for reference, I did add an igt test that uses debugfs to disable hw GPU hang detection so that we had a way to test the sw timer based fallback hang detection.. https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/msm_recovery.c#L133 .. we could use igt_debugfs_write() helpers in a similar way for tests to simulate/force error cases to get better test coverage
lumag_ has quit [Ping timeout: 480 seconds]
lumag_ has joined #linux-msm
svarbanov has joined #linux-msm
<robclark> abhinav___: ok.. confirmed that crc's change between good and bad state.. with blank screen the CRCs are all zero
<lumag_> robclark, just out of curiosity. Could you please trace CTL_START/CTL_FLUSH writes?
<lumag_> and compare timings and values being flushed/written
<robclark> I was kinda wishing for a debugfs way to trigger CTL_FLUSH write.. I suppose I could do it with devmem (ie. if the theory is we haven't flushed some updates)
<abhinav___> robclark: this certainly means frame wasnt pushed out. since issue is log sensitive, drm_trace with ctl_flush might be a better option
<robclark> ok, not really seeing any difference in CTL_FLUSH values.. seems like CTL_START is not written in either case?
<robclark> abhinav___: fwiw, regular printk is fine.. and even drm.debug enabled, you can still see the issue.. as long as loglevel is low enough, it doesn't slow things down enough to matter much, IME.. it is when you need to pump the debug msgs out the uart that the timing really changes