camus has quit [Remote host closed the connection]
<javierm>
robclark: I see that was added in commit 62b2f026cd8e ("drm/bridge: adv7533: Change number of DSI lanes dynamically") but I wonder if the mipi dsi API should be extended to allow changing the lane # instead
<javierm>
robclark: or is the detach/attach cycle really needed ?
<robclark>
yeah, that commit was a long time ago.. not entirely sure why we suddenly start hitting that.. but yeah, something a bit less brute force than detach -> attach might be called for
camus has joined #dri-devel
Stary has joined #dri-devel
Stary has quit []
camus has quit [Remote host closed the connection]
<javierm>
so it seems the msm drm driver removal path is not that robust
camus has joined #dri-devel
Stary has joined #dri-devel
* robclark
looks
<javierm>
the driver also is failing to bind so I guess something is not cleaned up correctly in the probe error path
<javierm>
this is on an HP X2 btw
toolchains has joined #dri-devel
JohnnyonFlame has joined #dri-devel
camus has quit [Remote host closed the connection]
<robclark>
hmm, I'm not seeing that w/ `poweroff` on lazor.. but I'm on -rc5.. unless it happens before something clears the terminal (is there a way to disable that?)
<javierm>
robclark: I think is something in my .config that prevents the drm driver to properly probe and that's why I'm hitting that
<robclark>
lazor should be *pretty much* the same (although I think there are variants of both with a different eDP bridge)
<robclark>
hmm, ok.. that could be.. there are a lot of ways for that to go wrong
<robclark>
the whole multiple sub devices and component framework... has a lot of different ways to fail
<javierm>
robclark: yeah...
camus has joined #dri-devel
Stary_ has joined #dri-devel
Stary is now known as Guest5846
Stary_ is now known as Stary
<robclark>
can you scripts/decode_stacktrace.sh the first splat.. see what line it is on in drm_modeset_lock_all_ctx().. I think that should give you multiple line #s for drm_modeset_lock_all_ctx+0x3c4/0x3d0
Guest5846 has quit [Read error: Connection reset by peer]
<javierm>
robclark: sure, I was trying linux-next but let me rebuild v5.19-rc7 to have a proper vmlinux
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
<robclark>
you got more complete dmesg? Maybe there is something to spot when it is trying to probe.. if it fails to probe maybe drm.debug=0x1f will give a few more msgs.. or CONFIG_DEBUG_DRIVER=y is useful for seeing what things probe-defer (but also got kinda spammy in recent times)
<robclark>
iirc there is some debugfs file that lists deferred devices as well, which is sometimes helpful for figuring out what is missing
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
<javierm>
robclark: yes, I added that debugfs file btw :)
<robclark>
ahh.. thx :-)
<javierm>
robclark: I'm still investigating. Let me finish that v5.19-rc7 build and I'll give you the full dmesg
<robclark>
sg
<javierm>
robclark: surprinsingly there was nothing in /sys/kernel/debug/devices_deferred, I think that the probe succeeded but not the msm_drm_bind()
camus has quit [Remote host closed the connection]
<robclark>
hmm, yeah, if some sub-device doesn't show up, the toplevel thing can probe but be sitting in limbo..
<javierm>
yeah and then msm_drv_shutdown() make some false assumptions
<robclark>
or if missing a bridge or panel, etc.. but that looks ok in your config
toolchains has quit [Read error: Connection timed out]
<javierm>
robclark: yeah, although the bridge is found but it's probe fails, I tried to figure out yesterday but couldn't
<robclark>
hmm, ok, I can probably simulate this then by faking bridge probe fail
<javierm>
dunno why ccache isn't picking the objects since I built v5.19-rc7 just yesterday :(
<javierm>
robclark: so in summary I think the problem is twofold 1) something missing in my .config that makes ti-sn65dsi86 probe to be deferred forever
<javierm>
robclark: 2) that component not being present makes msm probe to be in a limbo as you said and assuming something wrong on removal
<robclark>
hmm, still not hitting that splat if I fake bridge probe defer
<javierm>
weird
<javierm>
robclark: no worries, I'll keep poking at this and let you know if figure it out
camus has quit [Remote host closed the connection]
<robclark>
thx.. it could be something in linux-next or something btwn rc5 and rc7, I guess?
<javierm>
robclark: I think is something with my minimal config since the same version but with the fedora kernel config it does work
camus has joined #dri-devel
<javierm>
tried to check the differences but the fedora kernel config is massive
<robclark>
yeah, even comparing your config to mine which is fairly minimal is a lot of delta
<robclark>
np.. if you get stuck I can try your .config .. but rebuilding would take a while
<javierm>
robclark: don't worry. I just asked in case that log would ring a bell to you
<robclark>
623f279c77811475ac8fd5635cc4e4451aa71291 was in theory supposed to fix these issues ;-)
camus has quit [Remote host closed the connection]
<javierm>
I see. Wonder if that's just fixing the sympthoms rather than the cause
<robclark>
well, I mean the cause is something missing causing some part or another to not probe.. this sort of issue is all about just fixing the various symptoms that result ;-)
<javierm>
robclark: it may be. But I also I think that calling drm_atomic_helper_shutdown(drm) in shutdown may not be the correct thing to do...
abws has joined #dri-devel
<javierm>
because msm_drm_unbind() calls to msm_drm_uninit() that already drm_atomic_helper_shutdown(ddev) if ddev->registered
<javierm>
if that's the case and is called twice, then commit 623f279c7781 ("drm/msm: fix shutdown hook in case GPU components failed to bind") was indeed papering over the issue
<robclark>
hmm..
<javierm>
robclark: let me write a patch and I'll test it
abws_ has quit [Ping timeout: 480 seconds]
<robclark>
so, looking at the original thread for `drm/msm: add shutdown support for display platform_driver`.. we need to be sure to pwr things off before arm_smmu shuts down
<robclark>
and I guess that wasn't happening in some path?
<robclark>
in theory I guess it should be ok to call twice.. but maybe we are leaving some dangling state/pointers..
<javierm>
robclark: yes, but the problem is that will only be properly initialized if a bind happens
<javierm>
then you could bind -> unbind -> remove
<javierm>
but if bind fails, then you can't call it on remove
<javierm>
either the check should take into account drm->registered or just not call it twice
<javierm>
I would prefer the later and just drop the .shutdown callback
<robclark>
well, the patch that added the call was fixing *some* case.. I'm just not exactly sure what..
JohnnyonFlame has quit [Ping timeout: 480 seconds]
<robclark>
or rather, why the shutdown in msm_drm_uninit() wasn't enough
<javierm>
robclark: hmm. I'll have dinner and look at that commit and the discussion to get more context
<robclark>
I guess if device->remove() doesn't happen on poweroff/reboot then the correct thing is to check drm->registered in both paths
camus has quit [Remote host closed the connection]
<javierm>
robclark: yeah. At least that would be consistent with msm_drm_uninit()
<javierm>
robclark: I'll post that patch after dinner. We can later drop the .shutdown if found to not be needed
<javierm>
robclark: and then figure out why is not binding with my config to start with :)
camus has joined #dri-devel
<robclark>
if we don't get dev->remove() on shutdown then we defn need to keep dev->shutdown().. and I suspect that is the reason it was added in the first place
<javierm>
robclark: yeah, need to look at the driver model to remember if that was the case
<javierm>
robclark: now the question is, if .shutdown is always called then maybe we could drop the first call in .unbind ?
<robclark>
it looked like shutdown was only called in the poweroff/reboot path.. remove was called in a lot more paths, I didn't go thru 'em all
<javierm>
robclark: I see
<robclark>
it is, ofc, an area that I haven't looked at too carefully
<javierm>
and you were correct about .remove and .shutdown being executed in different code paths
sdutt has quit [Ping timeout: 480 seconds]
<robclark>
javierm: looks reasonable.. I had in mind the smaller one-liner patch just to check drm->registered in the 2nd spot, but I suppose it is clearer to have that handled in just one place
gouchi has quit [Remote host closed the connection]
<javierm>
robclark: yeah, I also was going to just add the one liner but then thought that we could unify it
<javierm>
in case there are issues found later so we have a single place to handle that
<robclark>
javierm: (probably) unrelated, but "Couldn't register GPU cooling device" in your dmesg is a bit odd, but you did have tsens and thermal enabled in your .config so not sure what is missing for that
<javierm>
robclark: yeah, me neither. But thought that was a red herring since I noticed that the driver should be able to probe even without one
<robclark>
yeah
camus has quit [Remote host closed the connection]
<javierm>
robclark: posted the patch. Let's see what others think about it
camus has joined #dri-devel
<robclark>
cool, thx
<javierm>
at least now I can investigate the other issues without the kernel crashing on reboots to boot my new kernel builds :)
<robclark>
heheh
Duke`` has quit [Ping timeout: 480 seconds]
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
off^ has quit [Ping timeout: 480 seconds]
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
off^ has joined #dri-devel
toolchains has joined #dri-devel
toolchains has quit [Ping timeout: 480 seconds]
camus has quit [Remote host closed the connection]
toolchains has joined #dri-devel
camus has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
cheako has quit [Quit: Connection closed for inactivity]
mangix has joined #dri-devel
<mangix>
is this the right channel to ask amdgpu questions?
<mattst88>
#radeon might be better
<mangix>
ty
toolchains has quit [Ping timeout: 480 seconds]
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
CME has joined #dri-devel
JohnnyonFlame has quit [Read error: Connection reset by peer]
pcercuei has quit [Quit: dodo]
camus has quit [Remote host closed the connection]
JohnnyonFlame has joined #dri-devel
camus has joined #dri-devel
camus has quit [Remote host closed the connection]
toolchains has joined #dri-devel
camus has joined #dri-devel
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
camus has quit [Remote host closed the connection]