ChanServ changed the topic of #aarch64-laptops to: Linux support for AArch64 Laptops (Asus NovaGo TP370QL - HP Envy x2 - Lenovo Mixx 630 - Lenovo Yoga C630)
systwi_ has joined #aarch64-laptops
systwi has quit [Ping timeout: 480 seconds]
<steev> no112: out of curiosity, does your gb2 run W11?
<qzed> What's a `Internal error: synchronous external abort: 96000010 [#1] SMP`?
<steev> a bad thing (tm)
<qzed> heh, that much I gathered by it (somwhat) crashing the system xD
<qzed> bamse: So I've rebased onto next-20220520 and tried to get the GPU working, following the sc7280 dts
<qzed> and naturally I'm running into issues again (including the above)
<qzed> so essentially some dpu_kms stuff is failing with EPROBE_DEFERRED and also the panel can't read EDID data for some reason
<qzed> also some times the system locks up during boot
<qzed> there are also some weird issues when I build in the GCC_8180 module vs. when I build it as module...
<qzed> when I build it as module (and include it in the initramfs) things (more or less) work (i.e. boot but problems above still exist)
<qzed> but when I build it into the kernel things lock up early on
<qzed> no112: there's some weirdness with grub on the Surface Pro X as well (it refuses to boot with some fedora grub patches applied). Seems a bit different though, but you could try https://github.com/linux-surface/grub-image-aarch64/releases/tag/grub-2.06
<bamse> qzed: i see the same problem on primus when i rebased today, i think there's something in the new edp probe path that is racy
<qzed> ah good to know I'm not the only one with issues
<bamse> qzed: and i think i had an error wrt issues powering up the gpu as well :/
<qzed> IIRC I read some comment about some panel probe stuff that needs to run synchronously... I'll try to find it, although I'm not sure if it's related
<bamse> i see
<bamse> btw, in dp_display_unbind() you want to comment out the call to dp_catalog_hpd_config_intr()...
<bamse> because if the dp_parser probe defers on finding the panel, it will propagate that probe defer and in the cleanup path of the whole mdss it will attempt to unbind, but at that time the dp controller isn't yet clocked...so the register access in hpd_config_intr will fault and crash the system
<qzed> bamse: ahh, thanks for that tip!
<bamse> don't know if there are any more patches trickling in related to this...so i guess we'll have to test this properly once -rc1 shows up, and fix it if anything remains
<qzed> yeah
<qzed> ah, somehow repeated the previous message instead of what I wanted to write...
<qzed> I'm not sure if that link is the probe/defer issue though as the panel-edp driver isn't marked with the async flag
<qzed> so if I'm not missing anything it should probe synchronously at that point
<qzed> and with the dp_catalog_hpd_config_intr() commented out it does get further, nice
<qzed> it can now even read edid successfully
<qzed> I guess I can ignore the `Trying to free already-free IRQ 191` for now but I#ll try to have a look at that (comes up when msm_dpu is deferred, I guess that's a simple free-but-already-freed/not-yet-set-up bug)
<bamse> not sure if it's a double free or a free before allocation
<qzed> yeah haven't looked at it yet, assuming since it's failing some time during probe it's probably a free before allocation
<qzed> so at the moment it seems to (at minimum) fail eDP link training
<qzed> but there's also a warning about `rcg didn't update its configuration`
<qzed> (`disp_cc_mdss_edp_pixel_clk_src`)
<bamse> so you get through the aux dance of reading out edid etc?
<qzed> yeah, that seems to work
<qzed> at least it's spitting out the correct model number and complains about it not having an entry
<bamse> yeah, i get that warning as well...i think the rate change propagates unexpectedly when we power up the phy
<bamse> not sure what changed there...
<bamse> might have something to do with the fact that we do this dance while the display is active
<qzed> I have no idea on that
<bamse> so do you get: [drm:_dpu_kms_initialize_displayport:635] [dpu error]modeset_init failed for DP, rc = -517 ?
<qzed> yeah
<bamse> qzed: well i wrote the phy driver, so i don't have anyone to blame ;)
<bamse> qzed: same here...on sc8280xp
<qzed> I assume it's trying again at some point (the error being EPROBE_DEFER and all)
<qzed> at least it complains about `[drm:dp_ctrl_link_train [msm]] *ERROR* max v_level reached` later
<bamse> okay, nice...i don't get that far...
<bamse> i'm stuck in the infinite loop of that error
<qzed> oh, hmm...
<qzed> I assume you're also trying the DP via aux-bus?
<bamse> yes
<bamse> hmm, but i might have failed to get is_edp set
<qzed> the log is a bit weird on my end... so it first fails with -517, at the same time it fails to read the EDID
<qzed> then later it's able to read the EDID
<bamse> and you only have a single dp instance enabled?
<qzed> yeah, so far I've only added the edp, I haven't tried anything with the USB DP things yet
<qzed> I've updated the log at https://gist.github.com/qzed/2288c651a017fcc016b524d3bac5a43e in case it helps
<bamse> sweet, flagging the dp controller as EDP made is_edp go true and now i only get a single probe defer, presumably because the module isn't loaded yet...and on the next attempt the panel showed up and the edid read passed
<qzed> nice!
<qzed> sounds like the same thing I can see
<bamse> then i ran into clock issues, but that might be expected ;)
<qzed> the `rcg didn't update its configuration`?
<qzed> also does Arch not have the qcom/a640_gmu.bin firmware?
<qzed> oh huh, that's not in linux-firmware either
<qzed> okay, so with that firmware from some oneplus repo I now get `msm_dpu ae01000.mdp: [drm:adreno_load_gpu [msm]] *ERROR* Couldn't power up the GPU: -13`
<bamse> that's what i saw on the primus as well, so something must have changed in the last couple of weeks
<qzed> ah okay
<qzed> btw. is there some good way to get the qcom firmware (other than googling for some github firmware dump of some semi-related phone)?
<bamse> no, so i need to talk to qualcomm about that
<bamse> i thought we had support for some other a640 device out there already
<qzed> IIRC it looked like it based on your commit for the 680 gpu
<bamse> yeah, but i think the a640 firmware is shared with sm8150 or something
<bamse> i mean, the a680 uses a640, and that's also used in sm8150
<qzed> ah
<qzed> I'm kinda wondering how different the supposed a690 is compared to the a680...
<qzed> at least if you look in the windows device manager some strings read a680 so there really can't be that much
<qzed> (and so far using the a680 definitions for that seems to work)
<qzed> meaning `dev->power.disable_depth > 0` and it's not active
<bamse> you got a a690?
<qzed> yeah, the microsoft "SQ2" chip has one
<bamse> according to the downstream sources, the a690 uses the a660 firmware...
<bamse> so it seems to be different from a680
<qzed> ah
<qzed> hmm interesting
<qzed> so it should use `"a660_sqe.fw` and `a660_gmu.bin`?
<bamse> and i presume we're in the same boat then, because the sc8280xp has a690 as well
<qzed> oh neat
<bamse> yes
hexdump0815 has joined #aarch64-laptops
hexdump01 has quit [Ping timeout: 480 seconds]
jhovold has joined #aarch64-laptops
<steev> shouldn't the surface use the signed firmware from windows?
exit70 has quit [Quit: ZNC 1.8.2 - https://znc.in]
exit70 has joined #aarch64-laptops
samueldr_ has quit [Remote host closed the connection]
samueldr has joined #aarch64-laptops
<Dylanger> <qzed> "okay, so with that firmware from..." <- I was getting an error like this on the Duet 5 too, it didn't actually do anything tho
iivanov has joined #aarch64-laptops
matthias_bgg has joined #aarch64-laptops
<no112> steev: It is eligible but isn't currently upgradeable to W11 (Regional timing thing, as I am in Australia)
<no112> qzed: Huh interesting. Will have a look at that Grub. Thanks!
<qzed> steev: I can get most of it from windows (mostly the man files) but havent found any gpu firmware :/
<qzed> *mbn, not man
<qzed> or is that packaged up int some other file somehow?
<steev> qzed: on older machines it was called something along the lines of qcdxkmsucXXXX.mbn
<steev> e.g. on my c630 it's qcdxkmsuc850.mbn
<qzed> yeah, I have that under the "zap-shader" node
<qzed> but I couldn't find the others (aXXX_sqe.fw and aXXX_gmu.bin)
<bamse> steev: the zap is signed, the sqe and gmu are not signed
<steev> ah
<bamse> steev: but unfortunately the sqe and gmu are not available in windows...so i guess they are baked into the driver or something
<steev> that's quite possible
<no112> For what its worth regarding GPU FW, I've been using the copies from Linaro (Linked on the aa64 laptops wiki)
<robclark> qzed: my expectation is that a690 is pretty close to a660 and "7c3"... ie. a6xx subgen 4.. haven't looked too much on the kernel side, which is where most of the work should be, although probably shouldn't be so bad.. userspace would need something like: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/freedreno/common/freedreno_devices.py#L344 .. I assume num_sp_cores/num_ccu will be 4, but a cmdstream trace
<robclark> would confirm that and the couple other params.. maybe if we are lucky we can find an android blob driver that support a690 and spoof the gpu-id, IIRC that is how we did it for a680
matthias_bgg has quit [Ping timeout: 480 seconds]
<bamse> robclark: for a680 we just sprinkled the fairy dust in the kernel and added a section in mesa and it happened to work
<no112> the classic story of ARM development :-P
matthias_bgg has joined #aarch64-laptops
<robclark> bamse: on the userspace side there are a couple magic params we needed to figure out.. pretty sure I managed to get an a680 cmdstream trace from somewhere for those. (But once you have a trace it isn't too hard to figure out and only a few line patch in mesa to add the device ;-))
<qzed> robclark: thanks! I'll see what I can do once we have the kernel errors fixed
<qzed> basic tty should work without mesa, right?
<robclark> yup
matthias_bgg has quit []
no112 has quit [Quit: Leaving]
jhovold has quit [Ping timeout: 480 seconds]
<qzed> bamse: I think I know what causes the `adreno 2c00000.gpu: [drm:adreno_load_gpu [msm]] *ERROR* Couldn't power up the GPU: -13`
<qzed> I think it's related to the EPROBE_DEFER
<qzed> at least it goes through a driver remove cycle
<qzed> at some point it reaches `adreno_unbind()`
<qzed> that then calls first `pm_runtime_force_suspend()` and then `gpu->funcs->destroy()`
<qzed> however both call `pm_runtime_disable()` on the GPU device
<qzed> the latter via `a6xx_destroy()` and `adreno_gpu_cleanup()`
<qzed> so we start out with `dev->power.disable_depth = 1` before the whole probe/remove cycle but end it with `dev->power.disable_depth = 2`
<qzed> so the next call that should enable the GPU just decreases `dev->power.disable_depth` by one, which means it's still disabled
<bamse> sounds like either destroy or cleanup should be the "invers" of "init"...but not both
<qzed> yeah
<bamse> and it seems that with the recent refactoring of mdss/dpu and edp changes we're probe deferring more
<qzed> or well, it's rather unbind and cleanup that are the two conflicts
<qzed> okay, now I get further, to `platform 2c6a000.gmu: [drm:a6xx_gmu_resume [msm]] *ERROR* GMU firmware initialization timed out`
<qzed> followed by
<qzed> `platform 2c6a000.gmu: [drm:a6xx_rpmh_stop [msm]] *ERROR* Unable to power off the GPU RSC`
<bamse> on your a690?
<qzed> `adreno 2c00000.gpu: [drm:adreno_load_gpu [msm]] *ERROR* Couldn't power up the GPU: -110`
<qzed> yeah
<bamse> okay, that's how far i got as well
<qzed> neat okay
<bamse> then i took a break to poke at dp instead
<bamse> suspect that i was missing something in my kernel patch
<qzed> I assume that's not an issue with the firmware? right now I'm loading qcom/a660_sqe.fw and qcom/a660_gmu.bin
<qzed> both seem to load fine based on the two lines before
<bamse> and the zap.mbn from windows?
<bamse> qcdxkmsuc8280.mbn or so
<qzed> you mean `qcdxkmsuc8180.mbn` under zap-shader? yeah
<qzed> although I do not get a log message for that one... so I guess it's not loaded yet?
<bamse> isn't it silent if you don't specify it?
<qzed> what do you mean? if it's not specifically in the list in adreno_device.c?
<bamse> on some devices the zap shader isn't used, so i think it won't tell you if you omitted it in dt
<bamse> been a while since i looked at that code though
<qzed> I'm looking through it right now, I think it doesn't use the `adreno_request_fw()` function which prints the fwname if it's loading the signed firmware
<qzed> I'll add a debug output and try
<qzed> here it seems to use `request_firmware_direct()`:
<robclark> for windows (and android) devices, you defn need zap shader.. it will be insta-reboot if you don't have it, but possibly you aren't getting far enough for those fireworks
<robclark> I'm not sure if there is anything that isn't chromebook which doesn't need zap.. possibly android-auto things?
<robclark> hmm, but those are probably still using qcom's tz, so I think they would also need zap to take gpu out of secure mode at boot
<bamse> robclark: no, that's correct...everything !chrome uses that today
<CosmicPenguin> Not super following the whole thread, but the GMU implementation will be very tightly linked to the hardware
<bamse> CosmicPenguin: you don't happen to know where the gmu firmware lives in windows?
<qzed> doesn't seem like the zap shader is being loaded, so I guess it doesn't even get that far yet
<CosmicPenguin> oh, thats a great question. The other firmware images are directly built into the driver
<bamse> CosmicPenguin: okay, i figured...and they probably put the zap as a separate file to be able to rely on the pil loader with its certificate checks
<CosmicPenguin> yep - absolutely
<CosmicPenguin> The sqe actually ships internally as a header, and the linux/android build has to convert it to a binary
<CosmicPenguin> but the gmu ships internally as a binary, and I never asked what Windows did, but I have to assume they're converting it into a header - it would be super easy to do
<bamse> ahh, yeah both makes sense
<bamse> when you don't have gpl to deal with
<robclark> It is not impossible that a690 uses a660_gmu.bin .. it's possible it doesn't but worth a try
<CosmicPenguin> There is a handshake between the kernel and the GMU during init - usually its just agreeing on a version number (which is why the gpu list has a GMU major minor in it)
<CosmicPenguin> But as the GMU transitions further toward doing scheduling on its own the initialization process will become increasingly more complex
<robclark> but that's tomorrow's problem ;-)
<qzed> so re zap: I don't think it even gets there yet, I've tried to add some debug prints but they don't show up
<qzed> re embedded firmware: are there any headers or magic constants to look out for?
iivanov has quit [Remote host closed the connection]
iivanov has joined #aarch64-laptops
fleebs has joined #aarch64-laptops
iivanov has quit [Ping timeout: 480 seconds]
<qzed> bamse: my attempt at quick-fixing the power-up issue: https://github.com/linux-surface/kernel/commit/0d2387fe750e0a442f639163429443d784e99dc0
<robclark> try a660_gmu.bin .. nearly all the a6xx use same gmu fw as other a6xx within same subgen, which in case of a690 would be a660
<qzed> bamse: I'll need to revisit this at some point as it seems that if we drop that call there, we could run into situations where it's unbalanced the other way around if the init function fails...
<qzed> robclark: already tried that
<qzed> qcom/a660_gmu.bin and qcom/a660_sqe.fw
<qzed> here's a full dmesg log from the current setup: https://gist.github.com/qzed/2288c651a017fcc016b524d3bac5a43e#file-dmesg-log
<robclark> hmm, ok.. I assume you added a690 to adreno_is_a660_family()?
<qzed> yeah
<robclark> k
<robclark> possibly there is a downstream kernel kgsl somewhere w/ a690 support? That would certainly be helpful..
<qzed> kgsl?
<robclark> downstream android kernel driver
<qzed> I'll try to look around
<qzed> is there a good resource for which devices use the a690? so far I only know of the Surface Pro X
<HdkR> Looks like Snapdragon 8cx G2 SoC has it as well
<HdkR> Which the HP Elite Folio uses
<robclark> in the past, it seemed like they did early bringup of the windows SoC's on android.. I don't think think there is any actual android device w/ a690, but there also wasn't any a680 android device
<qzed> as far as I can tell the Surface Pro X SQ2 also has the 8cx Gen 2 chip (which would make sens then with the GPU)
fleebs has quit [Remote host closed the connection]
fleebs has joined #aarch64-laptops
<fleebs> Has anyone got the ASUS NovaGo builtin keyboard to work?
alpernebbi has quit [Ping timeout: 480 seconds]
alpernebbi has joined #aarch64-laptops
iivanov has joined #aarch64-laptops
iivanov has quit [Ping timeout: 480 seconds]