robclark changed the topic of #aarch64-laptops to: Linux support for AArch64 Laptops (Chrome OS Trogdor Devices - Asus NovaGo TP370QL - HP Envy x2 - Lenovo Mixx 630 - Lenovo Yoga C630 - Lenovo ThinkPad X13s - and various other snapdragon laptops) - https://oftc.irclog.whitequark.org/aarch64-laptops
<Dantheman825[m]> did anyone else have audio issues on their X13s on 6.7.2?
KhazAkar has quit [Quit: Connection closed for inactivity]
<Dantheman825[m]> I compiled steev's branch like I usually do using laptop_defconfig (with minor adjustments to the config to allow compressed firmware and HID_NINTENDO support), and the speakers don't show up anymore (they worked fine on 6.7.1 with pretty much the same config)
<steev> Audio is working here with 6.7.2
<Dantheman825[m]> How odd, I wonder what it could be on my end
<steev> Dantheman825[m]: fwiw, you could try backing up your asound.state file, disabling alsa-restore service, remove it, and then reboot
<steev> where remove it = the asound.state file (thought i get rid of everything inside /var/lib/alsa/ )
<steev> i'm still working on the 6.8 stuff, since i want to bring in the bits we're still missing for webcam, as well as fix up the laptop_config
hexdump0815 has joined #aarch64-laptops
hexdump01 has quit [Ping timeout: 480 seconds]
<Dantheman825[m]> I removed the content from /var/lib/alsa, though the alsa-restore service was already disabled
<Dantheman825[m]> Audio is still missing on my 6.7.2 kernel (It’s also missing in Ubuntu’s 6.5.0-laptop-1008 kernel for some reason)
<steev> but if you boot back into 6.7.1 it comes back?
<albsen[m]> <jglathe_> "Now X13s works with touchscreen..." <- I'm on this version that you have on https://github.com/jglathe/linux_ms_dev_kit/tree/jg/blackrock-v6.7.y and touchscreen is working. only audio isn't working so clearly something else in my userspace is missing. or maybe firmware. still seeing the error in dmesg: `snd-sc8280xp sound: ASoC: invalid header size for type 171863149 at offset 0x0 size 0x2835c.`
<steev> i'd definitely check the audioreach-tplg.bin file - https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/qcom/sc8280xp/LENOVO/21BX and iirc, there's a symlink of it into /lib/firmware/qcom ?
<steev> it's in sc8280xp
<albsen[m]> steev: yes, I took those from there and symlinked it
<steev> make sure you made the correct symlink?
<steev> also make sure you took the actual file and not just downloaded the html of the website
<albsen[m]> ```SC8280XP-LENOVO-X13S-tplg.bin -> LENOVO/21BX/audioreach-tplg.bin``` in ```ll /usr/lib/firmware/qcom/sc8280xp```
<albsen[m]> steev: you know what ... youre correct: ```/usr/lib/firmware/qcom/sc8280xp/LENOVO/21BX/audioreach-tplg.bin: HTML document, Unicode text, UTF-8 text, with very long lines (1036)```
<albsen[m]> lol
<albsen[m]> haha, omg
<steev> i, and many others, have been bitten by that many times
<HdkR> The world over has downloaded too many html files from github :P
<albsen[m]> audioreach-tplg.bin: data
<albsen[m]> will reboot and check
<steev> at this point, it would be, missing the newest alsa-ucm-conf stuff
<albsen[m]> steev: done, and I have audio! thank you. its faint but its there!
<albsen[m]> I've recently pulled in the new alsa ucm2 from gh
<albsen[m]> let me check if there are any updates in the last 5 days
<albsen[m]> I'm on fe19ccd which is the most recent change related to headphone levels. I guess the speak volume is what it is for now. dont have headphones with me at the moment, will recheck later
<albsen[m]> this is soo cool, finally (almost) everything is working
KhazAkar has joined #aarch64-laptops
<steev> if volume is low you can try what isuggested to dantheman earlier
<steev> just have to make sure to stop alsa-restore otherwise it will just re-save what it currently has
<steev> anyway, work meeting in a few hours, i should get some sleep, glad you have at least some audio now!
<albsen[m]> steev: thanks a lot
jhovold has joined #aarch64-laptops
martiert has quit [Quit: WeeChat 4.2.0]
<jhovold> craftyguy, jglathe: I just tried connecting an external display and triggered a lockup / reboot almost immediately
<jhovold> could you try disconnecting your external display and see if the issues you reported appear to go away?
<jhovold> hmmm, it also seems the VT console is no longer mirrored on the external display with 6.8
martiert has joined #aarch64-laptops
iivanov has joined #aarch64-laptops
matthias_bgg has joined #aarch64-laptops
xroumegue has quit [Ping timeout: 480 seconds]
xroumegue has joined #aarch64-laptops
<craftyguy> jhovold: I had to roll back to 6.7 for now
KhazAkar has quit [Quit: Connection closed for inactivity]
<jhovold> craftyguy: understandable, it's probably two separate drm regressions as bamse saw a fence related splat
<jhovold> the crash I hit happened on hotplug so likely separate
<Dantheman825[m]> <steev> "but if you boot back into 6.7...." <- yes
KhazAkar has joined #aarch64-laptops
<jhovold> it appears the suspend issue with external display connected is no longer there with 6.8-rc
<jhovold> since that is likely a result of all the rework to enable runtime pm, I guess we won't see a fix for this in pre-6.8 kernels...
<jhovold> another hard crash when stopping Xorg with 6.8-rc2
<jhovold> (after disconnecting an external display)
valida-69[m] has joined #aarch64-laptops
iivanov has quit [Read error: Connection reset by peer]
iivanov has joined #aarch64-laptops
<travmurav[m]> @_oftc_jglathe:matrix.org: I've updated sltest to be a bit more reliable, made a new release on gh, sorry for this, could you test it on volterra again with the tcblaunch you have extracted?
jglathe__ has joined #aarch64-laptops
_whitelogger has joined #aarch64-laptops
<jhovold> Here's an updated wip branch for the X13s:
<jhovold> Changes include
<jhovold> - add gpu thermal zone
<jhovold> - avoid asserting touchscreen reset twice at boot
<jhovold> New issues in 6.8-rc
<jhovold> - possible drm regression causing wayland lockup
<jhovold> - external display hotplug regressions
<jhovold> - vt console not mirrored on hotplug
<jhovold> - hard crashes (on hotplug and after disconnect)
<adamcstephens> i think i actually just experienced that wayland lockup on rc1
<robclark> anyone got logs for wayland lockup?
<bamse> jhovold: no splat, ui just froze...'echo w > /proc/sysrq-trigger' told me it was waiting for a fence
<robclark> bamse: so I suspect that is related to the freezes I see with chromium/electron (which seems to do vblank based frame pacing)... we sometimes miss vblank irq. But if I saw the backtrace about where it was waiting, I could probably confirm that. It almost certainly isn't blocking on a gpu signaled fence (because hangcheck would step in and rectify the situation pretty quickly)
<_[m]123> that's why git clone and you won't deal with wrong html files 😉
<robclark> we were hoping https://patchwork.freedesktop.org/patch/571854/ would improve things, but maybe not
<bamse> hmm, my goto pastebin is dead..anyone has a recommendation for one that works? (and isn't just ads)
<bamse> robclark: here the dmesg was completely silent, so not the same issue
<robclark> fpaste?
<Jasper[m]> 0x0.st can also be curl'd to
<robclark> bamse: I suspect it is just another symptom of the same issue
<bamse> Jasper[m]: "can also be curled" <- is it possible to not curl to it?
<Jasper[m]> Probably, it's just an easy temporary file storage spot
<_[m]123> bamse: debian one
<bamse> robclark: http://0x0.st/HDZ4.txt
<bamse> Jasper[m]: that is rather convenient...
<bamse> Jasper[m]: thank you
<adamcstephens> yeah, that's my same exact hang
<robclark> hmm, got stacks for any other tasks? Although maybe that would be harder to see with drm/sched migrated away from kthread?
<robclark> I'd _guess_ that pageflip is waiting on gpu work that is waiting on prior pageflip
<bamse> robclark: did it a few times, once or twice i caught a lock in xhci...but it made progress
<bamse> robclark: so this was it
<robclark> so, maybe we can suss it out with dma_fence ftrace
<robclark> try enabling /sys/kernel/debug/tracing/events/dma_fence/ and then dumping traces on hang (if you can ssh to the device, perhaps)
<_[m]123> mm you could do a oneliner with some tail and grep and pipe it to curl immediately or to a file and then cat to curl 🤔
<bamse> robclark: ssh works fine, i've enabled dma_fence events...let's see if it shows up again
<robclark> thx
* robclark still stuck on Xorg waiting for mutter-wayland fix to propagate into f39
<bamse> robclark: i thought you had left x behind you long time ago?
<bamse> robclark: when i complained that xterm didn't render properly a couple of years ago, you tricked me to switch to wayland...
<robclark> seems like gdm crashes because of the mutter bug and then things fallback to Xorg?
<robclark> yeah, I only noticed the other day that it was falling back to Xorg
<bamse> robclark: speaking of which, jhovold tells me that people here are reporting "rendering artifacts in xterm"
<robclark> what is xterm? :-P
<bamse> robclark: yeah, it's going to be nice once the mutter fix propagates
<bamse> robclark: :)
<robclark> hmm, I had to install xterm just now.. but no obvious corruption (other than the unicode chars in my prompt that it doesn't know how to render)
<bamse> robclark: when i used it it worked fine most of the time, but every now and then i got "flickering"
<Jasper[m]> <bamse> "Jasper: thank you" <- No problem, be sure to give them a donation if you feel like it. Details are on the index
<robclark> hmm.. maybe you need to be using non compositing wm?
<jhovold> robclark: it's the corruption I mentioned last week or so
<jhovold> scrolling dmesg with less in an xterm, up and and down a bit
<jhovold> all of a sudden, there are later timestamps in the upper half of the xterm, and random pixels in the middle where seemingly old buffer content was left in place
<jhovold> i have a photo
<bamse> robclark: i think it was mostly when you're scrolling...not sure how they implement that, but say that you put a few lines in a texture and you move that up and down, and then you loose the bookkeeping and start showing the wrong buffer for a few lines, every other frame
<jhovold> when i hit that corruption, scrolling line by line, makes the corrupt/stale content move up line by line as well
<_[m]123> my terminal has been lagging, don't know why, probably graphical rendering issue or?
<jhovold> bamse: ok, so it was just wayland that locked up then, i think craftyguy said he could not even ssh in
<_[m]123> * issue or? on kde and guake btw
<bamse> jhovold: yes, it was just the drm driver...and it happened when i was compiling the kernel, in an alacritty window, in sway...
<bamse> jhovold: in other news, i formalized my kernel management...and pushed it to https://github.com/andersson/arch-packages/tree/master/linux-upstream
<bamse> jhovold: will bump to v6.8-rc2 once i've seen the build complete here in a bit
<jhovold> nice
<bamse> now i just need to automate the whole thing, so i don't have to makepkg -i manually ;)
<jhovold> bamse: apparantly both the panel probe deferral issue and broken suspend with external display connected are fixed with 6.8-rc
<jhovold> no we just have to fix the fallout of that rework...
<jhovold> *now
<bamse> i saw that, really nice
<jhovold> hit a couple of bad crashes when hotplugging an external display, no issues with 6.8-rc1 for a week before then
<bamse> robclark: ^^ think you "complained" about the s/r with external display just a few days ago...v6.8 is your friend
<bamse> jhovold: also, three times now in the last 3-4 days i've had my x13s reboot instead of giving me a login prompt at boot...jfyi
<jhovold> could be the bug I've triggered, I get a hypervisor dump, you'd see a reboot on a production machine
<jhovold> and you use an external display daily unlike me
<bamse> no external monitor connected for this
<bamse> and i've not tried to reproduce it on crd yet
<jhovold> hmm, ok. but you boot into wayland?
<bamse> no, boot to console
matthias_bgg has quit [Ping timeout: 480 seconds]
<jhovold> odd, 6.8-rc1 has been quite stable here until I tried connecting an external display (and I use xorg)
<jhovold> and I've only triggered the crashes twice so far, despite an excessive amount of hotplugging
<bamse> i've only done limited testing here...docked the laptop and did my maintainer duties over the weekend...
matthias_bgg has joined #aarch64-laptops
<robclark> jhovold: what do you see with `cat /sys/kernel/debug/dri/0/state | grep modifier` ?
<jhovold> modifier=0x0
<robclark> hmm, ok
<jhovold> bamse: regarding the reboot you've seen, there are two more regressions in 6.8-rc1 that I've fixed in my wip branches by reverting the offending commits
<jhovold> one can lead to a deadlock when probing the pcie controllers
<jhovold> the other issue would lead to a NULL pointer dereference in the battery driver
<jhovold> if you run mainline, you want make sure you have those reverts (the second in queued for rc3)
<bamse> jhovold: but i want to run mainline...not patched up mainline ;)
<jhovold> well then you'll have to live with the regressions until the fixes are in Linus tree, tough luck ;P
<bamse> jhovold: hopefully you like my new commit message, and we could perhaps get touchscreen in -rc3 or -rc4 as well
<bamse> jhovold: well, we have a tendency to always run almost-upstream...
<jhovold> you can't expect rc1 to never have regressions, but sure, i too wish we could reduce the number of regression which isn't caught until my rc1 rebases...
<bamse> everyone: still only have 1 tested-by on the touchscreen...https://lore.kernel.org/all/20240129-x13s-touchscreen-v3-0-c4a933034145@quicinc.com/T/#t
<bamse> jhovold: right, the whole purpose of -rc is to catch those...
<jhovold> in this case the pci maintainer reverted a commit for 6.7-final, and ignored my report that that lead to a regression
<bamse> yeah, that was pretty silly
<jhovold> the battery driver regression, also should have been caught long before it hit me in rc1
<bamse> and i think the way around the instability for -rc in my case is to test it on one machine and if problematic avoid upgrading the other one until rc-3/rc-4...
<jhovold> at least the revert there is straight-forward
<bamse> jhovold: indeed, need to get some CI going again...
<bamse> well, not "again" for the laptops...as last time i didn't know how to automate that
<bamse> steev: ^^ link to v3 of touchscreen patch...a tested-by would be welcome
<steev> oh nice
<steev> will get to it after work meetings :)
<jhovold> bamse: looks like you'll need to do a v4... :(
<exeat> Re: xterm, I too have noticed the text corruption after scrolling. I'm pretty sure that my fvwm2 and picom setup on the x13s is identical to what I have on an old i915 machine, where I never see that.
<steev> jhovold: because i'm an idiot, why does the comment need to be removed from the driver code? it reads to me that the bindings need to be updated if using that, and bamse is updating the binding to follow the comment?
<jhovold> exeat: same here, and I believe bamse has seen this on the c630 in the past as well
<jhovold> steev: when updating the binding, that comment becomes obsolete and incorrect and needs to be removed
<steev> jhovold: wouldn't it be for any future binding as well not just ours?
<jhovold> there is only one generic binding for "hid-over-i2c"
<steev> oh, derp :) that's what i was missing, thanks for explaining
<robclark> jhovold: what window mgr were you using?
<jhovold> dwm
<robclark> hmm ok
<bamse> jhovold: haha, looks like i don't want to fix that comment...sorry about that
<bamse> jhovold: b4 trailers -u told me you where happy with patch 1, so i fixed the commit message on patch 2
<jhovold> yeah, there are some downsides to automation
<jhovold> "with that fixed, reviewed-by: " etc
<robclark> dwm seems to not be a thing packaged for fedora?
<bamse> jhovold: the problem isn't automation so much as me context switching...
<bamse> jhovold: and my context switch didn't properly store the state
<jhovold> robclark: you need to recompile dwm when configuring it so hard to package, unless on gentoo
<jhovold> but it's quick to build ;): https://dwm.suckless.org/
<jhovold> bamse: remember which window manager you were using when you saw the xterm corruption?
<bamse> jhovold: must have been i3
<travmurav[m]> I have good news and maybe a bit annoying news
<travmurav[m]> Good news: we have el2 on x13s (sample size = 2)
<travmurav[m]> Annoying news: my code for restoring CPU state after. that works fine on 7c, doesn't seem to work on sc8280xp and it's pretty hard for me to debug this
<travmurav[m]> (Mostly since I have to do "remote hands" with someone who has the hw to do so)
<craftyguy> \o/
<travmurav[m]> And I guess for me the annoying part is that I have no idea if it's my hw that is just so lucky I don't hit bugs in my code, or whatever sc8280xp hyp does, zaps the ram I use or something :S
<travmurav[m]> But I guess I hope it's "my" mistake once again lol
<robclark> travmurav[m]: not sure if this helps, but https://photos.app.goo.gl/BJeJ3moqqm2yngGt9 ... I think that is roughly the same as you were seeing on other x13s?
<robclark> no green line
<travmurav[m]> Oh, is the efi file from "v2" release on gh?
<craftyguy> I saw a green line on mine
<robclark> oh, hmm.. maybe not.. it was from not_a_virus.tar.gz :-P
<travmurav[m]> Yeah that one tried to use smccc32 psci off which /apparently/ doesn't work on x13s
<travmurav[m]> It was absolutely reliable for me up to this point lol
<robclark> ok, I can try the new one
<travmurav[m]> Thankfully smccc64 version works so I won't blame qcom on breaking it since I don't know if the spec mandates it, but thus there is a green bar instead haha
<travmurav[m]> FWIW sltest.efi is the "most important part" for me since it exercises actually switching to el2, knowing that works reliably would be great
<travmurav[m]> And as long as it works, we can figure out everything else
<emily[m]1> <travmurav[m]> "Good news: we have el2 on x13s..." <- woah great work!
<robclark> ok, now I get a green line... but sltest.efi never exits, not sure if that is expected?
<robclark> last thing I see is still: `== Launch:`
<travmurav[m]> robclark: yes, it goes into a infinite loop since we have to exit uefi right before switching to el2
<travmurav[m]> So it switches to el2, draws the line and hangs
<robclark> ahh, ok
<travmurav[m]> ( if we keep uefi, the device will crash even with no code running, I suspect because some thing tries to dma with now-invalid iommu settings)
<travmurav[m]> That's why I exit uefi boot services in sltest and hook into ebs function in slbounce (which thus has to be a runtime driver to persist in memory)
<robclark> ahh
<robclark> so, any thoughts about making a replacement shim that does the dtb stuff and slbounce and then loads grub/sd-boot/whatever?
<travmurav[m]> robclark: current setup is kinda this
<jglathe__> wouldn't it be enough to load slbounce.efi before trying to start vmlinuz in grub?
<travmurav[m]> There is slbounce that replaces ExitBootServices() with ebs+el2 after
<travmurav[m]> Then there is dtbhack that applies needed workarounds (I e. Need to nuke zap shader from dt)
<travmurav[m]> And since the real switch would happen in ebs, we can safely run grub et al
<travmurav[m]> It will run Linux efi shim and the shim does ebs
<travmurav[m]> So we swap el1->el2 under linux's efi shim
<robclark> so, what happens if you don't remove the zap node? I guess `qcom_scm_is_available()` returns false? Maybe we can make things just ignore the zap node?
<travmurav[m]> I guess arguably this is not "automatic" but I think it can be cleaned up when we are sure it works properly
<travmurav[m]> robclark: it will try to load the mbn and fail since there is no hyp to handle that smc
<travmurav[m]> Applies to all remoteprocs too
<jglathe__> oof
<travmurav[m]> So on 7c all remoteprocs fail to load "qcom" way but I can add i.e. iommu definition from 7c cros and load Venus that way
<travmurav[m]> Or nuke zap shader and let it poke that funny register to get gpu
<robclark> I guess if we have some way for the driver to know that we are in EL2, we could use that to decide to skip the smc stuff?
<travmurav[m]> robclark: we can probably check if we are in el2 and load firmware thing returns -22 to then poke the register
<travmurav[m]> Another weird thing I see is that on 7c trying to read aop cmd-db memory results in a sync fault. I can read it fine from baremetal el2 code and in dtbhack I just make a copy in other memory and add it to th dt as a workaround...
<travmurav[m]> Not really sure what's up with that but I haven't debugged it after finding the quick workaround
<travmurav[m]> (I was mostly trying to get to "minimal demo" with Linux so far so haven't investigated much on it after just getting to the DE)
<robclark> ok, so looks like there are some other drivers calling is_hyp_mode_available()..
<travmurav[m]> So I think booting Linux in el2 might likely require having uart at first for sc8280xp too
<robclark> hmm, that is a thing I do not have (but maybe the crd8280 would?)
<Jasper[m]> Or pre-prod x13s?
<Jasper[m]> Volterra doesn't apparently
krei-se has quit [Quit: ZNC 1.8.2 - https://znc.in]
krei-se has joined #aarch64-laptops
<jglathe> @jhovold I'm on 6.7.1 now. Will start a new test with 6.8rc2?
<robclark> hmm, sad.. __boot_cpu_mode() is not exported to modules
<travmurav[m]> robclark: i wonder if we can run the "secure-launch available" smc as a heuristic for "hyp present" If we don't have anything else better
<travmurav[m]> It would return 0x0 with hyp but I guess only on woa devices
<ema> the x13s gets thermal throttled if it gets too hot, doesn't it? Any ways to figure out when that happens?
<travmurav[m]> Or maybe there is some better smc for this
<robclark> travmurav[m]: maybe.. possibly bamse has ideas on how to do this (or whether it is better to just patch the dtb) (ie. to detect when we need to take the gpu out of secure mode directly, like we do on chromebooks, vs using zap shader)
<steev> ema: yes, it's set in the dts
<robclark> ema: there are some thermal trace events (`/sys/kernel/debug/tracing/events/thermal/` which might be a good way to know when thermal throttling happens
<steev> 55, 58, and 70? i think it was
<jglathe> from watching temps I can see it
<steev> that's why i wrote that patch, to let it get a lil hotter before throttling
<jglathe> cores 4..7 go up to 90°C before throttling, when throttled the cores 0..3 get hotter than 4..7
<ema> thanks steev, robclark... constantly north of 85 with bpftrace -e 'tracepoint:thermal:thermal_temperature { printf("%d\n", args.temp) }'
<ema> (I'm building gcc so no big surprise there)
<ema> though probably it's tracepoint:thermal:thermal_zone_trip the event to look for
<_[m]123> lol I only just noticed alsa ucm repo is of steev
<_[m]123> so the input lag is electron?
<steev> _[m]123: i have mentioned it, but, it's out of date
<_[m]123> I'
<_[m]123> * I'm having keyboard input lag on element and guake
enyalios has quit [Remote host closed the connection]
jglathe has quit [Remote host closed the connection]
jglathe__ has quit [Remote host closed the connection]
enyalios has joined #aarch64-laptops
<robclark> travmurav[m]: do you know if dtbhack.efi/etc should work with dtb on a different ESP? Ie. if I'm booting from external usb-c to not break my normal boot, I tried: `dtbhack.efi fs14:\sc8280xp-lenovo-thinkpad-x13s.dtb` but it didn't seem to want to load it from a different "drive"
<konradybcio> robclark: efi shell should have a cp command :p
KREYREN_oftc has joined #aarch64-laptops
<robclark> true, I was just hoping to avoid having top sync everything to the "boot EL2" ESP..
KREYREN_oftc has quit [Ping timeout: 480 seconds]