ChanServ changed the topic of #asahi-dev to: Asahi Linux: porting Linux to Apple Silicon macs | Non-development talk: #asahi | General development | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
korreckj328 has quit []
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
kesslerd1pont has quit [Ping timeout: 480 seconds]
rhysmdnz has joined #asahi-dev
gabuscus has quit []
nsklaus has quit [Ping timeout: 480 seconds]
rhysmdnz has quit [Quit: Bridge terminating on SIGTERM]
rhysmdnz has joined #asahi-dev
nsklaus has joined #asahi-dev
gabuscus has joined #asahi-dev
Jamie has joined #asahi-dev
kesslerdupont has joined #asahi-dev
Jamie is now known as Guest12081
SalimTer- has joined #asahi-dev
salimterryli has quit [Ping timeout: 480 seconds]
nsklaus has quit [Ping timeout: 480 seconds]
kesslerdupont has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
kesslerdupont has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
kesslerdupont has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
salimterryli has joined #asahi-dev
SalimTer- has quit [Ping timeout: 480 seconds]
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
<marcan> 320x240 maybe, definitely nothing lower
<marcan> axboe: none of those look like a problem, need a full log
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
salimterryli has quit [Remote host closed the connection]
salimterryli has joined #asahi-dev
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
kesslerd1pont has joined #asahi-dev
nsklaus has joined #asahi-dev
kesslerd1pont has quit [Ping timeout: 480 seconds]
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
bps has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #asahi-dev
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
MajorBiscuit has quit [Quit: WeeChat 3.6]
nsklaus has joined #asahi-dev
<jannau> does pmgr-misc exist on die 1? I'm getting transalation faults on 0x228e20c000 and 0x228e20c800 on m1 ultra
bps has joined #asahi-dev
MajorBiscuit has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
<jannau> I see dart translation faults for 583008000.iommu on s2idle and the nvme didn't boot on the second resume
<jannau> no problems with dcp though
<marcan> no idea, haven't tested on ultra. I *assumed* it exists but who knows :p
<marcan> you're getting... translation faults?
<marcan> not SErrors?
<jannau> SErrors but the HV complains first https://gist.github.com/jannau/526d76696ab10f0ac2308151fc68d975
nsklaus has joined #asahi-dev
chadmed has quit [Remote host closed the connection]
<marcan> uhh that looks like a hypervisor bug
<marcan> that FAR doesn't make any sense
<marcan> could even be a CPU bug? like somehow ending up with the IPA instead of the VA in FAR?
nsklaus has quit [Ping timeout: 480 seconds]
kesslerdupont has joined #asahi-dev
<marcan> jannau: anyway yeah, doesn't seem to exist on die1 so let's just drop it
<jannau> if you move it, it has a t6020-pmgr-misc in .compatible
<marcan> thanks, fixing
<marcan> I'm guessing that is a SoC-global block and those clocks/pstates also forward to die1 automatically then
<jannau> did the sleep/shutdown tracing made the system slower under the hypervisor? boot timing seems to be much slower than I remember and macos felt slow as well
<marcan> it shouldn't
<marcan> but I could have screwed something up :)
<marcan> I did try to speed things up on M2+ though
kesslerdupont has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
<marcan> jannau: should be fixed
<marcan> (the die1 thing, not the HV thing)
<marcan> IIRC the Ultra was always kind of slow under the HV though, but there may be some pathological issue there
nsklaus has quit [Ping timeout: 480 seconds]
<jannau> might be a little bit faster with cbd9b7b9e1724244 but not much so I might misremember
Z750 has quit [Quit: Ping timeout (120 seconds)]
Z750 has joined #asahi-dev
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
<marcan> hm, it could be cpufreq is broken in the guest with clpc=0 and we don't bump the pcores, yeah
<marcan> feel free to call cpufreq_init() in the hv init path if it helps
<marcan> also I haven't tested on ultra but at least on M2 Max macos seemed to just... not use the pcores?
<marcan> not sure why or if that's a regression
nsklaus has joined #asahi-dev
nsklaus has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
cylm has joined #asahi-dev
kesslerdupont has joined #asahi-dev
kesslerdupont has quit [Ping timeout: 480 seconds]
chadmed has joined #asahi-dev
chadmed has quit [Remote host closed the connection]
chadmed has joined #asahi-dev
c10l has quit [Quit: Bye o/]
c10l has joined #asahi-dev
<jannau> cpuidle makes linux boot under HV slow. with cpuidle.off=1 in the commandline it's 2-3 times faster
nsklaus has quit [Quit: WeeChat 3.8]
<marcan> ok, that makes sense because it'll cause a ton of HV traps
<marcan> and on M1, the HV ends up interrupting the CPUs at 1000Hz anyway for dumb reasons
<marcan> those normally don't serialize, but they do via the cpuidle stuff
<marcan> I can probably fastpath the cpuidle stuff a bit so at least it doesn't serialize
kesslerdupont has joined #asahi-dev
kesslerdupont has quit [Ping timeout: 480 seconds]
<marcan> jannau: pushed a thing, let me know if it helps
kesslerdupont has joined #asahi-dev
<jannau> marcan: it does, might be even a little faster than cbd9b7b9e1724244 with cpuidle disabled. thanks
<marcan> good :)
kesslerdupont has quit [Read error: Connection timed out]
nsklaus has joined #asahi-dev
<jannau> I see the same on ultra, macos (12.3) doesn't use the p-cores with clpc=0. without it does
ChaosPrincess has quit [Quit: WeeChat 3.8]
ChaosPrincess has joined #asahi-dev
kesslerdupont has joined #asahi-dev
kesslerdupont has quit [Quit: leaving]
i509vcb has quit [Quit: Connection closed for inactivity]
c10l has quit [Quit: Ping timeout (120 seconds)]
c10l has joined #asahi-dev
<axboe> marcan: X restarting on resume, is that expected in current asahi-wip?
<_jannau_> axboe: I couldn't reproduce it (using HV with console_no_suspend and sddm as X client) so quite a few things which could explain different behavior
<axboe> hmm ok
<axboe> this is gdm + X
<axboe> sorry lightdm
<axboe> looks like X is still running, but my session is just gone and I get a lightdm login screen
<axboe> funky
<axboe> I did update m1n1 and u-boot at the same time, I can try my previous kernel and see if it happens there too. was just assuming it's a kernel side thing with dcp
<_jannau_> could be still the X server of your session which was killed. DCP shouldn't have changed much for existing installs
<axboe> yeah, what I mean is that X itself didn't restart, it's still there and same pid, but something made it exit the session
<_jannau_> main change was supporting the 13.3 firmware for m2 devices but is mostly separate from the existing support
<axboe> might be xfce4 triggering it. I don't recall seeing those "display went away" on suspend before, but it could just be that I never looked
<axboe> let me boot the old kernel and give it a spin
<_jannau_> there should be be 2 X sessions after you logged in. the lightdm one and you user session
<_jannau_> how long was the system suspended? did you enable the new apple cpuidle driver?
<axboe> old kernel doesn't boot anymore, so can't test that
<axboe> I did enable the new cpuidle
<axboe> seems to happen regardless of how long it was suspended
<axboe> just eg 10 min will do it
<axboe> so just did a suspend and resume after 10 seconds
<axboe> and that works fine
<_jannau_> I didn't keep the system suspended that long. I'd guess 30 seconds maximal
<axboe> that may be why
<axboe> usually if I suspend it's because I'm heading to the office, so probably 20-30 minutes min suspend
<axboe> or overnight
<tobhe> axboe: I think I have seen something similar before, even on wayland. out of curiosity: does your wifi still work after wakeup?
<axboe> wifi has always been a bit hit or miss for me. I'd say that 9/10 wakeups wifi is fine, 1/10 it doesn't come back and it needs a reboot
<axboe> but this one is different, I've never had my X session go away
<tobhe> the few times this has happened to me it was usually both at the same time so I assumed it might be gnome related
Knedlik has joined #asahi-dev
cylm has quit [Ping timeout: 480 seconds]
cylm has joined #asahi-dev
<Knedlik> Hey guys, I would like to use C++ for the touchbar Fxx buttons I was tasked with earlier. What should I use? I assume the DRM has some headers, but what should I use for touch input?
enron has quit [Quit: Ping timeout (120 seconds)]
nopeslide13 has quit []
<j`ey> libevdev I think
enron has joined #asahi-dev
nafod0 has joined #asahi-dev
nafod has quit [Read error: Connection reset by peer]
nafod0 is now known as nafod
nopeslide13 has joined #asahi-dev
<Knedlik> Okay I'm getting a bit confused reading through all the DRM-related stuff on the internet... how do I use it to draw on the touchbar?
Knedlik has quit [Quit: Konversation terminated!]
kesslerdupont has joined #asahi-dev
<marcan> axboe: we've long had that thing with systemd-journald crashing on long suspend due to some watchdog thing, but that's exactly what cpuidle should have fixed...
<marcan> Knedlik: that is a question with many answers...
<_jannau_> it didn't, systemd-journal still crashed for me after waking from 15 minutes of suspend
<marcan> ugh.
<marcan> what.
<marcan> Knedlik: https://github.com/girish2k44/drmmodeset might help
<_jannau_> I'll look at it tonight. X session survived though
<marcan> _jannau_: does the smc power log stuff log during suspend?
<marcan> because if it doesn't timekeeping is *definitely* stopping now
<_jannau_> wasn't enabled
<marcan> also does the kernel log timestamp jump during suspend, or does it skip the time?
<_jannau_> timestamps jump but I'd expect those to use boot time
<marcan> no, they don't jump for me
<marcan> so something is wrong
bps has quit [Ping timeout: 480 seconds]
<kesslerdupont> chadmed: I'll take a look today at speakersafetyd and see what I can do
<_jannau_> hypervisor + no_console_suspend?
<_jannau_> log_power does nothing on the studio (desktop systems)
<marcan> oh wait, this probably isn't going to ever work in the hypervisor
<marcan> because it basically triggers forever spurious wakeups, which will cause the kernel to repeatedly re-enable and re-disable timekeeping
<marcan> I might be able to cheat it
<_jannau_> no timestamp jump on bare metal
<marcan> _jannau_: pushed a thing to fix this on the *kernel* side under the HV
<marcan> you also probably want hv.p.hv_set_time_stealing(0, 1) to prove that it isn't just time stealing warping time
<marcan> with both of those no timestamp jump under the HV either, as expected
<marcan> should also give a nice performance improvement on top even
<marcan> (plus it actually tests deep sleep under the HV which is always nice)
<marcan> so... if timekeeping is actually stopping, why the hell is journald crashing?
<marcan> or did you only see that under the HV?
<_jannau_> that was only under HV with time clearly progressing
<marcan> ah ok
<marcan> yeah, just tested the userspace view, seems right now under the HV
<marcan> so. not an issue but not what axboe is seeing either (assuming he wasn't running under the HV)
MajorBiscuit has quit [Ping timeout: 480 seconds]
<_jannau_> I doubt that very much for the running to the office use case
<marcan> yeah :/
rosefromthedead has joined #asahi-dev
<axboe> just a regular joe user, no HV
<marcan> axboe: I think we're going to need more info/logs then... I don't have any good ideas of what to look for right now :/
<marcan> I wonder if I can cheat it with the HV by adding time instead of subtracting time...
<marcan> that might be a good test
<axboe> let me boot the latest, and will be happy to grab whatever you'd want in terms of logs or messages
<_jannau_> I've seen a dcp call ordering issue on wakeup but everything still worked. so probably some kind of race where at least one call is not needed
<axboe> back in a few min
bps has joined #asahi-dev
<axboe> ok back up - I'll try and suspend for 5 min and see what happens
<marcan> I'm adding an explicit timewarp op to the HV so I can test this, but if it really has to do with time elapsed I can only guess it's a userland bug with the calendar time jumping somehow
<axboe> ok
<axboe> 5 min is up, maybe I should do 5 more to be sure
<axboe> what do you want to see if it fails?
<marcan> X log, general syslog etc?
<axboe> ok
<marcan> we don't know what we're looking for, anything that might be relevant :)
<axboe> I hear ya, I'll try and poke a bit and see if I can figure out what's crashing
<axboe> I already checked the X log previously and found nothing interesting
<axboe> ok 9 min, let's try and bring it back
<axboe> yep, it happened
<marcan> just tested under the HV on the M1 Studio, warped time 10 hours forwards and nothing bad happened (kde on x11)
<axboe> interestingly, seemed like it was already resumed when I opened the lid
<axboe> huh maybe it rebooted
<marcan> is this with the GPU stuff?
<axboe> 0 min uptime
<axboe> this is with current asahi-wip, running mesa from a month ago or so (updated whenever the api changed)
<marcan> could be the GPU stuff is also breaking due to its power domain shutting down. can you add apple,always-on; to the ps_gfx power domain and see if that helps?
<axboe> sure
<axboe> always forget what model this is...
<marcan> cat /proc/device-tree/compatible
<axboe> t6001, thanks
<axboe> looks like it's in t6002.dtsi
<j`ey> axboe: also says at the top of dmesg: 'Machine model: '
<marcan> ps_gfx should be in t600x-pmgr.dtsi
<axboe> so add to the DIE_NODE() there?
<marcan> yeah, next to the other properties
<marcan> though that might actually kind of break for die1, but that should only cause some warnings at most
<marcan> since I don't think gfx for die1 is used
<axboe> done, will reboot in a sec
<marcan> I wonder if it's something like gfx-asc goes down unexpectedly, it has a backdoor pipe to SMC, SMC is sending it notifications for some reason, the FIFO fills up and SMC panics and reboots the machine
<axboe> suspended, let's try 30 sec first
<axboe> briefly showed the screen on resume, then rebooted
<marcan> okay, we need any logs you managed to get from that then
<marcan> that looks like something goes very wrong with SMC or similar, on resume
<marcan> (or kernel panic if you have it set to reboot on panic)
<axboe> don't enable that
<axboe> last thing syslog has before a bit of garbage is "entering sleep state suspend"
<axboe> so not very useful
<marcan> ah wait, this could be the pmgr-misc thing
<marcan> axboe: in t600x-die0.dtsi, comment out the whole pmgr_misc block
nyx_o has quit [Ping timeout: 480 seconds]
<axboe> ok
<axboe> remove the ps_gfx always on or leave it?
<marcan> leave it just in case
<axboe> will do, updating it now
<axboe> I'm still doing the whole osx recovery thing for dtb updates like probably a moron (assuming there's a better way now)
<marcan> umm... you know we've been shipping u-boot for that for a year now right? :p
<marcan> (and m1n1 chainloading)
<axboe> I haven't changed my ways :)
<axboe> never updated that side, will do once the m2 can get used
<marcan> I hope at least you're on the right stub version :p
<axboe> well...
<axboe> it's up, suspending
<_jannau_> dcp would complain if not
<axboe> it came back, I entered password to unlock, and then 1 sec later it rebooted
<marcan> I hope journald saved a log from that
<marcan> can you see if anything is in there?
<axboe> I got something
<marcan> so IIRC the SMC reboot watchdog is like 30 seconds or something, and if this is along those lines it might be crashing directly on suspend
<marcan> if so, try 15 seconds and you should have at least as much time to sync to disk and such
bps has quit [Ping timeout: 480 seconds]
<axboe> 2023-04-25T10:14:57.380543-06:00 m1max kernel: [ 33.664422] nvme-apple 393cc0000.nvme: RTKit: syslog message: cmd.c:7861: NVMe shutdown start seg->lba: 0, seg->size: 0
<axboe> 2023-04-25T10:14:57.380548-06:00 m1max kernel: [ 33.801967] nvme-apple 393cc0000.nvme: RTKit: syslog message: cmd.c:7874: seg->lba 0 saveCtx 1 took 137 ms
<marcan> that's normal
<axboe> 2023-04-25T10:14:57.380550-06:00 m1max kernel: [ 33.834543] macsmc-rtkit 290400000.smc: RTKit: syslog message: apComms.cpp:373: SMC HID Event: 03 00 01
<axboe> 2023-04-25T10:14:57.380552-06:00 m1max kernel: [ 33.834560] macsmc-hid macsmc-hid: Lid wakeup
<axboe> 2023-04-25T10:14:57.380554-06:00 m1max kernel: [ 33.835102] apple-pmgr-pwrstate 28e580000.power-management:power-controller@288: PS mca0: Failed to reach power state 0xf (now: 0x24f)
<axboe> 2023-04-25T10:14:57.380557-06:00 m1max kernel: [ 33.835223] apple-pmgr-pwrstate 28e580000.power-management:power-controller@290: PS mca1: Failed to reach power state 0xf (now: 0x24f)
<axboe> 2023-04-25T10:14:57.380559-06:00 m1max kernel: [ 33.835339] apple-pmgr-pwrstate
<marcan> yeah that's all normal, also pastebin?
<axboe> well shoot, that's all I get before the next line is from the next boot
<axboe> yeah sorry, will pastebin
<marcan> did it stop right there mid line?
<axboe> let me put it in a pastebin
<marcan> so it can't be SMC crashing before the wakeup because (duh) then we wouldn't wake up
<axboe> planned maintenance
<axboe> that's from starting suspend, and then last line is new boot
<marcan> ok yeah that's... not useful :(
<marcan> can you try again and see if you can sneak in a sync or something?
<marcan> obviously that didn't flush to disk everything that happened while it was alive
<axboe> sure, let me find the resume script and add a sync
<marcan> you could just do `echo mem > /sys/power/state ; sync` probably
<axboe> let's try
<marcan> maybe sync in a loop :)
<axboe> yep
<axboe> ssh in with dmesg in a loop too :)
<axboe> ok here goes
<axboe> that resumed fine
<marcan> `dmesg -w` is your friend :)
<axboe> lol
<axboe> wow, after all these years
<axboe> trying again
<axboe> worked again, damn
<axboe> let me try the lid
<axboe> worked
<marcan> heisenbug \o/
<axboe> gah
<axboe> tried lid again and it just insta booted when it came out
<marcan> oof...
<axboe> nothing in syslog, last one is the entering sleep state
<axboe> once more, got X back for half a second, then reboot
<axboe> let's see if syslog got anything
<axboe> I miss my servers with serial console :/
<marcan> you can get a serial console on these but..
<axboe> there's something, let me pastebin it again
<axboe> that's from start resume, to again last line being from the next boot
<marcan> ok, so the borked i2c isn't great but that doesn't excuse a reboot
<marcan> I don't see anything fatal in there :(
<axboe> ah crap
<axboe> I do have panic on oops
<axboe> oh no scratch that, it's just the timeouts
<axboe> actual PANIC_ON_OOPS isn't set
<marcan> just to confirm, CONFIG_BACKLIGHT_GPIO isn't set?
<axboe> it's not
<axboe> BACKLIGHT_CLASS_DEVICE=y, BACKLIGHT_PWM=y
<axboe> that's it for backlight
<axboe> oops
<axboe> oh it's too big
<marcan> ok, here's a random idea: make ps_uart0 apple,always-on;
<axboe> ok
* axboe does the osx dance
<axboe> brb
<marcan> why do you like hurting yourself so much :p
<axboe> haha
Jamie has joined #asahi-dev
<axboe> yeah...
rhysmdnz1 has joined #asahi-dev
Jamie is now known as Guest12142
<marcan> is this the 14" or the 16"?
<axboe> 16
<axboe> let's try again
<axboe> came back fine
<axboe> let me give it 5 cycles
rhysmdnz has quit [Ping timeout: 480 seconds]
<marcan> next move that always-on from ps_uart0 to ps_sio
<axboe> 4 so far, it's fine
<axboe> got some hci0 complaints on the last one, timeouts
<axboe> but it's up
<marcan> yeah bluetooth breaks, we know that
Guest12081 has quit [Ping timeout: 480 seconds]
<marcan> I think some coprocessor needs ps_sio and is exploding when we shut that down
<axboe> will do a few min for the last one, but seems promising so far
<axboe> I'll move it to ps_sio after this one and re-run
<axboe> reboot on the slightly longer one...
<axboe> and again looks like suspend was the culprit, because it was sitting instantly in lightdm login when I opened the lid
<marcan> oh, it crashed again even with that change?
<marcan> :(
<axboe> it did...
<axboe> last line in syslog is the entering suspend again
<marcan> I'm running out of ideas... AFAIK the coprocessors that can nuke the system like that are SMC and the CIO stuff (cc sven), but the latter shouldn't be enabled yet...
<marcan> and SMC is clearly alive since wakeup works
<marcan> DCP also clearly works
<axboe> was running asahi-wip based on 6.3-rc4 before, with mainline merged
<axboe> so that one was fine
<marcan> yeah but I don't think I made any changes that could cause this other than the cpuidle stuff :(
<marcan> you could try disabling cpuidle
<marcan> cpuidle.off=1
<axboe> let's try
<marcan> can you put a #define DEBUG at the top of drivers/soc/apple/apple-pmgr-pwrstate.c too?
<axboe> want that first, or disable?
<marcan> you can do both
<axboe> will do
<marcan> if cpuidle fixes it, it's probably unrelated. if it doesn't, try to get me another log where you get some stuff from resume (so at least we have a complete log of the suspend process)
<axboe> will try, I'll try the manual suspend rather than lid and see if I can get something
<marcan> yeah the result should be the same either way, I don't care if it doesn't crash at that point
<marcan> I just want a trace of what your system is powering down
<axboe> up with it disabled and with DEBUG, will let it sit suspended for a minute
<axboe> 3 so far, fine
<axboe> 5 now, anywhere from 30 seconds to 2 min
<axboe> all good
<marcan> just to be sure, which version of m1n1 are you using?
<axboe> current -git
<marcan> as of at least a couple days ago I take it
<marcan> what's your boot chain right now?
<axboe> let me resume and give you the sha
<axboe> 9e...ff
<axboe> that one
<marcan> if it's recent you have it in /proc/device-tree/chosen/asahi,m1n1-stage2-version :p
<marcan> ok
<marcan> got a log of the pmgr stuff?
<axboe> I still have the same bootup as from summer last year
<axboe> generate the blob with m1n1, u-boot, dtb
<axboe> osx recovery, etc
<marcan> ok
<axboe> dmesg | grep pmgr?
<marcan> from a suspend cycle, yeah
<marcan> or just paste the whole dmesg so I can see what when
<axboe> that's it from boot and the suspends I did so far
<marcan> I don't see the debugs?
<marcan> maybe you need to boot with `debug`?
<axboe> was assuming it is because it is off, will try with 'debug'
<marcan> another random guess: make ps_msg apple,always-on
<axboe> nope, will just change them to dev_info
<axboe> so far it doesn't reproduce with cpuidle.off=1, I can try that and kill the cpuidle off?
kesslerdupont has quit [Ping timeout: 480 seconds]
<axboe> ok got a bunch of pmgr stuff now
<marcan> I want to see the pmgr stuff first
<marcan> it should be the same regardless of what cpuidle is doing
<axboe> suspended it, resume and will paste it
<axboe> frozen on resume, X still on
kesslerdupont has joined #asahi-dev
<marcan> that sounds like a separate issue..
<axboe> I'll dump what I got
<axboe> it's ssh'able
<axboe> ah log too small, damn
<axboe> likely not interesting, sorry
<marcan> yeah I need the suspend cycle
<marcan> spi is going to spam (that's the keyboard/trackpad)
<axboe> want me to add always_on for ps_msg first or retry with larger dmesg?
<marcan> add that and remove the cpuidle.off and see if it repros, if it does either catch enough of a log or disable cpuidle again and catch it that way
<axboe> ok
<marcan> there's also one interesting thing we can try. set macOS as the default boot OS, do a one-time boot into Linux, repro, and see if macOS gives you any panic logs. I think it can catch certain system engine/management errors.
<marcan> (if it finds something it will pop it up on startup)
<axboe> building with larger dmesg and not disabling cpuidle and doing those first, then we can try that if it still craps out without being able to capture anything
<axboe> have ps_msg always on too
<axboe> 2nd resume came back with it rebooted, by the time the lid was opened
<axboe> nothing in syslog
<axboe> see t entering suspend at 11:28:31.30 and booting a new kernel at 11:19:26.14
<axboe> and already up when I opened the lid
<marcan> you mean 29?
<sven> yeah, cio won’t be enabled yet so it shouldn’t be able to reset the machine
<axboe> sorry yes 29
<sven> I’ve never seen a reset with just atcphy without cio
<sven> or, well, I have seen that but only when I messed up the init sequence which shouldn’t happen with the Linux driver
kesslerdupont has quit [Ping timeout: 480 seconds]
<axboe> it certainly looks like it's suspend somehow
<axboe> 55 seconds is probably roughly the time it takes to power cycle, and go through all of boot
<axboe> triple fault during suspend?
<marcan> that's not a thing on arm :p
<axboe> hah ok
<marcan> so, pmgr log?
<axboe> the osx default thing
<axboe> ?
<axboe> there's a ton of stuff before the entering suspend, let me pastebin it in case there's anything interesting
<marcan> no, I mean some dmesg of pmgr changes through a suspend cycle
<marcan> as long as I can see the whole suspend cycle (i.e. there should be one line of wakeup, otherwise you probably didn't get anything) we should be good
<marcan> *didn't get everything
<axboe> but there is no full cycle, it looks like it's power cycling right after having entered suspend
<marcan> didn't you get one good cycle at least?
<axboe> I can try one now, see if it's different
<marcan> if you got one good cycle it should be in the journal too
<axboe> 2 min
<axboe> rebooted again
<axboe> let's try again...
<axboe> shorter suspend worked, paste coming up
lebakassemmerl has joined #asahi-dev
lebakassemmerl has quit [Remote host closed the connection]
<axboe> that's around the suspend + resume
<marcan> ok, so you have a fairly short list of power domains that shut down between `Suspending console(s)` and `Lid wakeup`
<axboe> wthis is with pmgr_misc removed, and always-on for ps_uart0, ps_gfx, and ps_msg
<marcan> I really need to get some sleep, but if you want to continue debugging on your own: make all of those always-on, and see if that fixes things. if it does, manually bisect (though really gpio is the only one there that I'd even slightly suspect)
<axboe> ok
<axboe> sleep well!
<marcan> I also pushed a thing to the linux.git:pmgr-stuff branch that you might want to cherry pick and see if it helps
<axboe> I'll poke at it tonight, I should also try and get some work done...
<axboe> ok
<marcan> if making everything on that list always-on doesn't fix it either, then we have some weirder problem where cpuidle is broken, but it somehow only triggers during sleep for unknown reasons (e.g. because it sleeps more then, or because other stuff in the system is going lower power anyway)
<marcan> also do try the macos default thing, that might point at something useful
<jannau> I'll try to reproduce with serial on j314c (14" m1 max)
<marcan> basically my theory right now is that the CPUs going into deep idle is triggering some machinery that we otherwise haven't triggered yet, and it hates something we're doing, and something critical crashes and takes down the machine
<marcan> this *may* be related to how I couldn't get the CPUs properly fully shut down (with loss of MMIO) but could when I let macOS do it
<marcan> it's entirely possible there is some init we're missing
<marcan> axboe: so basically if making everything always-on doesn't help, do the macos thing and whatever the result of that is, I think that's as much as you can do at this point
<marcan> in that case I will continue poking at the sleep magic thing to see if it's related
<marcan> good night :)
<axboe> sounds good!
Cyrinux9 has quit []
Cyrinux9 has joined #asahi-dev
WindowPain has joined #asahi-dev
WindowPa- has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #asahi-dev
cylm has quit [Ping timeout: 480 seconds]
<jannau> axboe: I haven't been able to reproduce yet :(
<jannau> can you think of anything out of the ordinary? you're using xfce4 if I remember correctly and I gather you suspend/resume by closing/opening the lid
deteg1337 has joined #asahi-dev
deteg1337 has quit []
pharonix71 has quit [Read error: No route to host]
flying_sausages has quit []
flying_sausages has joined #asahi-dev
kesslerdupont has joined #asahi-dev
bluetail has quit [Quit: The Lounge - https://thelounge.chat]
Z750 has quit [Quit: Ping timeout (120 seconds)]
Z750 has joined #asahi-dev
zzywysm has joined #asahi-dev
kesslerdupont has quit [Ping timeout: 480 seconds]
WindowPain has quit [Quit: ZNC 1.8.2 - https://znc.in]
WindowPain has joined #asahi-dev
kesslerdupont has joined #asahi-dev
psykose_ has joined #asahi-dev
psykose has quit [Ping timeout: 480 seconds]
kesslerdupont has quit [Ping timeout: 480 seconds]
bluetail422 has joined #asahi-dev
bluetail422 has quit []
bluetail422 has joined #asahi-dev
bluetail422 has quit []
bluetail422 has joined #asahi-dev
bluetail422 has quit []
bluetail422 has joined #asahi-dev
bluetail422 has quit []
bluetail422 has joined #asahi-dev
bluetail422 has quit []
bluetail422 has joined #asahi-dev
bluetail422 has quit []
bluetail422 has joined #asahi-dev
bluetail422 has quit []
bluetail422 has joined #asahi-dev
kesslerdupont has joined #asahi-dev
abd has joined #asahi-dev
bluetail422 has quit []
bluetail422 has joined #asahi-dev
bluetail422 has quit []
bluetail422 has joined #asahi-dev
bluetail422 has quit []
bluetail422 has joined #asahi-dev
bluetail422 has quit []
bluetail has joined #asahi-dev
bluetail has quit []
bluetail has joined #asahi-dev
bluetail has quit []
bluetail has joined #asahi-dev
bluetail has quit []
bluetail has joined #asahi-dev
bluetail has quit []
abd has quit [Ping timeout: 480 seconds]
bluetail has joined #asahi-dev
bluetail has quit [Remote host closed the connection]
bluetail has joined #asahi-dev