ChanServ changed the topic of #asahi-dev to: Asahi Linux: porting Linux to Apple Silicon macs | Non-development talk: #asahi | General development | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-dev
darkapex has joined #asahi-dev
nsklaus has quit [Remote host closed the connection]
bisko has joined #asahi-dev
bisko has quit []
brolin has quit [Ping timeout: 480 seconds]
darkapex has quit [Remote host closed the connection]
hightower3 has joined #asahi-dev
hightower2 has quit [Ping timeout: 480 seconds]
tobhe_ has joined #asahi-dev
brolin has joined #asahi-dev
tobhe has quit [Ping timeout: 480 seconds]
gabuscus has quit []
gabuscus has joined #asahi-dev
brolin has quit [Ping timeout: 480 seconds]
stipa is now known as Guest462
stipa has joined #asahi-dev
Guest462 has quit [Read error: Connection reset by peer]
bisko has joined #asahi-dev
seb91nihel has joined #asahi-dev
bisko has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
seb4nihel has quit [Ping timeout: 480 seconds]
martinr1 has quit [Ping timeout: 480 seconds]
martinr1 has joined #asahi-dev
dhat has quit [Quit: KVIrc 5.0.0 Aria http://www.kvirc.net/]
martinr1 has quit [Ping timeout: 480 seconds]
greguu has quit [Ping timeout: 480 seconds]
martinr1 has joined #asahi-dev
<jannau> ChaosPrincess: have you tried reverting the clock change?
martinr1 has quit [Ping timeout: 480 seconds]
martinr1 has joined #asahi-dev
martinr1 has quit [Ping timeout: 480 seconds]
chadmed_ has quit [Quit: Page closed]
martinr1 has joined #asahi-dev
StupidYui has joined #asahi-dev
tanty has quit [Remote host closed the connection]
tanty has joined #asahi-dev
greguu has joined #asahi-dev
StupidYui has quit [Remote host closed the connection]
StupidYui has joined #asahi-dev
martinr1 has quit [Ping timeout: 480 seconds]
StupidYui has quit [Remote host closed the connection]
StupidYui has joined #asahi-dev
StupidYui has quit [Remote host closed the connection]
StupidYui has joined #asahi-dev
StupidYui has quit [Remote host closed the connection]
StupidYui has joined #asahi-dev
sarucchi has quit [Quit: Textual IRC Client: www.textualapp.com]
tanty has quit [Quit: Ciao!]
sarucchi has joined #asahi-dev
hightower3 has quit [Ping timeout: 480 seconds]
dh-- has joined #asahi-dev
tanty has joined #asahi-dev
martinr1 has joined #asahi-dev
dh-- has quit [Remote host closed the connection]
dh-- has joined #asahi-dev
bps has joined #asahi-dev
cylm has quit [Ping timeout: 480 seconds]
bisko has joined #asahi-dev
darkapex has joined #asahi-dev
sarucchi has quit [Quit: Textual IRC Client: www.textualapp.com]
sarucchi has joined #asahi-dev
bps has quit [Ping timeout: 480 seconds]
sarucchi has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi-dev
timokrgr has quit [Quit: User left the chat]
bps has joined #asahi-dev
timokrgr has joined #asahi-dev
hightower2 has joined #asahi-dev
<kettenis> Creating a UEFI boot option seems to do the trick
Tomdownsouth has joined #asahi-dev
bps has quit [Ping timeout: 480 seconds]
bps has joined #asahi-dev
cylm has joined #asahi-dev
<marcan> maz: repro'd the KVM pathological behavior. it's really obvious if you have a GUI.
<marcan> the symptoms are consistent with host AIC IRQ delivery being broken or severely delayed while running the guest
<marcan> in particular, if the guest is running on p-cluster 0 and I put a host-side CPU burner on p-cluster 1, it gets better randomly (p-cluster IRQ delivery is random). if I put the burner on the e-cluster, everything goes back to normal since (in the default AIC config) IRQ delivery prioritizes the e-cluster (but prioritizes not waking up asleep CPUs first)
<maz> marcan: interesting. this would mean that a host interrupt doesn't kick us out of a guest? that'd be terminally broken...
<marcan> yup
<marcan> I wonder if there's a magic bit we need to set somewhere to fix that...
<marcan> I'll have to write a m1n1 test case first to prove this is the hardware and not KVM being terminally broken
<marcan> but if I can't repro in m1n1 we'll have to look at the latter option :p
<maz> well, we set IMO+FMO+AMO, so *everything* has to take us to EL2. If that's not obeyed, that's pretty bad.
<marcan> yup, but we also know Apple have a magic IMPDEF we need to poke to make AIC IRQ delivery work *at all* so...
<marcan> we know there's weird magic involved
<maz> anyway, if you find a KVM bug, you now where to shout!
<maz> know*
<kettenis> jannau: pushed the code to generate an EFI boot option to the asahi branch
Tomdownsouth has quit [Ping timeout: 480 seconds]
<kettenis> probably best to remove the ubootefi.var from the ESP before trying this diff
<marcan> kettenis: should I be removing that from linux in our update script?
<kettenis> the u-boot that you've been shipping had the option that generates this file disabled
<kettenis> so it shouldn't be needed
<kettenis> but I guess it doesn't hurt
<kettenis> until users start messing around with EFI variables themselves
<kettenis> I think Fedora does set EFI variables
<kettenis> (that file is where u-boot stores the EFI variables)
<jannau> kettenis: I'll try later
<kettenis> thanks
<kettenis> I really should install Asahi Linux on at least one of the machines I have
<jannau> I think some fedora variant shipped at some point u-boot built from the wrong branch and had efi variables enabled
<kettenis> that in itself shouldn't be a problem if they set the variables to point at the right partition and bootloader
<kettenis> so what my new code does is create a Boot0000 boot option that points at the ESP with the UUID given by asahi,efi-system-partition and creates a BootOrder variable that points at that option
<kettenis> if the BootOrder variable already exists, it doesn't do anything
<kettenis> this way, a newly installed system should boot from the right partition
<kettenis> but if a user (or OS installer) overrides BootOrder it will do what the user requested
Tomdownsouth has joined #asahi-dev
<marcan> maz: IRQs do kick us out of the guest but I think the latency is total crap. I suspect an IMPDEF to control that...
<marcan> actually no, the real latency looks small?
<kettenis> marcan: regarding the oslog buffers, wouldn't it be better for u-boot to allocate those in RAM and make sure the memory gets reserved in the FDT and the EFI memory map?
<marcan> no, because they go through a DART
<marcan> and passing through DART mappings is a world of pain (see DCP)
<marcan> the SRAM bypasses that
korreckj328 has joined #asahi-dev
<kettenis> ok
<maz> marcan: what latency do you measure? from what point to what point? m1n1 or KVM?
<kettenis> did you push the updated masks for iova and size for the oslog messages somewhere?
<marcan> kettenis: not yet
<marcan> maz: m1n1, but this is weird.
Z750 has quit [Quit: Ping timeout (120 seconds)]
Z750 has joined #asahi-dev
<marcan> maz: aaand it's the impdef register
<marcan> I don't know *why* it's broken, but the symptoms seem to be that the IRQ gets delivered but AIC somehow thinks it isn't, even after it gets acked, and ends up timing out and also delivering to some other CPU
<marcan> it's as if the "delivered to CPU core" signal never gets back to AIC even though it has
<marcan> that's with magic_reg = 0
<marcan> magic_reg = 3 is the boot value we know doesn't work
<marcan> turns out magic_reg = 2 makes everything work
<marcan> sooo... m1n1 fix.
<marcan> macOS sets magic_reg = 0 on normal boots, but who knows what it does with virtualization
<marcan> bet it changes
<maz> hmmm. I positively hate magic.
<maz> I wonder if that has something to do with how long it takes to ack the interrupt.
<marcan> the ack has the same latency in both cases (I'm grabbing the counter right before triggering the IRQ and right after acking it in the EL2 IRQ handler that fired from EL1 context)
<maz> because if you're interrupting while running a guest, it takes significantly longer to handle the interrupt than if you were running userspace.
<marcan> but it looks like what fails to ack here isn't the IRQ itself, it's the lower level mechanism that picks a CPU to interrupt
<marcan> I'm not sure how this causes breakage, but for example I wouldn't be surprised if this machinery running is blocking
<marcan> which means when the *next* IRQ comes in, until this thing times out, it doesn't get delivered
<marcan> so the first IRQ arrives fine, the next one is bork
<marcan> is my hypothesis
<marcan> another possiblity is that this is an intentional mechanism so that the extra HV IRQ latency is considered in the "pick a CPU" heuristic, and the HV is supposed to manually ack the low-level IRQ in some way with that bit clear
brolin has joined #asahi-dev
Tomdownsouth has quit [Ping timeout: 480 seconds]
sarucchi has joined #asahi-dev
<marcan> maz: I'm staring at the kernel and this looks like an errata on t600x...
<marcan> there's a chicken kernel arg and some special handling here, and it only applies to pcores
<marcan> lovely
<maz> huh.
<marcan> maz: ok no, it only enables the workaround on dev machines by default as far as I can tell, and I see the same behavior on ecores anyway.
<marcan> so there is *some* bug but it's not our problem
<maz> OK, so it all edges on this magic sysreg?
<marcan> yeah, what I'm trying to figure out is whether the hypervisor pokes it (which would answer the question) or whether we screwed something else up.
nela has quit [Ping timeout: 480 seconds]
gladiac has joined #asahi-dev
cylm has quit [Quit: WeeChat 3.6]
<marcan> maz: I give up, but setting the magic reg to 2 fixes it so shrug
<marcan> (tested on linux)
<marcan> pushing a new m1n1 now with that
<marcan> pushed to asahi-dev
<maz> marcan: fair enough, thanks for having investigated it. eventually, I need to rebuild my t6002 anyway, as it still runs on the stuff I built when I got it last year (it hasn't rebooted yet).
<maz> I'll get to try it then.,
<maz> uptime
<maz> 16:43:21 up 260 days, 23:40, 5 users, load average: 0.02, 0.08, 0.12
<kettenis> and you call yourself a kernel developer? ;)
<maz> hey, I have 3 other M1/M2 to play with! :D
<maz> (and no, I don't call myself a kernel developer anymore -- more like a part time tinkerer...)
martinr1 has quit [Ping timeout: 480 seconds]
brolin has quit [Ping timeout: 480 seconds]
brolin has joined #asahi-dev
korreckj328 has quit [Quit: Leaving]
brolin has quit [Ping timeout: 480 seconds]
brolin has joined #asahi-dev
brolin has quit [Ping timeout: 480 seconds]
brolin has joined #asahi-dev
stipa has quit [Ping timeout: 480 seconds]
brolin has quit [Ping timeout: 480 seconds]
brolin has joined #asahi-dev
drubrkletern has joined #asahi-dev
bps has quit [Ping timeout: 480 seconds]
brolin has quit [Ping timeout: 480 seconds]
bisko has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
bisko has joined #asahi-dev
hightower2 has quit [Ping timeout: 480 seconds]
bisko has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
bisko has joined #asahi-dev
bisko has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
roxfan2 has joined #asahi-dev
bisko has joined #asahi-dev
bisko has quit []
roxfan has quit [Ping timeout: 480 seconds]
sarucchi has quit [Ping timeout: 480 seconds]
nela has joined #asahi-dev
salimterryli has joined #asahi-dev
SalimTer- has quit [Ping timeout: 480 seconds]
gladiac has quit [Quit: k thx bye]
<marcan> at least today's distraction had me find more stuff :p
<marcan> some of those chicken bit names are fun
<j`ey> SIQ?
<j`ey> (SYS_IMP_APL_SIQ_CFG_EL1)
<marcan> beats me
<marcan> that's what it's called though
<j`ey> Smart Interrupt? :p
<sven> :D
<jannau> dcpext seems to be stable when this display is powered over usb-c
<sven> I’ve started to work on a uh.. hack that bypasses most of typec/altmode for now
<jannau> sounds sensible, I think I hacked typec/displayport to point of breaking usb for better dp reliability
nela has quit [Read error: Connection reset by peer]
nela has joined #asahi-dev
compassion4 has joined #asahi-dev
compassion47 has joined #asahi-dev
compassion4 has quit []
compassion has quit [Ping timeout: 480 seconds]
compassion47 is now known as compassion
eiln has joined #asahi-dev
nela has quit [Remote host closed the connection]
nela5 has joined #asahi-dev
nela5 is now known as nela
nela has quit []
nela has joined #asahi-dev
<eiln> the problem w/ t6000 seems to be:
<eiln> TTY> pmgr: timeout while trying to set mode f for device at 0x28e0802c8: 10f
<eiln> which seems to be the PMGR_WAS_PWRGATED flag
<eiln> the updated experiments/ane_t600x_power.py in the ane-power branch (https://github.com/eiln/m1n1.git) tries to manually powergate ane_sys_cpu before turning it on
<eiln> please lmk if anyone w/ a t600x ane0 can test
<jannau> eiln: seems to work
<eiln> jannau: do you mind attaching the mon.poll()? the other one wasn't fully on then
<eiln> jannau: ah finally looks right!! thank you sm :)
brolin has joined #asahi-dev
<eiln> take 2, lmk if experiments/ane.py under ane-t6000 branch works (either ane0/ane2)
<eiln> if it does i should fix src/pmgr.c to handle pwrgating
brolin has quit [Ping timeout: 480 seconds]
<jannau> eiln: math checks out, there was "pmgr: timeout while trying to set mode f for device at 0x28e0802c8: b0f" on the first run on both ane0 and ane2
<jannau> 0x228e0802c8 for ane2
brolin has joined #asahi-dev
<eiln> jannau: woohoo! the first run is expected since it's not pwrgated yet
cylm has joined #asahi-dev
hightower2 has joined #asahi-dev
drubrkletern has quit [Remote host closed the connection]
nsklaus has quit [Ping timeout: 480 seconds]