<sven>
weird. that wfe definitely makes everything slow here
<j`ey>
sven: in your spin_unlock disassembly, is it using `stlr` to store -1 to the lock?
<sven>
25538: 08 00 80 92 mov x8, #-1
<sven>
2553c: 08 fc 9f c8 stlr x8, [x0]
<sven>
looks like it actually starts getting slower after "SMP mode to WFE" in the guest
<sven>
*"Setting SMP mode to WFE"
<j`ey>
dont see how that interacts with the spinlocks.. hm
<sven>
could also be coincidence ofc
<jannau>
judging from my uart output I'd say it starts getting slow when the chainloaded m1n1 disables the MMU
<alethkit>
Is there in-depth documentation on the memory model/architecture behind Apple Silicon? At the moment, I have the impression that it is heterogeneous, with asymmetric multiprocessing, but with UMA (and I presume, cache coherence) instead of x86-like NUMA?
<sven>
hm... so for me it already prints the "setting mode to wft" pretty slowly and that happens before the MMU is disabled I think
<jannau>
kernel gets back to normal speed shortly after initializing aic but I believe before it initializes the the secondary CPUs
<sven>
i've not been patient enough to wait for that :D
<sven>
alright, it slowed down just after "Setting SMP mode to WFE..." again
<jannau>
is it consistently slow for you? it's random for me. I'd say with 10% probablity of glacially slow
<sven>
yeah, pretty much
<sven>
if I leave the wfe in it's always glacially slow
<sven>
looks like it's even stuck at printing the next line after that smp mode now
<sven>
I don't think it'll ever get to the kernel here fwiw
<sven>
still stuck at that next line
<jannau>
strange, when it's not glacially slow for me it's 50% okay-ish and 50% clearly slow but not that slow that I reboot the HV
<sven>
this is on a m1 mini fwiw
<sven>
I don't think it'll get any further
<sven>
I can't even ctrl+c to the shell, I have to do a hard reboot
<jannau>
I see it on m1 and m1 max
<jannau>
if it's glacially slow it can take a few seconds until crtl+c get's me to the shell
<sven>
been waiting for two minutes now and nothing happens
<jannau>
it might been that slow for me once or twice and I rebooted. ~3 minutes for booting the kernel is what I call glacially slow
<jannau>
and I can't reproduce it at the moemt but it's much faster without the wfe
<sven>
without that wfe everything works fine for me
<jannau>
with an explicit 'sev' in spin_unlock everything is fast as well but the store release is supposed to send an event
<j`ey>
thats why I was wondering if it had somehow compiled without 'stlr'..
<sven>
yup, with an sev in spin_unlock it's also fast here
<j`ey>
is the mmu off at this point for you?
<sven>
hmm?
<sven>
it becomes slow for me before the mmu is off afaict
<j`ey>
there's some stuff about atomics and needing the SCTLR_ELx C (cacheable) bit set, that I dont quite understand. but if the mmu is still on then its not that
<sven>
it does disable the MMU a bit later and there could be a delay in the messages arriving over USB
<jannau>
j`ey: doesn't the store need be exclusive for the event trigger? i.e. either stlxr or stxr?
<jannau>
so the explicit sev in spin_unlock should be correct when armv8.1-a atomics are used
<j`ey>
tf-a just uses stlr..
<jannau>
B2.9.2 has a note which suggests the global monitor needs setup to work with address translation disabled
<j`ey>
yeah that's the bit ive been looking at
<sven>
adding that sev to the unlock has completely fixed it here fwiw
chadmed has quit [Ping timeout: 480 seconds]
<jannau>
yeah, here too. had I bothered to check a non regressed build I would have looked earlier for the cause. even my ok-ish speed is very slow compared sev in spin_unlock
<sven>
<3 git bisect :-)
chadmed has joined #asahi-dev
chadmed has quit []
<jannau>
only question is if we can still hit the ordering issue from 9c795fbdbf445d with the mmu disabled
derzahl has quit [Remote host closed the connection]