marcan changed the topic of #asahi to: Asahi Linux: porting Linux to Apple Silicon macs | General project discussion | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Topics: #asahi-dev #asahi-re #asahi-gpu #asahi-stream #asahi-offtopic | Keep things on topic | Logs: https://alx.sh/l/asahi
<marcan>
they should be at least clock gated while waiting
<alyssa>
is this going to rebuild the whole kernel?
<alyssa>
oh looks like it is. moan.
<marcan>
alyssa: userspace should boot equally fast, and the kernel *shouldn't* spend too much time before it does the pstates thing? it's like <1 second for me
<marcan>
if you have a big difference in userspace perf then something is weird
<alyssa>
marcan: i do (~20%), maybe schedutil is stupid
<sorear>
i'd be concerned about machine safety if anything - are there any circumstances where the laptop is likely to overheat if m1n1 sets it to full power and the kernel hangs or panics before it has a chance to take over pstates?
<marcan>
not really, pretty sure there's failsafes for that
<marcan>
but the point is this shouldn't be necessary
<alyssa>
yeah isn't like the #1 job of the SMC making sure that doesn't happen
<marcan>
alyssa: can you benchmark an e-core and a p-core (maybe all of them) with/without cpu_pstates?
<marcan>
I wonder if I botched the DVMR thing
<marcan>
actually, can you try with cpu_pstates but commenting out set_pstate(1, 15)?
<marcan>
if that still works then it's definitely the DVMR thing
<marcan>
do you get "Initializing cluster (DVMR: %d)" lines on boot?
<marcan>
(I don't get any of this because it's unnecessary on 12.0...)
riker77_ has joined #asahi
<marcan>
that one might be something to move to m1n1 tbh
riker77 has quit [Ping timeout: 480 seconds]
riker77_ is now known as riker77
<alyssa>
uhhh
<alyssa>
my machine definitely needs the DVMR thing, cpu_pstates does it right
<alyssa>
chromebook is busy rebuilding the entire kernel for the 2nd time today because of course
<alyssa>
I do not see cluster lines at boot
<alyssa>
I do see apple-mcc performance driver init
<alyssa>
wtf I saw this earlier
<alyssa>
ok, if I don' run cpu_pstates.py at boot, I get the Initializing cluster DVMR 0/1 lines
<alyssa>
if I do run cpu_pstates.py at boot, I don't
<alyssa>
1.660 + 1.540 --- this is without running it and with the cluster lines
<alyssa>
0.992 + 1.567 -- this is with it
<alyssa>
oh in this case userspace is actually slower with it. so maybe the diff in userspace perf is not statistically significant
<alyssa>
kernel is still noticeably faster there
<alyssa>
(40% in that case)
<alyssa>
with the pstates, max freqs 2064/3204
<alyssa>
without the pstates, max_freqs 2064/3204
<alyssa>
so that would seem like it's working? unless that's just from the DT and not the regs (unlike the python..)
<alyssa>
oh that's the DT
<alyssa>
but the fact the initializing cluster messages go away with pstates.py means everything works as expected so it's. not that
<alyssa>
marcan: oh, but wait! the initializing cluster messages are 0.9s and 1.0s into boot.
<alyssa>
whereas with pstates.py first, the whole kernel finishes booting in 1.0s
<alyssa>
so the problem isn't the cluster driver itself, it's just the cluster driver starts way too late in boot
<alyssa>
I do hope this works after all this T_T
robinp has quit [Ping timeout: 480 seconds]
PhilippvK has joined #asahi
phiologe has quit [Ping timeout: 480 seconds]
<alyssa>
(It does not. Though possibly I botched the firmware.)
<alyssa>
Oh geez. Corellium is doing the firmware dance in userspace. That code isn't public is it?
<alyssa>
uh
* alyssa
litters with pritnk
<alyssa>
..that's racy
<alyssa>
(IRQ coming in during setup)
<alyssa>
wonder if I have something funny with my setup
<alyssa>
since I'd sort of expect that to be broken for everyone
<alyssa>
unless the driver bug only happens with wrong firmware
<kdrag0n[m]>
marcan: how'd you get capacity-dmips-mhz 326 for the little cluster?
<kdrag0n[m]>
not sure how to pin freqs on macOS for testing, but those freqs will skew scheduler capacities if left that way
<alyssa>
kdrag0n[m]: this is good stuff btw
marvin24_ has joined #asahi
marvin24 has quit [Ping timeout: 480 seconds]
robinp has joined #asahi
<marcan>
kdrag0n[m]: that is not a problem with the cpufreq driver, it's a problem with the lack of a cpuidle driver
<marcan>
it can only boost to 3.2GHz on a single core at once
<marcan>
that requires the other cores to be in deep ("down") sleep, which requires a cpuidle driver since wfi in that mode is non-state-preserving
<marcan>
m1n1 currently does that properly in its internal SMP code, which is why you can benchmark 3.2GHz there
<marcan>
but not in Linux, so for the time being, even though you can request 3.2GHz, the hardware will cap you at 3GHz
<marcan>
(feel free to comment out those pstates in the devicetree if you don't want it to "lie")
<marcan>
kdrag0n[m]: the capacity-dmips-mhz numbers are based on dhrystone benchmarks
<marcan>
I calculated those on my last stream
<marcan>
713 for the e-cores looks like a way too small difference
<marcan>
I did consider the frequency to be 2988 when doing the math
<marcan>
this might be because CoreMark does a worse job exercising the pcores' wide instruction dispatch than Dhrystone?
<marcan>
alyssa: okay, so everything is working as intended then
<marcan>
in that case I will probably have m1n1 put the p-cores into 2GHz state or so
<marcan>
not full throttle, something reasonable that should speed up boot
<marcan>
and also I might just move the DVMR thing in there because it's one fewer thing for linux to worry about, and 12.0 does it anyway so even *Apple* thinks that belongs in the bootloader
<marcan>
I was running on the hypervisor, so that might skew things a bit (timer IRQs galore), but it shouldn't be enough to skew the numbers majorly I'd hope
<marcan>
I can try again on bare metal
<kdrag0n[m]>
marcan: ah that explains the freq behavior
<kdrag0n[m]>
those freqs should probably be marked with the boost flag then
<marcan>
yeah, though I'm not sure if that makes a difference?
<marcan>
if the scheduler does something useful with that, then yes
<kdrag0n[m]>
I don't think the scheduler checks it, but it's a semantic difference at least
<marcan>
I'm not sure there's a way to describe the "1 core 3.2, 2 cores 3.1, 3+ cores 3.0" relationship that the chips have
<kdrag0n[m]>
not sure how to deal with that from a scheduler standpoint
<marcan>
yeah
<marcan>
though it's a <10% difference anyway so it shouldn't be a massive problem
<marcan>
the whole dmips/mhz thing is massively application-dependent anyway
<marcan>
e.g. if you're running a spinloop both CPU cores have exactly the same performance (one iteration per cycle)
psykose has quit [Remote host closed the connection]
everslick has quit [Remote host closed the connection]
everslick has joined #asahi
<kdrag0n[m]>
marcan: I looked into the capacity numbers a bit more, and it seems more like Dhrystone is the one that's not very realistic
<alyssa>
🍿
<alyssa>
sven: atcphy scares me
<kdrag0n[m]>
I ran the benchmarks again with only one active P-core (thanks for the tip) and according to CoreMark, 3.2 GHz is slightly faster than my Zen 2 desktop at max freq
<kdrag0n[m]>
sounds about right given what people have said about the M1
<kdrag0n[m]>
and if that reference is right, then the e-cluster should be too
<kdrag0n[m]>
IPC values look reasonable compared to Snapdragon 888: 10.9 C/MHz (Firestorm) vs 9.3 (Cortex-X1), 7.6 (Icestorm) vs 3.7 (A55) which seems reaosnable considering how old the A55 is