marcan changed the topic of #asahi to: Asahi Linux: porting Linux to Apple Silicon macs | General project discussion | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Topics: #asahi-dev #asahi-re #asahi-gpu #asahi-stream #asahi-offtopic | Keep things on topic | Logs: https://alx.sh/l/asahi
<alyssa> /usr/share/firmware/wifi/C-4378__s-B1/ is.... erm...... large......
<alyssa> It's not at all clear to me how preloader-m1 and linux-m1 work together here.
<alyssa> oh joy, t2linux has the same problems
tylo has joined #asahi
povik has quit [Quit: Page closed]
<alyssa> what even
tylo1 has joined #asahi
tylo has quit [Read error: No route to host]
ihaveamac has quit [Quit: fail]
<alyssa> ugh this really needs to be a kernel module doesn't it...
<alyssa> (Or I really need an initramfs, or to bundle the blob with the kernel image, or..)
<alyssa> dont use blobs, kids..
tylo1 has quit [Ping timeout: 480 seconds]
<psykose> i wonder what drug would have blob form
<jevinskie[m]> Just add LSD. Just in time for trick or treating! https://en.m.wikipedia.org/wiki/Candy_Buttons
<marcan> kettenis: the cores don't spin
<marcan> there is a wfe in the "spin"loop
<marcan> they should be at least clock gated while waiting
<alyssa> is this going to rebuild the whole kernel?
<alyssa> oh looks like it is. moan.
<marcan> alyssa: userspace should boot equally fast, and the kernel *shouldn't* spend too much time before it does the pstates thing? it's like <1 second for me
<marcan> if you have a big difference in userspace perf then something is weird
<alyssa> marcan: i do (~20%), maybe schedutil is stupid
<sorear> i'd be concerned about machine safety if anything - are there any circumstances where the laptop is likely to overheat if m1n1 sets it to full power and the kernel hangs or panics before it has a chance to take over pstates?
<marcan> not really, pretty sure there's failsafes for that
<marcan> but the point is this shouldn't be necessary
<alyssa> yeah isn't like the #1 job of the SMC making sure that doesn't happen
<marcan> alyssa: can you benchmark an e-core and a p-core (maybe all of them) with/without cpu_pstates?
<marcan> I wonder if I botched the DVMR thing
<marcan> actually, can you try with cpu_pstates but commenting out set_pstate(1, 15)?
<marcan> if that still works then it's definitely the DVMR thing
<marcan> do you get "Initializing cluster (DVMR: %d)" lines on boot?
<marcan> (I don't get any of this because it's unnecessary on 12.0...)
riker77_ has joined #asahi
<marcan> that one might be something to move to m1n1 tbh
riker77 has quit [Ping timeout: 480 seconds]
riker77_ is now known as riker77
<alyssa> uhhh
<alyssa> my machine definitely needs the DVMR thing, cpu_pstates does it right
<alyssa> chromebook is busy rebuilding the entire kernel for the 2nd time today because of course
<alyssa> I do not see cluster lines at boot
<alyssa> I do see apple-mcc performance driver init
<alyssa> wtf I saw this earlier
<alyssa> ok, if I don' run cpu_pstates.py at boot, I get the Initializing cluster DVMR 0/1 lines
<alyssa> if I do run cpu_pstates.py at boot, I don't
<alyssa> 1.660 + 1.540 --- this is without running it and with the cluster lines
<alyssa> 0.992 + 1.567 -- this is with it
<alyssa> oh in this case userspace is actually slower with it. so maybe the diff in userspace perf is not statistically significant
<alyssa> kernel is still noticeably faster there
<alyssa> (40% in that case)
<alyssa> with the pstates, max freqs 2064/3204
<alyssa> without the pstates, max_freqs 2064/3204
<alyssa> so that would seem like it's working? unless that's just from the DT and not the regs (unlike the python..)
<alyssa> oh that's the DT
<alyssa> but the fact the initializing cluster messages go away with pstates.py means everything works as expected so it's. not that
<alyssa> marcan: oh, but wait! the initializing cluster messages are 0.9s and 1.0s into boot.
<alyssa> whereas with pstates.py first, the whole kernel finishes booting in 1.0s
<alyssa> so the problem isn't the cluster driver itself, it's just the cluster driver starts way too late in boot
<alyssa> I do hope this works after all this T_T
robinp has quit [Ping timeout: 480 seconds]
PhilippvK has joined #asahi
phiologe has quit [Ping timeout: 480 seconds]
<alyssa> (It does not. Though possibly I botched the firmware.)
<alyssa> Oh geez. Corellium is doing the firmware dance in userspace. That code isn't public is it?
<alyssa> uh
* alyssa litters with pritnk
<alyssa> ..that's racy
<alyssa> (IRQ coming in during setup)
<alyssa> wonder if I have something funny with my setup
<alyssa> since I'd sort of expect that to be broken for everyone
<alyssa> unless the driver bug only happens with wrong firmware
<kdrag0n[m]> marcan: how'd you get capacity-dmips-mhz 326 for the little cluster?
<kdrag0n[m]> I'm getting ~713 with both your new cpufreq driver and Corellium's: https://github.com/kdrag0n/linux-m1/commit/83b4bfbb156929c89fe1212b98bff27b74233667
<kdrag0n[m]> explanation is in the commit message
<kdrag0n[m]> it also looks your cpufreq driver has the same issue as Corellium's, where freqs > 2988 MHz are identical to 2988 in terms of performance: https://github.com/kdrag0n/freqbench/blob/master/results/t8103/asahi-cpufreq/run.log#L46-L49
<kdrag0n[m]> not sure how to pin freqs on macOS for testing, but those freqs will skew scheduler capacities if left that way
<alyssa> kdrag0n[m]: this is good stuff btw
marvin24_ has joined #asahi
marvin24 has quit [Ping timeout: 480 seconds]
robinp has joined #asahi
<marcan> kdrag0n[m]: that is not a problem with the cpufreq driver, it's a problem with the lack of a cpuidle driver
<marcan> it can only boost to 3.2GHz on a single core at once
<marcan> that requires the other cores to be in deep ("down") sleep, which requires a cpuidle driver since wfi in that mode is non-state-preserving
<marcan> m1n1 currently does that properly in its internal SMP code, which is why you can benchmark 3.2GHz there
<marcan> but not in Linux, so for the time being, even though you can request 3.2GHz, the hardware will cap you at 3GHz
<marcan> (feel free to comment out those pstates in the devicetree if you don't want it to "lie")
<marcan> kdrag0n[m]: the capacity-dmips-mhz numbers are based on dhrystone benchmarks
<marcan> I calculated those on my last stream
<marcan> 713 for the e-cores looks like a way too small difference
<marcan> I did consider the frequency to be 2988 when doing the math
<marcan> this might be because CoreMark does a worse job exercising the pcores' wide instruction dispatch than Dhrystone?
<marcan> alyssa: okay, so everything is working as intended then
<marcan> in that case I will probably have m1n1 put the p-cores into 2GHz state or so
<marcan> not full throttle, something reasonable that should speed up boot
<marcan> and also I might just move the DVMR thing in there because it's one fewer thing for linux to worry about, and 12.0 does it anyway so even *Apple* thinks that belongs in the bootloader
<marcan> kdrag0n[m]: https://youtu.be/Yqc-sDg5YZE?t=11619 this is where I do the benchmarks
<marcan> I was running on the hypervisor, so that might skew things a bit (timer IRQs galore), but it shouldn't be enough to skew the numbers majorly I'd hope
<marcan> I can try again on bare metal
<kdrag0n[m]> marcan: ah that explains the freq behavior
<kdrag0n[m]> those freqs should probably be marked with the boost flag then
<marcan> yeah, though I'm not sure if that makes a difference?
<marcan> if the scheduler does something useful with that, then yes
<kdrag0n[m]> I don't think the scheduler checks it, but it's a semantic difference at least
<marcan> I'm not sure there's a way to describe the "1 core 3.2, 2 cores 3.1, 3+ cores 3.0" relationship that the chips have
<kdrag0n[m]> not sure how to deal with that from a scheduler standpoint
<marcan> yeah
<marcan> though it's a <10% difference anyway so it shouldn't be a massive problem
<marcan> the whole dmips/mhz thing is massively application-dependent anyway
<marcan> e.g. if you're running a spinloop both CPU cores have exactly the same performance (one iteration per cycle)
psykose has quit [Remote host closed the connection]
psykose has joined #asahi
ar has quit [Ping timeout: 480 seconds]
nepeat has quit [Ping timeout: 480 seconds]
nepeat has joined #asahi
tylo1 has joined #asahi
tylo1 has quit [Ping timeout: 480 seconds]
<j_ey> alyssa: you found https://github.com/AsahiLinux/linux/commits/sven/20210829 that has the smc driver right? I have some extra patches that start to add a debugfs on top
ar has joined #asahi
<j_ey> (doesnt have smc gpio though, which I see is what you needed)
povik has joined #asahi
tylo1 has joined #asahi
nepeat has quit [Remote host closed the connection]
aleasto has joined #asahi
tylo1 has quit [Ping timeout: 480 seconds]
nepeat has joined #asahi
povik has quit [Quit: Page closed]
tylo1 has joined #asahi
tylo1 has quit [Ping timeout: 480 seconds]
gabuscus has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
gabuscus has joined #asahi
nico_32 has quit [Ping timeout: 480 seconds]
nico_32 has joined #asahi
<alyssa> j_ey: nod
nico_32 has quit [Ping timeout: 480 seconds]
nico_32 has joined #asahi
tomtastic has quit [Ping timeout: 480 seconds]
tomtastic has joined #asahi
aleasto has quit [Quit: Konversation terminated!]
aleasto has joined #asahi
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi
aleasto has quit []
aleasto has joined #asahi
<alyssa> maz: Not sure if it makes sense to squash https://rosenzweig.io/0001-arm64-dts-apple-j274-Expose-PCI-node-for-the-Wi-Fi-M.patch and the obvious Bluetooth change into "arm64: dts: apple: j274: Expose PCI node for the Ethernet MAC address"
vup2 has quit [Remote host closed the connection]
vup has joined #asahi
confusomu has joined #asahi
confusomu has quit [Read error: Connection reset by peer]
confusom1 has joined #asahi
kov has joined #asahi
tylo1 has joined #asahi
frode_0xa has joined #asahi
frode_0xa has quit []
frode_0xa has joined #asahi
ihaveamac has joined #asahi
vup has quit [Ping timeout: 480 seconds]
marvin24 has joined #asahi
marvin24_ has quit [Ping timeout: 480 seconds]
frode_0xa has quit [Quit: leaving]
frode_0xa has joined #asahi
<kettenis> alyssa: not sure providing a default mac address that isn't 00:00:00:00:00:00 is helpful
<kettenis> maz did provide one for ethernet, because that's the address it ends up with if the device tree doesn't provide an address
<kettenis> but even there it probably isn't helpful
vup has joined #asahi
<alyssa> kettenis: ack. that was copypaste fail.
<alyssa> fixed locally, thanks
yrlf has quit [Quit: The Lounge - https://thelounge.chat]
yrlf has joined #asahi
everslick has quit [Remote host closed the connection]
everslick has joined #asahi
<kdrag0n[m]> marcan: I looked into the capacity numbers a bit more, and it seems more like Dhrystone is the one that's not very realistic
<alyssa> 🍿
<alyssa> sven: atcphy scares me
<kdrag0n[m]> I ran the benchmarks again with only one active P-core (thanks for the tip) and according to CoreMark, 3.2 GHz is slightly faster than my Zen 2 desktop at max freq
<kdrag0n[m]> sounds about right given what people have said about the M1
<kdrag0n[m]> and if that reference is right, then the e-cluster should be too
<kdrag0n[m]> IPC values look reasonable compared to Snapdragon 888: 10.9 C/MHz (Firestorm) vs 9.3 (Cortex-X1), 7.6 (Icestorm) vs 3.7 (A55) which seems reaosnable considering how old the A55 is
aleasto has quit [Quit: Konversation terminated!]
<kdrag0n[m]> new commit based on the new data that includes 3.2 GHz (very little difference): https://github.com/kdrag0n/linux-m1/commit/05c296604a42
<j_ey> heh cool I didnt know the 888 has cortex-X1 cortex-a78 and cortex-a55
<alyssa> there are so many registers
<alyssa> right so errr
<alyssa> Good news, dcpext is 100% compatible with dcp so that's fine
<alyssa> Bad news is macOS does a ton of ATCPHY reg pokes and I don't know what's strictly needed here.
<alyssa> and there are a lot of tunables and I don't get the corellium code at all.
<kdrag0n[m]> btw is tracing pmgr supposed to work? macOS panics within a few seconds every time I add the tracer
<alyssa> sven: oh FFS
<alyssa> these two configs are distinct:
<alyssa> displayport over usb type-c
<alyssa> displayport over usb4/thunderbolt over usb type-c
<alyssa> all the ACIOPHY XBAR complexity in the corellium atcphy code ... given ACIO is TBT, maybe that only applies to the latter..
<alyssa> I mean we need both but.
<alyssa> I guess there are ATCPHY things regardless. but this is good to keep in mind when corellium references DP
<alyssa> what's the deal with the tunables for atcphy though
<alyssa> oh, but, err, how different are those configs exactly i mean
<alyssa> I have an adaptor that does HDMI(<->displayport) and USB on the same type-C port, that's not thunderbolt
<alyssa> I guess that corresponds to REG_ACIOPHY_XBAR_PROTO_USB0_DP1 or REG_ACIOPHY_XBAR_PROTO_DP0_USB1 depending on the orientation
<alyssa> okay that's enough trying to understand the monstrosity that is the modern USB pec
<alyssa> *spec
<alyssa> at any rate, I'm satisfied that my DCP driver structure will be able to cope. PHY driver issues aside.