ChanServ changed the topic of #asahi-dev to: Asahi Linux: porting Linux to Apple Silicon macs | General development | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-dev
<marcan> too many cores probably
<marcan> should slow down the HV tick on at least the secondaries
<marcan> that plus the BHL means it doesn't scale
<marcan> I forget if I take the BHL to do local FIQ handling... I should probably move at least that to finer-grained locking, so the HV ticks don't fight each other. that'll at least make it scale.
<kevans91> big hairy lock?
PhilippvK has joined #asahi-dev
phiologe has quit [Ping timeout: 480 seconds]
<marcan> big hypervisor lock
nicolas17 has quit [Quit: Konversation terminated!]
kode54 has quit [Quit: The Lounge - https://thelounge.chat]
KDDLB has quit [Quit: The Lounge - https://thelounge.chat]
KDDLB has joined #asahi-dev
KDDLB has quit []
KDDLB has joined #asahi-dev
KDDLB has quit [Quit: The Lounge - https://thelounge.chat]
KDDLB has joined #asahi-dev
kode54 has joined #asahi-dev
gladiac is now known as Guest1648
gladiac has joined #asahi-dev
Guest1648 has quit [Ping timeout: 480 seconds]
user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
user982492 has joined #asahi-dev
<kevans91> hmm, am I reading these pci nodes correctly? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/apple/t8103.dtsi#n446 -> my really limited understanding is that reg should encode the b:d:f (which I think we (FreeBSD) are doing wrong), but these reg values also don't seem to line up with the iommu-map if I'm reading that right
<kevans91> i.e. there's no way a masked rid from <0x0 0x0 0x0 0x0 0x0> could possibly fall into [0x100, 0x101), [0x200, 0x201), or [0x300, 0x301)
<kevans91> so it feels like either i'm reading something wrong, or this is inconsistent and one of those (presumably the iommu-map) are wrong
<kevans91> otoh, linux presumably gets this right
viridian_ has joined #asahi-dev
JayasPJacob[m] has joined #asahi-dev
<JayasPJacob[m]> hi i would like to contribute for this awesome project
<kettenis> kevans91: there are nu PCI bus master capable devices on bus 0
<kettenis> s/nu/no/
MajorBiscuit has joined #asahi-dev
the_lanetly_052 has joined #asahi-dev
user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
MajorBiscuit has quit [Ping timeout: 480 seconds]
jluthra_ has joined #asahi-dev
viridian_ has quit [Quit: -yes]
jluthra has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #asahi-dev
zoeyrae has quit [Quit: The Lounge - https://thelounge.chat]
the_lanetly_052 has quit [Remote host closed the connection]
the_lanetly_052 has joined #asahi-dev
Major_Biscuit has joined #asahi-dev
MajorBiscuit has quit [Ping timeout: 480 seconds]
riker77 has quit [Quit: Quitting IRC - gone for good...]
riker77 has joined #asahi-dev
M76cw2gnqjj[m] has joined #asahi-dev
Major_Biscuit has quit []
MajorBiscuit has joined #asahi-dev
kameks has joined #asahi-dev
AnalogDigital[m] has left #asahi-dev [#asahi-dev]
<arnd> I ran into some funny performance issue on the second half of the T6002. I noticed that a BLAS/GEMM benchmark is a lot slower on 10 cores of the M1 Ultra than an M1 Max MBP. With some more investigation I found that the first 10 cores are about the same, but the second set of firestorm cores appears slower. Doing the test individually on each core shows 30 GFLOPS on any icestorm, 90 GFLOPS on the first eight firestorm cores and 60
<arnd> GFLOPS on the other eight
<j`ey> did jannau fix cpufreq in linux?
<j`ey> ok maybe it doesn't actually need any fixes, after a quick look
<arnd> j`ey: what was the issue? I see the firestorm cores running at 3228000 HZ in /sys/devices/system/cpu/cpu12/cpufreq/cpuinfo_cur_freq during the test
<_jannau_> looks like cpufreq doesn't work on the second die. m1n1 initializes all cores to 2 GHz, (max for icestorm, 2/3 of max for firestorm)
<arnd> ok, that explains it then
<arnd> I guess cpuinfo_cur_freq shows what the kernel has requested, not what the hardware actually does
<_jannau_> yes, 3.228 GHz is even wrong for the cores of the 1st die. that's only reached when the other cores of the cluster are in deep sleep
<arnd> $ cat /sys/devices/system/cpu/cpu12/cpufreq/affected_cpus
<arnd> 2 3 4 5 12 13 14 15
<arnd> so it thinks that the cores in the second cluster always run at the same clk as the corresponding ones in the first one. looking at cur_freq confirms that
<_jannau_> that's wrong, looks like I botched the cpufreq dt
<jannau> missed to update the core's "apple,freq-domain = <&cpufreq_hw x>;" for the second die
<jannau> should be 3, 4, 5 instead of 0, 1, 2
<arnd> jannau, right, I just tried that locally
<arnd> I had to change the reg-names as well, but now it seems to work
<arnd> jannau: should I send you a proper patch, or do you just want to fold it in?
<arnd> not sure if marcan ends up redoing it all anyway
<_jannau_> I'll fold it in, we will probably end up redoing it anyway
<arnd> ok
<arnd> I got a typo in cpu_p00_d1, that needs to be 4, not 3
<arnd> with the wrong number, CPU 12 dropped down to 18 GFLOPS...
<_jannau_> thanks. strange that I still saw a 2x performance improvement in kcbench compared to m1 max
<j`ey> redo the measurements later :D
<arnd> jannau: what is the maximum frequency under constant load? If the m1 max gets throttled to the same 2GHz after a while, that still works out
<_jannau_> 3 GHz are the expected all core full load frequency for the performance cores. even in the macbook pro it should be sustainable for a longer period of time. we need to look for the actual core frequency
riker77 has quit [Ping timeout: 480 seconds]
robinp_ has quit [Ping timeout: 480 seconds]
<arnd> it scales almost perfectly now: 591 GFLOPS for either set of 8 firestorm cores, 1171 GFLOPS for all 16
robinp has joined #asahi-dev
<arnd> interestingly it goes down to 886 GFLOPS if I use all 20 cores including the small ones, but I think that's just blis/gemm not being aware of big/little cores, and making the big ones wait
kov has joined #asahi-dev
riker77 has joined #asahi-dev
<jannau> 10% faster, 70s instead of 78s for make vmlinux with linux-5.15 arm64 defconfig. I guess I have to redo it on the m1 max as well
c10l has quit [Quit: Bye o/]
<j`ey> it was 5mins on my m1 air D: (for 5.18ish defconfig)
c10l has joined #asahi-dev
riker77 has quit [Quit: Quitting IRC - gone for good...]
<sven> yay, nvmem (not to be confused with NVMe) which was even more boring than watchdog got merged :D
<j`ey> sven: woot
riker77 has joined #asahi-dev
MajorBiscuit has quit [Ping timeout: 480 seconds]
<arnd> jannau: on this many cores, the total runtime is not all that meaningful because half the build time is spent in single-threaded work like linking or parsing the Makefiles, it's often better to compare CPU time (user+system from /bin/time, or output from perf stat) to see how much work the CPUs got done
MajorBiscuit has joined #asahi-dev
<maz> j`ey: 5 minutes for 'make vmlinux'? that's odd. it takes half that time on my mini (make -j9 vmlinux).
<j`ey> I cant remember what -j I used now, less than 9 for sure. I also only have the 8GB ram model
<maz> 8GB is more than enough. the compiler used may have a significant impact though (I use GCC 10.2.1)
<j`ey> I was having OOM issues with LLVM, so felt like being convservative :P will run more tests at some point. I was using llvm/clang
<j`ey> (OOM issues *building* LLVM)
<j`ey> maz: any news on your studio?
<maz> j`ey: landing expected first half of May...
<j`ey> oof
___nick___ has joined #asahi-dev
___nick___ has quit []
atsalyuk has joined #asahi-dev
___nick___ has joined #asahi-dev
<mps> j`ey: `real 2m 26.38s` on mbp with `busybox time make -j9 vmlinux`, though not defconfig but current linux-asahi for alpine config
<j`ey> hmmm real 4m 4.61s
<j`ey> I cant imagine clang could be that much slower.. retesting with gcc
<mps> what is PAGE_SIZE of running kernel
<j`ey> 16
<mps> to complete pkg build (apk for alpine) take about 4m 40s
<mps> s/to/to me/
<j`ey> real 5m 51.31s, with gcc...
<arnd> the time it takes for building a kernel can differ hugely based on a single Kconfig option, such as CONFIG_DEBUG_INFO, the exact toolchain version, or how the compiler was built
<j`ey> I'll try mps's config later, since we have the same gcc (from alpine)
<arnd> j`ey: clang is usually some 10% to 20% slower than gcc for a defconfig build, but it's more sensitive to options that lead to larger indirect header inclusions
<j`ey> clang was 2mins faster for me
<j`ey> I need to check how alpine's gcc is built
<arnd> https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/arm64/ has toolchains that should work on any distro, as long as you have glibc, using those should make it easier to compare across systems
<j`ey> yeah, alpine has musl :)
<j`ey> I was going to ask for your commands to build those, so I could get a build as similar as possible..
<arnd> it might be enough to copy libc.so.6 from any other distro, but I haven't tried
<arnd> the toolchains are based on segher's 'buildall' scripts, I need to see if I can find a copy
<arnd> I don't know who jmesmon is, but it looks like nathanchance has a newer version at https://github.com/nathanchance/buildall
<j`ey> ah cool
<arnd> in the gcc source tree, run ./contrib/download_prerequisites to make it build a local copy of the mpc/mpfr/isl/gmp libraries instead of the distro version
alexsv has joined #asahi-dev
kameks has quit [Ping timeout: 480 seconds]
yuyichao has quit [Ping timeout: 480 seconds]
<j`ey> (so that points to alpine being slower, for some reason)
<mps> j`ey: hm, on which distro you run this
<j`ey> alpine
<mps> I don't understand, you run on alpine and say it is slower than your previous test?
Axenntio has joined #asahi-dev
Axenntio has quit []
<maz> j`ey: FWIW, I get 3:20 with LLVM-11 on the same defconfig, v5.18-rc2.
<j`ey> maz: thanks, maybe a musl vs glibc thing then..
timokrgr has quit [Quit: User left the chat]
timokrgr has joined #asahi-dev
<psykose> add gawk and measure again
<j`ey> psykose: I think that helps more for x86
<psykose> i know
yuyichao has joined #asahi-dev
user982492 has joined #asahi-dev
winter has quit [Quit: Ping timeout (120 seconds)]
linuxgemini95 has quit []
nico_32 has quit [Read error: Connection reset by peer]
user982492 has quit [Read error: Connection reset by peer]
nullroute has quit [Quit: bai]
user982492 has joined #asahi-dev
Method has quit [Remote host closed the connection]
nullroute has joined #asahi-dev
Method has joined #asahi-dev
winter has joined #asahi-dev
linuxgemini95 has joined #asahi-dev
nico_32 has joined #asahi-dev
<kevans91> kettenis: ahh, that makes sense, thanks!
user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
user982492 has joined #asahi-dev
user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
c10l has quit [Quit: Bye o/]
c10l has joined #asahi-dev
rikkaa has joined #asahi-dev
user982492 has joined #asahi-dev
MajorBiscuit has quit [Ping timeout: 480 seconds]
<kov> j`ey, mps are you both running on the Air? it is much slower than the Mini due to thermal throttling in my testing
<kov> (thought not sure it would have such a big impact on a quickish build)
<j`ey> kov: mps is mbp, im on the air
<kov> ah, then it makes sense yeah... fewer perf cores + thermal throttling will do it for sure hehe
coversine has joined #asahi-dev
boardwalk has quit [Quit: Ping timeout (120 seconds)]
boardwalk has joined #asahi-dev
MajorBiscuit has joined #asahi-dev
MajorBiscuit has quit []
MajorBiscuit has joined #asahi-dev
c10l has quit [Quit: Bye o/]
c10l has joined #asahi-dev
rikkaa has quit [Quit: Connection closed for inactivity]
possiblemeatball has joined #asahi-dev
possiblemeatball has quit [Quit: Leaving.]
possiblemeatball has joined #asahi-dev
MajorBiscuit has quit [Ping timeout: 480 seconds]
atsalyuk has quit [Ping timeout: 480 seconds]
user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
user982492 has joined #asahi-dev
possiblemeatball has quit [Quit: Leaving.]
possiblemeatball has joined #asahi-dev
bisko has joined #asahi-dev
___nick___ has quit [Ping timeout: 480 seconds]
bisko has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
bisko has joined #asahi-dev
MajorBiscuit has joined #asahi-dev
user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
possiblemeatball has quit [Quit: left]
possiblemeatball has joined #asahi-dev
MajorBiscuit has quit [Ping timeout: 480 seconds]
user982492 has joined #asahi-dev
<kode54> clang is 100% slower for me building a mostly generic Arch x86_64 kernel fulllto
possiblemeatball has quit [Quit: left]
possiblemeatball has joined #asahi-dev
<nathanchance> well, full LTO is the real problem there :)
<nathanchance> although clang is absolutely slower than gcc
<nathanchance> hopefully soon there will be some work in that area...
possiblemeatball has quit []
possiblemeatball has joined #asahi-dev
<j`ey> what happened to the days when clang was fast!
<psykose> don't think it ever was faster
<j`ey> there was a lot of claims of that.. many years ago
<mps> marketing
<psykose> maybe it was, then, but i certainly can't say that's the case anymor
nicolas17 has joined #asahi-dev