#asahi-dev on 2022-04-11 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:58 ChanServ changed the topic of #asahi-dev to: Asahi Linux: porting Linux to Apple Silicon macs | General development | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-dev

01:34 <marcan> too many cores probably

01:34 <marcan> should slow down the HV tick on at least the secondaries

01:34 <marcan> that plus the BHL means it doesn't scale

02:45 <marcan> I forget if I take the BHL to do local FIQ handling... I should probably move at least that to finer-grained locking, so the HV ticks don't fight each other. that'll at least make it scale.

02:46 <kevans91> big hairy lock?

02:52 PhilippvK has joined #asahi-dev

02:55 phiologe has quit [Ping timeout: 480 seconds]

03:02 <marcan> big hypervisor lock

03:07 nicolas17 has quit [Quit: Konversation terminated!]

03:08 kode54 has quit [Quit: The Lounge - https://thelounge.chat]

03:08 KDDLB has quit [Quit: The Lounge - https://thelounge.chat]

03:28 KDDLB has joined #asahi-dev

03:28 KDDLB has quit []

03:28 KDDLB has joined #asahi-dev

03:35 KDDLB has quit [Quit: The Lounge - https://thelounge.chat]

03:37 KDDLB has joined #asahi-dev

03:39 kode54 has joined #asahi-dev

04:45 gladiac is now known as Guest1648

04:45 gladiac has joined #asahi-dev

04:51 Guest1648 has quit [Ping timeout: 480 seconds]

05:12 user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

05:25 user982492 has joined #asahi-dev

05:42 <kevans91> hmm, am I reading these pci nodes correctly? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/apple/t8103.dtsi#n446 -> my really limited understanding is that reg should encode the b:d:f (which I think we (FreeBSD) are doing wrong), but these reg values also don't seem to line up with the iommu-map if I'm reading that right

05:44 <kevans91> i.e. there's no way a masked rid from <0x0 0x0 0x0 0x0 0x0> could possibly fall into [0x100, 0x101), [0x200, 0x201), or [0x300, 0x301)

05:45 <kevans91> so it feels like either i'm reading something wrong, or this is inconsistent and one of those (presumably the iommu-map) are wrong

05:47 <kevans91> otoh, linux presumably gets this right

06:03 viridian_ has joined #asahi-dev

06:18 JayasPJacob[m] has joined #asahi-dev

06:20 <JayasPJacob[m]> hi i would like to contribute for this awesome project

06:59 <kettenis> kevans91: there are nu PCI bus master capable devices on bus 0

06:59 <kettenis> s/nu/no/

07:12 MajorBiscuit has joined #asahi-dev

07:13 the_lanetly_052 has joined #asahi-dev

07:16 user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

07:27 MajorBiscuit has quit [Ping timeout: 480 seconds]

07:29 jluthra_ has joined #asahi-dev

07:29 viridian_ has quit [Quit: -yes]

07:30 jluthra has quit [Ping timeout: 480 seconds]

07:49 MajorBiscuit has joined #asahi-dev

07:50 zoeyrae has quit [Quit: The Lounge - https://thelounge.chat]

08:01 the_lanetly_052 has quit [Remote host closed the connection]

08:02 the_lanetly_052 has joined #asahi-dev

08:07 Major_Biscuit has joined #asahi-dev

08:09 MajorBiscuit has quit [Ping timeout: 480 seconds]

08:20 riker77 has quit [Quit: Quitting IRC - gone for good...]

08:33 riker77 has joined #asahi-dev

08:58 M76cw2gnqjj[m] has joined #asahi-dev

09:15 Major_Biscuit has quit []

09:19 MajorBiscuit has joined #asahi-dev

09:23 kameks has joined #asahi-dev

09:26 AnalogDigital[m] has left #asahi-dev [#asahi-dev]

09:42 <arnd> I ran into some funny performance issue on the second half of the T6002. I noticed that a BLAS/GEMM benchmark is a lot slower on 10 cores of the M1 Ultra than an M1 Max MBP. With some more investigation I found that the first 10 cores are about the same, but the second set of firestorm cores appears slower. Doing the test individually on each core shows 30 GFLOPS on any icestorm, 90 GFLOPS on the first eight firestorm cores and 60

09:42 <arnd> GFLOPS on the other eight

09:42 <arnd> https://www.irccloud.com/pastebin/S2KGO55M/

09:45 <j`ey> did jannau fix cpufreq in linux?

09:48 <j`ey> ok maybe it doesn't actually need any fixes, after a quick look

09:49 <arnd> j`ey: what was the issue? I see the firestorm cores running at 3228000 HZ in /sys/devices/system/cpu/cpu12/cpufreq/cpuinfo_cur_freq during the test

09:49 <_jannau_> looks like cpufreq doesn't work on the second die. m1n1 initializes all cores to 2 GHz, (max for icestorm, 2/3 of max for firestorm)

09:49 <arnd> ok, that explains it then

09:50 <arnd> I guess cpuinfo_cur_freq shows what the kernel has requested, not what the hardware actually does

09:52 <_jannau_> yes, 3.228 GHz is even wrong for the cores of the 1st die. that's only reached when the other cores of the cluster are in deep sleep

10:00 <arnd> $ cat /sys/devices/system/cpu/cpu12/cpufreq/affected_cpus

10:00 <arnd> 2 3 4 5 12 13 14 15

10:02 <arnd> so it thinks that the cores in the second cluster always run at the same clk as the corresponding ones in the first one. looking at cur_freq confirms that

10:11 <_jannau_> that's wrong, looks like I botched the cpufreq dt

10:15 <jannau> missed to update the core's "apple,freq-domain = <&cpufreq_hw x>;" for the second die

10:15 <jannau> should be 3, 4, 5 instead of 0, 1, 2

10:16 <arnd> jannau, right, I just tried that locally

10:17 <arnd> I had to change the reg-names as well, but now it seems to work

10:17 <arnd> https://www.irccloud.com/pastebin/DCvvTxzq/

10:18 <arnd> jannau: should I send you a proper patch, or do you just want to fold it in?

10:18 <arnd> not sure if marcan ends up redoing it all anyway

10:19 <_jannau_> I'll fold it in, we will probably end up redoing it anyway

10:19 <arnd> ok

10:20 <arnd> I got a typo in cpu_p00_d1, that needs to be 4, not 3

10:21 <arnd> with the wrong number, CPU 12 dropped down to 18 GFLOPS...

10:21 <_jannau_> thanks. strange that I still saw a 2x performance improvement in kcbench compared to m1 max

10:23 <j`ey> redo the measurements later :D

10:25 <arnd> jannau: what is the maximum frequency under constant load? If the m1 max gets throttled to the same 2GHz after a while, that still works out

10:32 <_jannau_> 3 GHz are the expected all core full load frequency for the performance cores. even in the macbook pro it should be sustainable for a longer period of time. we need to look for the actual core frequency

10:32 riker77 has quit [Ping timeout: 480 seconds]

10:33 robinp_ has quit [Ping timeout: 480 seconds]

10:34 <arnd> it scales almost perfectly now: 591 GFLOPS for either set of 8 firestorm cores, 1171 GFLOPS for all 16

10:34 robinp has joined #asahi-dev

10:35 <arnd> interestingly it goes down to 886 GFLOPS if I use all 20 cores including the small ones, but I think that's just blis/gemm not being aware of big/little cores, and making the big ones wait

10:47 kov has joined #asahi-dev

10:52 riker77 has joined #asahi-dev

10:59 <jannau> 10% faster, 70s instead of 78s for make vmlinux with linux-5.15 arm64 defconfig. I guess I have to redo it on the m1 max as well

11:00 c10l has quit [Quit: Bye o/]

11:01 <j`ey> it was 5mins on my m1 air D: (for 5.18ish defconfig)

11:06 c10l has joined #asahi-dev

11:18 riker77 has quit [Quit: Quitting IRC - gone for good...]

11:22 <sven> yay, nvmem (not to be confused with NVMe) which was even more boring than watchdog got merged :D

11:26 <j`ey> sven: woot

11:30 riker77 has joined #asahi-dev

11:39 MajorBiscuit has quit [Ping timeout: 480 seconds]

11:46 <arnd> jannau: on this many cores, the total runtime is not all that meaningful because half the build time is spent in single-threaded work like linking or parsing the Makefiles, it's often better to compare CPU time (user+system from /bin/time, or output from perf stat) to see how much work the CPUs got done

11:57 MajorBiscuit has joined #asahi-dev

12:04 <maz> j`ey: 5 minutes for 'make vmlinux'? that's odd. it takes half that time on my mini (make -j9 vmlinux).

12:05 <j`ey> I cant remember what -j I used now, less than 9 for sure. I also only have the 8GB ram model

12:06 <maz> 8GB is more than enough. the compiler used may have a significant impact though (I use GCC 10.2.1)

12:07 <j`ey> I was having OOM issues with LLVM, so felt like being convservative :P will run more tests at some point. I was using llvm/clang

12:07 <j`ey> (OOM issues *building* LLVM)

12:12 <j`ey> maz: any news on your studio?

12:13 <maz> j`ey: landing expected first half of May...

12:13 <j`ey> oof

12:16 ___nick___ has joined #asahi-dev

12:16 ___nick___ has quit []

12:17 atsalyuk has joined #asahi-dev

12:18 ___nick___ has joined #asahi-dev

12:21 <mps> j`ey: `real 2m 26.38s` on mbp with `busybox time make -j9 vmlinux`, though not defconfig but current linux-asahi for alpine config

12:23 <j`ey> hmmm real 4m 4.61s

12:24 <j`ey> I cant imagine clang could be that much slower.. retesting with gcc

12:25 <mps> what is PAGE_SIZE of running kernel

12:25 <j`ey> 16

12:26 <mps> to complete pkg build (apk for alpine) take about 4m 40s

12:26 <mps> s/to/to me/

12:31 <j`ey> real 5m 51.31s, with gcc...

12:36 <arnd> the time it takes for building a kernel can differ hugely based on a single Kconfig option, such as CONFIG_DEBUG_INFO, the exact toolchain version, or how the compiler was built

12:39 <j`ey> I'll try mps's config later, since we have the same gcc (from alpine)

12:40 <arnd> j`ey: clang is usually some 10% to 20% slower than gcc for a defconfig build, but it's more sensitive to options that lead to larger indirect header inclusions

12:41 <j`ey> clang was 2mins faster for me

12:41 <j`ey> I need to check how alpine's gcc is built

12:42 <arnd> https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/arm64/ has toolchains that should work on any distro, as long as you have glibc, using those should make it easier to compare across systems

12:42 <j`ey> yeah, alpine has musl :)

12:42 <j`ey> I was going to ask for your commands to build those, so I could get a build as similar as possible..

12:43 <arnd> it might be enough to copy libc.so.6 from any other distro, but I haven't tried

12:44 <arnd> the toolchains are based on segher's 'buildall' scripts, I need to see if I can find a copy

12:45 <j`ey> I found https://github.com/jmesmon/buildall

12:46 <arnd> I don't know who jmesmon is, but it looks like nathanchance has a newer version at https://github.com/nathanchance/buildall

12:46 <j`ey> ah cool

12:49 <arnd> in the gcc source tree, run ./contrib/download_prerequisites to make it build a local copy of the mpc/mpfr/isl/gmp libraries instead of the distro version

13:20 alexsv has joined #asahi-dev

13:30 kameks has quit [Ping timeout: 480 seconds]

13:35 <j`ey> mps: with https://git.alpinelinux.org/aports/tree/testing/linux-asahi/config-asahi.aarch64, real 2m 36.85s

13:39 yuyichao has quit [Ping timeout: 480 seconds]

13:40 <j`ey> (so that points to alpine being slower, for some reason)

13:43 <mps> j`ey: hm, on which distro you run this

13:44 <j`ey> alpine

13:45 <mps> I don't understand, you run on alpine and say it is slower than your previous test?

13:52 Axenntio has joined #asahi-dev

13:52 Axenntio has quit []

13:53 <maz> j`ey: FWIW, I get 3:20 with LLVM-11 on the same defconfig, v5.18-rc2.

13:55 <j`ey> maz: thanks, maybe a musl vs glibc thing then..

13:55 timokrgr has quit [Quit: User left the chat]

13:57 timokrgr has joined #asahi-dev

13:58 <psykose> add gawk and measure again

13:58 <j`ey> psykose: I think that helps more for x86

13:58 <psykose> i know

13:58 yuyichao has joined #asahi-dev

14:08 user982492 has joined #asahi-dev

14:16 winter has quit [Quit: Ping timeout (120 seconds)]

14:16 linuxgemini95 has quit []

14:17 nico_32 has quit [Read error: Connection reset by peer]

14:17 user982492 has quit [Read error: Connection reset by peer]

14:17 nullroute has quit [Quit: bai]

14:17 user982492 has joined #asahi-dev

14:17 Method has quit [Remote host closed the connection]

14:17 nullroute has joined #asahi-dev

14:19 Method has joined #asahi-dev

14:19 winter has joined #asahi-dev

14:19 linuxgemini95 has joined #asahi-dev

14:22 nico_32 has joined #asahi-dev

14:45 <kevans91> kettenis: ahh, that makes sense, thanks!

15:01 user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

15:49 user982492 has joined #asahi-dev

16:03 user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

16:21 c10l has quit [Quit: Bye o/]

16:29 c10l has joined #asahi-dev

16:42 rikkaa has joined #asahi-dev

16:45 user982492 has joined #asahi-dev

16:48 MajorBiscuit has quit [Ping timeout: 480 seconds]

16:58 <kov> j`ey, mps are you both running on the Air? it is much slower than the Mini due to thermal throttling in my testing

16:59 <kov> (thought not sure it would have such a big impact on a quickish build)

16:59 <j`ey> kov: mps is mbp, im on the air

17:00 <kov> ah, then it makes sense yeah... fewer perf cores + thermal throttling will do it for sure hehe

17:54 coversine has joined #asahi-dev

18:04 boardwalk has quit [Quit: Ping timeout (120 seconds)]

18:04 boardwalk has joined #asahi-dev

18:09 MajorBiscuit has joined #asahi-dev

18:09 MajorBiscuit has quit []

18:09 MajorBiscuit has joined #asahi-dev

18:15 c10l has quit [Quit: Bye o/]

18:16 c10l has joined #asahi-dev

18:51 rikkaa has quit [Quit: Connection closed for inactivity]

18:59 possiblemeatball has joined #asahi-dev

19:06 possiblemeatball has quit [Quit: Leaving.]

19:06 possiblemeatball has joined #asahi-dev

19:14 MajorBiscuit has quit [Ping timeout: 480 seconds]

19:33 atsalyuk has quit [Ping timeout: 480 seconds]

19:37 user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

19:40 user982492 has joined #asahi-dev

19:59 possiblemeatball has quit [Quit: Leaving.]

20:00 possiblemeatball has joined #asahi-dev

20:05 bisko has joined #asahi-dev

20:08 ___nick___ has quit [Ping timeout: 480 seconds]

20:18 bisko has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

20:18 bisko has joined #asahi-dev

20:47 MajorBiscuit has joined #asahi-dev

21:14 user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

21:35 possiblemeatball has quit [Quit: left]

21:36 possiblemeatball has joined #asahi-dev

21:47 MajorBiscuit has quit [Ping timeout: 480 seconds]

21:59 user982492 has joined #asahi-dev

22:01 <kode54> clang is 100% slower for me building a mostly generic Arch x86_64 kernel fulllto

22:01 possiblemeatball has quit [Quit: left]

22:02 possiblemeatball has joined #asahi-dev

22:02 <nathanchance> well, full LTO is the real problem there :)

22:02 <nathanchance> although clang is absolutely slower than gcc

22:02 <nathanchance> hopefully soon there will be some work in that area...

22:06 possiblemeatball has quit []

22:07 possiblemeatball has joined #asahi-dev

22:18 <j`ey> what happened to the days when clang was fast!

22:19 <psykose> don't think it ever was faster

22:20 <j`ey> there was a lot of claims of that.. many years ago

22:20 <j`ey> https://opensource.apple.com/source/clang/clang-23/clang/tools/clang/www/features.html#performance old page

22:21 <mps> marketing

22:22 <psykose> maybe it was, then, but i certainly can't say that's the case anymor

22:26 nicolas17 has joined #asahi-dev