#asahi-dev on 2022-04-07 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:58 ChanServ changed the topic of #asahi-dev to: Asahi Linux: porting Linux to Apple Silicon macs | General development | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-dev

00:14 al3xtjames has quit [Read error: Connection reset by peer]

00:14 al3xtjames has joined #asahi-dev

00:28 bps3 has quit [Remote host closed the connection]

00:29 bps3 has joined #asahi-dev

00:38 bps3 has quit [Ping timeout: 480 seconds]

00:43 c10l7 has joined #asahi-dev

00:48 c10l has quit [Ping timeout: 480 seconds]

00:49 amw has joined #asahi-dev

01:20 chadmed has quit [Remote host closed the connection]

01:26 yuyichao has quit [Ping timeout: 480 seconds]

01:28 chadmed has joined #asahi-dev

02:34 yuyichao has joined #asahi-dev

02:56 PhilippvK has joined #asahi-dev

03:00 phiologe has quit [Ping timeout: 480 seconds]

03:29 nicolas17 has quit [Quit: Konversation terminated!]

03:38 <nametable[m]> Can someone point me in the right direction as far as how to use the existing documentation. For example, lets say I'm interested in turning the backlight on/off (and then researching to figure out how dimming works). Where in the documentation that is available so far can I find the register address which corresponds to m1 macbook air backlight?

03:53 kov has quit [Quit: Coyote finally caught me]

05:28 <jannau> rqou_: problem seems to be "nvme-apple 393cc0000.nvme: ANS did not boot". I've seen nvme (and usb) failures with broken DCP but never looked closely at them since the problem was clearly DCP. Although I'm still wondering how DCP can break nvme

05:30 <jannau> nametable[m]: there is no documention. I'm not sure if anyone has checked but the expectation is that backlight is controlled through DCP

05:31 <jannau> to inestigate that further you have to look at dcp traces (hv/trace_dcp.py) from running macos under m1n1's hypervisor

05:35 <rqou_> huh i'm definitely surprised that DCP can break nvme, but i guess it's "fine" for now

05:35 <rqou_> my workaround seems to consistently win the race condition

05:36 <rqou_> just something to note for further investigation

05:40 <rqou_> is there somewhere where i can keep track of ongoing work that's too experimental to even make it into the https://github.com/AsahiLinux/linux/tree/asahi/ branch?

05:40 <chadmed> m1n1 experiments?

05:40 <rqou_> no, i mean things like "sven's 4k patch" or "sven's atcphy"

05:41 <rqou_> or the DCP driver

05:41 <rqou_> rn i have my own branch that's an ad-hoc merging of features i've tried out

05:42 <jannau> not in a central place

05:45 <jannau> rqou_: it's entirely possible that it is the same race and broken dcp results in strange looking rtkit behavior for dcp but still unsure how that can break nvme

05:46 <rqou_> yeah i dunno

05:48 <rqou_> i definitely observed e.g. m1n1 hypervisor debug printing affecting the race

06:13 <sven> DCP needs to move to the raw iommu api

06:13 <sven> that fixes the usage of that weird function

06:14 <sven> other than that, the nvme bug is that ANS fails to boot and then my error handling apparently sucks

06:14 <sven> I don’t understand why it would fail to boot though just because DCP does something

06:51 MajorBiscuit has joined #asahi-dev

06:53 Major_Biscuit has joined #asahi-dev

07:00 MajorBiscuit has quit [Ping timeout: 480 seconds]

07:12 jato has quit [Quit: ZNC - https://znc.in]

07:12 jato has joined #asahi-dev

07:20 jaalsa has joined #asahi-dev

07:20 jaalsa has quit []

07:21 jaalsa has joined #asahi-dev

07:37 <marcan> is SMC a common point there?

07:37 <marcan> maybe something is wrong, DCP does something that crashes SMC, then NVMe fails to boot because of that?

07:37 <sven> oh, true. didn't even think of that

07:38 <sven> I don't think the normal SMC message handler does anything fancy but I never looked into if ANS also talks to SMC during boot

07:39 jluthra has quit [Remote host closed the connection]

07:39 jluthra has joined #asahi-dev

07:40 <kettenis_> u-boot doesn't touch SMC, so it is possible to bring up NVMe without "starting" SMC

07:41 kettenis_ is now known as kettenis

07:41 <sven> SMC is still alive though at that point I think, it just doesn't talk to the AP

07:41 <marcan> we know that, the backdoor SMC channels are not dependent on the main one

07:41 <marcan> yeah

07:41 <marcan> SMC is always alive

07:42 <marcan> it does a ton of critical stuff like battery charge management

07:42 <marcan> (and if it dies the machine watchdog reboots a few seconds later)

07:49 <_jannau_> I don't think I saw smc watchdog resets in the error cases. linux was idle waiting for the root fs (using rootwait)

07:51 <_jannau_> is "nvme nvme0: Reset failure status: -62" just after the "did not boot" coming from pmgr?

08:03 ___nick___ has joined #asahi-dev

08:08 <sven> no

08:09 <sven> that's inside nvme_reset_work or however it's called

08:09 <sven> and the nvmmu inval stuff is because I forgot to initialize cq_phase when originally allocating the queues

08:22 the_lanetly_052___ has quit [Remote host closed the connection]

08:23 the_lanetly_052___ has joined #asahi-dev

08:37 ___nick___ has quit []

08:39 ___nick___ has joined #asahi-dev

08:41 ___nick___ has quit []

08:43 ___nick___ has joined #asahi-dev

09:06 <rqou_> i've been exercising my t8110 dart code enough that i'm confident enough to send a PR: https://github.com/AsahiLinux/m1n1/pull/190

09:06 <rqou_> also one for a ProRes experiment, but that's much more incomplete: https://github.com/AsahiLinux/m1n1/pull/191

09:10 asocialblade has quit [Ping timeout: 480 seconds]

09:30 veloek has quit [Quit: leaving]

09:32 veloek has joined #asahi-dev

09:42 bps3 has joined #asahi-dev

10:28 gladiac is now known as Guest1391

10:28 gladiac has joined #asahi-dev

10:31 <sven> nice!

10:34 Guest1391 has quit [Ping timeout: 480 seconds]

10:34 chadmed has quit [Read error: Connection reset by peer]

10:38 <Dcow[m]1> where is t8110 dart used ?

10:39 kov has joined #asahi-dev

10:40 <sven> thunderbolt and some other things on the t600x

10:41 <Dcow[m]1> but not in regular M1 ?

10:42 <sven> no

10:42 <sven> otherwise I would’ve already written support for it ;)

10:42 <Dcow[m]1> do you have only regular M1 devices?

10:42 <sven> yes

10:43 <Dcow[m]1> do you have github sponsors?)

10:43 <sven> no, and that’s quite intentional :)

10:43 <sven> once people start paying money to me it feels like a day job and I already have one of those :)

10:45 <Dcow[m]1> nah, that's the night one xD

10:45 <sven> :D

10:47 <jannau> sven: any progress on getting the t6000-dart merged in linux? would it help to ping the patch? we should try to get it into 5.19 together with basic t600x device trees

10:48 <sven> haven’t heard anything from the last time I pinged robin

10:49 <sven> we should also figure out if the bit that used to be “disable subpage protection” actually is “mark this page as uncachable”

10:49 <jannau> ok, I'll ping it tonight

10:55 <jannau> it appears t6000 and t8110 share the same PTE layout assuming the offset mask 37,10 in the kernel patch is correct and the 39,10 offset in m1n1 is an oversight

10:57 <sven> given that "disable subpage protection" no longer works I'm willing to bet that bit means uncachable as well

10:57 <sven> would still be good to confirm that somehow

10:57 <sven> and then fix the kernel patch to not set that for everything.. but ugh...

10:57 the_lanetly_052___ has quit [Ping timeout: 480 seconds]

10:58 <sven> might be too many differences to the arm pagetable format at that point for robin

11:00 <jannau> the ARM_MALI_LPAE page table format seems to have also a different use for BIT1)

11:01 <sven> ah, good

11:02 <jannau> read write protection bits also differ between between t8110 and the defines for t8103 in the kernel

11:02 <sven> hrm... if that bit is indeed uncachable but the same page is mapped as normal memory for the CPU shouldn't that break horribly?

11:02 <sven> unless the cache somehow snoops those DART transactions I guess

11:06 <Jamie[m]1> fyi rqou_ I'm getting sane results with that t8110 implementation from the AVD

11:07 <Jamie[m]1> and i now understand 1% more of the avd interface :P

11:24 chadmed has joined #asahi-dev

11:42 jonaburg[m] has joined #asahi-dev

12:13 tardyp has quit [Read error: Connection reset by peer]

12:13 tardyp has joined #asahi-dev

12:29 caef^ has quit [Remote host closed the connection]

13:06 <kettenis> sven: you mean the cache would still snoop DART transactions, but DART transaction would no longer consult the cache?

13:07 <sven> yes, but dunno if a setup like that would make sense

13:07 <kettenis> doesn't make sense to me, but that doesn't carry much weight ;)

13:08 <kettenis> note that our device trees don't have dma-coherent attributes, which I think that Linux does all the appropriate cache flushing and invalidation

13:09 <sven> at least for dma_alloc_coherent buffers it shouldn't do any cache maintenance

13:09 <sven> but let me check again

13:10 <kettenis> by allocating those as non-cached in the first place?

13:11 <sven> last time I looked into this I convinced myself that for devices with an iommu those would just be normal memory

13:11 <sven> but let me check that again because it's been a while

13:14 <kettenis> that's effectively what I do for OpenBSD (force the flag that tells the core code DMA is cache coherent in the DART driver)

13:15 <kettenis> but nvme doesn't use the DART for example

13:19 bisko has joined #asahi-dev

13:20 <sven> true

13:42 yuyichao has quit [Ping timeout: 480 seconds]

13:48 <emilytrau[m]> Hey devs! I'm working on adding Asahi linux support to the NixOS installer. What would be a reliable way to detect if we're on an Asahi system? So we can include support in the generated config

13:52 <j`ey> emilytrau[m]: you should talk to tpw_rules

13:53 atsalyuk has joined #asahi-dev

13:53 <emilytrau[m]> We're trying to get upstreamed. Using his amazing work as the base!

13:53 <emilytrau[m]> *upstreamed into nixpkgs

13:54 <kettenis> sven: On OpenBSD I couldn't really detect any performance benefits of skipping the cache flushes, but that may just be because the CPUs are too fast

13:55 <kettenis> still, I think we need to sprinkle some dma-coherent properties into the device trees

13:55 <sven> yeah, agreed

13:56 <kettenis> for OpenBSD a single one on the /soc node would be enough, but I'm not sure that works for Linux

13:57 <sven> I think it should, I remember seeing some of_ code that just tried walking up to the root to look for dma-coherent

13:57 <maz> kettenis: if the system is always coherent (which is likely since the CPUs implement FWB), then the CMOs may well be implemented as NOPs.

14:00 yuyichao has joined #asahi-dev

14:07 <j`ey> emilytrau[m]: you could use `strings /proc/device-tree/{model,compatible}`, not sure if there is a standard interface for that

14:08 <kettenis> the "compatible" property should be the definitve answer

14:30 atsalyuk has quit [Ping timeout: 480 seconds]

14:42 linuxgemini9 has quit [Remote host closed the connection]

14:42 linuxgemini9 has joined #asahi-dev

14:48 atsalyuk has joined #asahi-dev

14:52 doggkruse has joined #asahi-dev

14:53 linuxgemini95 has joined #asahi-dev

14:57 linuxgemini9 has quit [Ping timeout: 480 seconds]

15:01 nicolas17 has joined #asahi-dev

15:32 <sven> looks like pcie hotplug is actually pretty simple: when a LINK_UP interrupt is received tell the pci subsystem to re-scan, when LINK_DOWN is received remove the devices and finally poke that LSSMCTL bit again to be able to catch the next LINK_UP

15:33 <sven> the "tell the pci subsystem to re-scan" is a big hack right now but seems to work. let's hope thunderbolt is just the same: https://f.svpe.de/7c144fb92e09845e75b4cd9d8e1e63a336217fe28e2a2e86e061c3fc0eea6583_pcie-hotplug.txt

15:41 doggkruse has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

15:58 doggkruse has joined #asahi-dev

16:12 atsalyuk has quit [Remote host closed the connection]

16:12 atsalyuk has joined #asahi-dev

16:36 <maz> sven: triggering a PME event on the PCI side should result in a scan.

16:37 <maz> it could be that we need to tell the root port to generate that event on LINK_*.

16:47 sikkileo[m] is now known as sikkiladho[m]

17:35 Major_Biscuit has quit [Ping timeout: 480 seconds]

17:57 <sven> right now i just blindly copy/pasted whatever pciehp_configure_device does

19:31 doggkruse has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

19:55 atsalyuk has quit [Ping timeout: 480 seconds]

19:56 atsalyuk has joined #asahi-dev

20:08 ___nick___ has quit [Ping timeout: 480 seconds]

20:08 atsalyuk has quit [Ping timeout: 480 seconds]

21:06 atsalyuk has joined #asahi-dev

21:09 f14h has joined #asahi-dev

21:16 atsalyuk has quit [Ping timeout: 480 seconds]

21:17 f14h has quit [Quit: f14h]

21:20 f14h has joined #asahi-dev

21:27 f14h has quit [Remote host closed the connection]

21:34 wanderfull has joined #asahi-dev

22:05 alexsv has quit [Ping timeout: 480 seconds]

22:06 f14h has joined #asahi-dev

22:06 f14h has quit [Remote host closed the connection]

22:06 f14h has joined #asahi-dev

22:12 f14h has quit [Remote host closed the connection]

22:48 jaalsa has quit [Remote host closed the connection]

23:50 bps3 has quit [Ping timeout: 480 seconds]