<sven>
hrm, and i guess that command issue lock could use a description
<j`ey>
we need some A/B testing on that :P
<sven>
i still don't understand why that lock is required. and i can't even reproduce the errors without that lock on my mini :/
rikkaa has joined #asahi-dev
PaterTemporalis has joined #asahi-dev
m6wiq has joined #asahi-dev
ExeciN[m] has joined #asahi-dev
<maz>
sven: I wonder if dma_wmb() properly orders things on this system. it expends as dmb(oshst), while the spin_lock() would have be an acquire in the IS domain.
<maz>
on its own, this lock doesn't provide much guarantee, but that's all I can think of to explain its (possible) effect.
<sven>
so the weird thing that all I should need at that point is the guarantee that the memcpy before is visible to the nvme controller. that writel there writes to a hardware FIFO anyway that should be large enough
<sven>
and without the lock on some machines the command still executes just fine (so the nvme controller saw it) but never generates a completion interrupt
<maz>
can the nvme controller execute things without the writel() being issued if it was already processing commands?
<sven>
it can execute other commands
<sven>
that writel tells it "please now start executing the command at index i in this array"
<maz>
so it could be that you are actually using the release effect of the unlock, forcing the write to be made visible to the controller.
<maz>
(well, not really forcing, but actually providing ordering)
<maz>
if you have an affected system at hand, I'd replace the lock/unlock with a dmb right after the writel(), just to see...
<sven>
i can't reproduce it on my systems unfortunately which makes this so much harder :(
<maz>
:-/
<sven>
i'm also not sure I completely understand. what do I want to order against with an additional dmb after the writel?
<sven>
axboe had a system that was affected though, maybe he can give it a try
Axe has joined #asahi-dev
<maz>
you want to order the writel and whatever comes after it. my hunch is that although you have ordered the write to memory with the subsequent writel(), the writel() itself can be moved far down the line as there is nothing to 'make it happen now'.
<maz>
the unlock has *some* effects on that by only allowing sunsequen writes to be hoisted up into the lock, but the write itself is guaranteed tto be visible after the unlock.
<sven>
ohh.. okay. let me think what the consequences would be if the writel happens far down the line
PaterTemporalis has quit [Ping timeout: 480 seconds]
kloenk has quit [Remote host closed the connection]
Axe has quit [Remote host closed the connection]
<sven>
hrm, not sure there are any. I just submit a command there and then just wait until the controller puts it to the completion queue and fires an interrupt. i don't even care if the controller only sees the write seconds later.
<sven>
iirc the bug was then that the command appeard on the completion queue but no interrupt was triggered. it was only picked up after the 30 second timeout by polling that queue.
<maz>
that's odd. if the lock solves anything, it has to be ordering, because this doesn't protect anything.
<sven>
my other guess was that it was some kind of race condition and the lock just made it unlikely to lose that race
<sven>
but I wasn't able to find any race
<sven>
(race condition as in "you can't submit a command when you do X with the nvme controller at the same time")
<sven>
but then again, I can't reproduce this on my M1 at all. maybe it's just something strange that only happens on the Max/Pro
<maz>
has anyone tried reading back from the FIFO instead of the lock?
<sven>
don't think so
<maz>
guess we need someone with an expensive laptop willing to experiment... :-/
<sven>
marcan: ^--
<maz>
I'd offer my Studio, but I have to wait for another couple of months...
<sven>
so far I've managed to resist buying that studio ;)
<sven>
i'm glad they don't get shipped overnight. otherwise i'd have my excuse now :D
<chadmed>
im soooo tempted to flog off this j314s and buy a mac studio instead but i really need a machine i can haul to the lab :(
<Jamie[m]1>
i’m letting the sub-linear NUMA performance scaling dissuade me :)
<chadmed>
ah so the M1U is NUMA already
<povik>
chadmed: you can't do that! asahi-audio depends on it
<chadmed>
ah but if i got a mac studio i could make that sound nice instead!
<povik>
pfff
<povik>
there's nothing interesting unless you buy that expensive monitor
<povik>
and even then there may not be work for you
<chadmed>
thats true actually
* maz
is busy building a *second* serial adapter...
<chadmed>
i just want to replace my pathetic A1708-into-thunderbolt-dock "desktop" setup :P
<Jamie[m]1>
oh, do you need something tested on pro/max for that name thing sven?
<Jamie[m]1>
*nvme
<Jamie[m]1>
can try things or provide ssh if you want
<j`ey>
I think axboe was running stuff like: find /, to stress it a bit?
<sven>
yeah, something like full desktop environment + compiling the kernel + find /
<Jamie[m]1>
👍
<j`ey>
this looks weird: devtree: 0xfffffe0012240000
<sven>
yes
<j`ey>
I cant (yet) tell what happens to the adt if you chainload
<j`ey>
well run_guest.py rather
<sven>
should be just passed along
<j`ey>
ah: virt_base: 0xfffffe0011240000
kameks has quit [Ping timeout: 480 seconds]
<Glanzmann>
sven: Regarding the lock, I can reproduce it every time with that kernel config https://tg.st/u/config-2022-03-09-16k and run 'find / &> /dev/null'
<sven>
is that on a m1 or a m1 max/pro?
kylealanhale has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<Jamie[m]1>
booting the new kernel on pro now
<Glanzmann>
sven: I could reproduce it on the macbook air.
<Glanzmann>
sven: If you want I build the kernel with the spinlock removed and try to reproduce it.
<sven>
weird. I can’t reproduce it on my mini at all
<sven>
would be interesting to see if it still happens when running under the m1n1 hv and trace_nvme.py
<j`ey>
Glanzmann: wait.. you can reproduce it with the spin lock enabled?
<Jamie[m]1>
update: not booting the new kernel, I think mkinitcpio disagrees with linux on what the modules directory should be called -_-
<sven>
though I fear that might slow everything down too much
<Glanzmann>
j`ey: No with it **disabled**
<Glanzmann>
j`ey: I'm an early adopter so I ran into the issue very early.
<Glanzmann>
Than I raised it to axboe and he was giving me lock patch.
<j`ey>
Glanzmann: Im trying to understand if you reproduced it just now?
<sven>
waaait…. maybe I could never reproduce it because I’m always running under the hv these daya
<Glanzmann>
j`ey: No. But if sven wants to be able to reproduce the issue **without** the lock, than I can give him a setup where he can do so.
<sven>
*days
<Glanzmann>
sven: I was running a full blown desktop. X started up but shortly after I lost all disk I/O.
<sven>
ah. no. I couldn’t even reproduce it manually inside m1n1
<sven>
yeah, I can’t reproduce that on the mini
<Glanzmann>
sven: Should I try to reproduce on the mini and give you remote access or detailed instructions?
<Jamie[m]1>
sven: giving up for today, I don't understand arch's kernel/initramfs infrastructure and gotta sleep
<j`ey>
sven: <3 m1n1 hv
<sven>
Glanzmann: can you try what happens when you have no spinlock but add a readl(q->sq_db); just after the writel(tag, q->sq_db);?
<Glanzmann>
sven: Okay, I'll try to reproduce the issue. If I can, I'll try that.
darkapex1 has joined #asahi-dev
darkapex has quit [Ping timeout: 480 seconds]
chadmed has quit [Ping timeout: 480 seconds]
darkapex2 has joined #asahi-dev
BitcoinCandyWarrior has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
darkapex1 has quit [Ping timeout: 480 seconds]
BitcoinCandyWarrior has joined #asahi-dev
darkapex2 is now known as darkapex
BitcoinCandyWarrior has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<Glanzmann>
sven: And that was after I run find multiple times, did two fio runs, installed a new kernel ...
JasonAntwi-Appah[m] has joined #asahi-dev
<Glanzmann>
sven: Should I work a while with the readl patch and see if I can trigger it_
<Glanzmann>
?
m6wiq1 has joined #asahi-dev
m6wiq has quit [Read error: Connection reset by peer]
m6wiq1 has quit [Remote host closed the connection]
<j`ey>
marcan: for RVBAR (cpu-impl-reg + 0x00) is that normal memory or MMIO?
justeinkemp[m] has joined #asahi-dev
<j`ey>
yes seems so: Mapping MMIO range: 0x200000000 .. 0x300000000
skipwich has quit [Quit: DISCONNECT]
skipwich has joined #asahi-dev
MajorBiscuit has quit [Ping timeout: 480 seconds]
rikkaa has quit [Quit: Connection closed for inactivity]
<j`ey>
I have this weird u-boot thing when usin the hv, if I hit esc to stop auto boot, and then just hit enter, I always get 'Unknown command 'ry 'help''
<j`ey>
it;s like 'ry' is stuck in the hv uart
kloenk has joined #asahi-dev
brstream[m] has joined #asahi-dev
<marcan>
I think there's some issues with the vUART in general, Linux loses characters if I paste too fast
<marcan>
and I've also seen u-boot lose local keyboard characters?
* povik
tried sending files over vUART
<povik>
didn't work
cadawerum1[m] has joined #asahi-dev
<sven>
jannau: let’s maybe move this here
<sven>
so all you need to do is set atcphy->target_mode to usb3_dp
<sven>
and then there are some missing pokes that macOS does to bring up what it calls “dpphy” but they are obvious from a hv trace
<sven>
what I started with was literally the output of macOS debug serial + the MMIO trace copy and pasted into the driver :D
<sven>
so if you do that for the dpphy part as well it’ll probably work
<jannau>
marcan: judging from hv trace dcpep looks very similar or identical
<marcan>
jannau: it should be 95% identical, I'm just wondering if there's a callback or two that could be added/removed
<marcan>
I mean not the ABI
<marcan>
that is identical for sure
<marcan>
I mean what actually gets called
AdwyzzOLEDEdition[m] is now known as AdryzzOLEDEdition[m]
kylealanhale has joined #asahi-dev
doggkruse has joined #asahi-dev
nicolas17 has joined #asahi-dev
<jannau>
interesting, dart-dispext0 has nothing mapped
<sven>
huh
<sven>
is it still locked?
<sven>
i guess xnu can use that property in the dcpext node to map the firmware data section
<sven>
but that’s still weird that they don’t map it
<jannau>
dart-dcpext is locked and has 2 premapped regions, starting at 0 and the same size as dart-dcp
<sven>
ohh… right, there’s a disp and a DCP dart
alyssa has joined #asahi-dev
<alyssa>
jannau: marcan: has been asking me about DCP licensing
<alyssa>
Do you agree to relicense your portions of DCP kernel to MIT?
<alyssa>
(And then when I agree, we'll change the headers and marcan will stop boycotting the patches :-p)
<marcan>
I'm not boycotting them! I just want to keep kettenis on our side :p
<alyssa>
Uh huh
<jannau>
marcan, alyssa: consider my changes to DCP as MIT licensed
* alyssa
sighs
<alyssa>
I, Alyssa Anne Rosenzweig, domiciled at [redacted], Ontario, hereby license my DCP driver as the "MIT" license used by other Asahi Linux kernel drivers.
* alyssa
cuts ribbon
<marcan>
:>
<marcan>
thank you!
<sven>
lol
<alyssa>
what's keeping DCP out of asahi-dev at this point?
<alyssa>
I saw jannau did t600x bring up, and licensing is "fixed" now
<alyssa>
I guess need to agree on the m1n1 changes and fix u-boot so it works for the end users?
<marcan>
testing on everything, and I'm still not sure about that reserved regions stuff
<jannau>
there's modeset bug resulting in 1 swap every 2 seconds or so and I'm not sure what we want to do about "HACK: increase vblank wait timeout"
<marcan>
and also just in general I might want to look at locked dart and have opinions on that :p
<jannau>
it's tested on mac mini, imac and macbook pro 14"
<alyssa>
jannau: aaaand DCP is not working, yay
<alyssa>
sounds like this will be "fun" to debug
<alyssa>
wait PEBKAC
<alyssa>
(still not working but that one was PEBKAC)
<jannau>
alyssa: I push m1n1 (branch dcp) and linux (branch asahi-dcp) which work for me right now
<alyssa>
what tense is 'push
<jannau>
+ed
<alyssa>
as in, just now and I should rebuild each?
<jannau>
as in right now, I think it's just rebases onto newer asahi upstream commits but it is the state currently working for me
<alyssa>
On the Mini?
<jannau>
how is it broken? I should be profficient in spotting the cause of the error by now
<alyssa>
Screen goes to black immediately, does not come back (but still backlit maybe)
<jannau>
m1 imac but except for the higher resolution I haven't seen a difference between mini and imac
<alyssa>
Correctly boots otherwise (I can type commands "in the dark" and they work)
<jannau>
I would have expected 'apple-dart 231304000.iommu: reserved region: IOVA [mem 0x013fc000-0x0bd07fff] PA [mem 0x9da4f0000-0x9e4dfbfff]', not sure which log level though
<jannau>
dev_info()
<alyssa>
_info, yeah
<alyssa>
will grab that..
<alyssa>
wait but if the init messages are there, those should be too
<alyssa>
will grab more info
<jannau>
alyssa: you can check if the reserved mem regions are ing the dtb with 'dtc -I fs -O dts -o - /proc/device-tree | grep -A 50 reserved-memory'
<alyssa>
definitely something I have a reasonable way to do with no monitors ;-D
m6wiq has joined #asahi-dev
<jannau>
I assumed you can login via ssh
<alyssa>
Nah
<alyssa>
Ok, printk debugging and guru meditation--
<alyssa>
the reserved mappings are there, this check is failing:
<alyssa>
if (!of_device_is_compatible(iova_node, "iommu-mapping") ||
<alyssa>
!of_device_is_available(iova_node))
<nicolas17>
need to get audio fixed so you can beep debug info...
yuyichao has quit [Read error: Connection reset by peer]
meenmachine has joined #asahi-dev
<jannau>
sven: atc-WIP fails for me with 'dwc3 382280000.usb: failed to reinitialize core' when booted with usb device connected. both bare metal and HV
<sven>
m1 or m1 pro/max?
<jannau>
replugging fixes it but the device is just detected as high speed device
<jannau>
mac mini
<sven>
hm :/
<sven>
that's my machine as well where it just works :D
<sven>
i assume you merged the mini patch?
<sven>
and can you run it with tracing enabled and show me the dmesg
<sven>
tp_printk trace_event=appletypecphy:*
<sven>
as bootargs
<sven>
and did you try both ports?
<sven>
so far i've only been using the one that doesn't have the USB PD debug crap
<jannau>
no, I forgot to merge the m1n1 change
<sven>
ok. it still shouldn't fail with "failed to reinitialize core" though
<sven>
it should just fall back to usb2-only
<jannau>
"new SuperSpeed Plus Gen 2x1" after replugging, boot still fails with "failed to reinitialize core"
<sven>
weird
<sven>
can you show the full dmesg preferably with that trace_event stuff enabed?
<sven>
and I guess I should also clean up and push my m1n1 atc tracer
<sven>
so that's why it doesn't want to do usb2 for atcphy0
<j`ey>
what lne did you mean, it didnt link to one
<j`ey>
*line
<j`ey>
oh, it did, you just have to open the diff manually (line 1072)
<sven>
oh
<sven>
yeah, that one
doggkruse has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<j`ey>
if (ret == ENOENT), yeah, thats a hard one to spot
<sven>
<sven> that's what I get for assuming my code just works there as well I guess
<sven>
;)
<sven>
i've been doing this long enough, i know at this point that my code cannot be trusted :-P
<j`ey>
:-)
<sven>
ohh.... i also just realized that jannau's device is gen 2!
<sven>
so that works as well, nice.
<sven>
i was a bit concerned that gen 2 needs some of the magic pokes I've just skipped
doggkruse has joined #asahi-dev
<sven>
so why does dwc3 fail to reinit the core :/
<sven>
or even more importantly, why does it even try to do that in early boot?
<sven>
i don't see any unplug event there so it shouldn't even go down that path
<jannau>
"tunables and/or fuses not available" is gone, otherwise unchanged behavior. Is it expected that it takes 5 seconds before the device is attached?
<sven>
hrm, i never checked how long it took for me
<sven>
i noticed it took a while but i assumed that was mostly because dwc3 has to recreate everything
m6wiq has quit []
<sven>
~5-6 seconds for me as well, but this is a slow HDD that actually has to spin up
<jannau>
I'm asking because it's almost exactly 5 seconds
<sven>
huh, it's actually ~5 seconds before it even says "new superspeed device". that seems wrong.
<jannau>
4.8 seconds for an usb stick
<sven>
yeah, that sounds wrong
<sven>
maybe some of those missing pokes are required to make that part faster. i don't think there's any sleep(4.8s) anywhere in the code
<jannau>
no noticeable delay for fullspeed device and an usb-3 hub with integrated nic
<jannau>
"failed to reinitialize core" is gone when booting without usb device connected
<sven>
oh, maybe that's why I don't see it
<sven>
it's still weird that it even tries to do that though
doggkruse has quit [Ping timeout: 480 seconds]
<sven>
i'll take a closer look tomorrow but it sounds like something with the usb role switch hack goes wrong