ChanServ changed the topic of #asahi-dev to: Asahi Linux: porting Linux to Apple Silicon macs | Non-development talk: #asahi | General development | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-dev
johey has quit [Ping timeout: 480 seconds]
johey has joined #asahi-dev
jeisom has quit [Ping timeout: 480 seconds]
tristan2 has joined #asahi-dev
tristan2_ has quit [Ping timeout: 480 seconds]
crabbedhaloablut has joined #asahi-dev
dylanchapell has quit [Read error: Connection reset by peer]
linuxgemini has quit [Quit: Ping timeout (120 seconds)]
<j`ey>
eiln: this is a fun read even though I understand 0% :-)
linuxgemini has joined #asahi-dev
<eiln>
edit: if macOS says "AVD version Viola" at boot it should work. but m1n1-side register offsets might differ per soc
pjakobsson has quit [Ping timeout: 480 seconds]
chadmed has joined #asahi-dev
<maz>
extremely impressive indeed. though I'm in the same situation as j`ey on this front!
PaulFertser has quit [Quit: see you on the other side]
ydalton has joined #asahi-dev
<ydalton>
omg bad apple decoded by avd :3
<ydalton>
eiln: also, i tried your cm3 emulator script and it crashed complaining about unmapped reads/writes. would the emulator script have to be rewritten to suit the m2? that's where i got my firmware from
ydalton has left #asahi-dev [ERC 5.5.0.29.1 (IRC client for GNU Emacs 29.0.92)]
<eiln>
yes, the memory layout is different. not rewritten but it needs to be adjusted. soc and avd codename (from macos)?
ydalton has joined #asahi-dev
<ydalton>
t8112 but i don't know the codename of avd
<eiln>
log show --last 5m --info --debug
<eiln>
also emu is useless without the trace dumps
<ydalton>
i had frame params and the commands from a while ago
<ydalton>
maybe i will use your tracer script from your avd branch of m1n1
<ydalton>
omg, your tracer script has a function save_firmware, i was doing it manually with the mmio dump 💀
<leio>
what's CSC about? Is it something we could use e.g. for a gstreamer element that would replace videoconvertscale?
<eiln>
again you need to adjust the offsets
<ydalton>
looks like it, the regs look different than what i had on my m2
<eiln>
leio: colorspace converter and scaler. does a lot of stuff ranging from colorspace conversion, scaling (I'm not being sarcastic), pixel fills, tiling, up to AV1 film grain generation. up next on my list :D we'd need to completely rework V4L2 though
<leio>
usable standalone in userspace somehow too?
<leio>
(well, if the support for such is written)
<eiln>
we'll need to make an api for it, yes.
<leio>
to me that just screams for a csc_videoconvertscale gstreamer element to speed up transcodes and such
<ydalton>
eiln: what's with AVD_BASE being used? can't you get that from adt?
<leio>
but maybe in practice just letting GPU do it is good enough or better
<ydalton>
ah man, i remember the opcodes thing, memories
<leio>
what I wonder is how exposing this via v4l2 will then work with all the userspace out there
<eiln>
that's what I meant by "reworking v4l2"
<eiln>
I sent marcan an email about it
<leio>
(I jumped to decoding now, not csc)
<leio>
I could poke at anything needed from gstreamer, in theory
<leio>
but there's more to that world than gstreamer, unfortunately :D
<eiln>
I'd like to get ffmpeg hwaccel for scaling too (if that ever comes). meaning v4l2 should obviously have a generic api for it, one gst can make use of too.
<ydalton>
isn
<ydalton>
isn't there vaapi? i don't know enough about that though
<chadmed>
there was talk a while ago of writing a backend for vaapi but i cant remember where that ended up
<ydalton>
well, wouldn't that give hardware accelerated video for free? once a kernel driver is written
<ydalton>
since gstreamer, chromium, firefox are using
<ydalton>
it
<chadmed>
well yeah thats the idea
<chadmed>
bootlin did some work on a v4l2 backend for libva but idk if it will suit our needs specifically
<eiln>
none of them are sufficient. prores is/was on my list too but they don't even list all the exotic pixel formats prores uses, let alone an api. but I found out not long ago that I don't have prores, wtf
<eiln>
actually none of them are even good enough for avd
<ydalton>
i didn't know linux had any prores support
<chadmed>
prores is a very popular intermediate format in digital video pipelines
<chadmed>
and davinci resolve has a (mostly) well maintained linux version
<chadmed>
we'd probably be the first (and likely only) hardware prores codec supported upstream though
<ydalton>
are there any hardware prores codecs outside of apple silicon?
<chadmed>
yeah but usually on board ISPs for digital video cameras
<chadmed>
arri and blackmagic being well known vendors of such cameras
<ydalton>
i mean on graphics cards for example
<leio>
isn't vaapi so that there's a hardware-specific kernel side and then a vaapi driver that makes it usable through a common API for userspace apps, while you could also use the specific stuff exposed by the kernel driver for other things?
<leio>
vaapi driver being userspace
<chadmed>
leio: yes and bootlin have a basic v4l2 backend for vaapi but if eilin says its not good enough for avd or apr then i trust her
<chadmed>
and no i dont think anyones done a prores hw block on a commodity gpu
<eiln>
it's not really a parallelizable task
<chadmed>
people working in prores probably just bought mac pros with the afterburner card which aiui is just avd, ave and apr on a pcie card
<ydalton>
wonder if there was any linux support for those, probably not
ydalton has left #asahi-dev [ERC 5.5.0.29.1 (IRC client for GNU Emacs 29.0.92)]
<chadmed>
nope
<chadmed>
and they didnt have avd or ave, only an older version of apr
<eiln>
is that the same avd/ave/apr? intel macs shipped with imagination's decode IP, ave sucks enough that I trust it's the same one, but apr wasn't released until m1+s
<chadmed>
its a decode only version and doesnt do demosaicing so it might be something different
<chadmed>
prob that imagination chip on a card then
<chadmed>
wait no it cant be i dont think they ever did a prores codec
<eiln>
awh they couldn't have outsourced their own codec
<leio>
where do the decoded frames end up in terms of memory location, in the context of achieving zero-copy to display?
<eiln>
leio: all dma-capable peripherals operate in iommu/dart space (all except gpu). we could double iommu-map the underlying page to do m2m or zero copy
<leio>
all except gpu, meaning we can't "zero-copy" it for stuff like video/x-raw(memory:GLMemory) or vulkan video?
<leio>
for linux direct scanout, having it in DMABuf would be most interesting, I believe
pjakobsson has joined #asahi-dev
kidplayer666 has quit [Quit: Connection closed for inactivity]
eiln has quit [Quit: WeeChat 4.1.1]
<nicolas17>
eiln: is "Viola" the codename for AVD or for this particular revision? do you know of other names?
jeisom has joined #asahi-dev
eiln has joined #asahi-dev
PaulFertser has joined #asahi-dev
<eiln>
nicolas17: viola is j293 AVD (V3 in device tree). the revision codenames are all some weird plantae
<eiln>
*strings*
<eiln>
there's viola, cactus, ixora, daisy, salvia, lilyd, radish, clover, lotus, tansy, clary, catnip, dahlia. not all are supported
<eiln>
who does that?
kidplayer666 has joined #asahi-dev
<nicolas17>
wow, that's a lot
<nicolas17>
I just went "ooh codenames I love codenames"
<nicolas17>
but it's gonna be Work to figure out which is which :D
<nicolas17>
I assume it's Viola on every M1 SoC?
<eiln>
no. m1+s were "V4" in device tree
<nicolas17>
sorry I mean on every device with M1, not Pro/Max
<nicolas17>
it would be weird if it was j293-specific
<eiln>
yeah I'd guess so. you can run 'log show' grep AppleAVD
<ArcaneNibble>
M1 Max is LilyD
<ArcaneNibble>
(sorry, I *am* kinda alive despite having been too burned out to do any work on this)
<ArcaneNibble>
prores on M1 Max is RE-ed enough that someone could theoretically write a driver
<ArcaneNibble>
i haven't tried, since i know the vast majority of the complexity is integrating into the linux frameworks (v4l2?) that i know absolutely nothing about
<ArcaneNibble>
unless someone wants to do a char/misc proof-of-concept
<ArcaneNibble>
i have basic work on scaler/csc
<ArcaneNibble>
but it's *so* complicated and i don't really know how to trigger most operations from userspace
<ArcaneNibble>
if someone wants to tackle prores on the linux side, feel free to message me (and/or publicly here)
<ArcaneNibble>
hardware is conceptually very simple: you just give it a frame (by putting a descriptor in a ring buffer) and it encodes/decodes it and tells you when it's done
<nicolas17>
prores is entirely intra-only?
<ArcaneNibble>
yes
<ArcaneNibble>
inherently
<nicolas17>
so you don't even have to worry about order of frames, you just throw a pile of images there and they come back out decoded
<ArcaneNibble>
yes
<ArcaneNibble>
oh right, prores is also missing somebody to RE the bitstream for "prores raw"
<ArcaneNibble>
(but it's certainly possible to treat that as a black box)
<ArcaneNibble>
oh yeah, i also still have the hardware JPEG block RE'd without a linux driver, but that's of questionable utility (mjpeg accelerator?)
<nicolas17>
ArcaneNibble: I tried to reverse engineer the weird proprietary PNG extensions Apple has and I found even Apple's own code doesn't decode it consistently
<ArcaneNibble>
the biggest blocker for AVD on my part (other than burnout) is not knowing all that much about video codecs
<ArcaneNibble>
i think we'd want to completely rip out apple's M3 firmware for it though, it is terrible
<eiln>
nicolas17: prores is an intra-only codec so people editing video can seek fast
<eiln>
ArcaneNibble: good to hear from you :) I don't even plan on using the coprocessor, let alone firmware
<ArcaneNibble>
hm, pretty sure the VP/PP interrupts aren't routed to the main CPU?
<ArcaneNibble>
which is one thing that the M3 notably does
<eiln>
I'm not a video codec expert (jannau is) but I haven't ran into much math stuff. it's mostly RTFM and figuring out bitfields
<eiln>
I don't think VP/PP interrupts at all considering CM3 polls a status register
___nick___ has joined #asahi-dev
<nicolas17>
I'm not a video codec expert, but I understand like, MPEG2 at a conceptual level
<ArcaneNibble>
the firmware definitely has IRQ handlers in the vector table
<ArcaneNibble>
it *also* polls status registers, yes
<eiln>
executing code in a power-gated range is such a nightmare that I'd like to avoid that if we can
<ArcaneNibble>
is the M3's memory lost on power gate?
<eiln>
line 607 in avd_emu.py (heh) is the non-aic (PP) interrupt I believe. it does two writes at the polled status register which, if it doesn't, PP fails after the fifth-ish run
<eiln>
avd has another power domain which is not in the device tree and not PMGR, lovely. it's enabled by write 0xfff I believe. it's gated on that. and I believe there is no "off" write
<ArcaneNibble>
don't quite follow which bits of code you're looking at
<ArcaneNibble>
but i was thinking that for an RE'd implementation we'd either want "m3 does almost nothing, just proxies interrupts" or "m3 does a lot of offloading in a more competent way than apple"
<eiln>
if you unpack that into data/* you should be able to run my forked avd_emu.py
<eiln>
5, interesting.
<ArcaneNibble>
oh sorry, the 5th one is always an infinite loop
<ArcaneNibble>
so groups of 4, stride of 5
<ArcaneNibble>
first one pokes a status register then calls a shared handler that logs 'CtxD', second one logs 'H?Er' (error?), and the other two do more complicated things
<eiln>
we might be talking about different things, but it's not an infinite loop, it's polling the status register to change. it's why I added the w_40104060 hook
<ArcaneNibble>
hmm, maybe lilyD is totally different
<ArcaneNibble>
which version are you looking at?
<eiln>
j293ap 13.5 viola, it's in the tarball
<ArcaneNibble>
oh! viola firmware IRQs are totally different!
<ArcaneNibble>
from a quick eyeball of the firmwares we have on hand here, clover/viola are different from dahlia/radish/lilyC/lilyD
<krbtgt>
wrt the Mystery Codec, i wonder of it's VC1
<krbtgt>
it is used on blu-ray, it's from around the late 2000s
<krbtgt>
or one of the other VC "family" codecs
<krbtgt>
it's possible it was also removed a long time ago and you're just seeing the holes where it used to be too
<ellyq>
I wonder if it might be AV1?
<eiln>
yeah it's probably just some prototype design
<ellyq>
(sorry for butting in, just curious)
<eiln>
it's not
<eiln>
if the other ones have multiple fifos (why?) it's possible it's hevc fifo#2 but I recall I couldn't get it to work with hevc. also the version number jump was suspicious
<ArcaneNibble>
V?Dn isn't a version number
<ArcaneNibble>
lilyD fills in [0-8] depending on which fifo is used
<eiln>
my understanding is that each fifo is the entrypoint to a separate processor. actually I don't think there's any other way because you can see it processing each step with each word written
<eiln>
at least on mine it's not schedulable i.e. vp9 instruction stream has to go into the vp9 fifo, etc
<eiln>
oh wait no if there are multiple processors we can definitely make use of them as long as their output doesn't get overwritten. but I haven't seen any evidence of macOS threading these. their firmware IPCs are interrupts and the command for executing VP vs. PP (which there's only one of) is handled inside a single IPC decode command. ig that's one thing we can do better
eiln has quit [Quit: WeeChat 4.1.1]
aradhya7 has joined #asahi-dev
RoguePlanet has joined #asahi-dev
ydalton has joined #asahi-dev
<ydalton>
eiln: t8112's avd seems to be codenamed dahlia
RoguePlanet has quit []
ydalton has quit [Quit: ERC 5.5.0.29.1 (IRC client for GNU Emacs 29.0.92)]
<nicolas17>
AAAAAAAAAA I found a mistake in Apple's APFS documentation >_<
<ChaosPrincess>
which one? :P
<nicolas17>
"Virtual objects are stored on disk at a block address that you look up using an object map"
<nicolas17>
"To access a virtual object using the object map, perform the following operations: [...] Locate the B-tree for the object map by reading the om_tree_oid field of omap_phys_t"
<nicolas17>
"om_tree_oid: The virtual object identifier of the tree being used for object mappings."
<nicolas17>
if om_tree_oid is a virtual OID then I need to look it up in the object map, but to look something up in the object map I need to get the btree that om_tree_oid points at, how can that possibly work?
<nicolas17>
it seems om_tree_oid is a *physical* object ID
<ChaosPrincess>
yep
<ChaosPrincess>
there is also another one in omap description where they give two contradicting sentences about about what exact data is stored in it
<nicolas17>
om_snapshot_tree_oid is also documented as being a virtual OID, I haven't checked but that smells also wrong
<nicolas17>
ChaosPrincess: oh, paddr_t vs omap_val_t?
<nicolas17>
ok so I'm trying to figure out something
<nicolas17>
is everything in a data volume encrypted, including the trees that say *where* the data is? I guess yes because otherwise you could know the directory structure and file size, even if not the file names or content?
<ChaosPrincess>
those are encrypted
<nicolas17>
if I delete an encrypted data volume without knowing the encryption key, how does APFS know which blocks became free?
<ChaosPrincess>
there are two kinds of encryption - full volume and per-file
<nicolas17>
yes, and docs say per-file is only used on iOS, but maybe that's wrong and it's also used on ARM macOS too
<ChaosPrincess>
So, if i understand it correctly, there is an object map tree and a filesystem tree.
<ChaosPrincess>
filesystem tree is encrypted, but i guess object map isnt?
kidplayer666 has quit [Quit: Connection closed for inactivity]