ChanServ changed the topic of #asahi-gpu to: Asahi Linux: porting Linux to Apple Silicon macs | GPU / 3D graphics stack black-box RE and development (NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
Telvana has joined #asahi-gpu
DragoonAethis has quit [Quit: hej-hej!]
DragoonAethis has joined #asahi-gpu
phiologe has joined #asahi-gpu
PhilippvK has quit [Ping timeout: 480 seconds]
kov has quit [Quit: Coyote finally caught me]
pyropeter2 has joined #asahi-gpu
pyropeter1 has quit [Ping timeout: 480 seconds]
<kode54>
okay
<kode54>
ever since installing 12.4, now I have GPU glitches in Edge
<kode54>
like, moving my mouse over some objects, and it will only be repainting the mouseovers that are under the square encompassing the mouse cursor
<lina>
phire: Remember that mystery blob next to the iomaps?
<lina>
I figured it out... it's... colorspace conversion matrices.
<lina>
The good news is we can regenerate it ourselves easily.
<phire>
oh, interesting
<lina>
The better news is I'm pretty sure Apple's matrices have excessive rounding error at least, and some might be outright wrong.
<lina>
So we can probably do a *better* job than them.
<lina>
it has RGB<->YUV for BT.601, BT.709, BT.2020, each full/limited range, plus one I haven't identified yet, one identity (or negation?) matrix and an empty slot
<lina>
RGB->YUV full has the textbook coefficients; RGB->YUV limited range I suspect they have an off-by-one error in the limited range adjustment; YUV->RGB full has weird rounding error (e.g. zero coefficients that are some small number instead), and YUV->RGB limited I just can't make sense of, it looks like the limited adjustment is either wildly off there or I don't understand the input scaling.
<phire>
this unkptr_1a0 sturucture is such a mismash of (read-only?) data. Color conversion, io mappings, power states
<lina>
Yeah...
<lina>
I just called it HWDataB now
<phire>
gotta continue my naming scheme of appending letters when you have more than one
Lightsword_ has quit []
Lightsword has joined #asahi-gpu
<lina>
Ah, I think I figured out YUV->RGB limited. There's the weird off by one error again but it's not wrong, there's just strange scaling involved.
<lina>
so there is one unknown matrix, the rest "make sense" but the rounding is weird and I'm pretty sure there are off by one errors
<lina>
They seem to use /255 scaling for the RGB side and /256 scaling for the YUV side to get round numbers, for all components, which doesn't make sense (white ends up as 256, 128, 128 in fullrange YUV which is out of range for Y)
<chadmed>
can you have non-square identity matrices?
<lina>
I need to look up the relevant standards to figure out what's correct, but I'm fairly certain that whatever Apple did here isn't
<Sobek[m]>
chadmed: Identity matrices have to be square. Their the neutral element of square matrices multiplication.
<lina>
YUV full range is kind of dodgy by definition anyway because "no chroma" is supposed to be 128 but that's not the middle of the range, so I need to figure out what's correct and whether that implies that Y and UV have to have different scaling
<lina>
But either way I can't see how *all* of these matrices can be consistent; I'm pretty sure no matter how you interpret them, there is weirdness.
trouter has quit [Ping timeout: 480 seconds]
trouter has joined #asahi-gpu
<lina>
Ah, khronos has an actual spec for this and it refers to BT.2100 / T.871, and that confirms Apple's coefficients are definitely wrong here.
<lina>
We'll have to do some actual tests to figure out what the correct matrices would be, though, since I'm not sure if the interpretation is fixed-point 1.0 = 255 (actual max brightness) or fixed-point 1.0 = 256 (out of range but easier on the hardware) for these matrices, on both the RGB and YUV sides.
<lina>
Aaaaaa, and as far as I can tell it's actually *not possible* to program correct coefficients, assuming this needs to work across different bit depths (assuming the hardware does not special case the quantization differently for the full-range and limited-range slots)
<lina>
Because it turns out that the definition of full-range quantization and limited-range quantization interact with the bit depth differently, and therefore with any given single implementation of rounding/scaling you can only generalize the correct matrix for one of those two cases across all bit depths
<lina>
Limited-range white is defined as 235 << (8-n), while full-range white is defined as (1 << n) - 1
<lina>
So for limited-range you extend with zeroes at >8 bits (this would work for 1.0 == 1<<n across depths), while for full-range you extend with ones (needs 1.0 == (1<<n)-1 across depths), and assuming the hardware only does one thing across the board, it can't be right for both.
kov has joined #asahi-gpu
bisko has quit [Read error: Connection reset by peer]
<alyssa>
lina: Can we skip programming these matrices (or do all 0s) for now and gate YUV support behind a kernel UABI bump in the future when we have a use case?
<alyssa>
keeping in mind that I have a difficult time thinking of a use case
<jannau>
alyssa: YUV support for dmabuf import, for example for HW video decoding
<alyssa>
jannau: For the fast path, ideally that skips the GPU completely and goes straight to a YUV overlay plane in the DCP
<alyssa>
For the !fast path where you go through the GPU, the time spent texturing (both reading from memory and in the actual sampler hardware) will eclipse the little bit of arithmetic needed to do a matrix multiply
<alyssa>
So likely the native YUV path will be no faster and only slightly more energy efficient
<alyssa>
(and of course, it doesn't reall matter how fast it is as long as it's above 60fps, it's vsynced)
<alyssa>
but again if you want energy efficient, skip the GPU blit entirely ...
<alyssa>
Admittedly I'm unsure if Linux userspace is set up for either YUV overlay planes or hw YUV textures
<jannau>
I agree that it's not an use case which must be supported from the beginning. the overlay path might not available due to another video playing
<alyssa>
Ah right I forgot M1 is ridiculous limited in overlay planes ...
Leidenfrost has joined #asahi-gpu
Telvana has joined #asahi-gpu
<dottedmag>
alyssa: Wayland surfaces have the provision to do so, but it's up to compositor.
<alyssa>
dottedmag: right, I know the plumbing is there, just not sure if in practice it gets used (needs support from everyone: kernel, compositor, application, ..)
<jannau>
I unsure about the desktop linux userland status but both yuv overlays and yuv textures are used in embedded linux userland
<alyssa>
Got it
<alyssa>
Linux on iPod Nano when
<Leidenfrost>
Hi all
<Leidenfrost>
Question, will the GPU driver cover hardware video decode?
<jannau>
2 orders of magnitude larger "devices" and I don't think apple silicon will play a role there
<jannau>
Leidenfrost: no, that is a separate HW block on the SoC
<Leidenfrost>
I see
<TheLink>
I see not much point in hardware video decoding if the cpu is so strong and has many cores
<TheLink>
perhaps 4k in av1 but is that really a thing?
<TheLink>
besides av1 decoding in hardware isn't supported anyway
<Jamie[m]>
the decoder doesn’t do AV1
<alyssa>
Energy effiency => battery life on the handhelds
<jannau>
hw decode is more energy efficient
<Jamie[m]>
but generally speaking hardware video decoding is much more energy efficient
<Jamie[m]>
uh yeah what they said
<Jamie[m]>
lol
<alyssa>
"How many hours of video can you watch on $DEVICE before the battery dies?" is a metric Apple cares abou
<alyssa>
"hw vdec + display YUV overlay plane with no GPU in between and framebuffer compression in flight" is a winning strategy there
<Jamie[m]>
answer: an absurdly high amount
<Jamie[m]>
(they spec 21 hours for the 16” mbp)
<alyssa>
also for thermals, don't want your iThing getting absurdly hot bingewatching iCarly :p
<alyssa>
(although your iThing catching fire *would* be apt...)
<Jamie[m]>
the macos scheduler has input from the inbuilt smoke detector to migrate the process to icestorm if there’s too much fire happening
<Jamie[m]>
I gotta find more time to work on the AVD stuff
<Jamie[m]>
Who knew having a job doing computers all day makes you tired of doing computers
<alyssa>
Jamie[m]: same hat
<sven>
that's why you take vacations to work on hobby projects :-P
<sven>
but yeah, same otherwise
<TheLink>
the greatest advantage of hw video decoding is only with safari afaik
<TheLink>
last info I read said that chrome uses 2x the energy for hw decoding
<TheLink>
but we're talking linux here, so I guess you might have a real point
<dottedmag>
alyssa: huh, can hw video decoder on m1 produce directly yuv compressed framebuffer?
<lina>
alyssa: I don't think computing those matrices is a big deal. Worst case if we don't have the time to figure out the exact rounding, we guess. Nobody's going to complain about the YUV conversions being off by 1 lsb with everything else going on...
<marcan>
dottedmag: I don't know if compressed YUV is a thing Apple does, but if it is the HW vdec would definitely support it, since it's the primary use case for that.
<marcan>
Also there's a big scaler block that can probably convert anything to anything
<jannau>
dottedmag: not sure if we have the decoder already RE-ed enough to say that but I would expect that it supports framebuffer compression, the mjpeg decoder does
<marcan>
Jamie[m], sven: this is why I made my job doing computers all da- wait
<marcan>
< TheLink> the greatest advantage of hw video decoding is only with safari afaik <- only because nobody else supports it properly
<marcan>
chrome does not support the overlay surface offload on macOS so it keeps the GPU turned on
<marcan>
safari literally shuts down the GPU while playing video
<TheLink>
yeah, some said that drm is the reason google does it differently but I have no idea
<TheLink>
doesn't sound too plausible to me
<marcan>
google could still do it properly for non-DRMed content so that doesn't really add up
<Jamie[m]>
vague memory of suggestions that it supported a yuv compressed format last time i looked at it
<jannau>
I would expect that DRM use cases are easier to support without GPU, assuming their is platform support for it
<Jamie[m]>
the framebuffers coming out when ffmpeg drives it are regular yuv though
<Jamie[m]>
(coming out as in, they’re what i dump from the IOMMU mapped area)
<marcan>
it would support both regular and compressed, of course
<marcan>
and the compressed stuff is probably not in public APIs
<Jamie[m]>
yeah, iirc there’s a webkit header file that define the relevant VT format constant, but it’s not in VT public headers themselves
<marcan>
jannau: I'm not sure if apple actually implements anything like PAVP; my suspicion is they just rely on kernel integrity and the actual DRM key management/decrypting is in SEP and it'll refuse to do it if you've downgraded security
<daniels>
^ most common approach
<Jamie[m]>
iirc there is an AVD command to feed it a decryption key
<Jamie[m]>
so the actual decryption decryption is in the decoder block
<Jamie[m]>
but otherwise i think that’s right
<lina>
daniels: Since you're around... I'd love to sit down and get your opinion on this wonderfully weird AGX GPU one of these days. Stuff related to the kernel UABI once we get there, and right now I'm wondering if you have any takes on how to name things.
<lina>
(Since we get to pick all the names here, and I don't experience with other GPUs so I don't know what's recognizable to others)