#asahi-gpu on 2022-05-26 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:58 ChanServ changed the topic of #asahi-gpu to: Asahi Linux: porting Linux to Apple Silicon macs | GPU / 3D graphics stack black-box RE and development (NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu

00:53 Telvana has joined #asahi-gpu

01:00 DragoonAethis has quit [Quit: hej-hej!]

01:00 DragoonAethis has joined #asahi-gpu

02:58 phiologe has joined #asahi-gpu

03:01 PhilippvK has quit [Ping timeout: 480 seconds]

03:34 kov has quit [Quit: Coyote finally caught me]

03:43 pyropeter2 has joined #asahi-gpu

03:45 pyropeter1 has quit [Ping timeout: 480 seconds]

04:07 <kode54> okay

04:07 <kode54> ever since installing 12.4, now I have GPU glitches in Edge

04:07 <kode54> like, moving my mouse over some objects, and it will only be repainting the mouseovers that are under the square encompassing the mouse cursor

04:09 <kode54> it ends up doing stuff like this

04:09 <kode54> https://irc.losno.co/uploads/89994885572cbd9b/Screen%20Recording%202022-05-25%20at%203.10.42%20PM.mp4

05:03 al3xtjames3 has joined #asahi-gpu

05:03 bpye3 has joined #asahi-gpu

05:03 conradev8 has joined #asahi-gpu

05:03 ave363 has joined #asahi-gpu

05:03 tbodt_ has joined #asahi-gpu

05:03 Manouchehri_ has joined #asahi-gpu

05:03 gpanders_ has joined #asahi-gpu

05:03 d4ve_ has joined #asahi-gpu

05:03 jkkm_ has joined #asahi-gpu

05:03 coder_kalyan_ has joined #asahi-gpu

05:04 linuxgemini956 has joined #asahi-gpu

05:04 nepeat_ has joined #asahi-gpu

05:04 mmarchini_ has joined #asahi-gpu

05:04 sa1_ has joined #asahi-gpu

05:04 guan_ has joined #asahi-gpu

05:04 jabashque_ has joined #asahi-gpu

05:04 taowa_ has joined #asahi-gpu

05:04 philpax__ has joined #asahi-gpu

05:04 jbowen_ has joined #asahi-gpu

05:04 tardyp_ has joined #asahi-gpu

05:05 austriancoder_ has joined #asahi-gpu

05:05 yuyichao_ has joined #asahi-gpu

05:06 skoobasteeve has joined #asahi-gpu

05:06 Method_ has joined #asahi-gpu

05:07 rbenua has quit [synthon.oftc.net larich.oftc.net]

05:07 conradev has quit [synthon.oftc.net larich.oftc.net]

05:07 al3xtjames has quit [synthon.oftc.net larich.oftc.net]

05:07 boardwalk has quit [synthon.oftc.net larich.oftc.net]

05:07 AoV has quit [synthon.oftc.net larich.oftc.net]

05:07 jabashque has quit [synthon.oftc.net larich.oftc.net]

05:07 yuyichao has quit [synthon.oftc.net larich.oftc.net]

05:07 ids1024 has quit [synthon.oftc.net larich.oftc.net]

05:07 opticron has quit [synthon.oftc.net larich.oftc.net]

05:07 TheFirst has quit [synthon.oftc.net larich.oftc.net]

05:07 linuxgemini95 has quit [synthon.oftc.net larich.oftc.net]

05:07 rcombs has quit [synthon.oftc.net larich.oftc.net]

05:07 lain has quit [synthon.oftc.net larich.oftc.net]

05:07 skoobasteeve_ has quit [synthon.oftc.net larich.oftc.net]

05:07 bpye has quit [synthon.oftc.net larich.oftc.net]

05:07 gpanders has quit [synthon.oftc.net larich.oftc.net]

05:07 d4ve has quit [synthon.oftc.net larich.oftc.net]

05:07 coder_kalyan has quit [synthon.oftc.net larich.oftc.net]

05:07 Method has quit [synthon.oftc.net larich.oftc.net]

05:07 tbodt has quit [synthon.oftc.net larich.oftc.net]

05:07 austriancoder has quit [synthon.oftc.net larich.oftc.net]

05:07 philpax_ has quit [synthon.oftc.net larich.oftc.net]

05:07 guan has quit [synthon.oftc.net larich.oftc.net]

05:07 sa1 has quit [synthon.oftc.net larich.oftc.net]

05:07 Lightsword has quit [synthon.oftc.net larich.oftc.net]

05:07 tardyp has quit [synthon.oftc.net larich.oftc.net]

05:07 mmarchini has quit [synthon.oftc.net larich.oftc.net]

05:07 Manouchehri has quit [synthon.oftc.net larich.oftc.net]

05:07 jkkm has quit [synthon.oftc.net larich.oftc.net]

05:07 nepeat has quit [synthon.oftc.net larich.oftc.net]

05:07 ave36 has quit [synthon.oftc.net larich.oftc.net]

05:07 jbowen has quit [synthon.oftc.net larich.oftc.net]

05:07 taowa has quit [synthon.oftc.net larich.oftc.net]

05:07 al3xtjames3 is now known as al3xtjames

05:07 conradev8 is now known as conradev

05:07 jackhill has quit [synthon.oftc.net larich.oftc.net]

05:07 bpye3 is now known as bpye

05:07 guan_ is now known as guan

05:07 sa1_ is now known as sa1

05:07 tardyp_ is now known as tardyp

05:07 Manouchehri_ is now known as Manouchehri

05:07 jkkm_ is now known as jkkm

05:07 AoV has joined #asahi-gpu

05:07 boardwalk has joined #asahi-gpu

05:08 lain has joined #asahi-gpu

05:08 Lightsword has joined #asahi-gpu

05:08 austriancoder_ has quit []

05:08 Lightsword is now known as Guest357

05:09 austriancoder has joined #asahi-gpu

05:09 TheFirst has joined #asahi-gpu

05:10 rcombs has joined #asahi-gpu

05:11 rbenua has joined #asahi-gpu

05:12 jackhill has joined #asahi-gpu

05:12 opticron has joined #asahi-gpu

05:14 ids1024 has joined #asahi-gpu

05:44 jakebot has quit [Quit: The Lounge - https://thelounge.chat]

05:47 jakebot6 has joined #asahi-gpu

05:49 povik has joined #asahi-gpu

05:51 Guest357 has quit []

05:52 Lightsword_ has joined #asahi-gpu

05:58 jbowen_ is now known as jbowen

05:59 Lightsword_ has quit []

06:00 Lightsword_ has joined #asahi-gpu

06:43 <lina> phire: Remember that mystery blob next to the iomaps?

06:43 <lina> I figured it out... it's... colorspace conversion matrices.

06:43 <lina> The good news is we can regenerate it ourselves easily.

06:44 <phire> oh, interesting

06:44 <lina> The better news is I'm pretty sure Apple's matrices have excessive rounding error at least, and some might be outright wrong.

06:44 <lina> So we can probably do a *better* job than them.

06:45 <lina> it has RGB<->YUV for BT.601, BT.709, BT.2020, each full/limited range, plus one I haven't identified yet, one identity (or negation?) matrix and an empty slot

06:47 <lina> RGB->YUV full has the textbook coefficients; RGB->YUV limited range I suspect they have an off-by-one error in the limited range adjustment; YUV->RGB full has weird rounding error (e.g. zero coefficients that are some small number instead), and YUV->RGB limited I just can't make sense of, it looks like the limited adjustment is either wildly off there or I don't understand the input scaling.

06:53 <phire> this unkptr_1a0 sturucture is such a mismash of (read-only?) data. Color conversion, io mappings, power states

06:53 <lina> Yeah...

06:53 <lina> I just called it HWDataB now

06:54 <phire> gotta continue my naming scheme of appending letters when you have more than one

06:55 Lightsword_ has quit []

06:56 Lightsword has joined #asahi-gpu

07:13 <lina> Ah, I think I figured out YUV->RGB limited. There's the weird off by one error again but it's not wrong, there's just strange scaling involved.

07:27 Telvana has quit [Ping timeout: 480 seconds]

07:42 <lina> https://gist.github.com/asahilina/2a80fe5e4476de56412508ec78245b9b

07:42 <lina> so there is one unknown matrix, the rest "make sense" but the rounding is weird and I'm pretty sure there are off by one errors

07:43 <lina> They seem to use /255 scaling for the RGB side and /256 scaling for the YUV side to get round numbers, for all components, which doesn't make sense (white ends up as 256, 128, 128 in fullrange YUV which is out of range for Y)

07:43 <chadmed> can you have non-square identity matrices?

07:43 <lina> I need to look up the relevant standards to figure out what's correct, but I'm fairly certain that whatever Apple did here isn't

07:44 <lina> chadmed: It's 3x3 + offset column (affine transform)

07:44 <chadmed> oh right right

07:45 <Sobek[m]> chadmed: Identity matrices have to be square. Their the neutral element of square matrices multiplication.

07:45 <lina> YUV full range is kind of dodgy by definition anyway because "no chroma" is supposed to be 128 but that's not the middle of the range, so I need to figure out what's correct and whether that implies that Y and UV have to have different scaling

07:47 <lina> But either way I can't see how *all* of these matrices can be consistent; I'm pretty sure no matter how you interpret them, there is weirdness.

09:21 trouter has quit [Ping timeout: 480 seconds]

09:40 trouter has joined #asahi-gpu

10:21 <lina> Ah, khronos has an actual spec for this and it refers to BT.2100 / T.871, and that confirms Apple's coefficients are definitely wrong here.

10:22 <lina> https://www.khronos.org/registry/DataFormat/specs/1.2/dataformat.1.2.html

10:24 <lina> We'll have to do some actual tests to figure out what the correct matrices would be, though, since I'm not sure if the interpretation is fixed-point 1.0 = 255 (actual max brightness) or fixed-point 1.0 = 256 (out of range but easier on the hardware) for these matrices, on both the RGB and YUV sides.

10:53 <lina> Aaaaaa, and as far as I can tell it's actually *not possible* to program correct coefficients, assuming this needs to work across different bit depths (assuming the hardware does not special case the quantization differently for the full-range and limited-range slots)

10:54 <lina> Because it turns out that the definition of full-range quantization and limited-range quantization interact with the bit depth differently, and therefore with any given single implementation of rounding/scaling you can only generalize the correct matrix for one of those two cases across all bit depths

10:56 <lina> Limited-range white is defined as 235 << (8-n), while full-range white is defined as (1 << n) - 1

10:58 <lina> So for limited-range you extend with zeroes at >8 bits (this would work for 1.0 == 1<<n across depths), while for full-range you extend with ones (needs 1.0 == (1<<n)-1 across depths), and assuming the hardware only does one thing across the board, it can't be right for both.

11:04 kov has joined #asahi-gpu

12:58 bisko has quit [Read error: Connection reset by peer]

13:02 <alyssa> lina: Can we skip programming these matrices (or do all 0s) for now and gate YUV support behind a kernel UABI bump in the future when we have a use case?

13:03 <alyssa> keeping in mind that I have a difficult time thinking of a use case

13:05 <jannau> alyssa: YUV support for dmabuf import, for example for HW video decoding

13:08 <alyssa> jannau: For the fast path, ideally that skips the GPU completely and goes straight to a YUV overlay plane in the DCP

13:09 <alyssa> For the !fast path where you go through the GPU, the time spent texturing (both reading from memory and in the actual sampler hardware) will eclipse the little bit of arithmetic needed to do a matrix multiply

13:09 <alyssa> So likely the native YUV path will be no faster and only slightly more energy efficient

13:10 <alyssa> (and of course, it doesn't reall matter how fast it is as long as it's above 60fps, it's vsynced)

13:10 <alyssa> but again if you want energy efficient, skip the GPU blit entirely ...

13:12 <alyssa> Admittedly I'm unsure if Linux userspace is set up for either YUV overlay planes or hw YUV textures

13:14 <jannau> I agree that it's not an use case which must be supported from the beginning. the overlay path might not available due to another video playing

13:17 <alyssa> Ah right I forgot M1 is ridiculous limited in overlay planes ...

13:17 Leidenfrost has joined #asahi-gpu

13:27 Telvana has joined #asahi-gpu

13:46 <dottedmag> alyssa: Wayland surfaces have the provision to do so, but it's up to compositor.

13:55 <alyssa> dottedmag: right, I know the plumbing is there, just not sure if in practice it gets used (needs support from everyone: kernel, compositor, application, ..)

14:00 <jannau> I unsure about the desktop linux userland status but both yuv overlays and yuv textures are used in embedded linux userland

14:01 <alyssa> Got it

14:01 <alyssa> Linux on iPod Nano when

14:03 <Leidenfrost> Hi all

14:03 <Leidenfrost> Question, will the GPU driver cover hardware video decode?

14:03 <jannau> 2 orders of magnitude larger "devices" and I don't think apple silicon will play a role there

14:04 <jannau> Leidenfrost: no, that is a separate HW block on the SoC

14:04 <Leidenfrost> I see

14:08 <TheLink> I see not much point in hardware video decoding if the cpu is so strong and has many cores

14:09 <TheLink> perhaps 4k in av1 but is that really a thing?

14:10 <TheLink> besides av1 decoding in hardware isn't supported anyway

14:10 <Jamie[m]> the decoder doesn’t do AV1

14:10 <alyssa> Energy effiency => battery life on the handhelds

14:10 <jannau> hw decode is more energy efficient

14:10 <Jamie[m]> but generally speaking hardware video decoding is much more energy efficient

14:10 <Jamie[m]> uh yeah what they said

14:11 <Jamie[m]> lol

14:11 <alyssa> "How many hours of video can you watch on $DEVICE before the battery dies?" is a metric Apple cares abou

14:11 <alyssa> "hw vdec + display YUV overlay plane with no GPU in between and framebuffer compression in flight" is a winning strategy there

14:12 <Jamie[m]> answer: an absurdly high amount

14:12 <Jamie[m]> (they spec 21 hours for the 16” mbp)

14:12 <alyssa> also for thermals, don't want your iThing getting absurdly hot bingewatching iCarly :p

14:14 <alyssa> (although your iThing catching fire *would* be apt...)

14:15 <Jamie[m]> the macos scheduler has input from the inbuilt smoke detector to migrate the process to icestorm if there’s too much fire happening

14:18 <Jamie[m]> I gotta find more time to work on the AVD stuff

14:18 <Jamie[m]> Who knew having a job doing computers all day makes you tired of doing computers

14:19 <alyssa> Jamie[m]: same hat

14:19 <sven> that's why you take vacations to work on hobby projects :-P

14:19 <sven> but yeah, same otherwise

14:25 <TheLink> the greatest advantage of hw video decoding is only with safari afaik

14:25 <TheLink> last info I read said that chrome uses 2x the energy for hw decoding

14:26 <TheLink> but we're talking linux here, so I guess you might have a real point

14:36 <dottedmag> alyssa: huh, can hw video decoder on m1 produce directly yuv compressed framebuffer?

14:39 <lina> alyssa: I don't think computing those matrices is a big deal. Worst case if we don't have the time to figure out the exact rounding, we guess. Nobody's going to complain about the YUV conversions being off by 1 lsb with everything else going on...

14:43 <marcan> dottedmag: I don't know if compressed YUV is a thing Apple does, but if it is the HW vdec would definitely support it, since it's the primary use case for that.

14:43 <marcan> Also there's a big scaler block that can probably convert anything to anything

14:44 <jannau> dottedmag: not sure if we have the decoder already RE-ed enough to say that but I would expect that it supports framebuffer compression, the mjpeg decoder does

14:44 <marcan> Jamie[m], sven: this is why I made my job doing computers all da- wait

14:44 <marcan> < TheLink> the greatest advantage of hw video decoding is only with safari afaik <- only because nobody else supports it properly

14:45 <marcan> chrome does not support the overlay surface offload on macOS so it keeps the GPU turned on

14:45 <marcan> safari literally shuts down the GPU while playing video

14:46 <TheLink> yeah, some said that drm is the reason google does it differently but I have no idea

14:46 <TheLink> doesn't sound too plausible to me

14:47 <marcan> google could still do it properly for non-DRMed content so that doesn't really add up

14:51 <Jamie[m]> vague memory of suggestions that it supported a yuv compressed format last time i looked at it

14:51 <jannau> I would expect that DRM use cases are easier to support without GPU, assuming their is platform support for it

14:51 <Jamie[m]> the framebuffers coming out when ffmpeg drives it are regular yuv though

14:51 <Jamie[m]> (coming out as in, they’re what i dump from the IOMMU mapped area)

14:52 <marcan> it would support both regular and compressed, of course

14:52 <marcan> and the compressed stuff is probably not in public APIs

14:52 <Jamie[m]> yeah, iirc there’s a webkit header file that define the relevant VT format constant, but it’s not in VT public headers themselves

14:54 <marcan> jannau: I'm not sure if apple actually implements anything like PAVP; my suspicion is they just rely on kernel integrity and the actual DRM key management/decrypting is in SEP and it'll refuse to do it if you've downgraded security

14:54 <daniels> ^ most common approach

14:56 <Jamie[m]> iirc there is an AVD command to feed it a decryption key

14:56 <Jamie[m]> so the actual decryption decryption is in the decoder block

14:56 <Jamie[m]> but otherwise i think that’s right

14:57 <lina> daniels: Since you're around... I'd love to sit down and get your opinion on this wonderfully weird AGX GPU one of these days. Stuff related to the kernel UABI once we get there, and right now I'm wondering if you have any takes on how to name things.

14:58 <lina> (Since we get to pick all the names here, and I don't experience with other GPUs so I don't know what's recognizable to others)

14:59 <lina> *don't have

17:26 Telvana has quit [Ping timeout: 480 seconds]

17:34 Telvana has joined #asahi-gpu

19:10 Telvana2 has joined #asahi-gpu

19:11 Telvana has quit [Ping timeout: 480 seconds]

21:20 al3xtjames6 has joined #asahi-gpu

21:20 al3xtjames has quit [Ping timeout: 480 seconds]

21:20 al3xtjames6 is now known as al3xtjames

21:22 Leidenfrost has quit [Quit: Connection closed]

21:58 MajorBiscuit has joined #asahi-gpu

22:04 MajorBiscuit has quit [Quit: WeeChat 3.4]

23:36 Telvana2 has quit [Ping timeout: 480 seconds]

23:47 jole_ has joined #asahi-gpu

23:50 Telvana has joined #asahi-gpu