#asahi-dev on 2022-04-03 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:58 ChanServ changed the topic of #asahi-dev to: Asahi Linux: porting Linux to Apple Silicon macs | General development | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-dev

00:01 Ph03n1ks has quit [Remote host closed the connection]

00:02 doggkruse has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

00:03 doggkruse has joined #asahi-dev

00:04 Ph03n1ks has joined #asahi-dev

00:19 doggkruse has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

00:47 Ph03n1ks has quit [Remote host closed the connection]

00:48 chadmed has joined #asahi-dev

00:54 <chadmed> rqou_: could you also try 4:2:2 YUV if ffmpeg supports it? it's the prores native pixel format

00:56 <chadmed> i would be surprised if the hardware block could process raw BRGA pixel data since no version of prores uses this

00:57 <chadmed> even ProRes 4444 is YUV with alpha

01:00 <tpw_rules> apple definitely says prores 4444 supports rgba

01:00 <tpw_rules> i wonder if the order matters

01:06 akemin_dayo has joined #asahi-dev

01:08 <chadmed> oh ffmpeg doesnt support encoding yet it seems

01:08 <chadmed> but it worked with yuv?

01:17 Ph03n1ks has joined #asahi-dev

01:17 Ph03n1ks has quit [Read error: Connection reset by peer]

01:21 Ph03n1ks has joined #asahi-dev

01:30 <chadmed> hm both rgba and brga work fine for me

01:30 <chadmed> taking a 1080p H.264 video and encoding it into prores using the hardare goes at about 500fps

01:33 <chadmed> that does seem slow considering apple's performance claims though

01:33 <tpw_rules> are you sure you're not bottlenecked on decoding the input

01:34 bisko has quit [Read error: Connection reset by peer]

01:36 <chadmed> huh, ffmpeg doesnt report any videotoolbox decoders

01:36 <kode54> brew ffmpeg doesn't enable those by default

01:37 <kode54> I don't think

01:37 <chadmed> says it was built with --enable-videotoolbox and the encoders show up

01:37 <kode54> oh, hmm

01:37 <kode54> I think you have to specify the decoders manually, otherwise it prefers its built in ones

01:37 <kode54> hmm

01:38 <chadmed> yeah i just grepped the list of decoders and none of them say videotoolbox

01:38 <chadmed> the ffmpeg site says that its only implemented "internally" for decode

01:38 <chadmed> so i guess libavcodec is meant to select it automatically or something?

01:38 <chadmed> oh hang on

01:38 <kode54> aac_at

01:38 <kode54> etc

01:38 bisko has joined #asahi-dev

01:39 <kode54> all the _at codecs are audiotoolbox

01:43 <chadmed> lmao apparently the hardware h264 decoder is slower than the cpu

01:43 <chadmed> -hwaccel auto results in slower performance than using the cpu

01:46 <chadmed> i think this is probably something funky with ffmpeg or my machine, but at least we know the pixel formats actually do work

01:47 <chadmed> rqou_: what options did you pass to ffmpeg to make it not work?

01:47 <tpw_rules> is prores's bitstream documented anywhere?

01:52 <chadmed> link gore incoming

01:52 <chadmed> https://sourceforge.net/p/mediainfo/discussion/297610/thread/4bc21b18/ee82/2d5e/attachment/SMPTE%20ProRes%20rdd36-2015.pdf

01:53 <nicolas17> chadmed: sometimes hwaccel is slow because it needs to copy back to RAM, but that shouldn't be the case here, especially on M1

01:53 <chadmed> yeah its behaving weirdly

01:53 <chadmed> its about a quarter of the speed of the cpu lmoa

01:53 <nicolas17> are you encoding to prores?

01:54 <chadmed> im decoding an h264 stream and encoding that into prores yeah

01:54 <chadmed> with the cpu doing the h264 decoding it was about 500fps

01:54 <tpw_rules> ah, it uses golomb coding

01:54 <chadmed> using the h264 hardware decoder its about 160fps

01:54 * tpw_rules has been doing some work on lossless video compression

01:55 <nicolas17> chadmed: try "-f null -" for the output so you're *only* testing decoding

01:57 <chadmed> yeah same speed

01:58 <chadmed> the prores decoder operates much much faster though

01:58 Ph03n1ks has quit [Remote host closed the connection]

01:58 <chadmed> so maybe the h264 and hevc blocks are just slow?

01:58 Ph03n1ks has joined #asahi-dev

01:59 <tpw_rules> i mean 160fps sounds like it's fast enough

01:59 <chadmed> yeah of course it doesnt need to be any faster

01:59 <chadmed> it was just curious that the cpu was faster

02:00 <chadmed> though that is 160fps for a pretty low bitrate 1080p video stream

02:01 <chadmed> copying over something a bit more demanding now

02:03 <nicolas17> is cpu decoding using multiple threads?

02:04 <nicolas17> maybe the h264 hardware is significantly more power-efficient, but if you're doing a transcode you'd likely want best speed so that's good to know...

02:05 <chadmed> it almost certainly is more power efficient, i just find it odd that its many times slower in absolute terms too

02:06 <chadmed> especially given the use cases for these machines

02:06 <nicolas17> also maybe if you want all your CPU cores to be dedicated to encoding, pushing the decoder to hardware is also a speed win? idk

02:06 <chadmed> it makes sense to throttle these sorts of things in a mobile soc because youre only ever going to be watching a single stream of something on those devices

02:07 <chadmed> but apple heavily markets these things to "creators" who might want fast h264 encode for stuff like youtube

02:07 <nicolas17> well you weren't testing h264 *encode*, right?

02:07 <chadmed> i did test it

02:07 <chadmed> its the same speed

02:07 Ph03n1ks has quit [Ping timeout: 480 seconds]

02:08 <chadmed> im copying over some more stressful files to poke around with

02:10 <chadmed> 50fps on a 4K HDR10 stream

02:10 <chadmed> speed doesnt seem to change when forcing lower or higher bitrates too

02:11 <nicolas17> any signs of a hard cap? (like decode never going over 160fps at all)

02:11 <nicolas17> like it's throttling rather than being inherently slow

02:13 <chadmed> yeah it sort of ramps up to that speed and then stays there

02:13 <chadmed> encoding to null does not make it faster either

02:13 <nicolas17> what's ffmpeg CPU usage like?

02:14 <nicolas17> if it's >= 1 core, it may be bottlenecking doing something dumb on the CPU...

02:15 <chadmed> nah its behaving threaded

02:15 <chadmed> ~60% on two firestorm cores

02:15 <chadmed> it would make sense if the hardware blocks are just throttled imo

02:15 <chadmed> like i said its something youd definitely do on a mobile device and if they just plonked the blocks down into these chips with no changes...

02:16 <chadmed> the prores block, which is brand new, is much much much faster

02:16 <nicolas17> mhm

02:16 <chadmed> decoding the HEVC stream, encoding in prores to null is about 300fps

02:16 Ph03n1ks has joined #asahi-dev

02:16 <chadmed> the cynic in me says apple is pushing people into using prores

02:17 <chadmed> the bloomer in me wants to say they just "forgot" to take off the throttle in the firmware or something :P

02:17 <nicolas17> speaking of mobile devices, I wonder if there's anyone interested / crazy enough to port asahi to A10 or something :p

02:17 <tpw_rules> how fast is h264 encoding

02:18 <chadmed> never goes above 53fps on that 4K HDR10 stream

02:18 <tpw_rules> so it's basically symmetric. interesting

02:18 <tpw_rules> in fairness, for a given level of effort/silicon/whatever, prores will be far easier to encode

02:19 <tpw_rules> it looks to be just mjpeg with extra metadata and a different entropy coding stage

02:21 <chadmed> yeah

02:21 <chadmed> the hevc and h264 blocks are definitely behaving like theyre limited in some way

02:22 <chadmed> they are capable of decoding fast enough, since encoding into prores is fast

02:22 <chadmed> but then why was software decoding of that same stream like 500fps?

02:23 <chadmed> questions to answer after lunch maybe

02:23 <nicolas17> it could also be ffmpeg's fault somehow?

02:23 <chadmed> yeah its possible

02:24 <chadmed> would kind of ruin my plans to buy an m2 mac mini and use it as a server if that's the case :P

02:24 <chadmed> it could just be a macos thing too, once we have access to these blocks from linux we might be able to shed some light on the situation

02:25 <chadmed> i really just wanted to confirm my theory about the pixel format and got super carried away (spectrum moment)

02:25 <nicolas17> >.>

02:29 riceballnice[m] has joined #asahi-dev

02:29 Ph03n1ks has quit [Ping timeout: 480 seconds]

02:47 Ph03n1ks has joined #asahi-dev

03:00 PhilippvK has joined #asahi-dev

03:04 phiologe has quit [Ping timeout: 480 seconds]

03:25 kov has quit [Quit: Coyote finally caught me]

03:25 kov has joined #asahi-dev

03:31 Ph03n1ks has quit [Remote host closed the connection]

03:31 bisko has quit [Read error: Connection reset by peer]

03:33 Ph03n1ks has joined #asahi-dev

03:35 doggkruse has joined #asahi-dev

03:36 bisko has joined #asahi-dev

03:41 Ph03n1ks has quit [Ping timeout: 480 seconds]

03:50 Ph03n1ks has joined #asahi-dev

03:54 doggkruse has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

04:01 Ph03n1ks has quit [Ping timeout: 480 seconds]

04:10 Ph03n1ks has joined #asahi-dev

04:51 Ph03n1ks has quit [Remote host closed the connection]

04:51 Ph03n1ks has joined #asahi-dev

04:55 alcazar has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

04:56 doggkruse has joined #asahi-dev

04:59 <chadmed> https://gist.github.com/chadmed/f2ab8370056e84a6a9a4b9e226a93302

04:59 <chadmed> looks to me like the AVCHD and HEVC decoding blocks are indeed limited

04:59 <chadmed> i find it odd that theyre slower than 60fps, im not entirely convinced

05:03 Ph03n1ks has quit [Remote host closed the connection]

05:04 Ph03n1ks has joined #asahi-dev

05:08 <rqou_> chadmed: working: ffmpeg -loglevel debug -y -r 60 -f image2 -s 1920x1080 -i test.png -vcodec prores_videotoolbox -pix_fmt yuv420p test.mov

05:08 <rqou_> not working: replace yuv420p with bgra

05:08 <rqou_> this is with ffmpeg git as of this morning-ish (US pacific time)

05:09 <chadmed> huh odd, brga8888, brga10101010 and brga12121212 all worked for me on the prores block

05:10 <rqou_> ¯\_(ツ)_/¯ it's probably a ffmpeg problem

05:10 <Jamie[m]1> fyi there are multiple decoders inside the AVD for HEVC and H.264 (but only one for VP9), maybe the way to get 60fps is to somehow feed multiple blocks at a time, and ffmpeg isn't successfully doing that?

05:10 <rqou_> i got it to encode one frame at least one particular way, which is enough to get started on the RE effort

05:11 <rqou_> eventually i might want to figure out how to work the lowest-level macos interface though

05:11 <rqou_> but the lowest-level iokit interface is undocumented afaict?

05:11 <chadmed> Jamie[m]1: ffmpeg just passes the data through VideoToolbox so if its not fully utilising the hardware that would be an apple thing, right?

05:13 <chadmed> i might see if i have any high framerate video lying around i can feed it. since all the framerates are still above realtime theres still one theory id like to eliminate

05:14 <chadmed> it could be possible that it limits itself to some number above realtime so that video plays back and the buffer fills up at a decent speed, but no unnecessary power is used

05:15 <chadmed> but then why would the 1080p video i used decode at like 200fps

05:19 <chadmed> i mean we could also just apply occams razor and conclude that the encoders are not that fast

05:19 <Jamie[m]1> oh sorry I misread that we were talking about decoders

05:19 <Jamie[m]1> yeah that makes way more sense

05:19 <Jamie[m]1> sweet

05:20 <chadmed> yeah the decoders are fast

05:20 <chadmed> the gist i posted above has the figures, but i can transcode an HDR10 4K HEVC stream to ProRes at 300+ fps

05:20 <chadmed> using hardware decode

05:21 <chadmed> but if i then try to encode that to H.264 or HEVC the framerate tanks

05:21 <chadmed> so its just the encoders that seem limited

05:28 bisko has quit [Read error: Connection reset by peer]

05:30 bisko has joined #asahi-dev

05:33 r0ni has quit [Quit: Textual IRC Client: www.textualapp.com]

05:42 doggkruse has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

06:13 Ph03n1ks has quit [Remote host closed the connection]

06:17 doggkruse has joined #asahi-dev

06:18 the_lanetly_052 has joined #asahi-dev

06:20 veloek has joined #asahi-dev

06:30 doggkruse has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

06:45 <abrasive> seems logical, doesn't it? encoding is Hard, lotta decisions to make

06:47 <abrasive> chadmed: sorry if I missed something earlier, but did you notice your hw/sw results for hevc and x264 are bang on identical?

06:49 <chadmed> yeah which is why i think it's the encoders that are slow

06:51 <abrasive> gotcha :) sorry, bit slow today

06:52 <chadmed> so am i, it's sunday :-P

06:52 <chadmed> idk why i said decoders in my initial message

06:52 <chadmed> i did mean the encoders

07:12 chadmed has quit [Remote host closed the connection]

07:21 veloek_ has joined #asahi-dev

07:22 veloek has quit [Ping timeout: 480 seconds]

07:23 veloek_ has quit [Remote host closed the connection]

07:26 bisko has quit [Read error: Connection reset by peer]

07:28 bisko has joined #asahi-dev

07:29 chadmed has joined #asahi-dev

07:42 nicolas17 has quit [Ping timeout: 480 seconds]

07:59 chadmed has quit [Remote host closed the connection]

08:00 chadmed has joined #asahi-dev

08:09 chadmed has quit [Remote host closed the connection]

08:09 chadmed has joined #asahi-dev

08:18 MajorBiscuit has joined #asahi-dev

08:21 Ph03n1ks has joined #asahi-dev

08:22 Ph03n1ks has quit []

08:50 chadmed has quit [Ping timeout: 480 seconds]

08:56 bisko has quit [Read error: Connection reset by peer]

08:57 bisko has joined #asahi-dev

09:10 MajorBiscuit has quit [Ping timeout: 480 seconds]

09:37 herbas has joined #asahi-dev

10:19 MajorBiscuit has joined #asahi-dev

10:33 <marcan> I suspect rcombs might be interested in this conversation

10:34 <marcan> but also encoding HEVC/h264 is, afaik, *much* more compute-intensive than prores (especially HEVC)

10:35 <marcan> also IIRC rcombs said that the h264/hevc blocks work better parallelizing, so you need to have multiple streams going on simultaneously to get the full throughput, but that might be mostly decode?

10:35 Guest1043 has joined #asahi-dev

10:40 <jannau> h264/h265 encoding complexity also depends massively on the encoding options

10:42 MajorBiscuit has quit [Ping timeout: 480 seconds]

10:43 <jannau> I haven't looked at the interface apple offers but increasing the number of reference frames obviously makes the encoding more complex

10:44 <jannau> doubling it it means the encoder as to check twice as many frames whether which block is good predictor for current block

10:54 bisko has quit [Read error: Connection reset by peer]

10:55 bisko has joined #asahi-dev

11:04 amarioguy has joined #asahi-dev

11:04 <amarioguy> Just a quick question, m1n1 runs with the MMU disabled right?

11:05 <sven> no

11:06 <sven> enabling the MMU is on of the first things it does

11:06 amarioguy has quit [Remote host closed the connection]

11:10 <j`ey> and disabling it is one of the last things it does!

11:51 <marcan> you can disable it and it'll still run though, if need be, but it slows down a lot and some things won't work in that state

11:52 gladiac has joined #asahi-dev

11:52 kameks has joined #asahi-dev

11:59 jluthra has quit [Remote host closed the connection]

12:00 jluthra has joined #asahi-dev

12:08 <jannau> sigh, the new reserved memory/iommu mapping devicetree bindings proposed by Thierry are a pain to handle in m1n1

12:22 bisko has quit [Read error: Connection reset by peer]

12:27 dbldthe3ed[m] has joined #asahi-dev

12:27 bisko has joined #asahi-dev

12:37 herbas has quit [Quit: herbas]

12:41 the_lanetly_052 has quit [Ping timeout: 480 seconds]

12:53 veloek has joined #asahi-dev

12:54 <sven> alright, first draft for nvme v2: https://github.com/AsahiLinux/linux/commits/nvme-v2

12:54 <sven> note that there are some rtkit api changes

12:57 <j`ey> is it 3 power domains for all t6000, or just the 4T models?

12:57 <sven> hrm, good point. i assumed it was for all of them

12:57 <sven> do we even have different device trees for the 4T models?

12:58 <j`ey> probably not actually. was just a passing thought

12:59 <j`ey> oh nice if/then/else in yaml

13:00 <jannau> no, we have no way to decide that on the devicetree level. if anything m1n1 should remove the unneeded power domain

13:01 <sven> requiring three inside the binding sounds correct to me then

13:04 <marcan> it's 3 power domains for all of them, it's just that one power domain can be turned off on <4T but I don't think apple does at the OS level and neither should we

13:05 <jannau> sven: buffer->iova is not set in the dma_alloc_coherent() case in apple_rtkit_common_rx_get_buffer

13:06 <sven> oh, true

13:08 <sven> let me just get rid of those local variables

13:10 <jannau> not sure if it is wise to leave buffer partially initialized on errors. we check bfr->size in apple_rtkit_free_buffer()

13:12 <sven> yeah, probably a good idea to clean it up

13:32 MajorBiscuit has joined #asahi-dev

14:19 bisko has quit [Read error: Connection reset by peer]

14:21 bisko has joined #asahi-dev

14:45 gladiac is now known as Guest1106

14:45 gladiac has joined #asahi-dev

14:51 Guest1106 has quit [Ping timeout: 480 seconds]

14:53 themojoman_ has joined #asahi-dev

14:53 <kevans91> anyone know off-hand if u-boot can be convinced to run a UEFI m1n1-attached payload? I've already repurposed 'initramfs' for passing our kernel

14:54 <j`ey> You want u-boot to run m1n1?

14:55 <kevans91> no, I want m1n1 to execute u-boot and pass it a UEFI payload to execute as the next stage

14:57 <kevans91> i'm hacking our loader to take an "initramfs", chop off the cpio magic then execute that, but there's problems there and I still have to pass a flash drive back and forth to hack on our loader

14:57 <kevans91> so it'd be quite neat if I could further eliminate that step and pass both loader (uefi payload) and kernel (elf) along

15:01 <j`ey> you could look at src/payload.c, a while ago I hacked in support locally for u-boot fit images, you might be able to do similar. and maybe pass the address in the device tree somehow

15:02 <j`ey> what is 'our loader' in this case?

15:03 <sven> there’s some hack in linux.py to chainload uboot and a kernel together

15:03 <sven> it’s not nice and maybe doesn’t even work anymore at this point

15:03 themojoman_ has quit [Quit: Page closed]

15:04 <kevans91> freebsd's loader

15:05 <j`ey> ohh yeah, I forgot about that in linux.py, since I just ended up hardcoding BOOTCOMMAND

15:06 <kevans91> ooh, that's interesting

15:07 <kevans91> i might be able to hack this into submission

15:07 kameks has quit [Ping timeout: 480 seconds]

15:32 <kevans91> sweet, with a proper freebsd.py I should able t stop preprocessing our kernel to fake cpio

15:58 c10l53 has quit []

15:58 <marcan> yeah, for development feel free to put together a fbsd-specific loader script and submit that for upstreaming; I'm happy to carry whatever as long as it works for you

15:58 <marcan> I expect different OSes to have different loader requirements for development

15:59 <marcan> for production of course I'd rather everyone use UEFI but it sounds like you're already doing that

15:59 c10l53 has joined #asahi-dev

16:00 <marcan> we actually need to make sure that u-boot+linux thing still works / is a bit saner in the near future as we start depending on u-boot to provide PSCI over EFI services

16:02 <kevans91> nice, thanks for considering it upstreamable. :-) I'm trying to do minimal hacking to m1n1/u-boot to minimize the patches I need to send the other developer working on freebsd support (or anyone else I can get to engage)

16:02 <kevans91> final setup will use standard u-boot efi, mostly just hacking together something for quicker development iteration

16:04 nicolas17 has joined #asahi-dev

16:14 <marcan> basically all the python stuff is for development and we're pretty loose with what goes in, standalone scripts to achieve something useful for someone are fine, that's kind of the whole point

16:17 bisko has quit [Read error: Connection reset by peer]

16:21 bisko has joined #asahi-dev

16:40 doggkruse has joined #asahi-dev

16:43 RenatoMarinho[m] has joined #asahi-dev

17:26 bisko has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

17:26 <kevans91> ah, that's unfortunate. u-boot really wants you to `load` it before you can `bootefi` it, so I might have to switch loaders just for testing

17:26 <kevans91> or i guess custom u-boot isn't the end of the world

17:27 bisko has joined #asahi-dev

17:27 <tpw_rules> if that's the case can you just put your image in a well-known place in memory

17:29 <kevans91> the alternative seems to be FIT + bootm, which looks like it will recognize the preloaded FIT image and do the right thing

17:30 <kevans91> my image is currently in a well-known place, but there's no way to point u-boot at said place and fake loaded image information, I guess

17:31 <tpw_rules> i didn't think `load` did anything except copy some bits off disk to memory

17:31 <kevans91> it also sets the efi bootdev, which is the side effect that I need to make this work

17:31 <tpw_rules> i think bootefi does that

17:34 <kevans91> nope, apparently not. :-( it checks the address you feed it against a prior efi_set_bootdev()

17:35 <tpw_rules> weird

17:50 MajorBiscuit has quit [Ping timeout: 480 seconds]

18:14 bisko has quit [Read error: Connection reset by peer]

18:17 bisko has joined #asahi-dev

18:21 veloek has quit [Quit: leaving]

18:25 <kevans91> it's interesting because there's hints that it used to work. efi_run_image() will explicitly use a memory device path if one's not set, under the assumption that it, e.g., may be loaded directly into memory via JTAG

18:25 <kevans91> maybe they'll accept a patch to let it work again

18:48 <kevans91> woot, it's alive. the only concerning part of linux.py's uboot stuff might be the use of dtb_addr

18:49 <kevans91> the prepared dt's not at that address, not sure it's necessarily intended to be passing on the unadulterated input fdt

18:50 <kevans91> maybe all of the important stuff is already covered by bootargs

18:55 <sven> hrm, no, it should get the modified fdt

18:56 <sven> i guess back when I wrote that hack we either modified the fdt in place or there just weren’t any important modifications yet

18:57 <kevans91> it kind of looks like you can just omit the fdt address from booti and it'll use the modified one, anyways

18:57 <j`ey> or use $fdtcontroladdr

18:58 <sven> ah, nice

19:01 <kevans91> on the plus side, my freebsd abomination works with a somewhat small u-boot change

19:02 <kevans91> the downside is we don't get any of our loader scripts, but having to manually 'load' then 'boot' is still a huge improvement in workflow

19:29 bisko has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

19:53 bisko has joined #asahi-dev

20:11 bisko has quit [Read error: Connection reset by peer]

20:13 bisko has joined #asahi-dev

21:15 bisko has quit [Quit: Textual IRC Client: www.textualapp.com]

22:05 yrlf has joined #asahi-dev

22:10 aw213mf3f8[m] has joined #asahi-dev

22:52 c10l53 has quit []

22:54 c10l53 has joined #asahi-dev