<nicolas17>
chadmed: sometimes hwaccel is slow because it needs to copy back to RAM, but that shouldn't be the case here, especially on M1
<chadmed>
yeah its behaving weirdly
<chadmed>
its about a quarter of the speed of the cpu lmoa
<nicolas17>
are you encoding to prores?
<chadmed>
im decoding an h264 stream and encoding that into prores yeah
<chadmed>
with the cpu doing the h264 decoding it was about 500fps
<tpw_rules>
ah, it uses golomb coding
<chadmed>
using the h264 hardware decoder its about 160fps
* tpw_rules
has been doing some work on lossless video compression
<nicolas17>
chadmed: try "-f null -" for the output so you're *only* testing decoding
<chadmed>
yeah same speed
<chadmed>
the prores decoder operates much much faster though
Ph03n1ks has quit [Remote host closed the connection]
<chadmed>
so maybe the h264 and hevc blocks are just slow?
Ph03n1ks has joined #asahi-dev
<tpw_rules>
i mean 160fps sounds like it's fast enough
<chadmed>
yeah of course it doesnt need to be any faster
<chadmed>
it was just curious that the cpu was faster
<chadmed>
though that is 160fps for a pretty low bitrate 1080p video stream
<chadmed>
copying over something a bit more demanding now
<nicolas17>
is cpu decoding using multiple threads?
<nicolas17>
maybe the h264 hardware is significantly more power-efficient, but if you're doing a transcode you'd likely want best speed so that's good to know...
<chadmed>
it almost certainly is more power efficient, i just find it odd that its many times slower in absolute terms too
<chadmed>
especially given the use cases for these machines
<nicolas17>
also maybe if you want all your CPU cores to be dedicated to encoding, pushing the decoder to hardware is also a speed win? idk
<chadmed>
it makes sense to throttle these sorts of things in a mobile soc because youre only ever going to be watching a single stream of something on those devices
<chadmed>
but apple heavily markets these things to "creators" who might want fast h264 encode for stuff like youtube
<nicolas17>
well you weren't testing h264 *encode*, right?
<chadmed>
i did test it
<chadmed>
its the same speed
Ph03n1ks has quit [Ping timeout: 480 seconds]
<chadmed>
im copying over some more stressful files to poke around with
<chadmed>
50fps on a 4K HDR10 stream
<chadmed>
speed doesnt seem to change when forcing lower or higher bitrates too
<nicolas17>
any signs of a hard cap? (like decode never going over 160fps at all)
<nicolas17>
like it's throttling rather than being inherently slow
<chadmed>
yeah it sort of ramps up to that speed and then stays there
<chadmed>
encoding to null does not make it faster either
<nicolas17>
what's ffmpeg CPU usage like?
<nicolas17>
if it's >= 1 core, it may be bottlenecking doing something dumb on the CPU...
<chadmed>
nah its behaving threaded
<chadmed>
~60% on two firestorm cores
<chadmed>
it would make sense if the hardware blocks are just throttled imo
<chadmed>
like i said its something youd definitely do on a mobile device and if they just plonked the blocks down into these chips with no changes...
<chadmed>
the prores block, which is brand new, is much much much faster
<nicolas17>
mhm
<chadmed>
decoding the HEVC stream, encoding in prores to null is about 300fps
Ph03n1ks has joined #asahi-dev
<chadmed>
the cynic in me says apple is pushing people into using prores
<chadmed>
the bloomer in me wants to say they just "forgot" to take off the throttle in the firmware or something :P
<nicolas17>
speaking of mobile devices, I wonder if there's anyone interested / crazy enough to port asahi to A10 or something :p
<tpw_rules>
how fast is h264 encoding
<chadmed>
never goes above 53fps on that 4K HDR10 stream
<tpw_rules>
so it's basically symmetric. interesting
<tpw_rules>
in fairness, for a given level of effort/silicon/whatever, prores will be far easier to encode
<tpw_rules>
it looks to be just mjpeg with extra metadata and a different entropy coding stage
<chadmed>
yeah
<chadmed>
the hevc and h264 blocks are definitely behaving like theyre limited in some way
<chadmed>
they are capable of decoding fast enough, since encoding into prores is fast
<chadmed>
but then why was software decoding of that same stream like 500fps?
<chadmed>
questions to answer after lunch maybe
<nicolas17>
it could also be ffmpeg's fault somehow?
<chadmed>
yeah its possible
<chadmed>
would kind of ruin my plans to buy an m2 mac mini and use it as a server if that's the case :P
<chadmed>
it could just be a macos thing too, once we have access to these blocks from linux we might be able to shed some light on the situation
<chadmed>
i really just wanted to confirm my theory about the pixel format and got super carried away (spectrum moment)
<nicolas17>
>.>
riceballnice[m] has joined #asahi-dev
Ph03n1ks has quit [Ping timeout: 480 seconds]
Ph03n1ks has joined #asahi-dev
PhilippvK has joined #asahi-dev
phiologe has quit [Ping timeout: 480 seconds]
kov has quit [Quit: Coyote finally caught me]
kov has joined #asahi-dev
Ph03n1ks has quit [Remote host closed the connection]
bisko has quit [Read error: Connection reset by peer]
Ph03n1ks has joined #asahi-dev
doggkruse has joined #asahi-dev
bisko has joined #asahi-dev
Ph03n1ks has quit [Ping timeout: 480 seconds]
Ph03n1ks has joined #asahi-dev
doggkruse has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Ph03n1ks has quit [Ping timeout: 480 seconds]
Ph03n1ks has joined #asahi-dev
Ph03n1ks has quit [Remote host closed the connection]
<rqou_>
this is with ffmpeg git as of this morning-ish (US pacific time)
<chadmed>
huh odd, brga8888, brga10101010 and brga12121212 all worked for me on the prores block
<rqou_>
¯\_(ツ)_/¯ it's probably a ffmpeg problem
<Jamie[m]1>
fyi there are multiple decoders inside the AVD for HEVC and H.264 (but only one for VP9), maybe the way to get 60fps is to somehow feed multiple blocks at a time, and ffmpeg isn't successfully doing that?
<rqou_>
i got it to encode one frame at least one particular way, which is enough to get started on the RE effort
<rqou_>
eventually i might want to figure out how to work the lowest-level macos interface though
<rqou_>
but the lowest-level iokit interface is undocumented afaict?
<chadmed>
Jamie[m]1: ffmpeg just passes the data through VideoToolbox so if its not fully utilising the hardware that would be an apple thing, right?
<chadmed>
i might see if i have any high framerate video lying around i can feed it. since all the framerates are still above realtime theres still one theory id like to eliminate
<chadmed>
it could be possible that it limits itself to some number above realtime so that video plays back and the buffer fills up at a decent speed, but no unnecessary power is used
<chadmed>
but then why would the 1080p video i used decode at like 200fps
<chadmed>
i mean we could also just apply occams razor and conclude that the encoders are not that fast
<Jamie[m]1>
oh sorry I misread that we were talking about decoders
<Jamie[m]1>
yeah that makes way more sense
<Jamie[m]1>
sweet
<chadmed>
yeah the decoders are fast
<chadmed>
the gist i posted above has the figures, but i can transcode an HDR10 4K HEVC stream to ProRes at 300+ fps
<chadmed>
using hardware decode
<chadmed>
but if i then try to encode that to H.264 or HEVC the framerate tanks
<chadmed>
so its just the encoders that seem limited
bisko has quit [Read error: Connection reset by peer]
doggkruse has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Ph03n1ks has quit [Remote host closed the connection]
doggkruse has joined #asahi-dev
the_lanetly_052 has joined #asahi-dev
veloek has joined #asahi-dev
doggkruse has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<abrasive>
seems logical, doesn't it? encoding is Hard, lotta decisions to make
<abrasive>
chadmed: sorry if I missed something earlier, but did you notice your hw/sw results for hevc and x264 are bang on identical?
<chadmed>
yeah which is why i think it's the encoders that are slow
<abrasive>
gotcha :) sorry, bit slow today
<chadmed>
so am i, it's sunday :-P
<chadmed>
idk why i said decoders in my initial message
<chadmed>
i did mean the encoders
chadmed has quit [Remote host closed the connection]
veloek_ has joined #asahi-dev
veloek has quit [Ping timeout: 480 seconds]
veloek_ has quit [Remote host closed the connection]
bisko has quit [Read error: Connection reset by peer]
bisko has joined #asahi-dev
chadmed has joined #asahi-dev
nicolas17 has quit [Ping timeout: 480 seconds]
chadmed has quit [Remote host closed the connection]
chadmed has joined #asahi-dev
chadmed has quit [Remote host closed the connection]
chadmed has joined #asahi-dev
MajorBiscuit has joined #asahi-dev
Ph03n1ks has joined #asahi-dev
Ph03n1ks has quit []
chadmed has quit [Ping timeout: 480 seconds]
bisko has quit [Read error: Connection reset by peer]
bisko has joined #asahi-dev
MajorBiscuit has quit [Ping timeout: 480 seconds]
herbas has joined #asahi-dev
MajorBiscuit has joined #asahi-dev
<marcan>
I suspect rcombs might be interested in this conversation
<marcan>
but also encoding HEVC/h264 is, afaik, *much* more compute-intensive than prores (especially HEVC)
<marcan>
also IIRC rcombs said that the h264/hevc blocks work better parallelizing, so you need to have multiple streams going on simultaneously to get the full throughput, but that might be mostly decode?
Guest1043 has joined #asahi-dev
<jannau>
h264/h265 encoding complexity also depends massively on the encoding options
MajorBiscuit has quit [Ping timeout: 480 seconds]
<jannau>
I haven't looked at the interface apple offers but increasing the number of reference frames obviously makes the encoding more complex
<jannau>
doubling it it means the encoder as to check twice as many frames whether which block is good predictor for current block
bisko has quit [Read error: Connection reset by peer]
bisko has joined #asahi-dev
amarioguy has joined #asahi-dev
<amarioguy>
Just a quick question, m1n1 runs with the MMU disabled right?
<sven>
no
<sven>
enabling the MMU is on of the first things it does
amarioguy has quit [Remote host closed the connection]
<j`ey>
and disabling it is one of the last things it does!
<marcan>
you can disable it and it'll still run though, if need be, but it slows down a lot and some things won't work in that state
gladiac has joined #asahi-dev
kameks has joined #asahi-dev
jluthra has quit [Remote host closed the connection]
jluthra has joined #asahi-dev
<jannau>
sigh, the new reserved memory/iommu mapping devicetree bindings proposed by Thierry are a pain to handle in m1n1
bisko has quit [Read error: Connection reset by peer]
dbldthe3ed[m] has joined #asahi-dev
bisko has joined #asahi-dev
herbas has quit [Quit: herbas]
the_lanetly_052 has quit [Ping timeout: 480 seconds]
<j`ey>
is it 3 power domains for all t6000, or just the 4T models?
<sven>
hrm, good point. i assumed it was for all of them
<sven>
do we even have different device trees for the 4T models?
<j`ey>
probably not actually. was just a passing thought
<j`ey>
oh nice if/then/else in yaml
<jannau>
no, we have no way to decide that on the devicetree level. if anything m1n1 should remove the unneeded power domain
<sven>
requiring three inside the binding sounds correct to me then
<marcan>
it's 3 power domains for all of them, it's just that one power domain can be turned off on <4T but I don't think apple does at the OS level and neither should we
<jannau>
sven: buffer->iova is not set in the dma_alloc_coherent() case in apple_rtkit_common_rx_get_buffer
<sven>
oh, true
<sven>
let me just get rid of those local variables
<jannau>
not sure if it is wise to leave buffer partially initialized on errors. we check bfr->size in apple_rtkit_free_buffer()
<sven>
yeah, probably a good idea to clean it up
MajorBiscuit has joined #asahi-dev
bisko has quit [Read error: Connection reset by peer]
bisko has joined #asahi-dev
gladiac is now known as Guest1106
gladiac has joined #asahi-dev
Guest1106 has quit [Ping timeout: 480 seconds]
themojoman_ has joined #asahi-dev
<kevans91>
anyone know off-hand if u-boot can be convinced to run a UEFI m1n1-attached payload? I've already repurposed 'initramfs' for passing our kernel
<j`ey>
You want u-boot to run m1n1?
<kevans91>
no, I want m1n1 to execute u-boot and pass it a UEFI payload to execute as the next stage
<kevans91>
i'm hacking our loader to take an "initramfs", chop off the cpio magic then execute that, but there's problems there and I still have to pass a flash drive back and forth to hack on our loader
<kevans91>
so it'd be quite neat if I could further eliminate that step and pass both loader (uefi payload) and kernel (elf) along
<j`ey>
you could look at src/payload.c, a while ago I hacked in support locally for u-boot fit images, you might be able to do similar. and maybe pass the address in the device tree somehow
<j`ey>
what is 'our loader' in this case?
<sven>
there’s some hack in linux.py to chainload uboot and a kernel together
<sven>
it’s not nice and maybe doesn’t even work anymore at this point
themojoman_ has quit [Quit: Page closed]
<kevans91>
freebsd's loader
<j`ey>
ohh yeah, I forgot about that in linux.py, since I just ended up hardcoding BOOTCOMMAND
<kevans91>
ooh, that's interesting
<kevans91>
i might be able to hack this into submission
kameks has quit [Ping timeout: 480 seconds]
<kevans91>
sweet, with a proper freebsd.py I should able t stop preprocessing our kernel to fake cpio
c10l53 has quit []
<marcan>
yeah, for development feel free to put together a fbsd-specific loader script and submit that for upstreaming; I'm happy to carry whatever as long as it works for you
<marcan>
I expect different OSes to have different loader requirements for development
<marcan>
for production of course I'd rather everyone use UEFI but it sounds like you're already doing that
c10l53 has joined #asahi-dev
<marcan>
we actually need to make sure that u-boot+linux thing still works / is a bit saner in the near future as we start depending on u-boot to provide PSCI over EFI services
<kevans91>
nice, thanks for considering it upstreamable. :-) I'm trying to do minimal hacking to m1n1/u-boot to minimize the patches I need to send the other developer working on freebsd support (or anyone else I can get to engage)
<kevans91>
final setup will use standard u-boot efi, mostly just hacking together something for quicker development iteration
nicolas17 has joined #asahi-dev
<marcan>
basically all the python stuff is for development and we're pretty loose with what goes in, standalone scripts to achieve something useful for someone are fine, that's kind of the whole point
bisko has quit [Read error: Connection reset by peer]
bisko has joined #asahi-dev
doggkruse has joined #asahi-dev
RenatoMarinho[m] has joined #asahi-dev
bisko has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<kevans91>
ah, that's unfortunate. u-boot really wants you to `load` it before you can `bootefi` it, so I might have to switch loaders just for testing
<kevans91>
or i guess custom u-boot isn't the end of the world
bisko has joined #asahi-dev
<tpw_rules>
if that's the case can you just put your image in a well-known place in memory
<kevans91>
the alternative seems to be FIT + bootm, which looks like it will recognize the preloaded FIT image and do the right thing
<kevans91>
my image is currently in a well-known place, but there's no way to point u-boot at said place and fake loaded image information, I guess
<tpw_rules>
i didn't think `load` did anything except copy some bits off disk to memory
<kevans91>
it also sets the efi bootdev, which is the side effect that I need to make this work
<tpw_rules>
i think bootefi does that
<kevans91>
nope, apparently not. :-( it checks the address you feed it against a prior efi_set_bootdev()
<tpw_rules>
weird
MajorBiscuit has quit [Ping timeout: 480 seconds]
bisko has quit [Read error: Connection reset by peer]
bisko has joined #asahi-dev
veloek has quit [Quit: leaving]
<kevans91>
it's interesting because there's hints that it used to work. efi_run_image() will explicitly use a memory device path if one's not set, under the assumption that it, e.g., may be loaded directly into memory via JTAG
<kevans91>
maybe they'll accept a patch to let it work again
<kevans91>
woot, it's alive. the only concerning part of linux.py's uboot stuff might be the use of dtb_addr
<kevans91>
the prepared dt's not at that address, not sure it's necessarily intended to be passing on the unadulterated input fdt
<kevans91>
maybe all of the important stuff is already covered by bootargs
<sven>
hrm, no, it should get the modified fdt
<sven>
i guess back when I wrote that hack we either modified the fdt in place or there just weren’t any important modifications yet
<kevans91>
it kind of looks like you can just omit the fdt address from booti and it'll use the modified one, anyways
<j`ey>
or use $fdtcontroladdr
<sven>
ah, nice
<kevans91>
on the plus side, my freebsd abomination works with a somewhat small u-boot change
<kevans91>
the downside is we don't get any of our loader scripts, but having to manually 'load' then 'boot' is still a huge improvement in workflow
bisko has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
bisko has joined #asahi-dev
bisko has quit [Read error: Connection reset by peer]