sukuna has quit [Remote host closed the connection]
Duke`` has joined #dri-devel
LeviYun has joined #dri-devel
lefteyebob has joined #dri-devel
<lefteyebob>
<gogolittle> Ermine: you think you some geniuses? hahaa i said earlier the encoder has 64 as bit 1 65 as bit two so looks as that hard or what? so 194-64 is 130 196-65 is 131, so decoder has that shifted towards right side 66+67 power 2 and power 3 which corresponds to 2 +4 133 corresponds to 6 after decoding. Ermine you are such a protector amazing monkey. intel-gfx, nouveau, dri-devel
<lefteyebob>
are under his <gogolittle> protectorate and he can kill me without any money by offering anal to russian hitman, such a minix mini driver microkernel hero :).
<lefteyebob>
<gogolittle> that should work out however i have not yet done any corpuses or dictionaries , which happens later this year, i just presented the theory
<lefteyebob>
<gogolittle> it's that way how they do it in hw too, dft and fft is used for something other, signal domaining in time and shape or something
<lefteyebob>
<gogolittle> i have not programmed for ages as of now, but that is the way it should work, hardware encoders and decoders work also in similar way, it's just that verilog i programmed a bit in a very long time ago , i think last time 3 years ago in cambodia
<lefteyebob>
<gogolittle> i got all the info i needed but sure i feel a bit hungover after the battles of my life which happened there
<lefteyebob>
<gogolittle> there were some psychologists persistantly blabbering how ill scizhophrenic i am to defend my life when other ghosts wanted to take that, in my opinion they can go and fuck themselves, such a dorks as possible and failures, maybe some idiotic woman gave them some , but i defended my life in fact successfully and their psycho shitflow no longer bothers me, sure it hurts for
<lefteyebob>
those to go to jail and psychologist terrorist to lose their jobs, when you are a dickhead it needs to hurt.
LeviYun has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
coldfeet has joined #dri-devel
lefteyebob has quit [Remote host closed the connection]
Duke`` has joined #dri-devel
LeviYun has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
sima has joined #dri-devel
warpme has joined #dri-devel
LeviYun has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
u-amarsh04 has joined #dri-devel
LeviYun has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
coldfeet has quit [Remote host closed the connection]
u-amarsh04 has quit [Remote host closed the connection]
kts has joined #dri-devel
LeviYun has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
<Lynne>
what were the latest performance/power figures on discrete/mobile again?
alane_ has joined #dri-devel
alane has quit [Ping timeout: 480 seconds]
alane_ has quit [Ping timeout: 480 seconds]
alane has joined #dri-devel
davispuh has quit [Ping timeout: 480 seconds]
<haasn>
Lynne: on discrete with uber gpu it was quite significant, like 50% iirc
<haasn>
On mobile basically the same
<haasn>
As cpu
<haasn>
The shaders are hard to optimize
<haasn>
I wanted to take another stab at it
<haasn>
But dealing with u8vec4 etc was nontrivial for reasons I don’t remember
<haasn>
And you have extra losses from memcpy due to the gpu’s complete inability to schedule threads
<Lynne>
aren't mobile GPUs scalar, so vectors would make no difference?
<Lynne>
IIRC modern GPUs abandoned dedicated vectors a-la-SIMD 10 years or so ago
<karolherbst>
yeah, you don't want to vectorize inside your GPU code
<karolherbst>
if you do: reconsidering not doing that is your only option really. There are some benefits of aligning data that single threads can do vector load/stores, but that's pretty much the only benefit you can get with vectorizing (well.. except there are GPU ISAs with vec2 fp16 ops)
alane has quit [Ping timeout: 480 seconds]
<karolherbst>
Lynne: anyway, thanks for sharing your experience. I got a lot of people saying what "in theory" is the situation, so.. kinda glad that some people actually worked on it and know what's up
<karolherbst>
Lynne: but yeah.. I'm more concerned about AV1 atm, because a lot of GPUs don't have it, and "you need to buy a GPU from 3 years ago" isn't really a nice answer either
<Lynne>
dav1d is extremely optimized, if that's any cosolation
<airlied>
Company: is your desktop running on the gpu?
<karolherbst>
Lynne: right.. I just wonder if it's optimized enough that your laptop batter gets through meetings :D
<karolherbst>
but yeah...
<karolherbst>
if it's quick enough then that's totally fine
<karolherbst>
well not quick
<karolherbst>
but like optimized
<karolherbst>
or rather efficient
<karolherbst>
I'm not caring about an "fps" metric here, I'm caring about "how long does a battery last while watching AV1 videos/streams" metrics
<airlied>
haasn: what mobile were you targetting?
<airlied>
would be interesting to see how it goes on intel hw pre-av1
<Lynne>
karolherbst: meetings -> webrtc, which means there's a browser involved, which means there are lower branches you can pick off
<Lynne>
firefox for example does like 7 memcpys of video data between decoding and actually presenting the data on screen IIRC
<karolherbst>
oof
<karolherbst>
but the API situation is also kinda a mess
<Lynne>
as in decoding APIs? doesn't really matter when they download the data to feed it through their generic software codepath for color conversion and scaling
<karolherbst>
right
<karolherbst>
the entire situation sounds to me that we need some people really looking at from an end-to-end perspective to make that all efficient. In a perfect world you feed the video stream into some lib and then it displays it into a surface you choose, bonus points if you can provide a custom shader/whatever to filtering/color conversion/scaling/etc...
<karolherbst>
though I think that's mostly what ffmpeg is
<karolherbst>
but then you need a different code path for hw acceleration, because...
<karolherbst>
maybe we should just consider doing efficient fallbacks with vulkan video..
<karolherbst>
but I already hear people saying "no"
<airlied>
you'd essentially be porting chunks of dav1d to llvmpipe
<airlied>
or an llvmpipe equivalent
<Lynne>
ffmpeg lets you switch between hardware and software decoding on a per-frame basis, and it only takes tens of lines of code to enable whichever decode api you want to use
<Lynne>
but the issue is... patents
<karolherbst>
I'd rather have people try it out and say "yeah well.. it's not that much faster, but you save a bit of power" before just thinking it's a bad idea
LeviYun has joined #dri-devel
<karolherbst>
Lynne: right... that as well, but it never felt like that e.g. chromium uses ffmpeg in a way that it could hardware accelerated..
<karolherbst>
*be
<karolherbst>
and if they enable vaapi support somehow, it sometimes just doesn't work
<Lynne>
in firefox's case, they have an internal ffmpeg fork with every bit of patented code stripped out, and they only use it for vorbis IIRC
<karolherbst>
heh
<karolherbst>
no wonder why some distros just replace that with the ones they provide, so at least it would pick up stuff if it's a proper version
<Lynne>
decoder code in ffmpeg still requires actual decoder code, even if going through a hardware decoder, in order to let software decoding fallback
<karolherbst>
right..
<Lynne>
and openh264 requires browsers to ship the actual binary that cisco releases to decode without needing a patent
<karolherbst>
yeah.. but like h.264 is slowly getting a thing of the past, and CPU decoding ain't that expensive either
<Lynne>
right, I forgot about that, I have to actually remind myself that this era has came
<karolherbst>
heh
<karolherbst>
anyway, I'm more concerned abput VP8/VP9/AV1 here and that most GPUs don't really accelerate that and users just being left with a terrible user experience from time to time
<karolherbst>
like.. I know this is entirely my fault, but screencasting on my desktop is just a big no
<karolherbst>
and I _think_ the future is to let pipewire handle all the details and pick up the proper hw acceleration
<karolherbst>
or gstreamer rather
LeviYun has quit [Ping timeout: 480 seconds]
<airlied>
I think one reason doing encode totally on the GPU with shaders might win is not having to readback the desktop content from uncached vram
<airlied>
it might be less of a win on mobile though for the reason of not having pci bus
<karolherbst>
yeah... I guess there is actually value in that, as long as the application doesn't ever let the CPU see the content and just straight push it to GL/VK/whatever to display it
<Lynne>
I think pipewire handling video is a lost cause
<karolherbst>
right..
<karolherbst>
Lynne: how so?
cascardo has quit [Ping timeout: 480 seconds]
<airlied>
karolherbst: well for encode you want to read back the encoded data and send it somewhere else usually
<karolherbst>
though I guess with pipewire/gstreamer you still get the result into some CPU side buffer, no?
cascardo has joined #dri-devel
<karolherbst>
ohh right, encode
<Lynne>
capture path in pipewire for clients is a stygian nightmare, requires you to write 2000 lines of libdbus code just to know whether you can capture a cursor or not
<karolherbst>
though the encode thing most users are hitting is a reading from your webcam
<karolherbst>
(and screen sharing)
Duke`` has quit [Ping timeout: 480 seconds]
<Lynne>
modifiers for pipewire screen capture were added as an afterthought and require negotiation
<Lynne>
a dedicated capture protocol for wayland is the way to go
<karolherbst>
probably
<Lynne>
gnome won't implement it, that's a guarantee
<karolherbst>
I guess it depends on how big the benefits are
<airlied>
wouldn't that just recreate the X11 problem of any app can capture your whole display?
<karolherbst>
negotiation is kinda a pain here, but at some point you'll need to import a buffer/stream/whatever to do something with it, and in a perfect world it's a handle to a GPU buffer
<karolherbst>
well.. wayland compositors _could_ implement access control on a wayland protocol thing, but that's just a different discussion altogether, and I'd rather not get involved
<Lynne>
there are implementations of that already ^^
<karolherbst>
I think the bigger question is if that solution is suitable for user cases like flatpak or not
<karolherbst>
but anyway, I'm more concerned about where to go from "I use this webrtc based online meeting thing and it requires me to encode AV1 of my 4K desktop and sent it to others"
<karolherbst>
and how we can make it not suck once the world moves to "AV1 or nothing"
<haasn>
airlied: pixel 7
<karolherbst>
but we are already at the same situation with VP8/VP9
<karolherbst>
soo.. yeah
<karolherbst>
I literally can't use the screencast feature of my destop, because it's just unusable due to the entire situation
LeviYun has joined #dri-devel
<karolherbst>
it's my fault for having a dual 4K setup, but using h.264 doesn't suck as much, and if we do "force" the use of VP8/VP9/AV1 to users, it shouldn't be worse than just using h.264
<Lynne>
vp8 in particular should be practically free to decode anywhere
<Lynne>
its a really simple codec
<karolherbst>
yeah, but not to encode
<karolherbst>
well...
<karolherbst>
at least not to encode a dual 4K thing :D
<Lynne>
yeah, to encode, libvpx is beyond awful
<Lynne>
I think there was either a profile limit, a codec limit or a libvpx limit that forbade encoding with over 1500kbps bandwidth
<Lynne>
so you couldn't really do 4k with that little, the signalling overhead (no data) from all blocks would likely cost you most of that
<karolherbst>
though I think they might have improved the situation now...
<karolherbst>
well...
<karolherbst>
still uses 700% CPU on my i7-10850H
<karolherbst>
oh right.. it was unusable on my macbook
<karolherbst>
but the GPU is also at 100%, so it's kinda touch to record anything substantial
<Lynne>
oh, right, libvpx-vp9 had a dedicated realtime mode that might not get used
<Lynne>
IIRC I got hundreds of fps at 4k with it
<karolherbst>
ohh right, somebody mentioned that in the past..
<karolherbst>
I really should dig deeper at some point
<Lynne>
I get 75fps at 4k with -deadline realtime -cpu-used 8
<karolherbst>
try the same on aarch64
<Company>
airlied: yes
<Company>
airlied: it was just a casual "lemme check how this impacts software rendering" with MESA_VK_DEVICE_SELECT
<Company>
like I used to use LIBGL_ALWAYS_SOFTWARE
<Company>
which still works as I expect
<airlied>
Company: so if you are sw rendering something and your compositor is recompositing it'll use the gpu
<Company>
you mean mutter is using the gpu?
<Company>
that's not where the 100% gpu load is from though - according to intel_gpu_top it's all my software rendering
simon-perretta-img has quit [Read error: Connection reset by peer]
simon-perretta-img has joined #dri-devel
<Company>
i'm an idiot
<Company>
airlied: ignore me - mclasen recently added code that skips the Vulkan renderer if it's software, so if I force software rendering that code kicks in
<Company>
so asahi gets its GL renderer and doesn't use llvmpipe
LeviYun has quit [Ping timeout: 480 seconds]
feaneron has joined #dri-devel
fireburn has quit [Quit: Konversation terminated!]
LeviYun has joined #dri-devel
frankbinns2 has quit [Remote host closed the connection]
<karolherbst>
but of course there is more stuff going on
TMM has joined #dri-devel
<karolherbst>
but on aarch64 the problem is rather that I think the encoders aren't really optimized for aarch64, so it's slow
<HdkR>
Just think of an encoder that is expecting to take advantage of single-cycle vector operations. Then you move to ARM and they take three cycles and you have half the pipelines :P
LeviYun has quit [Ping timeout: 480 seconds]
<karolherbst>
heh
<karolherbst>
I should check if it got any better though
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
smpl has quit [Ping timeout: 480 seconds]
LeviYun has joined #dri-devel
sukuna has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
LeviYun has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
tristianc67048 has quit [Ping timeout: 480 seconds]