ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel driver has landed in mainline, userspace driver is part of mesa - Logs at https://oftc.irclog.whitequark.org/lima/
dllud has joined #lima
dllud_ has quit [Read error: Connection reset by peer]
Daanct12 has joined #lima
enunes has joined #lima
Daanct12 has quit [Ping timeout: 480 seconds]
dllud_ has joined #lima
dllud has quit [Read error: Connection reset by peer]
dllud has joined #lima
dllud_ has quit [Remote host closed the connection]
adjtm has quit [Quit: Leaving]
drod has joined #lima
<anarsoul>
marex: technically you still have to read whole texture, regardless of its format. I'm not sure which one will be more performant, you need do to your own benchmarking
<anarsoul>
marex: and don't forget about cache! :)
<anarsoul>
utgard processes fragments in 2x2 groups, so cache hit rate should be good for planar yuv (and for packed yuyv)
<anarsoul>
marex: I think you can compare fs shader length for both cases, the shortest will likely be faster
<marex>
anarsoul: well for packed YUV, it would be literally load/mul
<marex>
for planar YUV, it would be three loads, some swizzling, and then mul
<anarsoul>
marex: and probably some coords math
<marex>
so I think planar yuv is not good
<anarsoul>
marex: it's not so obvious to me. Basically i420 has only 1 u/v value per 4 y, so it'll use less memory bandwidth. So it'll depend on the shader
<anarsoul>
marex: keep in mind that utgard pp is VLIW architecture, so it does a lot of operations per instruction
<anarsoul>
with I420 I guess the shortest it could get is 3 instructions, since it needs 3 samplers
<anarsoul>
with YUYV it's only 1 sampler, but you'll likely need a conditional to get correct Y for your pixel
<anarsoul>
and also some coords math
<marex>
anarsoul: but the hardware has two samplers, doesn't it ?
<anarsoul>
enunes: while you here, can you re-test !16136 with ppir lowering in place, but just drop special hanlding of ppir_op_ddy? i.e. handle it as ppir_op_ddx
<anarsoul>
I still think it's incorrect to completely drop it, since op_ddx and op_ddy apparently need both arguments