ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel driver has landed in mainline, userspace driver is part of mesa - Logs at https://oftc.irclog.whitequark.org/lima/
<anarsoul[m]>
Nope. We do not transform the program in regalloc (besides spilling registers)
Daanct12 has joined #lima
<enunes>
anarsoul: in addition to these, I had a simple copy propagation pass which removed a few more movs too, it could also include modifier folding while doing that, I had it in a MR a few months ago but dropped it from there to unblock and never actually pushed another MR for it, can push it again
<anarsoul[m]>
sure, sounds good
<enunes>
anarsoul: for 3) I think the challenge was blocks with discard
<anarsoul[m]>
well, it produces an extra mov whenever store_output source is not an ssa
<anarsoul[m]>
btw, I also implemented using combiner for vector multiplication if one of sources is scalar, it improves instruction count for some shaders, but regresses others due to increased register pressure
<anarsoul[m]>
shader-db still says that instructions are helped
<anarsoul[m]>
I ran glmark2-es on 24.3.4 and git main to compare the difference, but it looks like glmark shaders didn't get much improvement. It's still noticeable though
<anarsoul[m]>
we probably should start using fp16 for varyings. It would halve memory bandwidth requirements for geometry-heavy workloads and will likely have noticeable performance improvement
<anarsoul[m]>
tried it, it's a really small change but unfortunately no measurable performance difference in glmark2
Daanct12 has quit [Quit: WeeChat 4.5.1]
chewitt has quit [Quit: Zzz..]
<anarsoul>
enunes: so for 3) discard is indeed the challenge, and we cannot get rid of mov here, however we also create a mov for ppir_op_dummy which is usually register load
<anarsoul>
this one can be optimized if "end" block has the only instruction which is this mov