ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
alyssa has joined #panfrost
<alyssa> AFBC passing CI up to some UnexpectedPass's since maintaining the piglit fails list is really hard at our current flake rate :)
<alyssa> I cannot understand why T860 is so much faster than G52 at -brefract ..
<alyssa> and yes cphealy I still have your -bbuffer bisect to look into, tomorrow though, sleep awaits :0
<alyssa> for -brefract on g52 at least, 97% gpu utilization
<alyssa> slightly more spent doing vertex processing, curiously
<alyssa> I guess it's a big bunny but..
<alyssa> nearly 15% of the gpu time is tiling
<alyssa> 81% of the EE cycles are spent starving.. hm
<cphealy> alyssa: yay, thanks!
<alyssa> cphealy: any idea why I can't reproduce the slow?
<alyssa> alyssa@scootaloo:~$ LIBGL_DRIVERS_PATH=~/lib/dri DISPLAY=:0 glmark2-es2 -bbuffer:columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata
<alyssa> glmark2 Score: 43
<alyssa> is this the right command?
<cphealy> I'm using cortex-a55 while you might be using a faster CPU?
<alyssa> yeah that'd do it
<cphealy> What core are you using again?
<alyssa> A53
<alyssa> wait is that right
<cphealy> That's older/slower than a55
<alyssa> is lscpu right?
<cphealy> A55 is ARMv8.2-A with in-order execution.
<cphealy> cat /proc/cpuinfo maybe?
<cphealy> If there's other things beyond bisecting that I can do to help identify what is going on, let me know.
<alyssa> cphealy: 4x A73 and 2x A53
<alyssa> so older but faster
<cphealy> 4x A73, oh yea, definitely faster!
<cphealy> Maybe try turning the 4x A73 off and you'll have an easier time reproducing the lower framerate?
<cphealy> You might be able to set the A73 cores to "offline"?
<alyssa> Ah yep that did it :-p
<cphealy> You turned the A73 off and can now reproduce the performance hit with bbuffer?
<alyssa> well it's slow now :p
<cphealy> ;-)
stepri01 has quit [Read error: Connection reset by peer]
stepri01 has joined #panfrost
<alyssa> cphealy: We're hitting a pathological case of shadow resource creation
<alyssa> If I disable the shadowing optimization, perf on that subtest goes from 21fps -> 46fps
<alyssa> I have not yet analyzed why, and why that commit is the bad case
<alyssa> But wouldn't have checked that without the bisect so thank you :]
<cphealy> With Mesa 21.1.2 I was getting 51fps on that test (with my Cortex-A55s)
<alyssa> this is with my A53
<cphealy> OK, that may make sense then.
<alyssa> cphealy: oh, and I suppose we're spending a lot of time in kernel on this bad path, which is why my profiles didn't make much sense.
<alyssa> though usually sysprof catches that ....
<cphealy> Would the number of cores make any difference between your number and mine or is it all single threaded?
<cphealy> CPU cores that is
<HdkR> Oh fun, optimizing for A53/A55. Painful times ahead :)
<HdkR> I'm sure there is an A35 board that would want those optimizations as well :D
<cphealy> cough, cough, macc24
<alyssa> cphealy: all single threaded ttbomk
vstehle has quit [Ping timeout: 480 seconds]
vstehle has joined #panfrost
f11f13 has quit [Remote host closed the connection]
f11f13 has joined #panfrost
rasterman has joined #panfrost
macc24 has quit []
macc24 has joined #panfrost
<macc24> alyssa: "4x A73 and 2x A53" this core configuration... reminds me of a chromebook
warpme___ has quit []
atler has joined #panfrost
JulianGro has joined #panfrost
f11f13 has quit [Remote host closed the connection]
f11f13 has joined #panfrost
gouchi has joined #panfrost
<alyssa> untested but that takes the evil test from 43fps to 74fps
<alyssa> disabling resource shadowing altogether takes it to 76fps but that's not a good idea in general (it's usually a win, though might need a heuristic to tune)
JulianGro has quit [Remote host closed the connection]
<cphealy> alyssa: I just re-ran glmark2-es2 with !13502. It significantly improves performance with all three buffer benchmarks:
<cphealy> [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map563489
<cphealy> [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map6036105
<cphealy> [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata58357
<cphealy> The three values are Mali DDK, Mesa latest git, Mesa latest git with !13502
<macc24> o/ cphealy
<cphealy> So, with your change, we've undone the regression with the buffer tests.
<macc24> can you test this on a A35 device?
<cphealy> Now for the bad news. The overall glmark2-es2 score dropped by ~8% due to other benchmarks getting slower.
<cphealy> So, this change by itself results in lower overall performance but does show the possibility of being able to increase the buffer test performance significantly.
<alyssa> cphealy: that is..... bizarre.
<alyssa> that change should not drop the results of any test
<alyssa> smells like there's another bug that was being masked
<alyssa> could you send the full set of scores?
<cphealy> alyssa: yep, will do in a few.
<alyssa> thanks
<robclark> you might want to pin cpu and gpu freq when you compare.. just to make sure what you are measuring is actually mesa perf and not cpufreq/gpufreq
<cphealy> robclark: Good idea. For all future glmark benchmark runs I'll set CPU and GPU governors to performance.
<robclark> if there is possibility of thermal throttling (either CPU or if there is a cooling device setup for GPU), pinning at some mid-range freq might be a good idea.. you may just need to play with it a bit to make sure you are getting consistent/sensible #s
<cphealy> ack
macc24 has quit [Ping timeout: 480 seconds]
macc24 has joined #panfrost
JulianGro has joined #panfrost
* alyssa looks into IDVS
<alyssa> ...again
macc24_ has joined #panfrost
macc24 has quit [Ping timeout: 480 seconds]
gouchi has quit [Remote host closed the connection]
<alyssa> oddly, I'm seeing IDVS hurt perf on e.g. glmark2 -bshading
<alyssa> from 744fps to 697fps in this experiment
<alyssa> not a completely controlled one but still. I was expecting the other direction of change. :V
<alyssa> glmark2 -bshading splits up nicely, so..
<alyssa> er
<alyssa> botched test, whoops
<alyssa> or not?
<alyssa> 745fps, yeah..
<alyssa> or 708fps, fluctuating wildly.
<alyssa> pinning gpufreq uh
<alyssa> nope still fluctuating like crazy. groan
<alyssa> and consistently slowing down by 10%. what the heck!
<alyssa> the bigger tests are more predictable
<alyssa> glmark2 -brefract down 63fps->62fps predictably. mumble.
<alyssa> Hopefully it's just something silly.
<alyssa> Okay, -bbuild:model=bunny is the ideal case and that one is indeed helped by IDVS, 192->260. I think
<alyssa> (or 198->269, something like that. Either way, a win)
<alyssa> the same test on my rk3399 laptop gets 290fps. ugh
<alyssa> effect of the job task split field is unclear