alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard + Bifrost + Valhall - Logs - I don't know anything about WSI. That's my story and I'm sticking to it.
alyssa has joined #panfrost
<alyssa> cphealy: Did you get a chance to test the shadowing fix? thank you
<alyssa> Oh, you did, didn't see the email, whoops sorry
<alyssa> 1/2 perf drop on that benchmark ... that's very unfortunate :|
<cphealy> alyssa: I'm pretty confident that the tests I ran are valid. I used the latest released glmark2-es2-wayland with TOT Mesa.
<cphealy> That I don't see any of the benefit you mentioned though gives me a little concern that I did something wrong.
<cphealy> Which specific glmark2 benchmark would you expect to improve?
<alyssa> cphealy: right
<alyssa> The new solution will use somewhat more CPU in exchange for a lot less GPU on certain workloads
<alyssa> I can easily see that being a huge win for me on RK3399 (fast CPU, slow GPU) but not so much on your board (fast GPU, slow CPU)...
<cphealy> When you tested, were you using a SoC with big ARM cores?
<alyssa> Ye
<cphealy> Any chance you can re-run on RK3399 with the big cores turned off?
<alyssa> Any hint how to do that? :)
<cphealy> Not yet, give me a few min though.. ;-)
<alyssa> thanks
<alyssa> (won't be able to for ~45 minutes, no rush)
<alpernebbi> taskset -c 0-3 or echo 0 | sudo tee /sys/devices/system/cpu/cpu{4,5}/online might work
<alyssa> alpernebbi: thanks!
hanetzer1 has joined #panfrost
<cphealy> alpernebbi: you beat me to it!
hanetzer has quit [Ping timeout: 480 seconds]
<cphealy> The cpu number can be different on different platforms so which cores are the big cores should first be determined to know what cpu numbers to disable.
<alyssa> yeah, but alpernebbi has the same machine I do :)
<cphealy> Ahh, it's probably the right answer then... ;-)
<alyssa> alpernebbi: Also, boo, I made it years without thinking "hexacore" and almost managed to forget it ;)
<cphealy> For other platforms, one can check the "cpu_capacity" sysfs for each CPU core to see which ones have the higher capacity.
paulk-bis has joined #panfrost
paulk has quit [Ping timeout: 480 seconds]
<alpernebbi> a bit sleepy, so I thought "what is hexacore, some genre of music" for a moment
<alyssa> hahaha
<alyssa> "hex core" maybe?
<alyssa> I can't remember the silly marketing back when rk3399 was new and shint
<cphealy> 2 big cores and 4 little cores
<alpernebbi> hex core sounds like some magic artifact
<cphealy> that's hexacore
<alpernebbi> cphealy: yeah latin for 6 or something
<alpernebbi> just having a sleepy moment
<cphealy> ;-)
hanetzer2 has joined #panfrost
<alpernebbi> btw I never tried the sysfs one, but I did use taskset -c 4,5 for qemu and it was enough for me
hanetzer1 has quit [Ping timeout: 480 seconds]
paulk-ter has joined #panfrost
paulk-bis has quit [Ping timeout: 480 seconds]
paulk has joined #panfrost
paulk-ter has quit [Ping timeout: 480 seconds]
paulk-bis has joined #panfrost
paulk has quit [Ping timeout: 480 seconds]
atler is now known as Guest212
atler has joined #panfrost
Guest212 has quit [Ping timeout: 480 seconds]
Daanct12 has joined #panfrost
floof58 is now known as Guest226
floof58 has joined #panfrost
Guest226 has quit [Ping timeout: 480 seconds]
floof58 has quit [Quit: floof58]
floof58 has joined #panfrost
rasterman has joined #panfrost
camus1 has joined #panfrost
camus has quit [Read error: Connection reset by peer]
bbrezillon has joined #panfrost
br_ has quit [Read error: Connection reset by peer]
bbrezillon has quit [Remote host closed the connection]
bbrezillon has joined #panfrost
br_ has joined #panfrost
alpernebbi has quit [Quit: alpernebbi]
alpernebbi has joined #panfrost
warpme____ has joined #panfrost
alpernebbi has quit [Ping timeout: 480 seconds]
alpernebbi has joined #panfrost
falk689_ has quit []
falk689 has joined #panfrost
karolherbst has quit [Ping timeout: 480 seconds]
karolherbst has joined #panfrost
<alyssa> Curious
<alyssa> with performance governors for cpu/gpu on rk3399, everything on the system is super responsive
<alyssa> so I'm wondering maybe the kernel scheduling (for both CPU and GPU) is just crap on this machine and that's why stuff is so janky most of the time
pjakobsson has quit [Remote host closed the connection]
<robmur01> A lot of responsiveness on a not-very-busy system can be down to interrupt handlers running on idle CPUs, which are thus clocked right down, further confounded by CPU0 often being the weediest little CPU yet bearing the brunt of most default affinity
<robmur01> tricky problem to solve well with software-controlled DVFS
<robmur01> try punting IRQ affinity for things that matter to the big cores, which will do a lot better even at their lowest freq
<cphealy> If only I had big cores in my SoC... ;-)
<alyssa> robmur01: nod... I guess the combination of software DVFS and software big.little is a mess
<robmur01> interrupts are basically the most pathological form of a bursty workload
<robclark> what are you comparing performance gov too? IME schedutil needs a lot of hinting from userspace about what tasks are important to move to big cores, vs what are not time-critical.. android has a lot of cgroup+uclamp stuff around that
<alyssa> robclark: whatever debian's default is
<alyssa> I suppose I should be grateful mainline+Debian works on this machine at all ;-D
<robmur01> probably doesn't help if the CPU has time to go idle and clock down while the GPU is busy for a frame, and vice versa. Any kind of scaling algorithm is liable to need different tuning for different workloads
<robclark> heh, you think that is hard to get right.. now move that game into a VM ;-)
<alyssa> oof
<alyssa> cphealy: Reproduced the glmark2 unhappiness with the shadowing stuff on RK3399
<anarsoul> alyssa: fix is in the works? :)
<alyssa> anarsoul: Still trying to understand
<alyssa> Lot of time spent in memcpy now
<alyssa> I guess that makes sense
<alyssa> buffer is 290816 and it's shadowing 4x each frame
<alyssa> so just over 1MB of memcpying every frame
<alyssa> versus just under 2MB of copying incurred from flushing
<alyssa> so less overall system memory bw, but more visible because it's on the CPU now
<alyssa> I guess?
<robclark> alyssa: you should TC plus allow_cpu_storage
<cphealy> is that CPU memcpying done using NEON instructions?
<alyssa> robclark: Plumbing in TC in the next few hours before the bpoint seems hard ;-)
<robclark> heh, well..
<robclark> cphealy: I guess the issue is probably readback from writecombine buffers
<alyssa> OOI, does TC help in real workloads?
<alyssa> as opposed to viewperf
<robclark> yeah.. and in particular the cpu-storage trick for shadowing buffers is nice because you are memcpy from cached/malloced to WC gpu buffers instead of WC->WC
<alyssa> oh that would solve this nicely
<alyssa> cphealy: You interested? ;-p
<robclark> the case I see TC hurt are really more just scheduler issues.. scheduler interprets light load split over two threads as "these completely independent threads aren't heavily loaded" without realizing the association between the two
<cphealy> ha, you wouldn't want me writing NEON code. Just curious if CPU memcopy could be faster with Panfrost.
<alyssa> cphealy: found the actual issue though
<alyssa> thank you for your dilligent benchmarking, this would've slipped through otherwise!
<cphealy> No problem
<cphealy> We are a team on Panfrost now
<alyssa> :-D
<alyssa> OK, with these patches, the "interleaved=true, map" case is doubled in perf
<alyssa> but the non-interleaved map is hurt a little bit and the non-interleaved subdata is halved in perf
<alyssa> ~~averages out though~~ investigating the non-interleaved case now that I have a better idea what's going on
<alyssa> right.. The subdata case is going to suck without the TC optimization
<robmur01> or rather; probably not, until now
* robmur01 can't read diffs properly
* alyssa is unsure there's much to be done to help with the subdata case
<greenjustin> Surprised there's no prfm in this memcpy implementation
<greenjustin> Suppose it doesn't matter much if you're copying from coherent memory though
<robmur01> generally prfm does more harm than good for simple access patterns that the stride prefetcher can deal with itself
<alyssa> scratch that... this is supposed to work ok
<greenjustin> That's fair, especially on older chips where prfm doesn't co-issue for free
<alyssa> but apparently having some spooky action at a distance
<robmur01> I think pretty much everything since Cortex-A9, bar original ThnuderX, has a competent stride prefetcher
<alyssa> 100% gdb cpu usage delight
<alyssa> something is seriously broken here
<alyssa> oh, nvm, user error
<alyssa> whole bunch of vbo's. right.
<alyssa> yeah, I really don't see what the driver can do here without TC
<anarsoul> what is TC?
<alyssa> threaded context
<alyssa> cphealy: Pushed a new version of the resource shadowing fix
<alyssa> The subdata case may be slower but the other cases should be faster
<alyssa> and even if they're not, I'm inclined to land given the massive perf improvement on real workloads (i.e. not a glmark2 case designed specifically to emulate poorly written old apps)
<alyssa> Let me know how perf is with that for you
<alyssa> (massive win on RK3399 anyway)
<alpernebbi> yay for rk3399 wins!
<alyssa> alpernebbi: :D
<cphealy> alyssa: I'll give it a try in a few, tnx!
<alyssa> +1
atler has quit [Quit: atler]
atler has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
alpernebbi_ has joined #panfrost
alpernebbi has quit [Ping timeout: 480 seconds]
<alyssa> stepri01: Unfortunately, our UAPI build problems aren't solved yet :(
<alyssa> This is needed to fix the C++ build
<alyssa> Of course I'm not supposed to land that Mesa change without first landing in the kernel
<alyssa> I don't have a current kernel tree checked out and don't have the disk space to spare on this machine to fix that for a 1 line patch
<alyssa> so I would appreciate it if you could write the obvious 1 line fix (as there), add my reviewed-by, and push to drm-misc-fixes as before
<alyssa> (ideally by Tuesday so the Mesa side fix makes it into 22.3-rc1)
<alyssa> Thank you :)
CuriousGuy has joined #panfrost
CuriousGuy has quit [Remote host closed the connection]
alyssa has quit [Quit: leaving]