robclark changed the topic of #aarch64-laptops to: Linux support for AArch64 Laptops (Chrome OS Trogdor Devices - Asus NovaGo TP370QL - HP Envy x2 - Lenovo Mixx 630 - Lenovo Yoga C630 - Lenovo ThinkPad X13s - and various other snapdragon laptops) - https://oftc.irclog.whitequark.org/aarch64-laptops
<steev> Seems like there’s a Lenovo system hardware update - 9/15/2023 in windows. Not sure what firmware changes there might be
<HdkR> https://store.avantek.co.uk/nv-grace-hopper-2u2n.html Just a casual $120k for Neoverse-V2
<clover[m]> Costs an ARM and a leg!
<HdkR> Guess I'm going to be waiting for AmpereOne eMag replacements
<robclark> HdkR: hmm, perf-record of the apitrace replay points at _mesa_LinkProgram() ... but not sure yet whether that just means the apitrace didn't cover the right sequence of gl calls or if something is different (less efficient?) about x86 emu vs native replay
<HdkR> Well this was aarch64 native apitrace capture
<HdkR> Since I had thunking enabled
<robclark> (basically I'm trying to figure out the callstack that leads to those hotspots so I can figure out why you are hitting that path)
<HdkR> Did it need to sit in the menu a bit longer?
<robclark> hmm, or just get me a `perf record -g` call stack
<robclark> I guess if you are hitting that path w/ thunking it isn't an emu issue
<HdkR> yea, that's all CPU time outside of the emulator
<robclark> there are several paths that lead to _mesa_apply_rgba_transfer_ops.. having a call stack would help track down why you are hitting that
<HdkR> Let's see if I can get one
<robclark> thx
<robclark> hmm, I probably need a copy/paste of the perf-report, since I don't think I have the same debug syms as you
<robclark> ok.. so some sort of readpix going on.. I guess to get gpu rendered content back into browser(ish) thing? I'll see if I can figure out if there is some slow path we are hitting there although it seems like it is largely fault handling
<HdkR> I can only guess. CEF does as it do
<robclark> fault handling is going to suck no matter how you slice it, ie. the issue isn't what is accessing memory from cpu but that something is.. but that shouldn't be anything that is x86 vs arm related, I wouldn't expect
<robclark> what is CEF?
<HdkR> Chrome Embedded Framework
<robclark> hmm, ok
<robclark> looks like _mesa_GetTexSubImage_sw is a thing to look at closer
<robclark> I guess a dumb-thing-that-apps-can-do-that-we-haven't-optimized-yet
<HdkR> It's always spooky when the CPU time isn't dominated by the emulation :)
hexdump0815 has joined #aarch64-laptops
hexdump01 has quit [Ping timeout: 480 seconds]
neggles has quit [Quit: bye friends - ZNC - https://znc.in]
Caterpillar has quit [Quit: Konversation terminated!]
alfredo has joined #aarch64-laptops
neggles has joined #aarch64-laptops
alfredo has quit [Ping timeout: 480 seconds]
iivanov has joined #aarch64-laptops
alfredo has joined #aarch64-laptops
alfredo has quit [Ping timeout: 480 seconds]
push has quit [Read error: Connection reset by peer]
push has joined #aarch64-laptops
push_ has joined #aarch64-laptops
push has quit [Read error: Connection reset by peer]
iivanov has quit [Quit: Leaving]
<robclark> HdkR: btw, did you look at `apitrace dump csgo.trace`? Not the most modern of gl usage..
<robclark> looks like it is just doing a boatload of texture upload
olv has quit [Read error: Network is unreachable]
steev has quit [Read error: Network is unreachable]
olv has joined #aarch64-laptops
leiflindholm has quit [Read error: Network is unreachable]
jonmasters has quit [Write error: connection closed]
robher has quit [Read error: Network is unreachable]
jbowen has quit [Read error: Network is unreachable]
arnd has quit [Read error: Network is unreachable]
steev has joined #aarch64-laptops
lool has quit [Read error: Network is unreachable]
eric_engestrom has quit [Read error: Network is unreachable]
dianders has quit [Read error: Network is unreachable]
lool has joined #aarch64-laptops
eric_engestrom has joined #aarch64-laptops
dianders has joined #aarch64-laptops
ndec has quit [Remote host closed the connection]
jonmasters has joined #aarch64-laptops
leiflindholm has joined #aarch64-laptops
arnd has joined #aarch64-laptops
robher has joined #aarch64-laptops
ndec has joined #aarch64-laptops
jbowen has joined #aarch64-laptops
<robclark> HdkR: try this.. not sure if there is any more to it, but seems to help:
<Jasper[m]> <robclark> "HdkR: btw, did you look at `..." <- Source engine is from 2004, the game runs in dx9 mode on Windows :^)
<Jasper[m]> Not that it'll do for long though, csgo is going to be deprecated this year
<steev> oh no :(
<Jasper[m]> steev: eh, CS2 will do dx11 or Vulkan afaik
<steev> oh
iivanov has joined #aarch64-laptops
iivanov has quit [Quit: Leaving]
<HdkR> robclark: Nice, I'll give it a try today
<robclark> perf says now it is bottlenecked on apitrace replay (decompression mostly).. but not sure how much faster that will make it in practice
<Jasper[m]> <Jasper[m]> "eh, CS2 will do dx11 or Vulkan..." <- (If I get early access I'll try it out, should have CEF aswell since that UI got backported)
<robclark> HdkR: hmm, seems like I should be somehow able to use FEXBash + chroot to get an env where I can build x86 mesa.. but they don't seem to play nicely together
<HdkR> The `unbreak_chroot.sh` script in the rootfs has some onf the finer details for how chrooting works
<HdkR> robclark: Patch takes CS:GO menu from sub 1FPS to 112FPS
<robclark> \o/
<HdkR> highest bits are now some atomic fetch_add and fd6_sampler_view_invalidate. atomic at 1.39%, sampler_view at 0.73%
<HdkR> Smattering split between JIT and msm_dri
<robclark> that isn't necessarily completely unreasonable.. what is gpu vs cpu load?
<HdkR> Primary thread at 65% usage, GPu at 60% usage
<HdkR> I guess a ping of ping-ponging CPU waiting on GPU
<robclark> yeah
<HdkR> Would need gpuvis to see where the stalls are
<HdkR> Not every day you can get single line changes that >100x the perf :)
<robclark> heheh
<HdkR> Looks like the hottest block is a < 32-byte memset. Rude
<robclark> `perf record -g` and `perf report -G` is useful to find where that is coming from
<HdkR> Well that one is in JIT code. Bit harder to trace back
<robclark> ahh
<HdkR> Sadly we don't have instructions yet that have ARM's new memset/memcpy instructions
<HdkR> Would have changed this three instruction loop in to a three instruction not-loop
<robclark> does a78c/x1c support this?
<HdkR> Nope. Nothing supports it yet, it's super fresh
<HdkR> Cortex-X1C/A78C memory subsystem is actually slightly worse off since it's an in-between core that glued on ARMv8.4 atomicity requirements
<robclark> ahh, if hw I have doesn't support it, it doesn't exist :-P
<HdkR> :)
<HdkR> Maybe with whatever comes after Cortex-A720 we'll have it
<HdkR> Oh interesting, the sampler view invalidate increases periodically
<robclark> sampler view invalidates might just be coming from the constant texture uploads?
<HdkR> Could be the case
<HdkR> Every time I see 4% CPU time in memcpy I think there is something wrong with memory clocking. But it could just be that set_constant_buffer is getting hammered by csgo
<robclark> pastebin decoded `perf report -G`?
<HdkR> The good thing is that 75% of the CPU time is in the emulation now
leezu has joined #aarch64-laptops
leezu has quit []
leezu has joined #aarch64-laptops
<robclark> yeah, that doesn't look too bad.. I'd have to look at the man page, but you should be able to sort by dso to see the total amount of time spent in msm_dri.so
<robclark> but looks like most of it is in csgo_linux64
<robclark> sampler_view_invalidate looks like all sampler_view_destroy path so not hitting a lot of demote paths (ie. with some sampler view formats if the UBWC and/or tiled layout is different from the underlying texture format we need to demote to tiled-non-UBWC or linear, but doesn't look like you are hitting that sampler_view_invalidate() path)
<steev> clover[m]: have you seen [ 101.273817] MultiMedia1 Playback: ASoC: no backend DAIs enabled for MultiMedia1 Playback, possibly missing ALSA mixer-based routing or UCM profile ? someone asked me about it and i've never seen it
enyalios has joined #aarch64-laptops