ChanServ changed the topic of #asahi-dev to: Asahi Linux: porting Linux to Apple Silicon macs | Non-development talk: #asahi | General development | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-dev
<chadmed> so again, something that we can probably beat macos at
nopeslide13 has quit []
nopeslide13 has joined #asahi-dev
Emantor has quit [Quit: ZNC - http://znc.in]
Emantor has joined #asahi-dev
<nicolas17> I just spotted "Mac14,8", "Mac14,13" and "Mac14,14" on a server-side config file
<nicolas17> disabling FindMy "separationMonitoring" so I guess they're desktops
tobhe has joined #asahi-dev
tobhe_ has quit [Ping timeout: 480 seconds]
gabuscus has quit []
gabuscus has joined #asahi-dev
<chadmed> WFI halts the clocks on CPUs where it is issued until theyre woken by an interrupt right
<chadmed> oh yeah it does ok thats why the numbers are different
<chadmed> less dynamic power coming from idle cores
<chadmed> if i take the turbo states out of the dataset then the numbers line up perfectly with what youd expect
<chadmed> still, 3 W dynamic for a firestorm core at full flog seems very low
<chadmed> not sure coremark is really hammering the core as much as it could
<chadmed> at least now that i know we're still getting perfectly polynomial power scaling on the cores outside of turbo mode i can do some stuff to compensate for the noise in the readings at low power
chadmed has quit [Quit: Konversation terminated!]
skipwich has quit [Quit: DISCONNECT]
skipwich has joined #asahi-dev
chadmed has joined #asahi-dev
chadmed has quit [Quit: Konversation terminated!]
kevans91 has quit [Ping timeout: 480 seconds]
chadmed has joined #asahi-dev
<tsujp> marcan: I see there are yaks in need of shaving for 'keyboard layout cleanup' and you're the contact for this, I'll volunteer but I'll need a bit of a bootstrap since I haven't done this kind of thing before (first time for everything though)
thelounge60 has quit [Quit: The Lounge - https://thelounge.chat]
thelounge60 has joined #asahi-dev
<chadmed> ok we're down to 2.55 W total AP idle and about 5 W system with the screen at the minimum brightness
<chadmed> not too bad but theres probably more we can do
<chadmed> i think the next "free" bit of desktop use power draw is definitely going to be doing something about the touchpad interrupts on non-mtp machines
<chadmed> the menu governor parks firestorm in the 6th pstate to idle them, which is going to cause idle power to be higher through leakage or watever
<chadmed> TEO parks them in the lowest pstate but it also isnt great for interactivity so theres our tradeoff
<nicolas17> is userspace behaving well? I have seen powertop report that plasma was causing >60 wakeups a second despite nothing visually changing on screen
<nicolas17> but that may have been an old version or otherwise different environment
<chadmed> idle power doesnt really change if i kill plasma and go to a tty
<chadmed> there are probably superfluous wakeups being caused by other stuff, because i managed to ssh into a partially failed boot before and idle was 2.2 W
<chadmed> i think realistically we're at the limit of what we can achieve by clock gating
<chadmed> the divergence between system draw and ap draw at idle tells me that theres other things we need to do to bring down idle power use
<chadmed> other SIP blocks sucking down more juice than they should
<chadmed> for example sometimes PSTR will just jump to 8-12 W with seemingly nothing else going on
<chadmed> PHPC stays at idle levels
<nicolas17> does Linux have anything for individual userspace threads to change the performance-power tradeoff?
<chadmed> we have UCLAMP_MIN and UCLAMP_MAX for userspace to tell CFS how much performance it _actually_ needs
<nicolas17> XNU has some "CPU QoS" thing beyond scheduling priorities, which likely also affects preference for p-cores vs e-cores
<chadmed> yeah, UCLAMP is a rough facsimile of that
<chadmed> userspace processes can hint to the scheduler how much core utilisation they will need ahead of time by clamping their util values
<chadmed> CFS will then put them in buckets with similar processes and dispatch cpu time to them accordingly
<chadmed> EAS helps with this too as threads which tell CFS that they dont need much utilisation can all be dispatched onto an icestorm core
<chadmed> with EAS we already get most background stuff being shunted over to icestorm
<chadmed> our problem is instead that EAS without hints tries to push stuff there too
<chadmed> then the util overflows so it shoves stuff over to a firestorm cluster
<chadmed> push interactive stuff there too*
<chadmed> one way we could work around this, for graphics tasks at least, is have asahi take in perf metrics from the gpu and feed that back into UCLAMP values to force whatever its linked to onto pcores at high pstates
<chadmed> but linux userspace is just gonna have to get with the times
<nicolas17> yeah I was wondering how much such hints would actually help
<chadmed> theyd help with interactivity but not necessarily power use
<chadmed> the reason i bought the t6020 machine is to test a hypothesis that having more than two ecores would give CFS/EAS a better opportunity to keep the pcores asleep
<chadmed> because sitting around at the dekstop icestorm is more than fast enough for a snappy experience, its just on t600x theres only two of them and they end up overloaded
<chadmed> hell with the gpu doing compositing and post processing, i can peg firefox to the ecores at 600mhz and it will play 1080p vp9 at 30fps no worries
<chadmed> the soc is just too lopsided for the scheduler to know how to use it properly without userspace helping it
<chadmed> CFS is just not a very good scheduler for HMP systems tbh
<nicolas17> eg. would forcing a background software update or indexing etc. to ecores, or even letting them stay on a slow pstate, reduce overall battery drain, even if it makes the update take longer (and use CPU for longer)?
<chadmed> yeah it would, even at full capacity the icestorm cores barely tick up enough power to get a reading over background noise from PHPC
<chadmed> theyre _very_ efficient
<nicolas17> ok *wow*, I didn't expect that much difference
<chadmed> according to my readings they use barely 500 mW at full tilt at 2.064 GHz
<chadmed> theyre super tiny and faster than they have any right to be
<nicolas17> so is there any "this is low priority background work" hint that would cause such "force to ecores" outcome, without having to pin to specific cores? (which would be too system-specific)
<nicolas17> afaik XNU "background" QoS does that
<chadmed> nothing that simple, youd have to hint to CFS with a mix of UCLAMP, priority tweaking and capacity-dmips-mhz values in the DT
<chadmed> the QoS buckets for CFS are all based on UCLAMP values, and if you dont provide those then it just uses its own feedback mechanism to predict what kind of capacity the process needs
<chadmed> the problem is that its really optimised for throughput and not interactivity, so bursty tasks will cause its perf counters to decay and youre back to it not having any idea what to do with the thread
<nicolas17> bah, Apple's ~5 qos categories seem way easier to understand for devs...
<chadmed> yeah see, the issue is on linux that we cant _rely_ on devs doing that
<chadmed> we dont have an army of devs who will hang on every word the kernel team say
<nicolas17> I mean it seems easier to rely on userspace devs to set a qos category than to set uclamp values :P
<nicolas17> which sound way fiddlier
<chadmed> so even if we had a simplified userspace-friendly qos system separate to nice/uclamp/whatever 50% of people wouldnt use it because its soy bloated cucked etc and the other 50% wouldnt use it because who the hell cares about the 5 people using heterogeneous workstations
<chadmed> intels heterogeneous socs have hardware scheduling so devs dont have to care about it
<chadmed> and no ones really bothered with a heterogeneous arm64 chip for the desktop except apple
<chadmed> so all the aggressive optimisations for software scheduling are in android, and guess what
<chadmed> they rely on android apis and alternative schedulers
<chadmed> its bad enough that we cant convince a nontrivial minority of folks that x11 is a ghoulish awful steaming pile and needs to die
<nicolas17> nice is already a mess on modern systems tbh
<chadmed> something as finnicky and not-user-facing as "please tell the scheduler your capacity requirements" is waaaay down the priority list of most developers
<nicolas17> I run 'nice ninja' on a terminal and my UI latency still lags
<nicolas17> because it seems nice levels only set priorities with respect to other processes in the same cgroup, and konsole is being put in a cgroup of its own
<chadmed> yeah its all just a mess of hacks and tricks and kludges that no one really understands and doesnt really have any benefit for anyone in the desktop
<nicolas17> the konsole cgroup and the firefox cgroup get equal treatment regardless of the nice levels of processes inside, and firefox lags while compiling
<chadmed> they all seem designed for highly integrated server environments where the developer and sysadmin are either the same person or at least in the same room
<chadmed> theyre not fit for purpose for modern desktop systems
<chadmed> intel's thread director is a damning indictment on this state of affairs
<nicolas17> I had a similar issue with BOINC on my desktop, it's nice'd and ionice'd and SCHED_IDLE'd and it still lagged the interactive desktop... why? cgroups
<chadmed> yeah again, a "solution" designed with the assumption that people want to spend 6 weeks collecting perf data and then 15 minutes on every cold boot manually optimising all this stuff
<nicolas17> so I changed the cgroup params of its systemd service unit... no difference, because CPU was still being shared equally between system.slice and user.slice
<chadmed> desktop users want to press the power button and not think about firmware or the kernel again beyond that
<chadmed> any time a normal user has to think about any of this shit is an absolute L for UX
<nicolas17> I moved it to its own top-level slice
cylm has joined #asahi-dev
<nicolas17> and now I finally can't tell the difference between boinc running vs not running
<chadmed> yeah see thats exactly my point
<chadmed> who the heck has time for that
<nicolas17> I think the package could be configured to set up the service like that
<nicolas17> but what about *gestures* everything else
<chadmed> well yeah thats what i meant by the solutions being designed for server admins
<chadmed> yeah you could package up a bunch of scripts that does all this for YOUR system and YOUR software
<chadmed> but theyre not going to work on another system with a different perf profile
<chadmed> developers need to build this in to code so that it happens automatically, and we cant do that easily or even *well* with the current kernel infrastructure
<chadmed> the classic example is balloo
<chadmed> why the hell does CFS give it full reign over my pcores?
<nicolas17> agreed, otoh baloo seems like exactly the kind of app that should be hinting "low priority"
<chadmed> yeah exactly thats why its the classic example :P
<chadmed> there should just be a bunch of switches that say "i dont care about throughput/latency/either"
<chadmed> so UCLAMP is probably the closes to XNU's QoS that we have, given the whole bucketing concept
<chadmed> but it would be nice if it had a more user-centric mechanism of operation so that devs could reason about it a bit better
<chadmed> UCLAMP_MAX=[512,1024] is utterly meaningless to me as a user and a developer
<chadmed> i want to be able to just say "my task is a background daemon" and let the scheduler worry about how it internally represents that
<nicolas17> ++
<nicolas17> too many kernel devs benchmarking throughput of kernel builds :P
<chadmed> android already does this, but again it relies on android-specific schedulers and android userspace apis
nicolas17 has quit [Ping timeout: 480 seconds]
psykose has quit [Remote host closed the connection]
knedlik has joined #asahi-dev
knedlik has quit [Remote host closed the connection]
stipa has quit [Remote host closed the connection]
seb4nihel has joined #asahi-dev
loki_val has quit [Ping timeout: 480 seconds]
crabbedhaloablut has joined #asahi-dev
nsklaus has joined #asahi-dev
pg12_ has quit []
pg12 has joined #asahi-dev
<jannau> dcp 13.3 FW doesn't like m1n1's dcp_ib_swap_set_layer call anymore. other problem problem was that the service init message is now TYPE_REPLY instead of TYPE_NOTIFY
nyilas has quit [Remote host closed the connection]
knedlik has joined #asahi-dev
knedlik has quit [Remote host closed the connection]
chadmed has quit [Remote host closed the connection]
<jannau> looks like there are two new dcp iboot calls in 13.3 but probably appended at the end. 13.2 has already additional calls but swap_set_layer was still working
chadmed has joined #asahi-dev
cylm_ has joined #asahi-dev
cylm has quit [Ping timeout: 480 seconds]
bps has joined #asahi-dev
bps has quit [Remote host closed the connection]
abd has joined #asahi-dev
bps has joined #asahi-dev
bps has quit [Ping timeout: 480 seconds]
bps has joined #asahi-dev
<jannau> looks like the swap_set_layer_cmd had layout/size changes, set_layer succeeds now but swap_end fails
bluetail422 has quit []
cylm has joined #asahi-dev
<jannau> at least there's a IOMobileFramebuffer::swap_submit_dcp() syslog line
<jannau> so the swap_set_layer_cmd might be interpreted as garbage by dcp
cylm_ has quit [Ping timeout: 480 seconds]
bluetail422 has joined #asahi-dev
bluetail422 has quit []
bps has quit [Ping timeout: 480 seconds]
bluetail422 has joined #asahi-dev
bluetail422 has quit []
<jannau> working, it was setting garbage layer parameters
<jannau> yeah! firmware version dependent dcp interfaces in m1n1 as well
bluetail422 has joined #asahi-dev
bluetail has joined #asahi-dev
bluetail has quit []
bluetail has joined #asahi-dev
bluetail422 has quit [Remote host closed the connection]
bluetail has quit [Remote host closed the connection]
bluetail has joined #asahi-dev
jnn is now known as jn
knedlik has joined #asahi-dev
<knedlik> I'll probably give the bass plugin a shot unless someone else plans to, seems not though
<knedlik> I'm unsure what I should be doing it on though, should I compile right on Asahi, or xcompile from MacOS? Also how do I test it... and not destroy my speakers?
psykose has joined #asahi-dev
<jannau> Knedlik: development can be in the first step fully independent of any hardware. the plugin gets an audio signal (a stream of samples) and outputs an audio signal. as first you wouldn't necessarily listen to what the plugin does
<knedlik> Makes sense, thanks for the link
<knedlik> Btw, any idea how I can make my IRC client not disconnect me when I turn off my phone?
<ChaosPrincess> the simple answer is 'you dont'
<knedlik> Ah
<ChaosPrincess> the less simple is 'run an irc bouncer on your server'
<knedlik> The algorithm linked in the Yaks page is what's needed, or do I need to modify that algo in some way?
abd has quit [Ping timeout: 480 seconds]
abd has joined #asahi-dev
abd has quit [Ping timeout: 480 seconds]
thevar1able_ has quit [Remote host closed the connection]
thevar1able_ has joined #asahi-dev
cylm_ has joined #asahi-dev
cylm has quit [Ping timeout: 480 seconds]
knedlik has quit [Ping timeout: 480 seconds]
abd has joined #asahi-dev
knedlik has joined #asahi-dev
knedlik has quit [Remote host closed the connection]
skipwich has quit [Remote host closed the connection]
skipwich has joined #asahi-dev
knedlik has joined #asahi-dev
knedlik has quit [Remote host closed the connection]
knedlik has joined #asahi-dev
<knedlik> I'm guessing the bass plugin should keep the differentiation between left and right?
kettenis has quit [Ping timeout: 480 seconds]
yamii has quit [Quit: WeeChat 3.8]
yamii has joined #asahi-dev
mkurz has joined #asahi-dev
knedlik has quit [Remote host closed the connection]
mkurz has quit [Quit: Konversation terminated!]
knedlik has joined #asahi-dev
knedlik has quit [Remote host closed the connection]
nsklaus has quit [Quit: ZZZzzz…]
lynndotpy has quit [Quit: bye bye]
lynndotpy has joined #asahi-dev
abd has quit [Ping timeout: 480 seconds]
Retr0id has quit [Quit: bye]
<chadmed> Knedlik: as i said implementation details are up to you, so long as its fast and light
<chadmed> that link is just there to give you a high level idea of what the plugin needs to do
<chadmed> you may sum L and R for bass since the wavelength is too long for anyone to discern direction from, especially on such a tiny machine
<chadmed> but if you want to go down that path, you have to chain off a stream from the main signal and then mix that back in to both L and R at the end
<Chinese_soup> chadmed: just so you know, you are replying to someone who's not here rn
<Chinese_soup> but I guess you maybe know that and assume they're gonna read the logs
<chadmed> yeah i assume they read the logs since theyre referencing messages i sent some time ago
<chadmed> also as jannau mentioned you dont necessarily need to develop this on an apple silicon machine, its just an audio plugin
<chadmed> any machine where you can run a plugin host and test it will work fine
<chadmed> my machine will lay down its life for science when the time comes to test it on apple silicon
abd has joined #asahi-dev