user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
tdsea^ has quit [Remote host closed the connection]
kylealanhale has joined #asahi-dev
bisko has quit [Read error: Connection reset by peer]
bisko has joined #asahi-dev
user982492 has joined #asahi-dev
yuyichao has quit [Ping timeout: 480 seconds]
kylealanhale has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
phiologe has joined #asahi-dev
PhilippvK has quit [Ping timeout: 480 seconds]
yuyichao has joined #asahi-dev
<YichaoYu[m]>
playing with the perf counter on a 13" air.
<YichaoYu[m]>
is it expected that the counter only work on the E core?
<YichaoYu[m]>
using numactl to pin the process to the P core cause it to not count anythng for both the cycles and instructions
<YichaoYu[m]>
(and attempting to use the raw counter also seems to turn up all 0 on the P core)
<YichaoYu[m]>
also somehow I must set `exclude_guest` or perf_event_open will return not supported
bisko has quit [Read error: Connection reset by peer]
bisko has joined #asahi-dev
kov has quit [Quit: Coyote finally caught me]
kov has joined #asahi-dev
nicolas17 has quit [Ping timeout: 480 seconds]
bisko has quit [Read error: Connection reset by peer]
bisko has joined #asahi-dev
user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
user982492 has joined #asahi-dev
bisko has quit [Read error: Connection reset by peer]
bisko has joined #asahi-dev
luigy[m] has quit [Server closed connection]
luigy[m] has joined #asahi-dev
null has quit [Server closed connection]
null has joined #asahi-dev
chadmed has joined #asahi-dev
skrzyp has quit [Server closed connection]
skrzyp has joined #asahi-dev
Ziemas has quit [Server closed connection]
Ziemas has joined #asahi-dev
Major_Biscuit has joined #asahi-dev
kaprests has quit [Server closed connection]
kaprests has joined #asahi-dev
Major_Biscuit has quit []
lewurm has quit [Server closed connection]
lewurm has joined #asahi-dev
_jannau_ has quit [Server closed connection]
_jannau_ has joined #asahi-dev
MajorBiscuit has joined #asahi-dev
jato has quit [Server closed connection]
jato has joined #asahi-dev
henje[m] has quit [Server closed connection]
henje[m] has joined #asahi-dev
<jannau>
YichaoYu[m]: you have specify the performance core's counter explicitly with 'apple_firestorm_pmu/cycles/'
jix has quit [Server closed connection]
jix has joined #asahi-dev
<maz>
YichaoYu[m]: 'exclude_guest' is expected, we don't support guest profiling (or any sort of PMU virtualisation). and as jannau pointed out, you must explicitly select the PMUs you want to count on.
bisko has quit [Remote host closed the connection]
user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
jvsg[m] has quit [Server closed connection]
jvsg[m] has joined #asahi-dev
M1bn3mar[m] has quit [Server closed connection]
M1bn3mar[m] has joined #asahi-dev
jevinskie[m] has quit [Server closed connection]
jevinskie[m] has joined #asahi-dev
bisko has quit [Remote host closed the connection]
dhewg has quit [Server closed connection]
dhewg has joined #asahi-dev
bisko has joined #asahi-dev
fridtjof[m] has quit [Server closed connection]
fridtjof[m] has joined #asahi-dev
cde[m] has quit [Server closed connection]
cde[m] has joined #asahi-dev
jluthra has quit [Remote host closed the connection]
jluthra has joined #asahi-dev
the_lanetly_052__ has quit [Ping timeout: 480 seconds]
<YichaoYu[m]>
maz: the manpage for perf_event_open says that exclude_guest is x86 only
<YichaoYu[m]>
and I had assumed that if I'm not running a vm anywhere, not specifying this option shouldn't make a difference?
alcazar has quit [Ping timeout: 480 seconds]
kameks has joined #asahi-dev
<maz>
YichaoYu[m]: the manpage is pretty much unmaintained (as is most of the documentation). Also, it is hard to find out whether you run a VM in a race-free way.
<YichaoYu[m]>
I mean shouldn't the default be something like I don't care and so it should work?
<maz>
it is also an ABI issue. Not specifying 'exclude_guest' implies that you can count guest events should you run one. however, we don't support it.
<YichaoYu[m]>
k
<maz>
no, the default should be something that reflects reality.,
alcazar has joined #asahi-dev
<YichaoYu[m]>
and for specifying the event, instead of using PERF_TYPE_RAW or PERF_TYPE_HARDWARE, I should use the /sys/bus/event_source/devices/apple_{firestorm,icestorm}_pmu/type instead?
<YichaoYu[m]>
what does it take for teaching the kernel how to count instructions and cycles across cores?
<YichaoYu[m]>
(and re exclude_guest, it's just that almost none of the simple example code I've found mentioned that, which is kinda fair since it's new in 3.2. it's not until I straced perf that I gave it a try....)
bisko has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<jannau>
YichaoYu[m]: the problem with counting cycles over different cpu core types is that you don't know what 1000 cycles mean
<jannau>
900 / 100 performance/efficiency core cycles is different than 100 / 900 performance/efficiency core cycles
<YichaoYu[m]>
true, but it's unfortunate that the default `perf stat` is a bit useless, and also instructions count should at least work across cores...
alcazar has quit [Ping timeout: 480 seconds]
<YichaoYu[m]>
(also I felt like between cpu frequency change and hitting different bottleneck in different part of the code I've already accepted that unless the code is fully warmed up the number might be difficult to interpret = = .....)
<YichaoYu[m]>
I found a few simple ones that are clearly related to memory access and branches but if someone have figured it out already I don't need to redo it...
vup has joined #asahi-dev
trouter has quit [Ping timeout: 480 seconds]
trouter has joined #asahi-dev
joske has joined #asahi-dev
bisko has joined #asahi-dev
chadmed has quit [Remote host closed the connection]
FieryFlames[m] has quit [Server closed connection]
<YichaoYu[m]>
trying to find the a14.plist and then realize that I'm under linux.....
<YichaoYu[m]>
but before I go taking a deep look at any of these, do any of these count as sources that can be used to write a linux driver?
<YichaoYu[m]>
also how was the current list in the kernel generate? is it known to be complete?
<YichaoYu[m]>
oh I see someone mentioned rr there already. yeah I'm mostly interested in finding reliable counters so that rr can work
alcazar has quit [Ping timeout: 480 seconds]
alcazar has joined #asahi-dev
<maz>
YichaoYu[m]: (catching up) counting cycles (or any other event) across CPU types makes no sense. it's like adding apples and oranges.
<maz>
the driver supports all the possible events. you just need to convert that file into the JSON format that the perf tool uses, and you'll get all the events with a nice description.
<YichaoYu[m]>
does it mean that I can pass in something that isn't in m1_pmu_events to perf_event_open?
<maz>
YichaoYu[m]: read again. the driver supports all 256 events. m1_pmu_events only describes events that have some special affinity.
<YichaoYu[m]>
Ah yeah missed that…. I thought the unknow_xx is some twisted form of knowing….
<YichaoYu[m]>
So what do we need to hook it up so that perf can understand them without referring to apple data?
<YichaoYu[m]>
And does it mean that perf will never be able to print something that makes sense for cycle count?
<YichaoYu[m]>
Even if the thread/process is pinned to one core?
<maz>
you need to convert the magic (and copyrighted) apple file into something that perf can ingest *at build time*.
<maz>
perf prints things that make sense to me, just as it does on any asymmetric system.
<maz>
if you pin the thread to a single core and use the PMU that matches that core, you'll get sensible results. if you don't, well...
<YichaoYu[m]>
I mean, if I do to want to use it, and for the purpose of putting those info into the kernel driver, does it need to be re’d?
<YichaoYu[m]>
But perf print not counting for cycles if the process runs on the p core
<maz>
you don't put *ANYTHING* in the kernel driver at all. you put the information in the userspace tool.
<YichaoYu[m]>
Even for perf_type_hardware?
<maz>
again: if you describe the event pertaining to the P core, perf will use the P core event.
<YichaoYu[m]>
yeah I know that, but there are also generic hardware event
<YichaoYu[m]>
if I don't want to use raw events
<maz>
you don't need to use raw events if you describe things to the userspace tool. perf will convert things for you, just like with any other CPUs on this side of the universe.
alcazar has joined #asahi-dev
<YichaoYu[m]>
as in `numactl --physcpubind=4 perf stat sleep 1` prints cycles + instructions unsupported
<YichaoYu[m]>
but why shouldn't one put it in the kernel so that perf_type-hardware can be supported?
<maz>
you don't specify a PMU, you get a random result. end of story.
<maz>
because this is kernel bloat that we don't need.
<maz>
userspace is the right place for this, just like any other PMU.
<maz>
we can have that conversation on the list if you want.
<YichaoYu[m]>
so you are saying PERF_TYPE_HARDWARE is useless?
<maz>
mostly, yes.
<YichaoYu[m]>
I've never heard of that argument before
<YichaoYu[m]>
but that also means
<maz>
mostly because there is no way you can map the events that current CPU implements onto what the in-kernel perf consider as "canonical" events.
<YichaoYu[m]>
if I want to cound instructions
<YichaoYu[m]>
and I don't want to pin
<YichaoYu[m]>
there's no way to do that without wasting counters
<YichaoYu[m]>
I'm not saying all of them should be mapped, only the ones that do have a cannonical counterpart
<maz>
good luck mapping those. even better luck finding an accurate architectural description that spans x86 (which the canonical events originate from) and any arm64 CPU.
yuyichao has quit [Ping timeout: 480 seconds]
<YichaoYu[m]>
Why do I need luck to get the instructions count working?
<maz>
because you stubbornly refuse to pass the command line argument that would make it work.
<YichaoYu[m]>
Are you taking about getting perf to work with current kernel or finding information so it it can be put into the kernel?
<YichaoYu[m]>
I'm also talking from the perspective of directly calling `perf_event_open`
brent has joined #asahi-dev
<YichaoYu[m]>
are you saying that even if I only care about clearly core-independent events like branch count or instruction count, I would have to do processor detection and figure out what kind of events I need?
<YichaoYu[m]>
(also I'm at work now so I may not reply promptly....)
brent has quit []
kameks has quit [Ping timeout: 480 seconds]
bisko has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
brent has joined #asahi-dev
brent has quit []
joske has quit [Quit: Leaving]
<maz>
YichaoYu[m]: so am I. this really is the wrong forum for discussion, so please post on the list and Cc me and the other ARM perf maintainers.\
<YichaoYu[m]>
I assume at least the last question (which was what I asked initially) belongs to here. Is any of these info upstremable?
<YichaoYu[m]>
(I can try mailing list but so far I have very bad experience reporting any issues there)
alcazar has quit [Ping timeout: 480 seconds]
alcazar has joined #asahi-dev
alcazar has quit [Ping timeout: 480 seconds]
luxio_39[m] has quit [Server closed connection]
luxio_39[m] has joined #asahi-dev
<maz>
YichaoYu[m]: no, this really is a generic question about asymmetric systems, specially about ARM big-little systems (which the M1 is an instance of).
kylealanhale has joined #asahi-dev
alcazar has joined #asahi-dev
<YichaoYu[m]>
no that's a generic question about copyright
<YichaoYu[m]>
for that last question I'm asking if the json, or whatever other format, you use to make perf understand all the counters, can that go directly into perf/kernel source
nicolas17 has joined #asahi-dev
<maz>
YichaoYu[m]: I'm not a lawyer, and I'm not going to give any advise on whether taking this information and dumping it in the perf tool is legal or not. personally, I'm not touching ot.
<maz>
it*
<maz>
you're welcome to do it at your own risk.
<YichaoYu[m]>
Sure that’s fine. I assume doing complete blind testing should be fine and so I was asking if it worth doing it.
<maz>
it would definitely be useful to have that information in the userspace tool. but if you are doing blind testing, please make your tests publicly available.
<YichaoYu[m]>
marcan: I assume you know better (than I do)
<YichaoYu[m]>
Yes that’s for sure
<YichaoYu[m]>
I’m just running some loops and summarizing the results to take educated guesses.
<YichaoYu[m]>
(And I was only doing that on the once in that list but I guess I should do all 256 of them )
<YichaoYu[m]>
I’m not particularly interested in submitting any code for review (kernel or otherwise) but just playing with it to collect info. I do want to make type_hardware work though so I guess I can send an email to the kernel list….