agx_ has quit [Read error: Connection reset by peer]
pcercuei has joined #etnaviv
karolherbst has quit [Ping timeout: 260 seconds]
karolherbst has joined #etnaviv
berton has joined #etnaviv
<BobB3>
Hi, I'd like to run an idea past you:
<BobB3>
To aid full system profiling, we want to allow global enabling and sampling of permon counters in etnaviv.
<BobB3>
Does this seem like a reasonable thing to do?
<BobB3>
The existing code will be tweaked so that when a submission includes pmrs, it enables any that were not currently enabled, pre and post samples as it does currently, then reverts disables them if they are not enabled globally.
<BobB3>
Rough idea would be to add 3 new ioctls (enable, disable and sample) to allow profiling SW to sample the counters regularly.
karolherbst has quit [Ping timeout: 260 seconds]
karolherbst has joined #etnaviv
kherbst has joined #etnaviv
karolherbst has quit [Ping timeout: 260 seconds]
kherbst has quit [Ping timeout: 260 seconds]
karolherbst has joined #etnaviv
karolherbst has quit [Quit: duh 🐧]
karolherbst has joined #etnaviv
karolherbst has quit [Remote host closed the connection]
karolherbst has joined #etnaviv
<austriancoder>
BobB3: what is your definition of "full system profiling"? The idea with perfounters is to get numbers for one draw call.
<BobB3>
We would like those numbers for the HW as a whole. Looking at the driver, it looks like they are general counter registers that keep counting as long as the debug register sets them enabled
<BobB3>
so, for example, situations like running a gl compositor, plus a few gl apps, a user might ask why performance is lowering. Being able to get the numbers for the HW as a whole could let us say "look, you hitting the limit of X based on this recorded perf counter"
<austriancoder>
hmm.. not sure if the perfounters are the thing you might want as they make sense on a per draw basis. what would work is something like utilization per integrated block of the gpu (like FE, PE, ..)
<BobB3>
If the counters are enabled, would they not count across multiple draw calls to give a total across all applications? I assume this is why the driver does a pre and post sample (I dont have a ref manual for the HW to check)
<BobB3>
my basic assumption is that anything that can be done for a single draw can be enabled for every draw and accumulated to get system wide profile numbers
<austriancoder>
the interaction between mesa and perfcounters are per draw. You attach pmrs to a draw call (pre/post) and get the result in a bo
<BobB3>
indeed, that is the current interface, but I see no reason why that has to be the only interface
<austriancoder>
I am not sure how you can tell e.g. form the number of vs cycles what globally should be the cause everything feels slow?
<cphealy>
BobB3: What SoC are you working on? Are you wanting counters other than just the 3D GPU counters?
<BobB3>
you may not be able to the specific cause, but it could help answer simpler questions like "is my gpu saturated" by checking idle cycles etc
<austriancoder>
Yeah.. and for utilization you do NOT need perfcounters
<BobB3>
I was hoping to enable all counters currently supported in the driver for per submission. It would be a general profile extraction feature, and leave interpreting meaning up to the user
<BobB3>
basically, the information is there and could be useful to userland for profiling, why not expose it?
<austriancoder>
utilization is not exposed and I love to see this via a drm-wide solution where the kernel samples defined registers and expose the global utilization via a well defined interface
pcercuei has quit [Quit: brb]
<austriancoder>
perfcounters are exposed via GL_AMD_performance_monitor
<BobB3>
yes, but how would one use that to get system wide stats?
<BobB3>
I was basing this idea on the ioctls that the panfrost driver has to grab the perf counters for the device as a whole
pcercuei has joined #etnaviv
Chewi_ has joined #etnaviv
hanzelpeter has joined #etnaviv
Chewi has quit [Ping timeout: 272 seconds]
Chewi_ is now known as Chewi
<BobB3>
austriancoder: let me ask a different question then: if I do the work to expose the perfcounters via ioctls, would it be rejected at review? (dont want to start something that be denied on principle)
<BobB3>
regarding use case for it, I guess it would just be for informational purposes to userland. In the same way that the mesa HUD can give perfcounter graphs, system wide GPU profiling GUIs (e.g. perfetto, gpuvis etc) could display those values for the GPU as a whole
<gbisson>
Marex: better late than never, I finally got a chance to boot 5.4 kernel on Nano with Etnaviv, what data did you need to know more about the GC7000UL?
<austriancoder>
BobB3: give me a day to think about .. also I want to look what gpuvis supports and what is doable and makes sense to support.
<BobB3>
austriancoder: sure, thanks for the consideration. FWIW gpuvis doesnt currently understand perfcounters at all. I added a patch to get it to interpret the generic drm scheduler events so we can get job timespans showing nicely, but it will be more GUI work to add perf counters I think
<daniels>
austriancoder: per-draw makes sense if you're trying to profile an app indeed, but if you're trying to modify a proprietary app then you can't ... also if you're trying to do whole-of-system profiling (e.g. someone has a proprietary navi UI + OEM CEF UI + compositor), then that means you have to somehow modify all three of those, and tunnel the data out of all three, and then reconcile it
<daniels>
austriancoder: so for those scenarios you really definitely want a global view, where you can see usage (not just GPU but also CPU/memory/IO/etc/etc), who's doing draw calls when, what the compositor's doing, drm vblanks, etc
<austriancoder>
daniels: for this I can use the tracepoints and e.g..gpuvis
<daniels>
so that's what gpuvis supports, and we're also looking to extend https://gitlab.freedesktop.org/fahien/gfx-pps to support etnaviv - that is a plugin for the Android (but usable outside Android!) tracing system which you can see at https://perfetto.dev
<daniels>
gpuvis isn't a terribly bad tool, but perfetto definitely has much more advanced and easy to use filtering and visualisation
<daniels>
also, to be completely transparent, I'm trying to burn down one of Collabora's revenue streams with this. people come to us saying 'our product UI is slow and we don't know why', then they pay us a bunch of money to investigate it for them. tbh I'd be much happier if they spent their money on more interesting things
<daniels>
plugging into tools like Perfetto is good for that, because they can run it without installing weird toolkits which only gfx hackers have ever heard of :P
<austriancoder>
daniels: I will have a look at perfetto then
<daniels>
austriancoder: cool, thanks! appreciate it, would be really cool to get etnaviv usable there as well :)
hanzelpeter has quit [Quit: leaving]
<Marex>
gbisson: what does dmesg show when the etnaviv driver probes ?
<Marex>
gbisson: thank you
lrusak has joined #etnaviv
mauz555 has quit [Remote host closed the connection]
Surkow|laptop has quit [Quit: Bye bye - Laptop goes to sleep - NOP NOP NOP]
berton has quit [Remote host closed the connection]
<gbisson>
Marex: sorry I don't have the board at this time, afaik it said probe succeeded without much details (I had to remove the thermal part for the bind to succeed)
<gbisson>
Marex: I'll give you the dmesg log first thing tomorrow
<Marex>
gbisson: no worries , take your time
mauz555 has joined #etnaviv
mauz555 has quit [Read error: Connection reset by peer]
embden has quit [Read error: Connection reset by peer]