ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard + Bifrost + Valhall - Logs - I don't know anything about WSI. That's my story and I'm sticking to it.
<jdavidberger> htop shows 3 idle cores and one that maxs out at like 5-10%. Its only doing ~10 dispatches over a few seconds for the test so I think the CPU shouldn't be slowing it down I think. It gets very very close to 25 if nothing else is running and the shaders are cached and everything which is suspiciously close to 32 * 800mhz
<cphealy> That is definitely suspiciously close to 32*800MHz!
<cphealy> What method are you using to come up with the value of 38gflops?
<jdavidberger> From the datasheet gen1 g52 is 32 operations per cycle but gen2 should be 48 operations per cycle so 48*800mhz = 38.4gflops
<cphealy> "FP32 Operations/Cycle"?
<HdkR> And then you learn that 32 operations/cycle != 32 FP32 operations/cycle
<cphealy> How are you interpreting the 32/48 value for Mali G52 as gen1 and gen2?
<cphealy> I wasn't aware that there are different gens of G52.
<jdavidberger> HdkR: True; but that linked datasheet lists it as FP32 operations/clock
<jdavidberger> cphealy: That is pretty unclear in the datasheet but the hardware revision is r1p0 and the max thread count as reported by the GPU matches the 768 number on the RK3566
<cphealy> I think the 32/48 is independent of r1p0 vs some other revision. My understanding of the revision is it is more along the lines of newer revision with bugs fixed. Basically the same as Cortex-A55 r1p0 vs Cortex-A55 r1p1 as an example.
<jdavidberger> Maybe -- the official documentation from ARM leaves something to be desired here for sure. But the GPU reporting 768 threads available with the drmIoctl call makes me think it's the second gen and I gotta think the 32->48 bump comes from the max threads going from 512->768
<cphealy> I have a different theory on the "FP32 Operations/Cycle" reporting 32/48: If you look at the top of that table, you will see an entry for "Arithmetric Units". With every GPU that has two numbers for the "FP32 Operations/Cycle", you will see two numbers for the "Arithmetric Units", so the correct FP32 Operations/Cycle number is likely tied directly to how many arithmetric units the GPU has.
<cphealy> It could be that you have a G52 with 2 arithmetric units as opposed to 3.
<cphealy> According to this datasheet: The GPU is ARM Mali G52 1-Core-2EE. This would mean that you have a single shader core with 2 execution units. This would mean 32 is the correct number for FP32 Operations/Cycle.
<jdavidberger> that makes sense. The 768 thread thing really threw me off in that chart. Which is a bummer but glad I only spent one day trying to talk it into going faster
hanetzer has joined #panfrost
<cphealy> ;-)
<jdavidberger> I have the application code I need to run speced out at ~20GFLOPs. Hopefully I'm not that far off. Thanks for the help; its good not to spin my wheels for nothing
camus has joined #panfrost
alpernebbi has quit [Ping timeout: 480 seconds]
alpernebbi has joined #panfrost
wicastC has quit [Remote host closed the connection]
wicastC has joined #panfrost
davidlt_ has joined #panfrost
davidlt__ has joined #panfrost
rcf has quit [Quit: WeeChat 3.8-dev]
hanetzer1 has joined #panfrost
rcf has joined #panfrost
hanetzer has quit [Ping timeout: 480 seconds]
davidlt_ has quit [Ping timeout: 480 seconds]
davidlt__ has quit [Ping timeout: 480 seconds]
hanetzer2 has joined #panfrost
hanetzer1 has quit [Ping timeout: 480 seconds]
guillaume_g has joined #panfrost
guillaume_g has quit []
furry has joined #panfrost
furry has left #panfrost [#panfrost]
jdavidberger has quit [Quit: Leaving.]
rasterman has joined #panfrost
chewitt has quit [Quit: Zzz..]
<robmur01> cphealy: spot on - G52r0 implicitly has 3EE shader cores, G52r1 is configurable for 2EE or 3EE
<q4a> Looks like I was wrong. Some Mali GPU (like G610) support geometry shaders. Check "GL_EXT_geometry_shader" and "Geometry Shader" in Andeoid aida64 report:
Leopold has quit [Ping timeout: 480 seconds]
<HdkR> q4a: They support the extension yes. The hardware converts them to compute shaders.
<HdkR> Well, driver converts them to compute shaders :P
<robmur01> right, just because the *driver* reports a capability doesn't mean it has to be implemented in hardware. Take llvmpipe, for instance ;)
<q4a> then how to check if it's software or hw implementation?
<HdkR> It's all compute shader baby
<HdkR> No mali supports GS in hardware :)
MajorBiscuit has joined #panfrost
guillaume_g has joined #panfrost
guillaume_g has quit []
TheKit[m] has joined #panfrost
jelly has quit [Read error: Connection reset by peer]
jelly-hme has joined #panfrost
rcf1 has joined #panfrost
rcf has quit [Quit: WeeChat 3.4.1]
jdavidberger has joined #panfrost
alyssa has joined #panfrost
<alyssa> Has MRT + blend shaders been broken on Midgard this entire time?
<alyssa> Answer is likely than you'd think
<cphealy> robmur01: Are there readable registers in the GPU that expose how many AUs each shader core has? Also, are there readable registers in the GPU that expose how many shader cores the GPU has?
<stepri01> shader cores is easy - SHADER_PRESENT is a bitmap of which cores are implemented, so that number of bits set is the number of shader cores
<stepri01> number of AUs is stored in CORE_FEATURES (on GPUs where it means something)
<stepri01> (or Execution Engines to use the correct term)
<cphealy> stepri01: Execution Engines is equivalent to Arithmetic Units in public ARM Mali datasheet vernacular, correct?
<alyssa> stepri01: FWIW, the userspace does want to know the clock speed for clinfo...
<alyssa> right now I hardcode 800MHz...
<alyssa> IDK what any app can actually do with the information lol
<stepri01> cphealy: I'm not sure where you are seeing AUs in public docs. refers to them as EEs. But I think they are the same thing
<stepri01> alyssa: Yes I know - I tried to argue against providing the data (because it's almost certainly useless) but I just got pointed to the spec and didn't have much of an argument :(
<stepri01> hardcoding a random number seems like a good idea
<alyssa> mood
<alyssa> I don't care one way or the other tbh
<jdavidberger> Here is the dump of DRM registers for the g52 2ee; so AU as core features matches 2 which matches the benchmark --
MajorBiscuit has quit [Ping timeout: 480 seconds]
ungeskriptet[m] is now known as ungeskriptet
<robclark> stepri01, alyssa: drm/msm exposes max clk.. I use it for things like calculating % utilization from perfcntrs.. IMO it is a perfectly reasonable thing to expose to userspace
<stepri01> robclark: The real problem is that "no idea" isn't an allowed response, and in some situations the kernel really doesn't know
<stepri01> It's also badly specified as things like DVFS mean that it's not the actual frequency
<robclark> surely the kernel knows the max freq.. it doesn't have to report the current freq, only the max
<stepri01> only if it is actually managing the clocks. On a FPGA platform it might not be known, and using a software model there isn't necessarily such a thing as a clock
<stepri01> so you end up with a "lie with a hardcoded number" path in the driver and wonder what exactly you gain by trying to give a real number on real hardware
<stepri01> beyond the specific case of profiling using hardware specific counters the number is useless and no application should use it
<stepri01> so why it exists in a supposedly hardware-agnostic spec is beyond me
stepri01 has quit [Quit: leaving]
<robclark> yeah, not sure why it is in spec.. but I don't think weird developer-only edge cases would convince me that it isn't something that the kernel should expose
p0g0 has quit [Ping timeout: 480 seconds]
<alyssa> especially given that neither FPGAs nor software models are available to us mere mortals
davidlt has joined #panfrost
<jdavidberger> Is there some concept of max invocations on bifrost that is possibly less than just GL_MAX_COMPUTE_WORK_GROUP_COUNT * GL_MAX_COMPUTE_WORK_GROUP_SIZE?
<jdavidberger> Specifically when I run with glDispatchCompute(65535,1,1) with a local size of {256,1,1} I expect 0xFFFF00 invocations and i'm seeing ... i think 0x2AAA80 invocations based on timings; hard to tell but def less
davidlt has quit [Ping timeout: 480 seconds]
jelly-hme is now known as jelly
jdavidberger has quit [Ping timeout: 480 seconds]
jdavidberger has joined #panfrost
davidlt has joined #panfrost
davidlt_ has joined #panfrost
davidlt has quit [Ping timeout: 480 seconds]
davidlt_ has quit [Ping timeout: 480 seconds]
Guest3758 has joined #panfrost
<alyssa> jdavidberger: Check dmesg, any chance the job is timing out/
<jdavidberger> [222057.754333] panfrost fde60000.gpu: gpu sched timeout, js=1, config=0x7b00, status=0x8, head=0xa279140, tail=0xa279140, sched_job=0000000065196d57 🤦
<jdavidberger> thanks that explains the behavior completely
Guest3758 has quit [Ping timeout: 480 seconds]
<alyssa> Woof...
<alyssa> jdavidberger: I don't recommend it (since long-running compute kernels will lock up your graphical session, there's no preemption
<alyssa> ) but you can patch the kernel to bump the timeout
<alyssa> (particularly if you're doing headless compute stuff)
<alyssa> it would be nice to get a proper fix but it's nontrivial and if it was going to happen it would have been 3 years ago :|
<alyssa> timeout in milliseconds
greenjustin_ has joined #panfrost
greenjustin is now known as Guest3794
greenjustin_ is now known as greenjustin
Guest3794 has quit [Ping timeout: 480 seconds]
Guest3791 has joined #panfrost
q4a1 has joined #panfrost
robmur01_ has joined #panfrost
larunbe has joined #panfrost
Consolatis_ has joined #panfrost
avane_ has joined #panfrost
FLHerne_ has joined #panfrost
pjakobsson_ has joined #panfrost
italove5 has joined #panfrost
samuelig_ has joined #panfrost
pbsds12 has joined #panfrost
mriesch_ has joined #panfrost
pendingchaos_ has joined #panfrost
mmind00_ has joined #panfrost
CounterPillow_ has joined #panfrost
alpernebbi has quit []
bbrezillon has quit []
mmind00 has quit []
q4a has quit []
indy has quit []
alarumbe has quit []
jelly has quit []
pbsds1 has quit []
pendingchaos has quit []
sergi has quit []
pjakobsson has quit []
robmur01 has quit []
avane has quit []
italove has quit []
xdarklight has quit []
Consolatis has quit []
nergzd723 has quit []
thecycoone[m] has quit []
simon-perretta-img has quit []
rellla has quit []
CounterPillow has quit []
mav has quit []
jenneron[m] has quit []
mriesch has quit []
mntirc has quit []
samuelig has quit []
FLHerne has quit []
FLHerne_ is now known as FLHerne
CounterPillow_ is now known as CounterPillow
alpernebbi has joined #panfrost
indy has joined #panfrost
rellla has joined #panfrost
xdarklight has joined #panfrost
hanetzer3 has joined #panfrost
mav has joined #panfrost
hanetzer2 has quit [Ping timeout: 480 seconds]
bbrezillon has joined #panfrost
mntirc has joined #panfrost
simon-perretta-img has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
jelly has joined #panfrost
mmind00_ has left #panfrost [#panfrost]
mmind00 has joined #panfrost
pendingchaos_ is now known as pendingchaos
mmind00_ has joined #panfrost
mmind00_ has quit []
mmind00 has quit []
simon-perretta-img has quit []
mntirc has quit []
bbrezillon has quit []
jelly has quit []
simon-perretta-img has joined #panfrost
mntirc has joined #panfrost
bbrezillon has joined #panfrost
jelly has joined #panfrost
jdavidberger has quit [Quit: Leaving.]