alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
<alyssa>
cyrozap: Not done and not documented anywhere afaik
<alyssa>
Maybe somewhere at Arm :p
<alyssa>
In theory all Bifrost ALU should be exactly 1 cycle (and anything memory related is N cycles for random vaalues of N that differ every frame)
<alyssa>
In practice.. dunno
vstehle has quit [Ping timeout: 480 seconds]
atler is now known as Guest404
atler has joined #panfrost
Xalius has quit [Quit: Leaving]
Guest404 has quit [Ping timeout: 480 seconds]
bbrezillon has quit [Ping timeout: 480 seconds]
cphealy has quit [Remote host closed the connection]
<amonakov>
at least to me, such use of "1 cycle" is terribly ambiguous, so I'd recommend against saying it like that to someone from a different background without further clarification
<amonakov>
to me, if you say that on a 0.5GHz GPU ALU is 1 cycle, it implies that it produces a result in two nanoseconds, which is not the case on Mali
stan_ has joined #panfrost
stan has quit [Ping timeout: 480 seconds]
<amonakov>
it's just that hardware selects sufficiently many (12 in case of G72) ready warps and cycles over them for each clause, so the code may have back-to-back dependent instructions in a clause
<amonakov>
this is different from Nvidia where you may structure the code to have higher ILP and not rely on warp switching (which has a cost, so that's a win)
<amonakov>
cyrozap: we did some basic investigation and found that Bifrost latency over both "ADD" and "MUL" pipes in 12 cycles (so most likely 6 cycles each), but the above makes it somewhat moot for optimization; https://freenode.irclog.whitequark.org/panfrost/2021-03-24
vstehle has joined #panfrost
chewitt_ has quit []
macc24 has joined #panfrost
bbrezillon has joined #panfrost
Turtle has joined #panfrost
alpernebbi has joined #panfrost
Daanct12 has joined #panfrost
Turtle has quit []
macc24 has quit [Ping timeout: 480 seconds]
wwilly has joined #panfrost
wwilly_ has joined #panfrost
wwilly has quit [Ping timeout: 480 seconds]
Rathann has quit [Remote host closed the connection]
Rathann has joined #panfrost
stan_ has quit []
macc24 has joined #panfrost
stanb has joined #panfrost
Daanct12 has quit [Quit: Quitting]
Danct12 has joined #panfrost
<warpme_>
guys: i'm playing with g31 on h616 and currently have it partially working. some elements of ui are nicely drawing+animating (i.e. pop-ups borders fade-in/out) but i.e. surfaces are all black. dmesg has many: "panfrost 1800000.gpu: gpu sched timeout, js=0, config=0x7300, status=0x58, head=0x3acf600, tail=0x3acf600, sched_job=00000000f8dd2901". These job timeouts are only for some ui operations. i.e. ui pop-ups fade-in/out not
<warpme_>
causing any job timeouts. Is it possible that i.e. one g31 exec.unit works ok while other exec.init generates job timeouts and this is somehow related to multi-render target (sfbd vs. mfbd) isue. h616 uses drm driver (sun4i-drm) which works well with t720 (sfbd device). but g31 is mfbd - so maybe sun4i-drm driver needs to be updated/extended to support mfbd ?
<warpme_>
exec.init->exec.unit
<macc24>
warpme_: check logs
<macc24>
i remember discussion about h616 few days ago or something
chewitt has joined #panfrost
<warpme_>
macc24: yes. it was me who initiated discussion. since that time i moved forward a bit. now g31 loads ok and partially rendering on screen. but all the time it is working partially. interrupts stats from gpu are growing monotonic; some ui parts are rendered correctly (like on working g31 on amlogic soc) - i.e. menus highlights or popup fades; but some other parts of ui are plain black and dmesg shows gpu job timeouts immediately
<warpme_>
when there is ui action (or every 60sec when clock on ui is updated). constantly have feeling like "half" of gpu works ok but other "half" not
<macc24>
oh
<warpme_>
i'm way too weak in panfrost internals but remotelly "feels" like i.e. one of execution unit in g31 can't render to screen....
<warpme_>
thus my Q i.e. about MFBD vs. SFBD in context of sun4i drm backend...
* macc24
smells mali t618
<warpme_>
in SFBD context or in soc quirks?
<cwabbott>
warpme_: sun4i is the display driver, it has basically nothing to do with the gpu, it just displays whatever is handed to it (which usually comes from the gpu)
<robmur01>
is it possible that it's literally just timing out because it's slow as heck and taking forever to wake up and get going at its lowest OPP?
<cwabbott>
and MFBD vs SFBD is a thing that changed within midgard, g31 is bifrost which is an entirely new generation in which a million other things changed
<cwabbott>
don't get fixated on it
chewitt has quit []
<warpme_>
robmur01: initially g31 was with 431MHz. I tested 250MHz. no any difference. so you suggest test with i.e. 600MHz?
<robmur01>
more like hacking JOB_TIMEOUT_MS to something significantly larger - chances are it *is* some more fundamental issue, but it's an easy sanity check to make
<warpme_>
btw: when i force llvmpipe - all works perfectly fine. is rendering to screen code the same for g31 hw gpu and llvmpipe?
<macc24>
warpme_: afaik yes
<cwabbott>
warpme_: on SoC's like you're using the display driver and gpu are two entirely different IP's - the display driver just displays the framebuffer handed to it and doesn't care where it came from
<robmur01>
yes, the GPU (or CPU for llvmpipe) renders to a buffer, then the display engine (sun4i_drm) scans out that buffer - they are entirely distinct things.
<warpme_>
re: GPU and DE - yes i'm aware of that. May i conclude that: if llvmpipe works 100% then we might be sure issue is NOT drm driver and issue lays purely within mesa g31 code?
<robmur01>
If the issue is the GPU failing to execute jobs correctly then it is clearly a GPU issue :/
stanb has quit [Ping timeout: 480 seconds]
<robmur01>
the black areas will merely represent the display happily scanning out buffers of zeros which the GPU was supposed to write meaningful pixel data over
<macc24>
warpme_: if you replace only llvmpipe and it works the issue is with stuff you replaced llvmpipe with(panfrost)
<macc24>
if you replace a lightbulb and light starts working, do you wonder if the light switch is broken?
alpernebbi has quit [Quit: alpernebbi]
<warpme_>
macc24: just off topic remark with bulb: lets magine: you have 250VAC in bulb socket and you are repacing 220VAC bulb vendorA not tolerating +15% overvoltage to bulb vendorB who tolerates it - and bulb stops breaking - where root cause is: your bulb socket voltage of vendorA?
<warpme_>
of->or
<macc24>
warpme_: there is no scenario in which 30V difference would matter with 230V/250V ac systems
<macc24>
unless they strung together some leds with no thinking in which case the bulb is trash anyway
<robmur01>
but either way it still would have no relation to the *switch*. I fear the point of the analogy may have been missed ;)
<robmur01>
cool - as I say that's what I expected, but it was worth double-checking
<warpme_>
:-)
<warpme_>
it here any magic switch in panfrost.ko (or env) to learn more about job timeout details?
<warpme_>
it->is
<robmur01>
so that points to either a system-level issue liek the GPU clock being inadvertently gated when it's not expected to be, or some issue with the job itself such that the GPU gets stuck executing it
<robmur01>
I think if you enable the pandecode dumping in mesa you should be able to correlate the job descriptor address from the fault with the actual thing it was trying to do.
<warpme_>
robmur01: hmm. for 1\ hypothesis i conclude this will happen when soc has separate clocks per gpu exec unit and one unit is dead due not clocked. i'm not sure g31 has separated per-exec.unit clocks. hypothesis 2\ sounds well - but the same binary os+app works well on sm1 g31....
<robmur01>
again I wouldn't get too hung up on the execution engines either - they're an internal detail of the shader core and AFAIK not really software-visible (other than performance-wise)
<warpme_>
ah ok!
<alyssa>
warpme_: Bifrost is kinda silly in the error reporting
<alyssa>
For all intents and purposes, treat GPU sched timeout on Bifrost the same way you would treat a fault in dmesg on midgard
<alyssa>
(I.e. as a probable userspace bug preparing bad descriptors or buggy shaders)
<alyssa>
if you have a recent Mesa, you can set PAN_MESA_DEBUG=sync and that should as soon as the buggy job is submitted
<robmur01>
so it's a bit like trying to debug missing libraries on a busybox system where the only error says "not found"? :D
<warpme_>
how far we can assume g31 on amlogic is the same like g31 on h616 ? You mention: "probable user-space bug preparing bad descriptors". But the same os+app binary works well on sm1 g31 so may we say: user-space is to exclude from possible root causes?
<warpme_>
alyssa: oh qll. let me try with PAN_MESA_DEBUG=sync
<bbrezillon>
what I don't get is why it works on the amlogic g52
<bbrezillon>
or maybe it doesn't
<narmstrong>
you mean amlogic g31
<bbrezillon>
sorry, g31 yes
<warpme_>
bbrezillon: fyi: for me g31 on sm1 works well without disabling afbc
<alyssa>
Ah!!!!
<alyssa>
what are the respective display controllers?
<warpme_>
de3.3 in allwinner and "don't remember" in amlogic
<alyssa>
To be clear, your app works on Allwinner but not on Amlogic?
<bbrezillon>
the other way around IIUC
<alyssa>
(If that's the case, it makes sense if the bug is AFBC in userspace, since Allwinner doesn't do AFBC at least not upstream, but Amlogic does.)
<warpme_>
right. qll. now it is clear why i started this all talk with suggestion that issue might be display engine - not gpu.........
<alyssa>
no, it's definitely still GPU
<warpme_>
and was banned :-)
<alyssa>
it /is/ GPU
<macc24>
warpme_: you were banned? lol?
<alyssa>
it's just that depending on what format is negotiated, the GPU uses different code paths
<warpme_>
oh sorry - not is prime sense :-)
<warpme_>
is->in
<bbrezillon>
warpme_: just to be sure, it works on amlogic but not on allwinner, right?
<warpme_>
yes. exactly
<bbrezillon>
so AFBC should only be used for non-scanout/private bufs in that case
<bbrezillon>
but it shouldn't fault
<bbrezillon>
(unless there's no AFBC support in the GPU, but I'm not sure that's possible)
<robmur01>
bbrezillon: configurable ;)
<bbrezillon>
oh
<bbrezillon>
guess that's exposed in one of the static regs
<alyssa>
robmur01: wait what >_>
<macc24>
robmur01: why?
<alyssa>
bbrezillon: AFBC_FEATURES (0x004C)
<bbrezillon>
yep, found it too
<robmur01>
it's a tiny area-optimised GPU - makes sense to leave out the AFBC logic if the display etc. won't support it
<alyssa>
Guess I have a kernel patch to write
<warpme_>
alyssa: "format is negotiated, the GPU uses different code paths" - this reminds me months ago discussion about allwinner h6 modifiers issue (still present). current vanilla 5.12.8 gives me distorted screen on h6 and iirc it was modifiers issue....
<macc24>
is there ANY soc vendor that doesn't have weird wtf quirks?
<alyssa>
macc24: no.
* macc24
grabs a 65C02 and runs away crying
<alyssa>
robmur01: What about on midgard v5?
<alyssa>
That register doesn't exist thee
<alyssa>
All T760/T820/T860/T880 must support AFBC?
<alyssa>
warpme_: Particular you, I want to see the difference between your two G31 devices
_whitelogger has joined #panfrost
<alyssa>
gcc features.c -I /usr/include/libdrm -ldrm to build
<jernej>
alyssa: I did quick source check of BSP H616 display driver and it seems that display HW supports AFBC
<jernej>
so it's quiet possible GPU will report AFBC support too
chewitt has joined #panfrost
chewitt_ has quit [Read error: Connection reset by peer]
camus1 has joined #panfrost
<daniels>
warpme_: btw, your patch is completely correct, please submit it upstream!
<daniels>
the only way to have all this working between different devices is explicit modifier negotiation
<macc24>
>sending patches upstream
<daniels>
I don't love git-send-email either, but it beats being frustrated all the time
<daniels>
nothing about that patch is a hack, it's exactly the right solution
<jernej>
daniels: why is that necessary? if driver doesn't support any modifier, why it should report one?
<jernej>
(sun4i-drm maintainer here)
<alyssa>
jernej: IN my interpretation DRM_FORMAT_MOD_INVALID means "anything goes"
<alyssa>
If you want linear specifically, you need DRM_FORMAT_MOD_LINEAR
<alyssa>
If someone creates a texture in panfrost with DRM_FORMAT_MOD_INVALID, panfrost will choose the fastest modifier it has available (usually AFBC or tiled)
<alyssa>
We could make mesa treat DRM_FORMAT_MOD_INVALID as a fallback to linear but that would complicate a lot of other logic and.. there's already a modifier for linear
<daniels>
jernej: ^ linear _is_ a modifier
<jernej>
interestingly, I don't have any issue with T720 on H6 with Kodi
<daniels>
linear is an explicit declaration of the buffer layout; invalid is 'errrrrrr I dunno, let's all guess and hope that magic semantics have given us the answer we're all hoping for'
<daniels>
which, as you've seen, is not correct in every single case
<jernej>
I would argue that historically, no modifier means linear
<alyssa>
Possibly it would make sense for the kernel to fill in a default list of only LINEAR modifiers if the display driver returns NULL like sun4i does
<jernej>
since that property afaik didn't always exist
<alyssa>
not sure if doing so would break userspace
<alyssa>
daniels: ^ thoughts there?
<daniels>
jernej: it doesn't though
<daniels>
Intel, AMD, and Broadcom, all do magic heuristics to get a tiling format
<daniels>
so INVALID can mean ... any number of things, all of which resolve to some kind of magic knowledge
<alyssa>
tbh I wish INVALID didn't exist at all
<daniels>
shrug, backwards compat forever
<alyssa>
yeah, i'm just thinking if we had had a magic 8-ball
<daniels>
srs
<jernej>
daniels: correction, no modifier is reported
<alyssa>
ideally the pre-modifier thing would resolve to LINEAR
<jernej>
not even invalid
<alyssa>
and INVALID would never be a thing
<daniels>
jernej: same thing
<daniels>
alyssa: it just means that INVALID would become a separate bool somewhere
<alyssa>
how so?
<jernej>
but if that would be so important, surely DRM infrastructure would force it?
<alyssa>
why do we need magic modifiers anyway..?
<jernej>
currently __drm_universal_plane_init happily works without specified format
<daniels>
because of pre-modifier userspace which will call gbm_bo_create with no modifiers, or drmModeAddFB2 with no modifiers, or or
<jernej>
*modifier
<daniels>
jernej: correct, and it shouldn't
<daniels>
all the drivers passing NULL should be fixed to pass { LINEAR, INVALID }
<jernej>
well, then, if that becomes requirement, that's ok with me, but you still have older kernels without that
<alyssa>
daniels: sure, but can't new kernel with old userspace just treat that case as { LINEAR, INVALID }?
<daniels>
it probably could, but given how few patches are involved, I'd much prefer to have a BUG_ON(!modifiers) and actually surface it to driver authors, rather than magic implicit stuff behind their back
<daniels>
jernej: sure, but the point is that in the absence of a modifier (either explicitly passed as INVALID from userspace or no modifier provided - those are the same thing), you _cannot_ guarantee linear
<daniels>
userspace cannot treat those two cases the same, because suddenly hardware from three vendors who ship a lot of units breaks completely
<jernej>
we used that rule in Kodi and it works for now, but it's not tested on every SoC under the sun
<daniels>
Intel, AMD, RPi
<daniels>
all three of those will give you INVALID != LINEAR
<daniels>
this is literally the entire reason that INVALID exists, because there is extant userspace which will use tiled layouts (determined by magic heuristics) in the absence of an explicit modifier
<jernej>
correction, in Kodi, we use functions without modifiers for those cases when no modifier is reported
<daniels>
sure, and that's correct
<jernej>
and it works :)
<daniels>
but that's not the same thing as LINEAR
<daniels>
if your entire usage chain for the BO supports explicit modifiers, then use explicit modifiers (gbm_{bo,surface}_create_with_modifiers, drmModeAddFB2WithModifiers, etc) and everything will work; if it does not support explicit modifiers, then do not use modifiers (gbm_{bo,surface}_create, drmModeAddFB2) and everything will work
<jernej>
exactly
<daniels>
DRM_FORMAT_MOD_INVALID is a perfect alias for no modifier (apart from the gbm create functions, which will not accept INVALID as the only entry for annoying yet valid API reasons), and GBM_BO_USE_LINEAR is a perfect alias for DRM_FORMAT_MOD_LINEAR
<jernej>
and as I said, I have not seen issues reported by warpme_ on H6 & T720 with Kodi
<daniels>
the coda to this is that you cannot use GBM_BO_USE_LINEAR unless you know what you're doing
<daniels>
*cannot use GBM_BO_USE_LINEAR in the absence of explicit modifier support through your entire use chain
<daniels>
anyway the upshot to all this is that implicit modifiers are super painful and fragile as seen, and the fix for that is using explicit modifiers, soooo ...
<jernej>
so someone must fix DRM infra to force explicit modifiers :D
<jernej>
daniels: but anyway, more or less all graphics app work on H6 as-is (including X desktops), so there must be some common assumption for such cases?
<daniels>
the common assumption is that the GPU driver and the DRM driver (and the codec and and ...) perfectly agree on the layout through a completely undetermined mechanism
<daniels>
one such mechanism is 'everything is linear'
<daniels>
another is 'you only have one set of heuristics across all the drivers and those are immutable'
<daniels>
another is 'your display and GPU drivers are the same thing and they have a secret side-channel and please don't ask about codecs'
<daniels>
for T720+H6, the answer is that _somehow_ everything ends up being linear
<daniels>
the fix for this is not to make everything linear always everywhere, because again that's not actually possible in the presence of userspace which _already_ today makes different assumptions in the absence of modifiers ... the fix is to plumb modifiers through
<daniels>
but yeah, if I pushed a commit to drm-misc-next to BUG_ON(!modifiers) right now, then sunxi_drm would fall over on probe, which seems ... less good than pushing the patch to explicitly expose { LINEAR, INVALID }?
<jernej>
if you push BUG_ON(!modifiers), I would expect all drivers are fixed beforehand
<daniels>
indeed!
jolan has quit [Quit: leaving]
jolan has joined #panfrost
<warpme_>
alyssa: here is output from G31 in h616 and sm1:
<alyssa>
robmur01: That seems to imply the h616 has all the formats sm1 does, so AFBC isn't there
<alyssa>
So maybe we do need that AFBC_FEATURES reg? which I guess is new to Bifrost
<alyssa>
warpme_: How hard is it for you to compile new kernels for each board?
<bbrezillon>
warpme_: you should be able to read it with devmem if the GPU is in use
<alyssa>
wait what? this is new
<alyssa>
and cursed
<anarsoul>
alyssa: what? devmem?
<bbrezillon>
'echo on > /sys/class/drm/renderD128/power/control && memtool md 0xffe4004c' seems to work here
<jernej>
warpme_: another thing to test on H616 would be your modifier patch
<alyssa>
anarsoul: using /dev/mem for poke registers
<alyssa>
so cursed
<bbrezillon>
warpme_: you'll have to replace the base address by what's reported in /proc/iomem of course
<warpme_>
alyssa: "How hard is it for you to compile new kernels for each board" - in sense of testing or in sense of default development model?
<warpme_>
jernej: re: my patch on h616: h616 works only with patch. without patch - black screen....
<jernej>
warpme_ you mean modifier patch?
<warpme_>
yes
<anarsoul>
alyssa: you just need to be careful, but it's an important witchcraft part!
<anarsoul>
:)
<alyssa>
anarsoul: So uh hypothetically you could bring up a platform driver in userspace by mmaping /dev/mem? cursed
<bbrezillon>
if you don't need interrupts, yes :)
<alyssa>
awful :)
<bbrezillon>
indeed
<warpme_>
bbrezillon: "you should be able to read it with devmem if the GPU is in use" - i have cap. to read mem loc. from userspace. What mem.loc. i need to report?
<jernej>
warpme_: 0x0180004c
chewitt has quit []
<alyssa>
probably different between the two SoC's, no?
<jernej>
yeah, that is for H616
<jernej>
warpme_: in short, address in gpu node + 0x4c
<bbrezillon>
warpme_: 'cat /proc/iomem' should tell you the base address
<bbrezillon>
and you add 0x4c to it as jernej said
<bbrezillon>
and sinec jernej already looked in the dtsi, I think you can read 0x180004c directly
<bbrezillon>
let's hope it returns something != 0
<alyssa>
fingers crossed
<robmur01>
anyone horrified by /dev/mem try not to think about the part of the Raspberry Pi community who like writing userspace GPIO drivers because syscalls are "too slow"...