#panfrost on 2022-05-04 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:57 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

00:12 <icecream95> alyssa: Why is a preframe shader using CUBEFACE instructions to reload from a 2D texture?

00:14 <icecream95> sigh, now I lost my G52 blob patches due to disk corruption...

00:18 <icecream95> ouch... who set commit=600 on /

00:27 <alyssa> icecream95: no idea, why is it?

00:51 pch has joined #panfrost

01:05 <icecream95> alyssa: bi_opt_message_preload should run after bi_analyze_helper_requirements, shouldn't it?

01:12 * icecream95 has no idea why #5605 doesn't affect the blob

04:13 JulianGro has quit [Remote host closed the connection]

06:17 Daanct12 has joined #panfrost

06:45 Daanct12 has quit [Quit: Leaving]

06:46 Daanct12 has joined #panfrost

06:51 guillaume_g has joined #panfrost

06:53 MajorBiscuit has joined #panfrost

07:38 psydroid[m]1 has quit []

07:38 go4godvin has quit [Quit: Bridge terminating on SIGTERM]

07:38 strongtz[m] has quit []

07:38 Dylanger has quit [Quit: Bridge terminating on SIGTERM]

07:38 jenneron[m] has quit []

07:38 JulianGro[m] has quit []

07:38 toggleton[m] has quit []

07:38 CalebFontenotHaileysCuteNerdyB has quit []

07:38 stebler[m] has quit []

07:38 unevenrhombus[m] has quit []

07:44 CalebFontenotHaileysCuteNerdyB has joined #panfrost

08:29 rasterman has joined #panfrost

09:15 Daanct12 has quit [Quit: Leaving]

09:40 Dylanger has joined #panfrost

09:40 Guest29 has joined #panfrost

09:40 jenneron[m] has joined #panfrost

09:40 JulianGroOld[m] has joined #panfrost

09:40 psydroid[m] has joined #panfrost

09:40 stebler[m] has joined #panfrost

09:40 strongtz[m] has joined #panfrost

09:40 toggleton[m] has joined #panfrost

09:40 unevenrhombus[m] has joined #panfrost

09:43 rkanwal has joined #panfrost

09:46 pch has quit [Remote host closed the connection]

10:01 pch has joined #panfrost

10:25 icecream95 has quit [Ping timeout: 480 seconds]

10:53 MajorBiscuit has quit [Ping timeout: 480 seconds]

10:58 MajorBiscuit has joined #panfrost

11:24 ggardet has joined #panfrost

11:24 guillaume_g has quit [Read error: Connection reset by peer]

11:35 <alyssa> icecream95: uh... yes, I think so (re preload vs helper reqs)

11:35 <alyssa> though I'm not sure it's as simple as flipping the pass order

12:49 rkanwal has quit [Quit: rkanwal]

13:05 MajorBiscuit has quit [Quit: WeeChat 3.4]

13:25 floof58 is now known as Guest82

13:25 floof58 has joined #panfrost

13:25 <alyssa> jekstrand: so, umm, not that it's any of my business or anything, because it's not

13:25 Guest82 has quit []

13:25 <alyssa> but why doesn't panvk ci work?

13:33 <alyssa> https://gitlab.freedesktop.org/mesa/mesa/-/jobs/22160482

13:34 <alyssa> "Some failures found"

13:34 <alyssa> "RESULT=pass"

13:51 <daniels> oh no no no

13:52 <alyssa> seems to be just VK that's affected

13:53 <daniels> ~/wayland/wayland-protocols/chrome push ← → % bash -c 'false; set -e; echo $?'

13:53 <daniels> 0

13:53 <daniels> 1

13:53 <daniels> ~/wayland/wayland-protocols/chrome push ← → % bash -c 'false; echo $?; set -e'

13:54 <daniels> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16324

13:55 <alyssa> clearly we need CI for our CI

13:55 <daniels> the Python bits actually have really good inbuilt testing

13:55 <daniels> the shell bits, however, are shell

13:56 <alyssa> ah yes well

13:57 <alyssa> kicked off manual CI for that

13:57 <alyssa> we'll see how many drivers broke ITMT :p

13:58 <alyssa> ...How long was this broken for?

14:02 <daniels> 6 days

14:02 <alyssa> ah, alright

14:05 <alyssa> daniels: https://mesa.pages.freedesktop.org/-/mesa/-/jobs/22164270/artifacts/results/summary/results/trace@gl-panfrost-t860@neverball@neverball.trace.html

14:05 <alyssa> ....Lol

14:05 <alyssa> One of the "regressions" was fixing Neverball apparently :-p

14:07 <alyssa> I guess if !16324 is closed the pipeline should cancelled..

14:07 <daniels> heh!

14:08 <alyssa> user clip plane lowering

14:17 JulianGro has joined #panfrost

14:52 Guest29 is now known as go4godvin

14:54 <jekstrand> alyssa: UBO pulls aren't working properly in VS but are in FS. Any bright ideas?

14:55 <alyssa> jekstrand: show me the code

14:55 <alyssa> PanVK doesn't have UBO pulls at all wired up yet

14:56 <alyssa> unless I don't understand what you mean by pull

14:57 <jekstrand> I mean regular LOAD.ubo

14:58 <alyssa> Oh.

14:58 <alyssa> PANVK_DEBUG=trace please

15:00 <jekstrand> alyssa: https://people.freedesktop.org/~jekstrand/pandecode.dump.0000

15:00 <alyssa> Exception Status: 0

15:00 <alyssa> is this tracing before or after submission?

15:00 <jekstrand> idk

15:00 <alyssa> if after submission, that implies a gpu hang

15:00 <alyssa> (if before, that's normal)

15:01 <jekstrand> Yup, it's faulting

15:01 <jekstrand> bah

15:01 <jekstrand> should have known to check for that

15:01 <jekstrand> Unhandled Page fault in AS0 at VA 0x0000000000000000

15:01 <jekstrand> Reason: TODO

15:01 <jekstrand> raw fault status: 0x10002C2

15:01 <jekstrand> decoded fault status: SLAVE FAULT

15:01 <jekstrand> exception type 0xC2: TRANSLATION_FAULT_2

15:01 <jekstrand> access type 0x2: READ

15:02 <jekstrand> source id 0x100

15:02 <alyssa> ubuf_1[0] = 0x0;

15:02 <alyssa> // XXX: null pointer deref

15:02 <alyssa> Look it even tells you what you screwed up :-p

15:02 <jekstrand> That should be fine

15:02 <jekstrand> Nothings reading that one

15:02 <alyssa> +LOAD.i32.ubo t1, 0x00000020 /* 0.000000 */, 0x00000001 /* 0.000000 */, @r0

15:02 <alyssa> I beg to differ?

15:02 <jekstrand> Except maybe the 1st clause is...

15:02 <jekstrand> bah

15:02 <alyssa> Literally the first instruction reads that one?

15:02 <jekstrand> Ok, I've got my bug. Now just need to figure out why

15:02 <jekstrand> /o\

15:03 <alyssa> clause_52 and clause_54 also read it

15:03 <alyssa> glad I could be of service :-p

15:03 <alyssa> this is why i always run with PANVK_DEBUG=sync. always

15:03 <jekstrand> This is gonna be stupid again....

15:03 <alyssa> jekstrand: cts-runner took like an hour to run deqp-gles3

15:04 <alyssa> today is all about stupid! :-D

15:04 <alyssa> Fail (Result image invalid)

15:04 <alyssa> shit.

15:04 <jekstrand> :(

15:04 <jekstrand> The most annoying thing is that, once I fix this, I'm going to need to do another run which means another 8 hours. :(

15:04 <alyssa> Same hat.

15:04 <alyssa> dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelemen

15:04 <alyssa> ts_separate_grid_1000x1000_drawcount_1

15:04 <alyssa> ma'am this is a wendy's

15:05 * alyssa wonders why that is failing under cts-runner but never came up in my deqp-runner runs

15:10 anarsoul has quit [Ping timeout: 480 seconds]

15:10 anarsoul has joined #panfrost

15:11 <tomeu> maybe due to deqp-runner's flake retries?

15:11 * jekstrand wonders if there's another instance of uniforms_to_ubo he needs to kill

15:11 <tomeu> but it should print if it fails at any point

15:13 <jekstrand> Or maybe we're passing the wrong sysvals UBO index?

15:14 <jekstrand> Found it, I think

15:14 <jekstrand> dang...

15:14 <alyssa> woof

15:14 <jekstrand> The bifrost compile things nir->info.num_ubos means something

15:15 <jekstrand> Maybe?

15:15 <alyssa> maybe?

15:16 <jekstrand> sysval_ubo = MAX2(b->shader->inputs->sysval_ubo, b->shader->nir->info.num_ubos)

15:16 <jekstrand> WTH?

15:16 <jekstrand> Why is it clamping things up?

15:17 * jekstrand sees no universe in which that's a good idea.

15:18 * jekstrand wonders if this also fixes texturequerysize

15:19 <jekstrand> I mean, it was clearly put there for SOME reason

15:19 <jekstrand> I just question the quality of the reason. :P

15:19 <jekstrand> Woo! Pass

15:24 <alyssa> :-D

15:24 <jekstrand> Oh... It's because of 3559efb9bf5c

15:25 <jekstrand> Looks like we need to teach GL panfrost to pass num_ubos there.

15:25 <alyssa> 15:17 * jekstrand sees no universe in which that's a good idea.

15:25 <alyssa> sorry copypaste fail

16:00 ggardet has quit []

16:05 <alyssa> I really hate panfrost_create_scanout_res

16:05 <alyssa> just. everything about it.

16:38 * jekstrand kicks off another full run. Hopefully this one will go faster because I'll hang a lot less. :D

16:39 <jekstrand> I think I'll reboot first. Just to be sure things are in a nice state

17:49 rasterman has quit [Quit: Gettin' stinky!]

17:49 <greenjustin> This maybe a dumb question, but do folks here happen to know if Mali GPUs have the ability to sleep?

17:50 <greenjustin> We've been monitoring the GPU frequency on some of our mediatek boards recently, and noticed they tend to scale the frequency appropriately, but never seem to actually idle. Not sure if it's a driver thing or if the hardware just lacks the capability

17:52 <macc24> greenjustin: they do

17:52 <macc24> i noticed that when running just fbcon on my mt8183 device /sys/class/devfreq/1304000.gpu/trans_stat shows no activity

17:53 <greenjustin> That's interesting, I've been monitoring exactly that file on MT8192s and noticed it never shows no activity

17:53 <greenjustin> I'm thinking it's a kbase thing...all the more reason to make the switch :-)

17:54 <macc24> oh

17:54 <macc24> kbase

18:08 <alyssa> Why do we have panfrost_create_scanout_res again?

18:09 <alyssa> I fear if I ask I might end up learning things about WSI i don't want to know

18:19 <anholt> greenjustin: what are you using to monitor gpu frequency? /debug/clk_summary?

18:20 <anholt> I wouldn't expect a reported frequency there to go to 0. generally the clock will stay programmed at a rate, but you'll power off the block and disable the clock.

18:20 <anholt> if you're using perf counters or something to watch the gpu's clock, then I would expect that accessing the perf counters is keeping the gpu on.

18:21 <greenjustin> anholt: I've been using /sys/class/devfreq/1300000.mali/trans_stat

18:21 <macc24> greenjustin: wait, there's mtk chip with mali at 1300000?

18:21 <greenjustin> anholt: On most platforms it stops accounting time when the GPU is idle. On MTK it looks like that's just never...

18:22 <greenjustin> macc24: err, that should read 13000000. I missed a 0

18:23 <macc24> greenjustin: what machine are you working ?

18:23 <macc24> working on*

18:24 <greenjustin> macc24: Asurada. I think it's "ASUS Chromebook Flip CM3200FM1A" or something

18:24 <macc24> ah

18:24 <macc24> mt8192

18:26 <robmur01> FWIW there could be any number of reasons why a GPU doesn't go idle, or doesn't suspend while idle. "Mali", or even "Mali on MTK" is way to broad to generalise

18:26 <macc24> greenjustin: what are you running on that machine when you want the gpu to be idling?

18:27 <greenjustin> macc24: Nothing really. I stopped the UI, this is just a TTY console. I expect that all goes through the other DRM device on these mediatek SoCs rather than Mali, right?

18:27 <macc24> greenjustin: "tty console", chromeos's frecon?

18:27 <greenjustin> yes

18:27 <macc24> try stopping it

18:27 <macc24> and then testing

18:29 <robmur01> pay attention to /sys/device/platform/<gpu>/power/ too

18:31 <greenjustin> macc24: stopping frecon doesn't seem to help

18:31 <greenjustin> robmur01: I don't have a /sys/device/platform/<gpu>/power afaict, but I do have /sys/kernel/debug/mali0/ipa_current_power. That seems to be putting out a constant "22" unknown units...

18:32 <robmur01> note that "<gpu>" is a placeholder for whatever the name of the actual GPU device is on that platform, since I don't know ;)

18:33 <greenjustin> robmur01: Right, I figured I just didn't see anything in that directory that looked like the mali

18:34 <robmur01> for DT it's usually based on the node name, so possibly "13000000.<something>"

18:36 <robmur01> ah, in fact it might be symlinked as "device" from the devfreq dir too

18:39 <greenjustin> robmur01: I do have a /sys/class/devfreq/13000000.mali/device/power/! I can't make much sense of it though. runtime_usage is "0", runtime_status is "unsupported", and runtime_enabled is "disabled"

18:44 <macc24> heh

18:45 <robmur01> well there you go then, it's not suspending because runtime PM is disabled. Whether that's down to the kernel config or that particular platform's kbase port is something you could investigate if you were sufficiently bothered

18:46 <macc24> just mtk doing mtk things

18:47 <robmur01> but is that the one where you have to do at least one particularly nasty erratum workaround when turning the thing on, so never turning it off isn't so bad an idea...

18:48 <greenjustin> oooor I just call this "ammunition for justifying the switch to panfrost" :P

18:50 <greenjustin> robmur01: Oh? It's a G57 in an MT8192, if that's the right GPU. But it's about as possible in my estimation that ARM messed up the drivers. I've already found some pretty glaring bugs in them...

18:50 <macc24> greenjustin: almost every downstream kernel is 'messed up' imo

18:51 <greenjustin> macc24: yeah, it's a hot mess.

18:52 <macc24> greenjustin: convince google to use 100% pure upstream kernel

18:52 <macc24> plz

18:52 <robmur01> yes, the kbase driver as supplied in the DDK is hardly exemplary upstream quality, but bear in mind that it's then up to the licensee to completely butcher it with their own platform integration bits

18:55 <greenjustin> macc24: I wish... That would certainly make my job easier

18:56 <alyssa> robmur01: yes, MT8192 is shipping Natt r0p0 (iirc), which is affected by that tricky errata

18:56 <macc24> greenjustin: tbh google does a pretty good job of upstream stuff, much better than what i saw in few downstream kernels for couple of rockchip devices

18:56 <macc24> alyssa: oh it's gpu-specific?

18:56 <alyssa> robmur01: I typed out a workaround for it, but couldn't find out how to get the issue to trigger so I left it out of my kernel

18:57 <robmur01> furthermore, if you switch to panfrost in the hope of gaining RPM, right now at the very least you'll lose devfreq. Patches welcome ;)

18:57 <alyssa> and mt8192+panfrost is scheduled to pass the CTS in a few hours, so it can't be /too/ critical ......

18:57 <macc24> alyssa: what issue?

18:57 <alyssa> macc24: dont remember the # off hand

18:57 <macc24> robmur01: i still have that slightly wrong but not completely wrong opp patch that might work on mt8192 too :D

18:57 <macc24> that opp patch that makes all mt8183 cadmium devices slightly unstable

18:58 <alyssa> the one I told you to drop months ago, right?

18:58 <macc24> yep

18:58 <greenjustin> robmur01: I noticed that when I was testing on jacuzzi. The thing is though, we could just fix ourselves as opposed to waiting for some combination of ARM or MTK to get things working. That's honestly the biggest selling point of panfrost for us I think, is the reduced maintenance burden

18:58 <robmur01> "slightly wrong", the motto of downstream :D

18:59 <macc24> listen, i lost the patch made by someone on here that was mostly right patch

18:59 <alyssa> robmur01: One of my coworkers reported that the mediatek clock/supply/etc hierarchy on mt819x is overcomplicated downstream (to save a couple mW)... With a modifier device tree, most of the platform specific crap goes away and devfreq works.

18:59 <macc24> robmur01: and i can't find the patch now :|

19:00 <alyssa> robmur01: I don't know the details. Once it became clear that upstreaming would be less Mali work and more dealing with MTK's nonsense, I hopped back to Mesa and am waiting for one of my coworkers to take over the patch series.

19:00 <robmur01> oh, so we can just squint and not bother scaling one of the regulators? Yeah, that might fly

19:00 <alyssa> I think so, yeah

19:00 <greenjustin> macc24: Wait is that the patch you sent me that just forces the video ram regulator to be always on?

19:01 <greenjustin> I definitely still have that locally somewhere...

19:01 <macc24> greenjustin: no, the different one

19:01 <alyssa> I gave up at "why does this need 4 clocks"

19:01 <macc24> greenjustin: i was talking about https://github.com/Maccraft123/Cadmium/blob/master/kernel/patches/kukui.opp-multi-regulator.patch

19:01 <macc24> alyssa: O.o

19:09 * robmur01 is gonna guess they do explicit clock reparenting from the GPU driver for glitch-free rate changes or similar, rather than abstracting it in the clock driver

19:09 <robmur01> at least judging from the names in macc24's DT patch

19:11 <macc24> robmur01: i recall the word "reparent" when dealling with clocks and mt8183 so it might be right

19:15 <greenjustin> oh! does this fix the devfreq problem on mt8183s? or at least attempt to...

19:16 <robmur01> it's not an uncommon thing, I think it's prevalent on Amlogic SoCs too

19:16 <macc24> it tries to, alyssa thinks it's a bad fix, i think it's a bad fix too

19:16 <macc24> but it (mostly) works

19:16 <macc24> so

19:16 * macc24 shrugs

19:19 <robmur01> I tried to do a "good" fix, but it only proved that the OPP core API itself needs some work

19:22 <greenjustin> maybe I can TAL once I dig myself out of this MM21 rabbit hole...

19:34 soreau has quit [Read error: No route to host]

19:38 soreau has joined #panfrost

19:42 <greenjustin> robmur01: If I'm reading this right, then I think that's 100% what's going on with the kbase driver: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/kernel/v5.15/drivers/gpu/arm/bifrost/platform/mediatek/mt8183_mali_kbase_runtime_pm.c;drc=47416e73d481966308cf321eb108bd4a4eb1dd23;l=501

20:15 <greenjustin> alyssa: you mentioned G57s have a weird issue that makes it a bad idea to sleep? do you happen to have a link to more information on that?

20:24 <robmur01> hmm, I wonder if hacking panfrost_devfreq_target() to set the clock rate to 26MHz before setting the real OPP might trigger the reparenting dance automagically (unless mfgpll_ck can actually run that slow)? Might be a cheeky compromise if so...

20:27 <robmur01> (and of course assuming that the CCF would change the PLL rate *before* switching the mux back for the second change, which I have no idea about either)

20:47 * jekstrand should probably review alyssa's blend patch

20:52 Rathann has joined #panfrost

20:53 <alyssa> jekstrand: ....what patch

20:53 <alyssa> is this reverse psychology?

20:53 <alyssa> I'm pretty sure that's supposed to read

20:53 * alyssa should probably review jekstrand's blend patch

20:53 <alyssa> Uh, ok, I guess I can go do that...

21:01 rasterman has joined #panfrost

21:01 <Pu244> *testing*

21:02 <Pu244> Huh, OK, just an issue in the Qubes channel. n/m.

21:02 <alyssa> Test failed, please try again later.

21:02 <Pu244> Test failed, yay! That means I can fix what's wrong and test again!

21:02 <alyssa> :D

21:02 <Pu244> <3 test driven development.

21:07 <jekstrand> alyssa: Or you could review those. :P

21:34 <HdkR> test driven development, great until you find something in an application that doesn't trigger a test :P

22:15 <jekstrand> HdkR: Write more tests!

22:15 <jekstrand> That's called DDT: Debug-driven testing. :P

22:26 <HdkR> You're right

22:27 <HdkR> Need more llvm bugpoint style bug reduction

22:46 icecream95 has joined #panfrost

22:50 <alyssa> jekstrand: I know you're joking but I've been doing that with panfrost recently to excellent effect

22:51 <alyssa> Someone reports an issue, icecream95 debugs it and sends a 1 line fix, I merge and then root cause it and end up rewriting the whole module and unit testing the new one and finding and fixing other bugs in the process....

22:51 <alyssa> We have a process here! :-p

22:53 <jekstrand> hehe

22:56 <icecream95> alyssa: The next step is figuring out how I can write 0 line fixes for bugs

23:02 <anarsoul> icecream95: 0 line net like +1 -1? :)

23:07 robmur01_ has joined #panfrost

23:14 robmur01 has quit [Ping timeout: 480 seconds]

23:17 <alyssa> icecream95: does that require me to write perfect code? I mean I guess I can *try* ...

23:18 <icecream95> alyssa: No, it requires you to write enough code that I can fix bugs by only deleting lines :)

23:18 <alyssa> Ah ha!

23:18 <alyssa> Yes, I can do that! :-D

23:21 * alyssa should stop watchig CTS results scroll by and do not-work.

23:25 kenzie has quit [Quit: The Lounge - https://thelounge.chat]

23:27 rasterman has quit [Quit: Gettin' stinky!]

23:30 kenzie has joined #panfrost

23:45 JulianGro has quit [Remote host closed the connection]

23:56 <jekstrand> alyssa: Yes, yes you should.

23:57 <jekstrand> (He says as he flips over to his panvk dEQP run tab just to check on it)