ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
<icecream95> alyssa: Why is a preframe shader using CUBEFACE instructions to reload from a 2D texture?
<icecream95> sigh, now I lost my G52 blob patches due to disk corruption...
<icecream95> ouch... who set commit=600 on /
<alyssa> icecream95: no idea, why is it?
pch has joined #panfrost
<icecream95> alyssa: bi_opt_message_preload should run after bi_analyze_helper_requirements, shouldn't it?
* icecream95 has no idea why #5605 doesn't affect the blob
JulianGro has quit [Remote host closed the connection]
Daanct12 has joined #panfrost
Daanct12 has quit [Quit: Leaving]
Daanct12 has joined #panfrost
guillaume_g has joined #panfrost
MajorBiscuit has joined #panfrost
psydroid[m]1 has quit []
go4godvin has quit [Quit: Bridge terminating on SIGTERM]
strongtz[m] has quit []
Dylanger has quit [Quit: Bridge terminating on SIGTERM]
jenneron[m] has quit []
JulianGro[m] has quit []
toggleton[m] has quit []
CalebFontenotHaileysCuteNerdyB has quit []
stebler[m] has quit []
unevenrhombus[m] has quit []
CalebFontenotHaileysCuteNerdyB has joined #panfrost
rasterman has joined #panfrost
Daanct12 has quit [Quit: Leaving]
Dylanger has joined #panfrost
Guest29 has joined #panfrost
jenneron[m] has joined #panfrost
JulianGroOld[m] has joined #panfrost
psydroid[m] has joined #panfrost
stebler[m] has joined #panfrost
strongtz[m] has joined #panfrost
toggleton[m] has joined #panfrost
unevenrhombus[m] has joined #panfrost
rkanwal has joined #panfrost
pch has quit [Remote host closed the connection]
pch has joined #panfrost
icecream95 has quit [Ping timeout: 480 seconds]
MajorBiscuit has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #panfrost
ggardet has joined #panfrost
guillaume_g has quit [Read error: Connection reset by peer]
<alyssa> icecream95: uh... yes, I think so (re preload vs helper reqs)
<alyssa> though I'm not sure it's as simple as flipping the pass order
rkanwal has quit [Quit: rkanwal]
MajorBiscuit has quit [Quit: WeeChat 3.4]
floof58 is now known as Guest82
floof58 has joined #panfrost
<alyssa> jekstrand: so, umm, not that it's any of my business or anything, because it's not
Guest82 has quit []
<alyssa> but why doesn't panvk ci work?
<alyssa> "Some failures found"
<alyssa> "RESULT=pass"
<daniels> oh no no no
<alyssa> seems to be just VK that's affected
<daniels> ~/wayland/wayland-protocols/chrome push ← → % bash -c 'false; set -e; echo $?'
<daniels> 0
<daniels> 1
<daniels> ~/wayland/wayland-protocols/chrome push ← → % bash -c 'false; echo $?; set -e'
<alyssa> clearly we need CI for our CI
<daniels> the Python bits actually have really good inbuilt testing
<daniels> the shell bits, however, are shell
<alyssa> ah yes well
<alyssa> kicked off manual CI for that
<alyssa> we'll see how many drivers broke ITMT :p
<alyssa> ...How long was this broken for?
<daniels> 6 days
<alyssa> ah, alright
<alyssa> ....Lol
<alyssa> One of the "regressions" was fixing Neverball apparently :-p
<alyssa> I guess if !16324 is closed the pipeline should cancelled..
<daniels> heh!
<alyssa> user clip plane lowering
JulianGro has joined #panfrost
Guest29 is now known as go4godvin
<jekstrand> alyssa: UBO pulls aren't working properly in VS but are in FS. Any bright ideas?
<alyssa> jekstrand: show me the code
<alyssa> PanVK doesn't have UBO pulls at all wired up yet
<alyssa> unless I don't understand what you mean by pull
<jekstrand> I mean regular LOAD.ubo
<alyssa> Oh.
<alyssa> PANVK_DEBUG=trace please
<alyssa> Exception Status: 0
<alyssa> is this tracing before or after submission?
<jekstrand> idk
<alyssa> if after submission, that implies a gpu hang
<alyssa> (if before, that's normal)
<jekstrand> Yup, it's faulting
<jekstrand> bah
<jekstrand> should have known to check for that
<jekstrand> Unhandled Page fault in AS0 at VA 0x0000000000000000
<jekstrand> Reason: TODO
<jekstrand> raw fault status: 0x10002C2
<jekstrand> decoded fault status: SLAVE FAULT
<jekstrand> exception type 0xC2: TRANSLATION_FAULT_2
<jekstrand> access type 0x2: READ
<jekstrand> source id 0x100
<alyssa> ubuf_1[0] = 0x0;
<alyssa> // XXX: null pointer deref
<alyssa> Look it even tells you what you screwed up :-p
<jekstrand> That should be fine
<jekstrand> Nothings reading that one
<alyssa> +LOAD.i32.ubo t1, 0x00000020 /* 0.000000 */, 0x00000001 /* 0.000000 */, @r0
<alyssa> I beg to differ?
<jekstrand> Except maybe the 1st clause is...
<jekstrand> bah
<alyssa> Literally the first instruction reads that one?
<jekstrand> Ok, I've got my bug. Now just need to figure out why
<jekstrand> /o\
<alyssa> clause_52 and clause_54 also read it
<alyssa> glad I could be of service :-p
<alyssa> this is why i always run with PANVK_DEBUG=sync. always
<jekstrand> This is gonna be stupid again....
<alyssa> jekstrand: cts-runner took like an hour to run deqp-gles3
<alyssa> today is all about stupid! :-D
<alyssa> Fail (Result image invalid)
<alyssa> shit.
<jekstrand> :(
<jekstrand> The most annoying thing is that, once I fix this, I'm going to need to do another run which means another 8 hours. :(
<alyssa> Same hat.
<alyssa> dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelemen
<alyssa> ts_separate_grid_1000x1000_drawcount_1
<alyssa> ma'am this is a wendy's
* alyssa wonders why that is failing under cts-runner but never came up in my deqp-runner runs
anarsoul has quit [Ping timeout: 480 seconds]
anarsoul has joined #panfrost
<tomeu> maybe due to deqp-runner's flake retries?
* jekstrand wonders if there's another instance of uniforms_to_ubo he needs to kill
<tomeu> but it should print if it fails at any point
<jekstrand> Or maybe we're passing the wrong sysvals UBO index?
<jekstrand> Found it, I think
<jekstrand> dang...
<alyssa> woof
<jekstrand> The bifrost compile things nir->info.num_ubos means something
<jekstrand> Maybe?
<alyssa> maybe?
<jekstrand> sysval_ubo = MAX2(b->shader->inputs->sysval_ubo, b->shader->nir->info.num_ubos)
<jekstrand> WTH?
<jekstrand> Why is it clamping things up?
* jekstrand sees no universe in which that's a good idea.
* jekstrand wonders if this also fixes texturequerysize
<jekstrand> I mean, it was clearly put there for SOME reason
<jekstrand> I just question the quality of the reason. :P
<jekstrand> Woo! Pass
<alyssa> :-D
<jekstrand> Oh... It's because of 3559efb9bf5c
<jekstrand> Looks like we need to teach GL panfrost to pass num_ubos there.
<alyssa> 15:17 * jekstrand sees no universe in which that's a good idea.
<alyssa> sorry copypaste fail
ggardet has quit []
<alyssa> I really hate panfrost_create_scanout_res
<alyssa> just. everything about it.
* jekstrand kicks off another full run. Hopefully this one will go faster because I'll hang a lot less. :D
<jekstrand> I think I'll reboot first. Just to be sure things are in a nice state
rasterman has quit [Quit: Gettin' stinky!]
<greenjustin> This maybe a dumb question, but do folks here happen to know if Mali GPUs have the ability to sleep?
<greenjustin> We've been monitoring the GPU frequency on some of our mediatek boards recently, and noticed they tend to scale the frequency appropriately, but never seem to actually idle. Not sure if it's a driver thing or if the hardware just lacks the capability
<macc24> greenjustin: they do
<macc24> i noticed that when running just fbcon on my mt8183 device /sys/class/devfreq/1304000.gpu/trans_stat shows no activity
<greenjustin> That's interesting, I've been monitoring exactly that file on MT8192s and noticed it never shows no activity
<greenjustin> I'm thinking it's a kbase thing...all the more reason to make the switch :-)
<macc24> oh
<macc24> kbase
<alyssa> Why do we have panfrost_create_scanout_res again?
<alyssa> I fear if I ask I might end up learning things about WSI i don't want to know
<anholt> greenjustin: what are you using to monitor gpu frequency? /debug/clk_summary?
<anholt> I wouldn't expect a reported frequency there to go to 0. generally the clock will stay programmed at a rate, but you'll power off the block and disable the clock.
<anholt> if you're using perf counters or something to watch the gpu's clock, then I would expect that accessing the perf counters is keeping the gpu on.
<greenjustin> anholt: I've been using /sys/class/devfreq/1300000.mali/trans_stat
<macc24> greenjustin: wait, there's mtk chip with mali at 1300000?
<greenjustin> anholt: On most platforms it stops accounting time when the GPU is idle. On MTK it looks like that's just never...
<greenjustin> macc24: err, that should read 13000000. I missed a 0
<macc24> greenjustin: what machine are you working ?
<macc24> working on*
<greenjustin> macc24: Asurada. I think it's "ASUS Chromebook Flip CM3200FM1A" or something
<macc24> ah
<macc24> mt8192
<robmur01> FWIW there could be any number of reasons why a GPU doesn't go idle, or doesn't suspend while idle. "Mali", or even "Mali on MTK" is way to broad to generalise
<macc24> greenjustin: what are you running on that machine when you want the gpu to be idling?
<greenjustin> macc24: Nothing really. I stopped the UI, this is just a TTY console. I expect that all goes through the other DRM device on these mediatek SoCs rather than Mali, right?
<macc24> greenjustin: "tty console", chromeos's frecon?
<greenjustin> yes
<macc24> try stopping it
<macc24> and then testing
<robmur01> pay attention to /sys/device/platform/<gpu>/power/ too
<greenjustin> macc24: stopping frecon doesn't seem to help
<greenjustin> robmur01: I don't have a /sys/device/platform/<gpu>/power afaict, but I do have /sys/kernel/debug/mali0/ipa_current_power. That seems to be putting out a constant "22" unknown units...
<robmur01> note that "<gpu>" is a placeholder for whatever the name of the actual GPU device is on that platform, since I don't know ;)
<greenjustin> robmur01: Right, I figured I just didn't see anything in that directory that looked like the mali
<robmur01> for DT it's usually based on the node name, so possibly "13000000.<something>"
<robmur01> ah, in fact it might be symlinked as "device" from the devfreq dir too
<greenjustin> robmur01: I do have a /sys/class/devfreq/13000000.mali/device/power/! I can't make much sense of it though. runtime_usage is "0", runtime_status is "unsupported", and runtime_enabled is "disabled"
<macc24> heh
<robmur01> well there you go then, it's not suspending because runtime PM is disabled. Whether that's down to the kernel config or that particular platform's kbase port is something you could investigate if you were sufficiently bothered
<macc24> just mtk doing mtk things
<robmur01> but is that the one where you have to do at least one particularly nasty erratum workaround when turning the thing on, so never turning it off isn't so bad an idea...
<greenjustin> oooor I just call this "ammunition for justifying the switch to panfrost" :P
<greenjustin> robmur01: Oh? It's a G57 in an MT8192, if that's the right GPU. But it's about as possible in my estimation that ARM messed up the drivers. I've already found some pretty glaring bugs in them...
<macc24> greenjustin: almost every downstream kernel is 'messed up' imo
<greenjustin> macc24: yeah, it's a hot mess.
<macc24> greenjustin: convince google to use 100% pure upstream kernel
<macc24> plz
<robmur01> yes, the kbase driver as supplied in the DDK is hardly exemplary upstream quality, but bear in mind that it's then up to the licensee to completely butcher it with their own platform integration bits
<greenjustin> macc24: I wish... That would certainly make my job easier
<alyssa> robmur01: yes, MT8192 is shipping Natt r0p0 (iirc), which is affected by that tricky errata
<macc24> greenjustin: tbh google does a pretty good job of upstream stuff, much better than what i saw in few downstream kernels for couple of rockchip devices
<macc24> alyssa: oh it's gpu-specific?
<alyssa> robmur01: I typed out a workaround for it, but couldn't find out how to get the issue to trigger so I left it out of my kernel
<robmur01> furthermore, if you switch to panfrost in the hope of gaining RPM, right now at the very least you'll lose devfreq. Patches welcome ;)
<alyssa> and mt8192+panfrost is scheduled to pass the CTS in a few hours, so it can't be /too/ critical ......
<macc24> alyssa: what issue?
<alyssa> macc24: dont remember the # off hand
<macc24> robmur01: i still have that slightly wrong but not completely wrong opp patch that might work on mt8192 too :D
<macc24> that opp patch that makes all mt8183 cadmium devices slightly unstable
<alyssa> the one I told you to drop months ago, right?
<macc24> yep
<greenjustin> robmur01: I noticed that when I was testing on jacuzzi. The thing is though, we could just fix ourselves as opposed to waiting for some combination of ARM or MTK to get things working. That's honestly the biggest selling point of panfrost for us I think, is the reduced maintenance burden
<robmur01> "slightly wrong", the motto of downstream :D
<macc24> listen, i lost the patch made by someone on here that was mostly right patch
<alyssa> robmur01: One of my coworkers reported that the mediatek clock/supply/etc hierarchy on mt819x is overcomplicated downstream (to save a couple mW)... With a modifier device tree, most of the platform specific crap goes away and devfreq works.
<macc24> robmur01: and i can't find the patch now :|
<alyssa> robmur01: I don't know the details. Once it became clear that upstreaming would be less Mali work and more dealing with MTK's nonsense, I hopped back to Mesa and am waiting for one of my coworkers to take over the patch series.
<robmur01> oh, so we can just squint and not bother scaling one of the regulators? Yeah, that might fly
<alyssa> I think so, yeah
<greenjustin> macc24: Wait is that the patch you sent me that just forces the video ram regulator to be always on?
<greenjustin> I definitely still have that locally somewhere...
<macc24> greenjustin: no, the different one
<alyssa> I gave up at "why does this need 4 clocks"
<macc24> alyssa: O.o
* robmur01 is gonna guess they do explicit clock reparenting from the GPU driver for glitch-free rate changes or similar, rather than abstracting it in the clock driver
<robmur01> at least judging from the names in macc24's DT patch
<macc24> robmur01: i recall the word "reparent" when dealling with clocks and mt8183 so it might be right
<greenjustin> oh! does this fix the devfreq problem on mt8183s? or at least attempt to...
<robmur01> it's not an uncommon thing, I think it's prevalent on Amlogic SoCs too
<macc24> it tries to, alyssa thinks it's a bad fix, i think it's a bad fix too
<macc24> but it (mostly) works
<macc24> so
* macc24 shrugs
<robmur01> I tried to do a "good" fix, but it only proved that the OPP core API itself needs some work
<greenjustin> maybe I can TAL once I dig myself out of this MM21 rabbit hole...
soreau has quit [Read error: No route to host]
soreau has joined #panfrost
<greenjustin> alyssa: you mentioned G57s have a weird issue that makes it a bad idea to sleep? do you happen to have a link to more information on that?
<robmur01> hmm, I wonder if hacking panfrost_devfreq_target() to set the clock rate to 26MHz before setting the real OPP might trigger the reparenting dance automagically (unless mfgpll_ck can actually run that slow)? Might be a cheeky compromise if so...
<robmur01> (and of course assuming that the CCF would change the PLL rate *before* switching the mux back for the second change, which I have no idea about either)
* jekstrand should probably review alyssa's blend patch
Rathann has joined #panfrost
<alyssa> jekstrand: ....what patch
<alyssa> is this reverse psychology?
<alyssa> I'm pretty sure that's supposed to read
* alyssa should probably review jekstrand's blend patch
<alyssa> Uh, ok, I guess I can go do that...
rasterman has joined #panfrost
<Pu244> *testing*
<Pu244> Huh, OK, just an issue in the Qubes channel. n/m.
<alyssa> Test failed, please try again later.
<Pu244> Test failed, yay! That means I can fix what's wrong and test again!
<alyssa> :D
<Pu244> <3 test driven development.
<jekstrand> alyssa: Or you could review those. :P
<HdkR> test driven development, great until you find something in an application that doesn't trigger a test :P
<jekstrand> HdkR: Write more tests!
<jekstrand> That's called DDT: Debug-driven testing. :P
<HdkR> You're right
<HdkR> Need more llvm bugpoint style bug reduction
icecream95 has joined #panfrost
<alyssa> jekstrand: I know you're joking but I've been doing that with panfrost recently to excellent effect
<alyssa> Someone reports an issue, icecream95 debugs it and sends a 1 line fix, I merge and then root cause it and end up rewriting the whole module and unit testing the new one and finding and fixing other bugs in the process....
<alyssa> We have a process here! :-p
<jekstrand> hehe
<icecream95> alyssa: The next step is figuring out how I can write 0 line fixes for bugs
<anarsoul> icecream95: 0 line net like +1 -1? :)
robmur01_ has joined #panfrost
robmur01 has quit [Ping timeout: 480 seconds]
<alyssa> icecream95: does that require me to write perfect code? I mean I guess I can *try* ...
<icecream95> alyssa: No, it requires you to write enough code that I can fix bugs by only deleting lines :)
<alyssa> Ah ha!
<alyssa> Yes, I can do that! :-D
* alyssa should stop watchig CTS results scroll by and do not-work.
kenzie has quit [Quit: The Lounge - https://thelounge.chat]
rasterman has quit [Quit: Gettin' stinky!]
kenzie has joined #panfrost
JulianGro has quit [Remote host closed the connection]
<jekstrand> alyssa: Yes, yes you should.
<jekstrand> (He says as he flips over to his panvk dEQP run tab just to check on it)