#panfrost on 2024-02-14 — irc logs at oftc.irclog.whitequark.org

2023-04-06 13:40 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard + Bifrost + Valhall - Logs https://oftc.irclog.whitequark.org/panfrost

06:05 Alex^ has quit [Ping timeout: 480 seconds]

07:22 chewitt has quit [Quit: Zzz..]

07:22 chewitt has joined #panfrost

07:34 chewitt has quit [Quit: Zzz..]

08:07 warpme has joined #panfrost

08:14 pjakobsson has joined #panfrost

09:04 warpme has quit []

09:18 simon-perretta-img has quit [Ping timeout: 480 seconds]

09:20 simon-perretta-img has joined #panfrost

09:29 rasterman has joined #panfrost

09:39 chewitt has joined #panfrost

09:50 chewitt has quit [Ping timeout: 480 seconds]

09:52 warpme has joined #panfrost

10:01 CME has quit [Ping timeout: 480 seconds]

10:29 warpme has quit []

10:29 chewitt has joined #panfrost

10:29 warpme has joined #panfrost

10:31 warpme has quit []

10:33 warpme has joined #panfrost

10:37 chewitt has quit [Ping timeout: 480 seconds]

10:39 warpme has quit []

10:40 warpme has joined #panfrost

10:44 chewitt has joined #panfrost

10:53 chewitt has quit [Ping timeout: 480 seconds]

11:16 warpme has quit []

11:17 CME has joined #panfrost

11:21 Googulator has quit [Read error: Connection reset by peer]

11:21 Googulator has joined #panfrost

11:25 warpme has joined #panfrost

11:29 hanetzer1 has joined #panfrost

11:29 CME_ has joined #panfrost

11:31 hanetzer has quit [Ping timeout: 480 seconds]

11:32 chewitt has joined #panfrost

11:34 CME has quit [Ping timeout: 480 seconds]

12:10 pH5 has quit [Read error: Network is unreachable]

12:10 pH5 has joined #panfrost

12:16 pH5 has quit [Read error: Connection reset by peer]

12:16 pH5 has joined #panfrost

12:35 warpme has quit []

12:57 warpme has joined #panfrost

13:21 simon-perretta-img has quit [Ping timeout: 480 seconds]

13:21 simon-perretta-img has joined #panfrost

13:23 Leopold___ has joined #panfrost

13:29 simon-perretta-img has quit [Ping timeout: 480 seconds]

13:29 simon-perretta-img has joined #panfrost

13:30 Leopold____ has joined #panfrost

13:30 Leopold_ has quit [Ping timeout: 480 seconds]

13:32 Leopold___ has quit [Ping timeout: 480 seconds]

13:36 warpme has quit []

13:41 thaytan has quit [Ping timeout: 480 seconds]

14:31 thaytan has joined #panfrost

14:44 <rasterman> bbrezillon: hey hey there you are :)

15:06 <rasterman> bbrezillon: try disable weston's egl image cache ... it seems it has one. efl doesn't thus it eglcreate+destroy image per window per frame rendered and eventually map of buffer fails - va space exhausted due to repeated map+unmaps (you'd eventually hit this problem with enough surfaces updating a lot) but enlightenment hits it reliably running anything that updates like glmark etc.

15:11 <bbrezillon> rasterman: hehe, exactly what daniels suggested :-)

15:12 <rasterman> i was just talking to him :)

15:12 <rasterman> so yeah

15:12 <rasterman> anyway - was just something i hit yesterday

15:12 <rasterman> and was "wtf....?"

15:12 <rasterman> first i checked all my own code for leaks everywhere i could find

15:13 <rasterman> didnt find anything... my code is of course perfect without any bugs at all! it couldnt possibly have a leak!!!!

15:13 <rasterman> </sarcasm>

15:13 <rasterman> but ... i didn't find any leak involving gbm/dmabufs/eglimages etc. ... so i dug into mesa and instrumented the panthor buffer map/unmaps

15:14 <rasterman> and it also didnt appear to leak - it was mapping and unmapping the same thing consistently

15:15 <rasterman> i then noticed weston stopped after a few hundred or whatever map/unmaps

15:15 <rasterman> it stopped doing this

15:15 <rasterman> and i was "ooooh so thats why it doesnt see the problem!"

15:16 <rasterman> i suspected a cache in weston somewhere - didnt get to check but i was pretty sure the map failing at this point was something kernel-driver side as mesa was behaving right and everything above it too

15:16 <bbrezillon> if you end up with the same amount of map/unmap a given BO at the kernel level, that probably means the problem is in mesa (src/panfrost/lib/kmod/panthor_kmod.c)

15:17 <bbrezillon> *for a given BO

15:18 <bbrezillon> can instrument calls to util_vma_heap_{alloc,free}() in there?

15:19 <rasterman> ummm

15:19 <rasterman> well the map/unmap directly relate to create+destory egl image

15:20 <rasterman> and thats being driven by either weston or evas - evas just always create+destroys per window surface its compositing every frame - religiously.

15:20 <rasterman> and panthor_kmod_vm_bind() is where i see the issue

15:21 <rasterman> PAN_KMOD_VM_OP_TYPE_MAP and PAN_KMOD_VM_OP_TYPE_UNMAP

15:21 <bbrezillon> ah, so map/unmap calls are unbalanced at the kernel level?

15:21 <rasterman> i printf;'d both and i see the same thing being mapped and unmapped all the time

15:21 <rasterman> at the mesa level they are balanced

15:22 <bbrezillon> map is always synchronous, and unmap asynchronous

15:22 <bbrezillon> and we collect free-VAs in the VA alloc path

15:22 <rasterman> well the requests are always there....

15:22 <rasterman> and yeah i saw the collect code

15:22 <rasterman> collect has nothnig to collect...

15:23 <bbrezillon> is the list empty, or does it contain elements whose attached fence is never signaled?

15:23 <rasterman> empty

15:23 <rasterman> panthor_kmod_vm_collect_async_unmaps() - i dumped that too

15:23 <rasterman> it is called - but nothing ever iterated in the list

15:24 <bbrezillon> and where does the VA allocation fails?

15:24 <bbrezillon> in mesa, or in the kernel, at VM_BIND(op=map) type?

15:24 <bbrezillon> *time

15:24 <rasterman> eventually a map fails

15:24 <rasterman> in mesa

15:24 <rasterman> panthor_kmod_vm_alloc_va() fails specifically

15:25 <rasterman> in the PAN_KMOD_VM_OP_TYPE_MAP op handler

15:26 <bbrezillon> ok, so that's util_vma_heap_alloc() failing then

15:26 <rasterman> yeah

15:27 <rasterman> what i see is the return addresses keep increasing every call - so its not reusing any address space

15:28 <rasterman> the total mapped mem reagions doesnt go up over time

15:28 <rasterman> (well pmaps doesnt show it)

15:28 <rasterman> i only started digging yesterday so i dont have the full picture yet

15:30 <rasterman> actually i didnt check if util_vma_heap_free() is called as it shouild be end of that ops handling func

15:35 <bbrezillon> yeah, it could also be a mismatch in the panthor_kmod_va_collect::size field

15:36 <rasterman> ooooh how interesting

15:36 <rasterman> sorry problem might be in mesa

15:36 <rasterman> nothing ever actually unmaps

15:36 <rasterman> well util_vma_heap_free() is not called

15:37 <rasterman> also the random "window vanishes" problems i mentioned might be related

15:37 <rasterman> it may be weston is switching toi a plane to display

15:38 <rasterman> ha

15:38 <rasterman> got weston to completel hang now....

15:42 <rasterman> oh fantastic....

15:43 <rasterman> i've managed to hange the entire kms subsystem...

15:43 <bbrezillon> ok, so I found a leak in the sync unmap path, but we're only using async unmap in the gallium driver, so that's not it

15:43 rz_ has quit [Ping timeout: 480 seconds]

15:43 rz has joined #panfrost

15:43 <rasterman> [15:43:03.951] fatal: failed to create compositor backend

15:43 <rasterman> my display is hung with whatecver weston had last

15:44 <rasterman> dont try and run too many weston-simple-egl's and glmark2 :)

15:45 <bbrezillon> (KMS hang => at least I'm not the one to blame for that one :-))

15:47 <rasterman> hehehe

15:48 <rasterman> dang

15:48 <rasterman> it wont even reboot now

15:48 <rasterman> reboot -> hang... wtf? gee. well done me! :)

16:03 <bbrezillon> rasterman: I tried with PAN_MESA_DEBUG=nocache, and I see the VAs being collected

16:03 <bbrezillon> so the logic is not completely broken, at least

16:04 <rasterman> sorrty

16:04 <rasterman> was rebooting and resetting up my build setup

16:04 Googulator has quit [Ping timeout: 480 seconds]

16:05 Googulator has joined #panfrost

16:06 <rasterman> so ok the problem might not be in kernel

16:06 <rasterman> i assuemd the util_vma_heap_free was getting called as a PAN_KMOD_VM_OP_TYPE_UNMAP did queue the unmap bind op

16:06 <rasterman> it... never happens :(

16:07 <bbrezillon> can you try with the BO cache disabled?

16:07 <rasterman> i did

16:08 <rasterman> problem still there :(

16:08 <bbrezillon> and do you see some VAs collected when you do?

16:08 <rasterman> no actually

16:09 <bbrezillon> uh, that's not expected

16:09 <rasterman> panthor_kmod_vm_collect_async_unmaps() finds an empty list

16:10 <bbrezillon> can you push you mesa branch somewhere, so I can have a look?

16:11 <rasterman> sec....

16:11 <rasterman> i have forced a fix

16:11 <rasterman> but i'm pretty sure it's wong

16:12 <bbrezillon> can you check that you're entering this if block => https://gitlab.freedesktop.org/bbrezillon/mesa/-/blob/panfrost-v10/src/panfrost/lib/kmod/panthor_kmod.c?ref_type=heads#L970 ?

16:12 <rasterman> ummmi dont have this checked out as my own gitlab fork

16:12 <rasterman> can i just git u a diff for now?

16:12 <rasterman> i'll set up my own fork/branch

16:13 <bbrezillon> the whole point of pushing your branch was to make sure you use the latest panfrost-v10 version :-)

16:13 <rasterman> oh wait

16:13 <rasterman> my mesa doest have that block

16:13 <bbrezillon> ahah

16:13 <rasterman> if (!ret && va_collect_cnt) {

16:13 <rasterman> after the drmioctl

16:13 <rasterman> and no track activity if either

16:13 <rasterman> damn

16:13 <rasterman> am i behind? :)

16:14 <bbrezillon> most certainly, yes

16:14 <rasterman> heheh

16:14 <rasterman> well poop

16:15 <rasterman> pulling

16:15 <bbrezillon> panfrost-v10 on my repo should contain the latest version (it's the one feeding the MR)

16:15 <rasterman> oh wait

16:15 <rasterman> i'm still on panvk-v10-wip

16:15 <bbrezillon> nah, that's wrong

16:16 <rasterman> hmm

16:16 <rasterman> i dont see a panfrost-v10

16:17 <rasterman> only the wip one...

16:17 <rasterman> i pulled right now...

16:17 <bbrezillon> on my repo?

16:17 <rasterman> url = https://gitlab.freedesktop.org/bbrezillon/mesa.git

16:18 <rasterman> oh wait

16:18 <bbrezillon> unless gitlab UI is lying to me, it does exist https://gitlab.freedesktop.org/bbrezillon/mesa/-/tree/panfrost-v10?ref_type=heads :P

16:18 <rasterman> is it panfrost-v10?

16:18 <rasterman> as opposed to panvk-v10-wip ?

16:18 <bbrezillon> it is panfrost-v10

16:18 <rasterman> aaah never mind

16:18 <rasterman> was looking at panvk :)

16:19 <rasterman> branch naming... yay

16:19 <bbrezillon> panvk-v10-wip is some experimental branch

16:19 Leopold____ has quit [Remote host closed the connection]

16:19 <rasterman> yeah... tho i like experiments

16:20 * bbrezillon this is not the branch you're looking for

16:20 <rasterman> ummm...

16:21 <rasterman> libEGL fatal: DRI driver not from this Mesa build ('24.0.0-devel (git-4559db3bd0)' vs '24.0.0-devel (git-1ae5e8fcf7)')

16:21 <rasterman> this is new

16:21 <rasterman> let me do a full rebuild

16:21 <bbrezillon> mismatch between the libEGL and dri driver

16:21 <rasterman> meson seems to have not done the right thing™

16:22 <bbrezillon> could be that EGL was not enabled in your new config

16:22 <rasterman> yeah 0- it did a partial rebuild

16:22 <rasterman> only 200 files gore rebuilt on my branch switch

16:22 <rasterman> err got

16:22 <rasterman> oh ugh... still same issue.

16:23 <bbrezillon> meson should be smart enough to figure out what to rebuild

16:23 Leopold_ has joined #panfrost

16:23 <bbrezillon> are you sure your meson config is correct?

16:23 <rasterman> there

16:23 <rasterman> just nuked my /opt/panfrost and re-installed

16:23 <bbrezillon> in particular -Degl=enabled

16:24 <rasterman> yeah

16:24 <rasterman> egl is enabled

16:24 <bbrezillon> install path, maybe?

16:24 <rasterman> same install path

16:24 <rasterman> i have a script - havent changed it...

16:24 <rasterman> gah

16:24 <rasterman> okl

16:24 <rasterman> same va alloc problem

16:24 <rasterman> but i have an ugly fix,...

16:25 <bbrezillon> can you add a trace in the if block I pointed out earlier?

16:26 <bbrezillon> I mean, with PAN_MESA_DEBUG=nocache, you should definitely see some VAs collected asynchronously

16:26 <rasterman> yeah that'll be next

16:26 <rasterman> aftert testing my fix

16:27 <rasterman> werd... ok - yes. my fix does work... but i don't like it

16:27 <rasterman> oh wait

16:27 <rasterman> wtf... i dont have that block

16:28 <bbrezillon> :D

16:28 <bbrezillon> looks like you lost your git foo

16:28 <rasterman> so i'm on panfrost-v10

16:28 <rasterman> https://gitlab.freedesktop.org/bbrezillon/mesa.git

16:29 <rasterman> Already up to date.

16:29 <rasterman> so git says

16:29 <rasterman> the last commit is from nov 11

16:29 <bbrezillon> again, unless the UI lies to me, it's there https://gitlab.freedesktop.org/panfrost/mesa/-/blob/panfrost-v10/src/panfrost/lib/kmod/panthor_kmod.c#L966

16:29 <rasterman> this cant be right

16:30 <rasterman> Date: Tue Nov 14 11:18:23 2023 +0100

16:30 <rasterman> oh wait

16:30 <bbrezillon> git reset --hard <my-remote>/panfrost-v10

16:30 <rasterman> Date: Thu Jan 18 12:18:28 2024 +0100

16:30 <rasterman> from you i guess...

16:32 <rasterman> this is... weird...

16:32 <rasterman> re-cloning...

16:32 <bbrezillon> I force push to this branch

16:33 <rasterman> still the same

16:34 <rasterman> oh wait

16:34 <rasterman> i'm on your tree

16:34 <bbrezillon> still the same as in, the if block doesn't exist?

16:34 <rasterman> nyour personal one

16:34 <rasterman> bbrezillon not panfrost

16:34 <bbrezillon> nah, sorry for the confusion

16:35 <bbrezillon> it should be my tree

16:35 <bbrezillon> I pushed to both panfrost and bbrezillon

16:35 <rasterman> thats what i expected...

16:35 <bbrezillon> but those should contain the exact same version

16:35 <rasterman> but panfrsot != bbrezillon in thei scase

16:35 <bbrezillon> WTF?

16:35 <rasterman> oh wait

16:35 <rasterman> now it does have the block

16:36 <rasterman> the commit log looked the same

16:36 <bbrezillon> make sure HEAD is at 36ac5af2cc09

16:36 <rasterman> yeah

16:36 <rasterman> 36ac5af2cc09b6cc41fb5f68cb95af6e71a33def

16:36 <rasterman> i'm t here

16:36 <bbrezillon> ok

16:37 <rasterman> building... so something went poop with forced pushes or something

16:37 <rasterman> either way... hash is right... buildunderway

16:38 <rasterman> "too many branches" :)

16:38 <rasterman> i didnt update my tree for a little bit - been busy with various conferences/meetings

16:38 <bbrezillon> "too many WIP" :P

16:39 <rasterman> yeah

16:39 <rasterman> i hvent updated kernel either in a few weeks

16:40 <rasterman> ok

16:40 <rasterman> lets see if bug still there first

16:40 <rasterman> ha

16:40 <rasterman> it's gone

16:41 <rasterman> so "fixed already"

16:41 <rasterman> my ugly fix is not needed

16:41 <bbrezillon> just curious, how do you update your test/dev branch with git?

16:41 <rasterman> git pull --rebase

16:41 <bbrezillon> hm, that should work

16:41 <rasterman> yeah

16:41 <bbrezillon> don't know what happened

16:41 <rasterman> if i have changes i normally stash/unstash them

16:42 <rasterman> but va bug gone now! woot.

16:42 <rasterman> thanks so much!

16:42 <rasterman> https://termbin.com/napk

16:43 <rasterman> that was my fix right now... but it wasn't nice...

16:43 <bbrezillon> I still need to fix the sync unmap path though

16:43 <rasterman> yeah

16:43 <bbrezillon> well, I fixed it

16:43 <rasterman> good to note that :)

16:43 <rasterman> or well fix it :)

16:44 <bbrezillon> I also have a bunch of tiny fixes in the pipe

16:44 <rasterman> this is the problem with so much "WIP"

16:44 <rasterman> not to mentioon spread across different branches and across kernel and mesa...

16:44 <rasterman> why dont we just put all of kernel and mesa in one git tree and we all work in master! :) :P :)

16:45 <bbrezillon> well, too much WIP comes from working on so many different things at the same time

16:45 <bbrezillon> and all that directly comes from the fact we're mostly out-of-tree at the moment

16:46 <rasterman> and yeah... that doesn;'t help

16:46 <bbrezillon> but you can just pretend all those WIP branches don't exist, and refer to panfrost-v10, as I mentioned in my cover-letter and MR description :P

16:47 <rasterman> :)

16:47 <bbrezillon> jedi hand wave => this is not the branch you're looking for

16:47 <rasterman> i was busy beating my head on why enlightenment didnt display on my rock5...

16:47 <rasterman> so i ignored mesa for a bit

16:47 <rasterman> fouind out why and fixed it yesterday - thus stumbled into this issues

16:48 <rasterman> but yeah - slightly out of date trees was my issue now

16:55 <rasterman> oh damn... THAT was enlightenment's performance problem...

16:55 Leopold_ has quit [Remote host closed the connection]

16:56 <bbrezillon> ???

16:56 <rasterman> evas had dithering enabled

16:56 <rasterman> there is a "high quality dithering" shader that takes intermediate fragment shader results and uses a dither matrix to get apparent > 24bpp

16:57 <rasterman> (visually it gets a lot better for smooth gradients, fades etc.) because the gpu frag shader will have intermediate results in regs that are more than 8bit per channel

16:57 <rasterman> so dither it back out to backbuffer...

16:57 <rasterman> but that doesn't come for free

16:58 <rasterman> turn it off and it'll be equivalent to weston then

17:00 Leopold has joined #panfrost

17:00 larunbe has joined #panfrost

17:04 alarumbe has quit [Ping timeout: 480 seconds]

17:11 Leopold has quit [Remote host closed the connection]

17:12 Leopold has joined #panfrost

17:20 Googulator has quit [Read error: Connection reset by peer]

17:21 Googulator has joined #panfrost

17:52 chewitt has quit [Quit: Zzz..]

18:15 rasterman has quit [Quit: Gettin' stinky!]

20:50 pbrobinson has quit [Ping timeout: 480 seconds]

20:50 pbrobinson has joined #panfrost

21:08 simon-perretta-img has quit [Read error: Connection reset by peer]

21:09 simon-perretta-img has joined #panfrost

21:34 paulk has quit [Ping timeout: 480 seconds]

21:36 paulk has joined #panfrost

22:07 simon-perretta-img has quit [Ping timeout: 480 seconds]

22:07 simon-perretta-img has joined #panfrost

22:47 Googulator has quit [Ping timeout: 480 seconds]

22:47 Googulator has joined #panfrost

23:07 Googulator has quit [Read error: Connection reset by peer]

23:10 Googulator has joined #panfrost

23:37 Googulator has quit [Read error: Connection reset by peer]

23:37 Googulator has joined #panfrost

23:52 Googulator has quit [Ping timeout: 480 seconds]

23:52 Googulator has joined #panfrost