alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard + Bifrost + Valhall - Logs https://oftc.irclog.whitequark.org/panfrost
Alex^ has quit [Ping timeout: 480 seconds]
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
chewitt has quit [Quit: Zzz..]
warpme has joined #panfrost
pjakobsson has joined #panfrost
warpme has quit []
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #panfrost
rasterman has joined #panfrost
chewitt has joined #panfrost
chewitt has quit [Ping timeout: 480 seconds]
warpme has joined #panfrost
CME has quit [Ping timeout: 480 seconds]
warpme has quit []
chewitt has joined #panfrost
warpme has joined #panfrost
warpme has quit []
warpme has joined #panfrost
chewitt has quit [Ping timeout: 480 seconds]
warpme has quit []
warpme has joined #panfrost
chewitt has joined #panfrost
chewitt has quit [Ping timeout: 480 seconds]
warpme has quit []
CME has joined #panfrost
Googulator has quit [Read error: Connection reset by peer]
Googulator has joined #panfrost
warpme has joined #panfrost
hanetzer1 has joined #panfrost
CME_ has joined #panfrost
hanetzer has quit [Ping timeout: 480 seconds]
chewitt has joined #panfrost
CME has quit [Ping timeout: 480 seconds]
pH5 has quit [Read error: Network is unreachable]
pH5 has joined #panfrost
pH5 has quit [Read error: Connection reset by peer]
pH5 has joined #panfrost
warpme has quit []
warpme has joined #panfrost
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #panfrost
Leopold___ has joined #panfrost
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #panfrost
Leopold____ has joined #panfrost
Leopold_ has quit [Ping timeout: 480 seconds]
Leopold___ has quit [Ping timeout: 480 seconds]
warpme has quit []
thaytan has quit [Ping timeout: 480 seconds]
thaytan has joined #panfrost
<rasterman> bbrezillon: hey hey there you are :)
<rasterman> bbrezillon: try disable weston's egl image cache ... it seems it has one. efl doesn't thus it eglcreate+destroy image per window per frame rendered and eventually map of buffer fails - va space exhausted due to repeated map+unmaps (you'd eventually hit this problem with enough surfaces updating a lot) but enlightenment hits it reliably running anything that updates like glmark etc.
<bbrezillon> rasterman: hehe, exactly what daniels suggested :-)
<rasterman> i was just talking to him :)
<rasterman> so yeah
<rasterman> anyway - was just something i hit yesterday
<rasterman> and was "wtf....?"
<rasterman> first i checked all my own code for leaks everywhere i could find
<rasterman> didnt find anything... my code is of course perfect without any bugs at all! it couldnt possibly have a leak!!!!
<rasterman> </sarcasm>
<rasterman> but ... i didn't find any leak involving gbm/dmabufs/eglimages etc. ... so i dug into mesa and instrumented the panthor buffer map/unmaps
<rasterman> and it also didnt appear to leak - it was mapping and unmapping the same thing consistently
<rasterman> i then noticed weston stopped after a few hundred or whatever map/unmaps
<rasterman> it stopped doing this
<rasterman> and i was "ooooh so thats why it doesnt see the problem!"
<rasterman> i suspected a cache in weston somewhere - didnt get to check but i was pretty sure the map failing at this point was something kernel-driver side as mesa was behaving right and everything above it too
<bbrezillon> if you end up with the same amount of map/unmap a given BO at the kernel level, that probably means the problem is in mesa (src/panfrost/lib/kmod/panthor_kmod.c)
<bbrezillon> *for a given BO
<bbrezillon> can instrument calls to util_vma_heap_{alloc,free}() in there?
<rasterman> ummm
<rasterman> well the map/unmap directly relate to create+destory egl image
<rasterman> and thats being driven by either weston or evas - evas just always create+destroys per window surface its compositing every frame - religiously.
<rasterman> and panthor_kmod_vm_bind() is where i see the issue
<rasterman> PAN_KMOD_VM_OP_TYPE_MAP and PAN_KMOD_VM_OP_TYPE_UNMAP
<bbrezillon> ah, so map/unmap calls are unbalanced at the kernel level?
<rasterman> i printf;'d both and i see the same thing being mapped and unmapped all the time
<rasterman> at the mesa level they are balanced
<bbrezillon> map is always synchronous, and unmap asynchronous
<bbrezillon> and we collect free-VAs in the VA alloc path
<rasterman> well the requests are always there....
<rasterman> and yeah i saw the collect code
<rasterman> collect has nothnig to collect...
<bbrezillon> is the list empty, or does it contain elements whose attached fence is never signaled?
<rasterman> empty
<rasterman> panthor_kmod_vm_collect_async_unmaps() - i dumped that too
<rasterman> it is called - but nothing ever iterated in the list
<bbrezillon> and where does the VA allocation fails?
<bbrezillon> in mesa, or in the kernel, at VM_BIND(op=map) type?
<bbrezillon> *time
<rasterman> eventually a map fails
<rasterman> in mesa
<rasterman> panthor_kmod_vm_alloc_va() fails specifically
<rasterman> in the PAN_KMOD_VM_OP_TYPE_MAP op handler
<bbrezillon> ok, so that's util_vma_heap_alloc() failing then
<rasterman> yeah
<rasterman> what i see is the return addresses keep increasing every call - so its not reusing any address space
<rasterman> the total mapped mem reagions doesnt go up over time
<rasterman> (well pmaps doesnt show it)
<rasterman> i only started digging yesterday so i dont have the full picture yet
<rasterman> actually i didnt check if util_vma_heap_free() is called as it shouild be end of that ops handling func
<bbrezillon> yeah, it could also be a mismatch in the panthor_kmod_va_collect::size field
<rasterman> ooooh how interesting
<rasterman> sorry problem might be in mesa
<rasterman> nothing ever actually unmaps
<rasterman> well util_vma_heap_free() is not called
<rasterman> also the random "window vanishes" problems i mentioned might be related
<rasterman> it may be weston is switching toi a plane to display
<rasterman> ha
<rasterman> got weston to completel hang now....
<rasterman> oh fantastic....
<rasterman> i've managed to hange the entire kms subsystem...
<bbrezillon> ok, so I found a leak in the sync unmap path, but we're only using async unmap in the gallium driver, so that's not it
rz_ has quit [Ping timeout: 480 seconds]
rz has joined #panfrost
<rasterman> [15:43:03.951] fatal: failed to create compositor backend
<rasterman> my display is hung with whatecver weston had last
<rasterman> dont try and run too many weston-simple-egl's and glmark2 :)
<bbrezillon> (KMS hang => at least I'm not the one to blame for that one :-))
<rasterman> hehehe
<rasterman> dang
<rasterman> it wont even reboot now
<rasterman> reboot -> hang... wtf? gee. well done me! :)
<bbrezillon> rasterman: I tried with PAN_MESA_DEBUG=nocache, and I see the VAs being collected
<bbrezillon> so the logic is not completely broken, at least
<rasterman> sorrty
<rasterman> was rebooting and resetting up my build setup
Googulator has quit [Ping timeout: 480 seconds]
Googulator has joined #panfrost
<rasterman> so ok the problem might not be in kernel
<rasterman> i assuemd the util_vma_heap_free was getting called as a PAN_KMOD_VM_OP_TYPE_UNMAP did queue the unmap bind op
<rasterman> it... never happens :(
<bbrezillon> can you try with the BO cache disabled?
<rasterman> i did
<rasterman> problem still there :(
<bbrezillon> and do you see some VAs collected when you do?
<rasterman> no actually
<bbrezillon> uh, that's not expected
<rasterman> panthor_kmod_vm_collect_async_unmaps() finds an empty list
<bbrezillon> can you push you mesa branch somewhere, so I can have a look?
<rasterman> sec....
<rasterman> i have forced a fix
<rasterman> but i'm pretty sure it's wong
<rasterman> ummmi dont have this checked out as my own gitlab fork
<rasterman> can i just git u a diff for now?
<rasterman> i'll set up my own fork/branch
<bbrezillon> the whole point of pushing your branch was to make sure you use the latest panfrost-v10 version :-)
<rasterman> oh wait
<rasterman> my mesa doest have that block
<bbrezillon> ahah
<rasterman> if (!ret && va_collect_cnt) {
<rasterman> after the drmioctl
<rasterman> and no track activity if either
<rasterman> damn
<rasterman> am i behind? :)
<bbrezillon> most certainly, yes
<rasterman> heheh
<rasterman> well poop
<rasterman> pulling
<bbrezillon> panfrost-v10 on my repo should contain the latest version (it's the one feeding the MR)
<rasterman> oh wait
<rasterman> i'm still on panvk-v10-wip
<bbrezillon> nah, that's wrong
<rasterman> hmm
<rasterman> i dont see a panfrost-v10
<rasterman> only the wip one...
<rasterman> i pulled right now...
<bbrezillon> on my repo?
<rasterman> oh wait
<bbrezillon> unless gitlab UI is lying to me, it does exist https://gitlab.freedesktop.org/bbrezillon/mesa/-/tree/panfrost-v10?ref_type=heads :P
<rasterman> is it panfrost-v10?
<rasterman> as opposed to panvk-v10-wip ?
<bbrezillon> it is panfrost-v10
<rasterman> aaah never mind
<rasterman> was looking at panvk :)
<rasterman> branch naming... yay
<bbrezillon> panvk-v10-wip is some experimental branch
Leopold____ has quit [Remote host closed the connection]
<rasterman> yeah... tho i like experiments
* bbrezillon this is not the branch you're looking for
<rasterman> ummm...
<rasterman> libEGL fatal: DRI driver not from this Mesa build ('24.0.0-devel (git-4559db3bd0)' vs '24.0.0-devel (git-1ae5e8fcf7)')
<rasterman> this is new
<rasterman> let me do a full rebuild
<bbrezillon> mismatch between the libEGL and dri driver
<rasterman> meson seems to have not done the right thing™
<bbrezillon> could be that EGL was not enabled in your new config
<rasterman> yeah 0- it did a partial rebuild
<rasterman> only 200 files gore rebuilt on my branch switch
<rasterman> err got
<rasterman> oh ugh... still same issue.
<bbrezillon> meson should be smart enough to figure out what to rebuild
Leopold_ has joined #panfrost
<bbrezillon> are you sure your meson config is correct?
<rasterman> there
<rasterman> just nuked my /opt/panfrost and re-installed
<bbrezillon> in particular -Degl=enabled
<rasterman> yeah
<rasterman> egl is enabled
<bbrezillon> install path, maybe?
<rasterman> same install path
<rasterman> i have a script - havent changed it...
<rasterman> gah
<rasterman> okl
<rasterman> same va alloc problem
<rasterman> but i have an ugly fix,...
<bbrezillon> can you add a trace in the if block I pointed out earlier?
<bbrezillon> I mean, with PAN_MESA_DEBUG=nocache, you should definitely see some VAs collected asynchronously
<rasterman> yeah that'll be next
<rasterman> aftert testing my fix
<rasterman> werd... ok - yes. my fix does work... but i don't like it
<rasterman> oh wait
<rasterman> wtf... i dont have that block
<bbrezillon> :D
<bbrezillon> looks like you lost your git foo
<rasterman> so i'm on panfrost-v10
<rasterman> Already up to date.
<rasterman> so git says
<rasterman> the last commit is from nov 11
<rasterman> this cant be right
<rasterman> Date: Tue Nov 14 11:18:23 2023 +0100
<rasterman> oh wait
<bbrezillon> git reset --hard <my-remote>/panfrost-v10
<rasterman> Date: Thu Jan 18 12:18:28 2024 +0100
<rasterman> from you i guess...
<rasterman> this is... weird...
<rasterman> re-cloning...
<bbrezillon> I force push to this branch
<rasterman> still the same
<rasterman> oh wait
<rasterman> i'm on your tree
<bbrezillon> still the same as in, the if block doesn't exist?
<rasterman> nyour personal one
<rasterman> bbrezillon not panfrost
<bbrezillon> nah, sorry for the confusion
<bbrezillon> it should be my tree
<bbrezillon> I pushed to both panfrost and bbrezillon
<rasterman> thats what i expected...
<bbrezillon> but those should contain the exact same version
<rasterman> but panfrsot != bbrezillon in thei scase
<bbrezillon> WTF?
<rasterman> oh wait
<rasterman> now it does have the block
<rasterman> the commit log looked the same
<bbrezillon> make sure HEAD is at 36ac5af2cc09
<rasterman> yeah
<rasterman> 36ac5af2cc09b6cc41fb5f68cb95af6e71a33def
<rasterman> i'm t here
<bbrezillon> ok
<rasterman> building... so something went poop with forced pushes or something
<rasterman> either way... hash is right... buildunderway
<rasterman> "too many branches" :)
<rasterman> i didnt update my tree for a little bit - been busy with various conferences/meetings
<bbrezillon> "too many WIP" :P
<rasterman> yeah
<rasterman> i hvent updated kernel either in a few weeks
<rasterman> ok
<rasterman> lets see if bug still there first
<rasterman> ha
<rasterman> it's gone
<rasterman> so "fixed already"
<rasterman> my ugly fix is not needed
<bbrezillon> just curious, how do you update your test/dev branch with git?
<rasterman> git pull --rebase
<bbrezillon> hm, that should work
<rasterman> yeah
<bbrezillon> don't know what happened
<rasterman> if i have changes i normally stash/unstash them
<rasterman> but va bug gone now! woot.
<rasterman> thanks so much!
<rasterman> that was my fix right now... but it wasn't nice...
<bbrezillon> I still need to fix the sync unmap path though
<rasterman> yeah
<bbrezillon> well, I fixed it
<rasterman> good to note that :)
<rasterman> or well fix it :)
<bbrezillon> I also have a bunch of tiny fixes in the pipe
<rasterman> this is the problem with so much "WIP"
<rasterman> not to mentioon spread across different branches and across kernel and mesa...
<rasterman> why dont we just put all of kernel and mesa in one git tree and we all work in master! :) :P :)
<bbrezillon> well, too much WIP comes from working on so many different things at the same time
<bbrezillon> and all that directly comes from the fact we're mostly out-of-tree at the moment
<rasterman> and yeah... that doesn;'t help
<bbrezillon> but you can just pretend all those WIP branches don't exist, and refer to panfrost-v10, as I mentioned in my cover-letter and MR description :P
<rasterman> :)
<bbrezillon> jedi hand wave => this is not the branch you're looking for
<rasterman> i was busy beating my head on why enlightenment didnt display on my rock5...
<rasterman> so i ignored mesa for a bit
<rasterman> fouind out why and fixed it yesterday - thus stumbled into this issues
<rasterman> but yeah - slightly out of date trees was my issue now
<rasterman> oh damn... THAT was enlightenment's performance problem...
Leopold_ has quit [Remote host closed the connection]
<bbrezillon> ???
<rasterman> evas had dithering enabled
<rasterman> there is a "high quality dithering" shader that takes intermediate fragment shader results and uses a dither matrix to get apparent > 24bpp
<rasterman> (visually it gets a lot better for smooth gradients, fades etc.) because the gpu frag shader will have intermediate results in regs that are more than 8bit per channel
<rasterman> so dither it back out to backbuffer...
<rasterman> but that doesn't come for free
<rasterman> turn it off and it'll be equivalent to weston then
Leopold has joined #panfrost
larunbe has joined #panfrost
alarumbe has quit [Ping timeout: 480 seconds]
Leopold has quit [Remote host closed the connection]
Leopold has joined #panfrost
Googulator has quit [Read error: Connection reset by peer]
Googulator has joined #panfrost
chewitt has quit [Quit: Zzz..]
rasterman has quit [Quit: Gettin' stinky!]
pbrobinson has quit [Ping timeout: 480 seconds]
pbrobinson has joined #panfrost
simon-perretta-img has quit [Read error: Connection reset by peer]
simon-perretta-img has joined #panfrost
paulk has quit [Ping timeout: 480 seconds]
paulk has joined #panfrost
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #panfrost
Googulator has quit [Ping timeout: 480 seconds]
Googulator has joined #panfrost
Googulator has quit [Read error: Connection reset by peer]
Googulator has joined #panfrost
Googulator has quit [Read error: Connection reset by peer]
Googulator has joined #panfrost
Googulator has quit [Ping timeout: 480 seconds]
Googulator has joined #panfrost