ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
rasterman has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
<jenneron> alyssa: hm, but i'm getting DATA_INVALID_FAULT on js=1, so reason is different?
Bennett has joined #panfrost
camus1 has quit [Remote host closed the connection]
camus has joined #panfrost
tchebb has joined #panfrost
Bennett has quit [Remote host closed the connection]
nlhowell has quit [Ping timeout: 480 seconds]
tchebb has quit [Quit: ZNC - http://znc.in]
tchebb has joined #panfrost
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
MoeIcenowy has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]
MoeIcenowy has joined #panfrost
Daanct12 has quit [Quit: Quitting]
Danct12 has joined #panfrost
<tomeu> jenneron: DATA_INVALID_FAULT can be due to lots of things, but note that whenever panfrost soft-stops a job, it immediately after resets the GPU
<tomeu> so repeated soft-stops aren't likely to be causing future DATA_INVALID_FAULTs
<jenneron> then no simple solution for t604
erlehmann has quit [Ping timeout: 480 seconds]
chewitt has quit [Quit: Zzz..]
MajorBiscuit has joined #panfrost
erlehmann has joined #panfrost
floof58 has quit []
floof58 has joined #panfrost
JulianGro has joined #panfrost
camus has quit [Remote host closed the connection]
camus has joined #panfrost
tolszak has quit [Remote host closed the connection]
* robmur01 unpacks and fires up the RK3399 board for the first time since Christmas holidays...
<robmur01> #5758 appears still to be in full force - even glmark2's window comes up invisible most of the time
<alyssa> robumm
<robmur01> I'm trying to bring the RK3288 box back to life for other reasons so I might test if it's a general Rockchip/Midgard problem
<robmur01> (Gnome via GDM with stock distro Mesa 21.3.2, for reference)
<alyssa> tomeu: can you explain the logic of panfrost_resource_get_handle?
<alyssa> It's WSI code dating back to the earliest merge of Panfrost into Mesa so I am assuming you wrote it originally
<tomeu> any particular questions?
* tomeu is getting into a short meeting
<alyssa> how does the FD path work? why do we have separate paths for renderonly ("scanout") and not?
rasterman has joined #panfrost
<daniels> alyssa: you need separate paths for renderonly vs. not because you need to export the handle from the right namespace
<daniels> *right device
<alyssa> right, okay
<daniels> can you elaborate on 'how does the FD path work'?
<daniels> or was it just that
<daniels> (if it's just that, you're in good company: every other driver has to do this too when using kmsro)
<alyssa> daniels: I was asked how this works 3 months ago and apparently ignored the email https://gitlab.freedesktop.org/mesa/mesa/-/issues/5758#note_1194932
<alyssa> comparing to the lima and v3d implementations, they seem to prime handle->fd on the gpu device regardless
<alyssa> so either panfrost is right and everyone else is broken, or panfrost is broken
<alyssa> and given panfrost is the only driver affected by this common change..
<daniels> oh, fun
<alyssa> etnaviv also always uses the gpu device, I think
<alyssa> (for the FD path)
<daniels> so, the common failure mode there is that you're getting bitten by one of GEM's worst attributes: every buffer is _uniquely_ identified by a single GEM handle per device open (i.e. drm_file / file description, which may have multiple file descriptors), and GEM handles are in no way refcounted
<alyssa> (everyone uses renderonly for me wonder.
<alyssa> (everyone uses renderonly for the KMS path)
<daniels> so if you don't have an underlying cache mechanism where you refcount GEM handles and only close them when the last ref has gone, you'll get random ENOENT
<daniels> (this is also why users can ~never close GEM handles themselves; if you get a handle back from gbm_bo_get_handle, then you never destroy it yourself, but just assume that gbm_bo_destroy will wipe it out if appropriate)
<alyssa> oh boy
<alyssa> It's not obvious to me why other drivers avoid this fate
<alyssa> ^how
<alyssa> ooh
<alyssa> v3d only calls v3d_boo_free if !bo->private
<alyssa> bo_free closing the gem handle
<alyssa> ---wait, no, that doesn't help
<alyssa> if bo->private, the bo goes to cache
<alyssa> if !bo->private, it's closing
<alyssa> so bo_last_unreference_locked really does close the exBO
<alyssa> there is a bo_handles hash table on the screen which might be part of the answer
<alyssa> Panfrost *does* guard freeing on the refcnt going to zero, and we do refcnt BOs
nlhowell has joined #panfrost
<alyssa> is it possible we're failing to increment the reference somewhere? or we're decrementing incorrectly?
<alyssa> that cache maters if you try to--
<alyssa> oh ffs
<alyssa> yes. that cache is what makes the scheme works.
<alyssa> otherwise we can get two BOs with the same GEM handle (and same content etc) each with a refcnt of 1
<alyssa> (if you import the same GEM handle twice)
<alyssa> rather than a single BO with a refcnt of 2
<alyssa> and then when we're done with either BO, splat.
<alyssa> concretely, calling panfrost_resource_from_handle twice with the same GEM handle will cause fails later
<alyssa> what's the difference between WINSYS_HANDLE_TYPE_KMS and WINSYS_HANDLE_TYPE_FD?
<daniels> _KMS returns a GEM handle in whandle->handle; _FD returns an FD
nlhowell has quit [Ping timeout: 480 seconds]
<daniels> but yeah, the client can throw an FD which resolves to the same GEM handle at you multiple times (via eglCreateImage or gbm_bo_import), and you have to make sure that you track those properly
<daniels> great stuff, isn't it
nlhowell has joined #panfrost
chewitt has joined #panfrost
nlhowell is now known as Guest182
Guest182 has quit [Read error: No route to host]
nlhowell has joined #panfrost
nlhowell is now known as Guest183
nlhowell has joined #panfrost
Guest183 has quit [Read error: No route to host]
nlhowell is now known as Guest184
nlhowell has joined #panfrost
<alyssa> quite
<alyssa> Any idea why no other drivers seem to handle the _FD + kmsro case correctly?
Guest184 has quit [Ping timeout: 480 seconds]
<alyssa> (but we're the broken one)
<alyssa> robmur01: should have a patch typed out for you to try in an hour or so
<alyssa> to hopefully resolve #5758
<robmur01> cool, I'll have time to do some testing this evening, so feel free to take multiple hours if it helps
<alyssa> thx
<alyssa> i am learning more about WSI than I wanted
nlhowell is now known as Guest188
nlhowell has joined #panfrost
Guest188 has quit [Read error: No route to host]
nlhowell is now known as Guest189
nlhowell has joined #panfrost
<daniels> heh
<daniels> I'm not entirely sure why the others don't do it properly tbh
Guest189 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest190
nlhowell has joined #panfrost
<alyssa> WSI is the very definition of it that one must never understand
Guest190 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest191
nlhowell has joined #panfrost
Rathann has joined #panfrost
<daniels> luckily GPU side is perfect in every way
<alyssa> ofc
<alyssa> ...wait, we already /have/ a BO caching scheme
<alyssa> this should already work
<alyssa> pan_lookup_bo implicitly snarf from the cache
<daniels> does it account for kmsro?
<alyssa> unclear
<alyssa> our resource_from_handle only allows importing FDs
<alyssa> it works by first priming the FD to a GEM handle
<alyssa> (on the GPU device, not the scanout device)
<alyssa> then looking up the BO with that (gpu) GEM handle
<daniels> yeah, that seems entirely sensible
<alyssa> if there is no such BO, we create one with DRM_IOCTL_PANFROST_GET_BO_OFFSET with initial refcnt=1
Guest191 has quit [Ping timeout: 480 seconds]
<alyssa> otherwise, we bump the refcnt of the existing BO, with some wild shenanigans to deal with a race
nlhowell is now known as Guest193
nlhowell has joined #panfrost
<alyssa> trying to understand how the race is possibly given both pieces of code are protected by the same mutex
<alyssa> unless the drmPrimeFDTOHandle needs to be protected by the mutex too
<alyssa> suppose B has gem handle G and refcnt 1.
<alyssa> simultaneously, G is imported and B is unreferenced.
Guest193 has quit [Read error: No route to host]
<alyssa> erm
<alyssa> suppose B has gem handle G, fd FD, and refcnt 1.
<alyssa> simultaneously, FD is imported and B is unreferenced.
nlhowell is now known as Guest195
nlhowell has joined #panfrost
<alyssa> wait no maybe this is fine ARGH
<alyssa> v3d does the locks the same way..
<daniels> also the X server is pretty relentlessly single-threaded
<alyssa> v3d solves the funny race by protecting the unreference by a mutex for shared BOs
Guest195 has quit [Ping timeout: 480 seconds]
Rathann has quit [Quit: Leaving]
nlhowell is now known as Guest198
nlhowell has joined #panfrost
Guest198 has quit [Read error: No route to host]
nlhowell is now known as Guest200
nlhowell has joined #panfrost
<alyssa> daniels: right...
<alyssa> in that case I got nothin'
Guest200 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest204
nlhowell has joined #panfrost
Guest204 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest205
nlhowell has joined #panfrost
Guest205 has quit [Read error: No route to host]
nlhowell is now known as Guest207
nlhowell has joined #panfrost
Guest207 has quit [Read error: No route to host]
nlhowell is now known as Guest208
nlhowell has joined #panfrost
* alyssa tries to reproduce on her laptop
Guest208 has quit [Ping timeout: 480 seconds]
<alyssa> i can reproduce.
<alyssa> grumble
<alyssa> and have lots of logs.
nlhowell is now known as Guest209
nlhowell has joined #panfrost
Guest209 has quit [Read error: No route to host]
cphealy has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest210
nlhowell has joined #panfrost
Guest210 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest212
nlhowell has joined #panfrost
Guest212 has quit [Read error: No route to host]
nlhowell is now known as Guest216
nlhowell has joined #panfrost
Guest216 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest217
nlhowell has joined #panfrost
Guest217 has quit [Read error: No route to host]
nlhowell is now known as Guest218
nlhowell has joined #panfrost
Guest218 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest221
nlhowell has joined #panfrost
Guest221 has quit [Read error: No route to host]
nlhowell is now known as Guest223
nlhowell has joined #panfrost
Guest223 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest225
nlhowell has joined #panfrost
Guest225 has quit [Read error: No route to host]
nlhowell is now known as Guest227
nlhowell has joined #panfrost
Guest227 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest228
nlhowell has joined #panfrost
nlhowell is now known as Guest231
nlhowell has joined #panfrost
Guest228 has quit [Read error: No route to host]
<alyssa> It seems to fix xwayland for me, but maybe that's only one symptom
<robmur01> ack, I'll give it a go after dinner and probably reply on the MR
Guest231 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest233
nlhowell has joined #panfrost
<alyssa> thanks
<alyssa> I am not sure it's the right fix
<alyssa> though I do think the patch is correct (if insufficient)
nlhowell is now known as Guest234
nlhowell has joined #panfrost
Guest233 has quit [Read error: No route to host]
Guest234 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest236
nlhowell has joined #panfrost
MajorBiscuit has quit [Ping timeout: 480 seconds]
Guest236 has quit [Read error: No route to host]
nlhowell is now known as Guest237
nlhowell has joined #panfrost
Guest237 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest238
nlhowell has joined #panfrost
Guest238 has quit [Read error: No route to host]
nlhowell is now known as Guest240
nlhowell has joined #panfrost
Guest240 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest241
nlhowell has joined #panfrost
Guest241 has quit [Read error: No route to host]
nlhowell is now known as Guest242
nlhowell has joined #panfrost
Guest242 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest243
nlhowell has joined #panfrost
Guest243 has quit [Read error: No route to host]
nlhowell is now known as Guest245
nlhowell has joined #panfrost
Guest245 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest246
nlhowell has joined #panfrost
Guest246 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest247
nlhowell has joined #panfrost
Guest247 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest248
nlhowell has joined #panfrost
Guest248 has quit [Remote host closed the connection]
nlhowell is now known as Guest249
nlhowell has joined #panfrost
Guest249 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest254
nlhowell has joined #panfrost
Guest254 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest255
nlhowell has joined #panfrost
Guest255 has quit [Ping timeout: 480 seconds]
cphealy has joined #panfrost
<alyssa> jekstrand: ...Is the compiler depending on GenXML pack routines for the command stream a bad idea?
<alyssa> My gut says yes
<alyssa> open coding the structs might be worse
<alyssa> actually maybe not they're easy structs
<alyssa> KISS or something
camus1 has joined #panfrost
<alyssa> admittedly they're bitfields and I feel real queasy about C bitfields and endian issues and such
camus has quit [Read error: Connection reset by peer]
nlhowell is now known as Guest261
nlhowell has joined #panfrost
<alyssa> Oh I can just not pack data structs in the compiler, nice
<jekstrand> :)
<jekstrand> We've tried to avoid having the compiler depend on it for Intel
<alyssa> Makes sense
<jekstrand> But what we have is also pretty terrible.
<jekstrand> I briefly considered writing gen_pack_nir.py for ray-tracing. (-:
<alyssa> libpanfrost depends on the compilers so having the compilers depend on GenXML would require some serious surgery
<alyssa> I much prefer keeping data structure noise out of the compiler anyway
<jekstrand> Yeah, dependencies are fun. This is why the Intel drivers have like 10 .a's that get built and that doesn't count the per-hw-generation stuff.
<alyssa> nod
Guest261 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest263
nlhowell has joined #panfrost
Guest263 has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
nlhowell is now known as Guest265
nlhowell has joined #panfrost
Guest265 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest266
nlhowell has joined #panfrost
Guest266 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest267
nlhowell has joined #panfrost
Guest267 has quit [Ping timeout: 480 seconds]
nlhowell is now known as Guest268
nlhowell has joined #panfrost
Guest268 has quit [Ping timeout: 480 seconds]
<robmur01> :(
<robmur01> there seems to be something in main at the moment that makes GDM completely fall over when I try to log in, so I rebased the MR back onto 21.3.6 and... not much better
nlhowell is now known as Guest269
nlhowell has joined #panfrost
<macc24> robmur01: oh i thought it was mediatek_drm being mediatek_drm
<macc24> if it's same issue i'm having then gdm should work when forced to run on x11
<robmur01> anecdotally it does feel like that starting FreeCAD has gone from "sometimes black window, usually invisible" to "sometimes looks fine, usually black window"
Guest269 has quit [Ping timeout: 480 seconds]
nlhowell has quit [Ping timeout: 480 seconds]
nlhowell has joined #panfrost
<alyssa> robmur01: Uh oh
nlhowell has quit [Ping timeout: 480 seconds]
JulianGro has quit [Remote host closed the connection]