#panfrost on 2022-02-22 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

00:13 rasterman has joined #panfrost

01:06 rasterman has quit [Quit: Gettin' stinky!]

01:54 <jenneron> alyssa: hm, but i'm getting DATA_INVALID_FAULT on js=1, so reason is different?

02:06 Bennett has joined #panfrost

02:31 camus1 has quit [Remote host closed the connection]

02:32 camus has joined #panfrost

03:04 tchebb has joined #panfrost

03:07 Bennett has quit [Remote host closed the connection]

03:56 nlhowell has quit [Ping timeout: 480 seconds]

03:58 tchebb has quit [Quit: ZNC - http://znc.in]

03:59 tchebb has joined #panfrost

04:12 chewitt has quit [Quit: Zzz..]

04:19 chewitt has joined #panfrost

04:45 MoeIcenowy has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]

04:45 MoeIcenowy has joined #panfrost

05:21 Daanct12 has quit [Quit: Quitting]

05:40 Danct12 has joined #panfrost

05:59 <tomeu> jenneron: DATA_INVALID_FAULT can be due to lots of things, but note that whenever panfrost soft-stops a job, it immediately after resets the GPU

06:00 <tomeu> so repeated soft-stops aren't likely to be causing future DATA_INVALID_FAULTs

06:01 <jenneron> then no simple solution for t604

06:24 erlehmann has quit [Ping timeout: 480 seconds]

06:41 chewitt has quit [Quit: Zzz..]

07:56 MajorBiscuit has joined #panfrost

08:52 erlehmann has joined #panfrost

09:14 floof58 has quit []

09:15 floof58 has joined #panfrost

10:32 JulianGro has joined #panfrost

12:14 camus has quit [Remote host closed the connection]

12:14 camus has joined #panfrost

12:17 tolszak has quit [Remote host closed the connection]

12:19 * robmur01 unpacks and fires up the RK3399 board for the first time since Christmas holidays...

12:21 <robmur01> #5758 appears still to be in full force - even glmark2's window comes up invisible most of the time

12:22 <alyssa> robumm

12:22 <robmur01> I'm trying to bring the RK3288 box back to life for other reasons so I might test if it's a general Rockchip/Midgard problem

12:23 <robmur01> (Gnome via GDM with stock distro Mesa 21.3.2, for reference)

12:31 <alyssa> tomeu: can you explain the logic of panfrost_resource_get_handle?

12:31 <alyssa> It's WSI code dating back to the earliest merge of Panfrost into Mesa so I am assuming you wrote it originally

12:32 <tomeu> any particular questions?

12:32 * tomeu is getting into a short meeting

12:34 <alyssa> how does the FD path work? why do we have separate paths for renderonly ("scanout") and not?

12:36 rasterman has joined #panfrost

12:39 <daniels> alyssa: you need separate paths for renderonly vs. not because you need to export the handle from the right namespace

12:39 <daniels> *right device

12:39 <alyssa> right, okay

12:39 <daniels> can you elaborate on 'how does the FD path work'?

12:40 <daniels> or was it just that

12:40 <daniels> (if it's just that, you're in good company: every other driver has to do this too when using kmsro)

12:40 <alyssa> daniels: I was asked how this works 3 months ago and apparently ignored the email https://gitlab.freedesktop.org/mesa/mesa/-/issues/5758#note_1194932

12:42 <alyssa> comparing to the lima and v3d implementations, they seem to prime handle->fd on the gpu device regardless

12:44 <alyssa> so either panfrost is right and everyone else is broken, or panfrost is broken

12:44 <alyssa> and given panfrost is the only driver affected by this common change..

12:45 <daniels> oh, fun

12:46 <alyssa> etnaviv also always uses the gpu device, I think

12:46 <alyssa> (for the FD path)

12:46 <daniels> so, the common failure mode there is that you're getting bitten by one of GEM's worst attributes: every buffer is _uniquely_ identified by a single GEM handle per device open (i.e. drm_file / file description, which may have multiple file descriptors), and GEM handles are in no way refcounted

12:46 <alyssa> (everyone uses renderonly for me wonder.

12:46 <alyssa> (everyone uses renderonly for the KMS path)

12:47 <daniels> so if you don't have an underlying cache mechanism where you refcount GEM handles and only close them when the last ref has gone, you'll get random ENOENT

12:48 <daniels> (this is also why users can ~never close GEM handles themselves; if you get a handle back from gbm_bo_get_handle, then you never destroy it yourself, but just assume that gbm_bo_destroy will wipe it out if appropriate)

12:49 <alyssa> oh boy

12:53 <alyssa> It's not obvious to me why other drivers avoid this fate

12:53 <alyssa> ^how

12:53 <alyssa> ooh

12:53 <alyssa> v3d only calls v3d_boo_free if !bo->private

12:54 <alyssa> bo_free closing the gem handle

12:55 <alyssa> ---wait, no, that doesn't help

12:55 <alyssa> if bo->private, the bo goes to cache

12:55 <alyssa> if !bo->private, it's closing

12:55 <alyssa> so bo_last_unreference_locked really does close the exBO

12:58 <alyssa> there is a bo_handles hash table on the screen which might be part of the answer

12:59 <alyssa> Panfrost *does* guard freeing on the refcnt going to zero, and we do refcnt BOs

12:59 nlhowell has joined #panfrost

13:00 <alyssa> is it possible we're failing to increment the reference somewhere? or we're decrementing incorrectly?

13:02 <alyssa> that cache maters if you try to--

13:02 <alyssa> oh ffs

13:02 <alyssa> yes. that cache is what makes the scheme works.

13:02 <alyssa> otherwise we can get two BOs with the same GEM handle (and same content etc) each with a refcnt of 1

13:02 <alyssa> (if you import the same GEM handle twice)

13:02 <alyssa> rather than a single BO with a refcnt of 2

13:03 <alyssa> and then when we're done with either BO, splat.

13:06 <alyssa> concretely, calling panfrost_resource_from_handle twice with the same GEM handle will cause fails later

13:14 <alyssa> what's the difference between WINSYS_HANDLE_TYPE_KMS and WINSYS_HANDLE_TYPE_FD?

13:40 <daniels> _KMS returns a GEM handle in whandle->handle; _FD returns an FD

13:41 nlhowell has quit [Ping timeout: 480 seconds]

13:41 <daniels> but yeah, the client can throw an FD which resolves to the same GEM handle at you multiple times (via eglCreateImage or gbm_bo_import), and you have to make sure that you track those properly

13:41 <daniels> great stuff, isn't it

14:00 nlhowell has joined #panfrost

14:01 chewitt has joined #panfrost

14:07 nlhowell is now known as Guest182

14:07 Guest182 has quit [Read error: No route to host]

14:07 nlhowell has joined #panfrost

14:22 nlhowell is now known as Guest183

14:22 nlhowell has joined #panfrost

14:22 Guest183 has quit [Read error: No route to host]

14:23 nlhowell is now known as Guest184

14:23 nlhowell has joined #panfrost

14:30 <alyssa> quite

14:30 <alyssa> Any idea why no other drivers seem to handle the _FD + kmsro case correctly?

14:30 Guest184 has quit [Ping timeout: 480 seconds]

14:30 <alyssa> (but we're the broken one)

14:31 <alyssa> robmur01: should have a patch typed out for you to try in an hour or so

14:31 <alyssa> to hopefully resolve #5758

14:35 <robmur01> cool, I'll have time to do some testing this evening, so feel free to take multiple hours if it helps

14:36 <alyssa> thx

14:36 <alyssa> i am learning more about WSI than I wanted

14:37 nlhowell is now known as Guest188

14:37 nlhowell has joined #panfrost

14:37 Guest188 has quit [Read error: No route to host]

14:38 nlhowell is now known as Guest189

14:38 nlhowell has joined #panfrost

14:44 <daniels> heh

14:45 <daniels> I'm not entirely sure why the others don't do it properly tbh

14:45 Guest189 has quit [Ping timeout: 480 seconds]

14:52 nlhowell is now known as Guest190

14:52 nlhowell has joined #panfrost

14:58 <alyssa> WSI is the very definition of it that one must never understand

14:59 Guest190 has quit [Ping timeout: 480 seconds]

15:00 nlhowell is now known as Guest191

15:00 nlhowell has joined #panfrost

15:01 Rathann has joined #panfrost

15:02 <daniels> luckily GPU side is perfect in every way

15:02 <alyssa> ofc

15:03 <alyssa> ...wait, we already /have/ a BO caching scheme

15:03 <alyssa> this should already work

15:03 <alyssa> pan_lookup_bo implicitly snarf from the cache

15:03 <daniels> does it account for kmsro?

15:05 <alyssa> unclear

15:06 <alyssa> our resource_from_handle only allows importing FDs

15:06 <alyssa> it works by first priming the FD to a GEM handle

15:06 <alyssa> (on the GPU device, not the scanout device)

15:06 <alyssa> then looking up the BO with that (gpu) GEM handle

15:07 <daniels> yeah, that seems entirely sensible

15:07 <alyssa> if there is no such BO, we create one with DRM_IOCTL_PANFROST_GET_BO_OFFSET with initial refcnt=1

15:07 Guest191 has quit [Ping timeout: 480 seconds]

15:07 <alyssa> otherwise, we bump the refcnt of the existing BO, with some wild shenanigans to deal with a race

15:07 nlhowell is now known as Guest193

15:07 nlhowell has joined #panfrost

15:09 <alyssa> trying to understand how the race is possibly given both pieces of code are protected by the same mutex

15:10 <alyssa> unless the drmPrimeFDTOHandle needs to be protected by the mutex too

15:11 <alyssa> suppose B has gem handle G and refcnt 1.

15:11 <alyssa> simultaneously, G is imported and B is unreferenced.

15:11 Guest193 has quit [Read error: No route to host]

15:11 <alyssa> erm

15:11 <alyssa> suppose B has gem handle G, fd FD, and refcnt 1.

15:11 <alyssa> simultaneously, FD is imported and B is unreferenced.

15:12 nlhowell is now known as Guest195

15:12 nlhowell has joined #panfrost

15:12 <alyssa> wait no maybe this is fine ARGH

15:13 <alyssa> v3d does the locks the same way..

15:15 <daniels> also the X server is pretty relentlessly single-threaded

15:15 <alyssa> v3d solves the funny race by protecting the unreference by a mutex for shared BOs

15:18 Guest195 has quit [Ping timeout: 480 seconds]

15:19 Rathann has quit [Quit: Leaving]

15:22 nlhowell is now known as Guest198

15:22 nlhowell has joined #panfrost

15:26 Guest198 has quit [Read error: No route to host]

15:26 nlhowell is now known as Guest200

15:26 nlhowell has joined #panfrost

15:29 <alyssa> daniels: right...

15:30 <alyssa> in that case I got nothin'

15:33 Guest200 has quit [Ping timeout: 480 seconds]

15:38 nlhowell is now known as Guest204

15:38 nlhowell has joined #panfrost

15:44 Guest204 has quit [Ping timeout: 480 seconds]

15:49 nlhowell is now known as Guest205

15:49 nlhowell has joined #panfrost

15:51 Guest205 has quit [Read error: No route to host]

15:52 nlhowell is now known as Guest207

15:52 nlhowell has joined #panfrost

15:53 Guest207 has quit [Read error: No route to host]

15:54 nlhowell is now known as Guest208

15:54 nlhowell has joined #panfrost

15:56 * alyssa tries to reproduce on her laptop

16:01 Guest208 has quit [Ping timeout: 480 seconds]

16:06 <alyssa> i can reproduce.

16:06 <alyssa> grumble

16:06 <alyssa> and have lots of logs.

16:07 nlhowell is now known as Guest209

16:07 nlhowell has joined #panfrost

16:11 Guest209 has quit [Read error: No route to host]

16:12 cphealy has quit [Ping timeout: 480 seconds]

16:12 nlhowell is now known as Guest210

16:12 nlhowell has joined #panfrost

16:19 Guest210 has quit [Ping timeout: 480 seconds]

16:23 nlhowell is now known as Guest212

16:23 nlhowell has joined #panfrost

16:27 Guest212 has quit [Read error: No route to host]

16:27 nlhowell is now known as Guest216

16:27 nlhowell has joined #panfrost

16:34 Guest216 has quit [Ping timeout: 480 seconds]

16:37 nlhowell is now known as Guest217

16:37 nlhowell has joined #panfrost

16:41 Guest217 has quit [Read error: No route to host]

16:41 nlhowell is now known as Guest218

16:41 nlhowell has joined #panfrost

16:48 Guest218 has quit [Ping timeout: 480 seconds]

16:52 nlhowell is now known as Guest221

16:52 nlhowell has joined #panfrost

16:56 Guest221 has quit [Read error: No route to host]

16:57 nlhowell is now known as Guest223

16:57 nlhowell has joined #panfrost

17:03 Guest223 has quit [Ping timeout: 480 seconds]

17:07 nlhowell is now known as Guest225

17:07 nlhowell has joined #panfrost

17:11 Guest225 has quit [Read error: No route to host]

17:11 nlhowell is now known as Guest227

17:11 nlhowell has joined #panfrost

17:18 Guest227 has quit [Ping timeout: 480 seconds]

17:22 nlhowell is now known as Guest228

17:22 nlhowell has joined #panfrost

17:26 nlhowell is now known as Guest231

17:27 nlhowell has joined #panfrost

17:27 Guest228 has quit [Read error: No route to host]

17:30 <alyssa> robmur01: Please test https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15120

17:30 <alyssa> It seems to fix xwayland for me, but maybe that's only one symptom

17:32 <robmur01> ack, I'll give it a go after dinner and probably reply on the MR

17:33 Guest231 has quit [Ping timeout: 480 seconds]

17:37 nlhowell is now known as Guest233

17:37 nlhowell has joined #panfrost

17:37 <alyssa> thanks

17:37 <alyssa> I am not sure it's the right fix

17:37 <alyssa> though I do think the patch is correct (if insufficient)

17:41 nlhowell is now known as Guest234

17:41 nlhowell has joined #panfrost

17:42 Guest233 has quit [Read error: No route to host]

17:48 Guest234 has quit [Ping timeout: 480 seconds]

17:52 nlhowell is now known as Guest236

17:52 nlhowell has joined #panfrost

17:55 MajorBiscuit has quit [Ping timeout: 480 seconds]

17:56 Guest236 has quit [Read error: No route to host]

17:56 nlhowell is now known as Guest237

17:56 nlhowell has joined #panfrost

18:01 Guest237 has quit [Ping timeout: 480 seconds]

18:08 nlhowell is now known as Guest238

18:08 nlhowell has joined #panfrost

18:11 Guest238 has quit [Read error: No route to host]

18:11 nlhowell is now known as Guest240

18:11 nlhowell has joined #panfrost

18:18 Guest240 has quit [Ping timeout: 480 seconds]

18:22 nlhowell is now known as Guest241

18:22 nlhowell has joined #panfrost

18:27 Guest241 has quit [Read error: No route to host]

18:27 nlhowell is now known as Guest242

18:27 nlhowell has joined #panfrost

18:33 Guest242 has quit [Ping timeout: 480 seconds]

18:37 nlhowell is now known as Guest243

18:37 nlhowell has joined #panfrost

18:41 Guest243 has quit [Read error: No route to host]

18:41 nlhowell is now known as Guest245

18:41 nlhowell has joined #panfrost

18:48 Guest245 has quit [Ping timeout: 480 seconds]

18:59 nlhowell is now known as Guest246

18:59 nlhowell has joined #panfrost

19:06 Guest246 has quit [Ping timeout: 480 seconds]

19:12 nlhowell is now known as Guest247

19:12 nlhowell has joined #panfrost

19:19 Guest247 has quit [Ping timeout: 480 seconds]

19:22 nlhowell is now known as Guest248

19:22 nlhowell has joined #panfrost

19:26 Guest248 has quit [Remote host closed the connection]

19:26 nlhowell is now known as Guest249

19:26 nlhowell has joined #panfrost

19:33 Guest249 has quit [Ping timeout: 480 seconds]

19:53 nlhowell is now known as Guest254

19:53 nlhowell has joined #panfrost

19:59 Guest254 has quit [Ping timeout: 480 seconds]

20:11 nlhowell is now known as Guest255

20:11 nlhowell has joined #panfrost

20:18 Guest255 has quit [Ping timeout: 480 seconds]

20:18 cphealy has joined #panfrost

20:32 <alyssa> jekstrand: ...Is the compiler depending on GenXML pack routines for the command stream a bad idea?

20:32 <alyssa> My gut says yes

20:33 <alyssa> open coding the structs might be worse

20:34 <alyssa> actually maybe not they're easy structs

20:34 <alyssa> KISS or something

20:35 camus1 has joined #panfrost

20:35 <alyssa> admittedly they're bitfields and I feel real queasy about C bitfields and endian issues and such

20:37 camus has quit [Read error: Connection reset by peer]

20:37 nlhowell is now known as Guest261

20:37 nlhowell has joined #panfrost

20:41 <alyssa> Oh I can just not pack data structs in the compiler, nice

20:42 <jekstrand> :)

20:42 <jekstrand> We've tried to avoid having the compiler depend on it for Intel

20:42 <alyssa> Makes sense

20:42 <jekstrand> But what we have is also pretty terrible.

20:42 <jekstrand> I briefly considered writing gen_pack_nir.py for ray-tracing. (-:

20:43 <alyssa> libpanfrost depends on the compilers so having the compilers depend on GenXML would require some serious surgery

20:43 <alyssa> I much prefer keeping data structure noise out of the compiler anyway

20:43 <jekstrand> Yeah, dependencies are fun. This is why the Intel drivers have like 10 .a's that get built and that doesn't count the per-hw-generation stuff.

20:43 <alyssa> nod

20:44 Guest261 has quit [Ping timeout: 480 seconds]

21:00 nlhowell is now known as Guest263

21:00 nlhowell has joined #panfrost

21:03 Guest263 has quit [Ping timeout: 480 seconds]

21:12 rasterman has quit [Quit: Gettin' stinky!]

21:22 nlhowell is now known as Guest265

21:22 nlhowell has joined #panfrost

21:29 Guest265 has quit [Ping timeout: 480 seconds]

21:42 nlhowell is now known as Guest266

21:42 nlhowell has joined #panfrost

21:49 Guest266 has quit [Ping timeout: 480 seconds]

22:07 nlhowell is now known as Guest267

22:07 nlhowell has joined #panfrost

22:14 Guest267 has quit [Ping timeout: 480 seconds]

22:26 nlhowell is now known as Guest268

22:26 nlhowell has joined #panfrost

22:33 Guest268 has quit [Ping timeout: 480 seconds]

22:50 <robmur01> :(

22:51 <robmur01> there seems to be something in main at the moment that makes GDM completely fall over when I try to log in, so I rebased the MR back onto 21.3.6 and... not much better

22:52 nlhowell is now known as Guest269

22:52 nlhowell has joined #panfrost

22:53 <macc24> robmur01: oh i thought it was mediatek_drm being mediatek_drm

22:53 <macc24> if it's same issue i'm having then gdm should work when forced to run on x11

22:53 <robmur01> anecdotally it does feel like that starting FreeCAD has gone from "sometimes black window, usually invisible" to "sometimes looks fine, usually black window"

22:59 Guest269 has quit [Ping timeout: 480 seconds]

23:08 nlhowell has quit [Ping timeout: 480 seconds]

23:19 nlhowell has joined #panfrost

23:30 <alyssa> robmur01: Uh oh

23:30 nlhowell has quit [Ping timeout: 480 seconds]

23:34 JulianGro has quit [Remote host closed the connection]