#dri-devel on 2023-06-16 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:05 kts has joined #dri-devel

00:10 <karolherbst> ecm: mhhh.. I wonder if this is actually maybe a dri2 problem as nouveau still defaults to it

00:10 kts has quit [Quit: Konversation terminated!]

00:17 <ecm> so it's coming from egl2_dri2.c instead

00:17 <ecm> s/egl2/egl/

00:17 <karolherbst> you could try to force enable DRI 3 in the xf86-video-nouveau driver and see if that helps with anything

00:17 <karolherbst> but it's still kinda weird

00:17 jewins has quit [Ping timeout: 480 seconds]

00:22 AndroUser2 has quit [Ping timeout: 480 seconds]

00:22 AndroUser2 has joined #dri-devel

00:33 <ecm> ok I forced DRI 3 for nouveau, it doesn't show the fd == -1 error, but eglinfo still fails eglInitialize

00:36 <ecm> eglInitialize fails for platform x11, sorry

00:36 <ecm> doesn't fail for DRI3

00:36 <karolherbst> mhhh.. interesting

00:38 ecm` has joined #dri-devel

00:39 <ecm`> eglinfo after dri3 enabled: https://0x0.st/HTXE.txt

00:40 <karolherbst> something is broken and I have no idea what :)

00:40 co1umbarius has joined #dri-devel

00:40 <ecm`> libEGL debug: EGL user error 0x300c (EGL_BAD_PARAMETER) in eglGetPlatformDisplay is the new error now

00:41 columbarius has quit [Ping timeout: 480 seconds]

00:46 benjaminl has joined #dri-devel

00:52 benjamin1 has quit [Ping timeout: 480 seconds]

00:53 ecm` has quit [Ping timeout: 480 seconds]

00:54 ecm has quit [Ping timeout: 480 seconds]

01:04 <airlied> mlankhorst: care to dequeue drm-misc-fixes?

01:09 heat has quit [Read error: No route to host]

01:09 heat has joined #dri-devel

01:26 avoidr_ has joined #dri-devel

01:29 avoidr has quit [Ping timeout: 480 seconds]

01:30 yuq825 has joined #dri-devel

01:54 benjaminl has quit [Ping timeout: 480 seconds]

02:03 benjaminl has joined #dri-devel

02:11 benjaminl has quit [Ping timeout: 480 seconds]

02:12 Mal_ has joined #dri-devel

02:18 jewins has joined #dri-devel

02:27 Mal__ has joined #dri-devel

02:27 heat has quit [Ping timeout: 480 seconds]

02:30 benjaminl has joined #dri-devel

02:33 Mal_ has quit [Ping timeout: 480 seconds]

02:35 Mal__ has quit [Ping timeout: 480 seconds]

02:46 benjaminl has quit [Ping timeout: 480 seconds]

02:53 the_sea_peoples has quit [Quit: WeeChat 2.8]

02:54 the_sea_peoples has joined #dri-devel

03:08 Leopold_ has quit [Remote host closed the connection]

03:08 Leopold_ has joined #dri-devel

03:09 oneforall2 has quit [Remote host closed the connection]

03:09 oneforall2 has joined #dri-devel

03:12 dviola has quit [Ping timeout: 480 seconds]

03:17 kts has joined #dri-devel

03:23 Kayden has joined #dri-devel

03:24 Leopold_ has quit [Remote host closed the connection]

03:24 Leopold_ has joined #dri-devel

03:53 Mal__ has joined #dri-devel

04:01 benjaminl has joined #dri-devel

04:11 benjaminl has quit [Ping timeout: 480 seconds]

04:21 mbrost has quit [Ping timeout: 480 seconds]

04:26 bmodem has joined #dri-devel

04:27 Mal__ has quit [Ping timeout: 480 seconds]

04:36 Mal__ has joined #dri-devel

04:36 Company has joined #dri-devel

04:39 benjaminl has joined #dri-devel

04:41 dviola has joined #dri-devel

04:49 benjaminl has quit [Ping timeout: 480 seconds]

04:57 kts has quit [Ping timeout: 480 seconds]

05:04 YuGiOhJCJ has joined #dri-devel

05:25 sima has joined #dri-devel

05:30 fab has joined #dri-devel

05:33 Mal__ has quit [Ping timeout: 480 seconds]

05:44 aravind has joined #dri-devel

05:50 Mal__ has joined #dri-devel

05:51 jewins has quit [Ping timeout: 480 seconds]

05:58 kzd has quit [Ping timeout: 480 seconds]

05:58 ced117 has quit [Ping timeout: 480 seconds]

06:00 tzimmermann has joined #dri-devel

06:02 Mal__ has quit [Read error: Connection reset by peer]

06:23 benjaminl has joined #dri-devel

06:31 benjaminl has quit [Ping timeout: 480 seconds]

06:32 alanc has quit [Remote host closed the connection]

06:32 BobBeck is now known as Guest3236

06:32 BobBeck has joined #dri-devel

06:33 gerddie3 has joined #dri-devel

06:33 alanc has joined #dri-devel

06:34 fab has quit [Ping timeout: 480 seconds]

06:36 robobub_ has quit []

06:37 frankbinns has quit [Remote host closed the connection]

06:38 K`den has joined #dri-devel

06:38 Kayden has quit [Read error: Connection reset by peer]

06:49 K`den has quit []

06:50 K`den has joined #dri-devel

06:51 K`den is now known as Kayden

06:58 sghuge has quit [Remote host closed the connection]

06:58 sghuge has joined #dri-devel

06:59 pochu has joined #dri-devel

06:59 benjaminl has joined #dri-devel

07:02 Mal__ has joined #dri-devel

07:07 benjaminl has quit [Ping timeout: 480 seconds]

07:09 Mal__ has quit []

07:14 bgs has joined #dri-devel

07:14 frankbinns has joined #dri-devel

07:24 fab has joined #dri-devel

07:24 jkrzyszt has joined #dri-devel

07:24 rsalvaterra has quit []

07:25 rsalvaterra has joined #dri-devel

07:31 rasterman has joined #dri-devel

07:35 benjaminl has joined #dri-devel

07:36 <MrCooper> AFAICT drm_syncobj fds can't be polled, can they?

07:38 <emersion> if you mean poll(), no. i have a patch for that but it needs an IGT

07:40 <MrCooper> right, thanks

07:43 benjaminl has quit [Ping timeout: 480 seconds]

07:55 lynxeye has joined #dri-devel

07:56 tursulin has joined #dri-devel

08:06 <RAOF> Ok, amdgpu. Why are you allocating a buffer with tiling mode incompatible with scanout when I ask for a GBM_BO_USE_RENDERING | GBM_BO_USE_SCANOUT surface?!

08:06 <RAOF> What am I doing differently to the working case?!

08:09 <MrCooper> sounds like a radeonsi bug, it should pick a scanout capable modifier with GBM_BO_USE_SCANOUT

08:09 benjaminl has joined #dri-devel

08:10 <emersion> how do you figure out that it's not scanout capable?

08:10 <emersion> are you sure it's the right device?

08:10 <MrCooper> RAOF: see https://gitlab.freedesktop.org/mesa/mesa/-/issues/8729

08:11 <RAOF> Because when it's displayed it's garbled in a nice blocky tiling fashion.

08:12 <RAOF> It's definitely the right device; I'm only opening a single drm node.

08:12 <MrCooper> that sounds like an amdgpu kernel bug then; if the modifier isn't scanout capable, it should refuse to create a KMS FB for it

08:12 <emersion> RAOF: are you stripping an explicit modifier by any chance?

08:12 <RAOF> Or maybe it's the other way around? Maybe EGL is confused and it's rendering to it as if it's tiled?

08:13 <emersion> ie, allocating with with_modifiers(), then importing it without passing the modifier?

08:13 <emersion> amdgpu should reject buffers it cannot scanout, in theory

08:13 <RAOF> emersion: Nope! I'm deliberately using non-modifiers path, (because there aren't any supported modifiers on amdgpu, on at least this card).

08:14 <emersion> ok, GFX8-

08:15 <RAOF> It might still be a bug on my end; a different branch (with significantly different code flow) does work, but I can't see any difference in the way I'm allocating the gbm_surface, nor in the way I'm using EGL.

08:16 <RAOF> And none of the debugging I've tried has seen any differences, and it's difficult to introspect this state.

08:18 benjaminl has quit [Ping timeout: 480 seconds]

08:19 <RAOF> If there's any magical MESA_DEBUG environment that will make some of these decisions more legible that'd be awesome 😐

08:23 AndroUser2 has quit [Remote host closed the connection]

08:23 <MrCooper> AMD_DEBUG might be more relevant here, maybe check AMD_DEBUG=help and try some of those which sound related

08:23 AndroUser2 has joined #dri-devel

08:29 <MrCooper> lynxeye: interesting plot twist on mesa#8729 :)

08:30 <lynxeye> MrCooper: He, sorry about that, but I hadn't seen this discussion before you linked to it in here.

08:31 <MrCooper> no worries, happens to me all the time

08:39 AndroUser2 has quit [Remote host closed the connection]

08:39 AndroUser2 has joined #dri-devel

08:42 benjaminl has joined #dri-devel

08:50 benjaminl has quit [Ping timeout: 480 seconds]

08:55 swivel has quit [Remote host closed the connection]

08:55 swivel has joined #dri-devel

08:59 jfalempe_ is now known as jfalempe

09:02 swalker__ has joined #dri-devel

09:08 avoidr_ has quit []

09:08 avoidr has joined #dri-devel

09:16 benjaminl has joined #dri-devel

09:16 AndroUser2 has quit [Remote host closed the connection]

09:19 AndroUser2 has joined #dri-devel

09:24 benjaminl has quit [Ping timeout: 480 seconds]

09:24 djbw_ has quit [Read error: Connection reset by peer]

09:33 <ishitatsuyuki> i'm new to drm/ttm, but I wonder why the wait-wound business is needed instead of e.g. sorting locks by their pointer or other ID?

09:53 benjaminl has joined #dri-devel

10:01 benjaminl has quit [Ping timeout: 480 seconds]

10:04 bmodem1 has joined #dri-devel

10:08 bmodem has quit [Ping timeout: 480 seconds]

10:12 Company has quit [Read error: Connection reset by peer]

10:14 <airlied> ishitatsuyuki: because sorting is expensive usually

10:16 <karolherbst> anybody ever thought about enforcing DRM API locking rules via `WARN_ON(spin_is_locked(lock))`? at least the dma-fence API looks very hard to actually use correctly and there seem plenty of code around just not caring properly about locks

10:18 <airlied> sima likely had

10:18 <airlied> has

10:18 <karolherbst> also.. I'm convinced that `dma_fence_is_signaled` has to go

10:19 <karolherbst> (or to properly lock access to fence->flags)

10:20 <karolherbst> dma-fence kinda smells to much "we outsmart data races by using atomics"

10:20 <airlied> sima: ^

10:21 <lynxeye> karolherbst: Not a fan of the WARN_ON way to do this, but lockdep_assert_held is really useful.

10:22 vliaskov has joined #dri-devel

10:22 <karolherbst> yeah.. so that's basically the same just triggers when lockdep is enabled, right?

10:23 <karolherbst> or is it checking for deps?

10:23 <karolherbst> because I don't see why lock dependencies matter here at all, it's just enforcing what the API states

10:23 <karolherbst> *it should

10:23 <lynxeye> karolherbst: Yea, triggers with lockdep and in contrast to spin_is_locked it actually verifies that it's your thread that has lock and not someone else.

10:23 <karolherbst> ahhh

10:23 <karolherbst> okay

10:24 <karolherbst> I expect this spamming warnings all over the place, so having it behind lockdep might be better here anyway

10:27 benjaminl has joined #dri-devel

10:35 benjaminl has quit [Ping timeout: 480 seconds]

10:51 Haaninjo has joined #dri-devel

10:53 <cwabbott> jenatali: wow, that spec seems to require some real driver heroics, especially around suspend/resume

10:54 <cwabbott> implementing suspend/resume was already painful enough in turnip after they added it to Vulkan dynamic rendering to match DX12

10:55 alyssa has joined #dri-devel

10:55 <alyssa> cwabbott: is now a good time to invoke the axiom of QDS?

10:55 <cwabbott> now it sounds like they want the driver to compute tile layouts at submit time, because you can't compute the tile layout until you know all of the render passes you want to merge

10:57 <cwabbott> alyssa: uhh, what is that?

11:04 benjaminl has joined #dri-devel

11:06 Haaninjo has quit [Quit: Ex-Chat]

11:11 <alyssa> "now it sounds like they want the driver to compute tile layouts at submit time"

11:11 <alyssa> IDK about Qualcomm but this would be extra spicy on Apple (and maybe Mali) because the layouts get baked into the fragment shaders

11:12 <alyssa> what's that? you want even more FS prologs + FS epilogs?

11:12 benjaminl has quit [Ping timeout: 480 seconds]

11:12 <alyssa> and somehow want to defer shader linking until submit time instead of just draw time?

11:12 smiles_1111 has quit [Remote host closed the connection]

11:12 <alyssa> well, if you insist! (-:

11:12 smiles_1111 has joined #dri-devel

11:13 <alyssa> (might be possible to push offsets as a uniform, but still.)

11:25 kxkamil has quit []

11:25 <mlankhorst> airlied: sorry about that!

11:33 AndroUser2 has quit [Remote host closed the connection]

11:33 AndroUser2 has joined #dri-devel

11:36 lemonzest has quit [Quit: WeeChat 3.6]

11:38 benjaminl has joined #dri-devel

11:41 lemonzest has joined #dri-devel

11:46 benjaminl has quit [Ping timeout: 480 seconds]

11:51 kxkamil has joined #dri-devel

12:00 fab has quit [Quit: fab]

12:02 <jenatali> cwabbott: Yeah, that's the impression that I get, but QC was okay with it so 🤷

12:05 <alyssa> jenatali: so far i've not been impressed with QC's software (-:

12:06 <jenatali> Yeah

12:11 benjaminl has joined #dri-devel

12:13 yuq825 has left #dri-devel [#dri-devel]

12:17 * alyssa writes optimization passes Just For Fun becuase it's Friday and she did real work all week

12:17 <alyssa> Do we have an easy way to find the first unconditional block executed after an instruction? Shouldn't be too hard to walk the cf list

12:17 <karolherbst> alyssa: are you bored?

12:17 <alyssa> karolherbst: tired

12:18 <karolherbst> mhhh

12:18 <karolherbst> I still have my subgroup MR, but you already reviewed quite a bit of it

12:18 <alyssa> oh I can look at that today

12:18 <karolherbst> but it also changed quite a bit

12:18 <karolherbst> cool

12:18 <alyssa> I feel like what I want is something dominance related maybe?

12:18 <alyssa> although maybe not even

12:18 <karolherbst> soooo

12:19 <karolherbst> I have a fun optimization we need

12:19 <karolherbst> but it's also quite a bit of work

12:19 <karolherbst> but probably also fun

12:19 <karolherbst> ever looked into loop merging?

12:19 <alyssa> uh oh

12:19 <alyssa> I already know what fun optimization I'm writing :-p

12:19 <karolherbst> like merging the inner loop with the outer one

12:19 <alyssa> i know better than to write loop opts

12:19 <karolherbst> so threads taking longer in the inner loop don't stall threads waiting on the next outer iteration

12:19 benjaminl has quit [Ping timeout: 480 seconds]

12:19 alyssa has left #dri-devel [#dri-devel]

12:19 <karolherbst> :D

12:21 kts has joined #dri-devel

12:21 <HdkR> That sounds like a dependency tracking hellscape

12:21 <karolherbst> it's what nvidia is doing

12:22 <karolherbst> HdkR: but it's actually not that hard, you just turn the inner loop into predicated blocks

12:22 <karolherbst> and build a little state machine deciding what outer+inner loop iteration the thraed is at

12:23 <karolherbst> and decouple threads like this

12:23 <karolherbst> it's kinda fun from a concept perspective

12:23 camus has quit [Ping timeout: 480 seconds]

12:24 pochu has quit [Quit: leaving]

12:24 <karolherbst> HdkR: it also helps with minimizing c/r stack usage in shaders

12:26 <HdkR> I see the improvements. Sounds painful :D

12:26 <karolherbst> :D

12:26 <karolherbst> we have to do it though

12:32 AndroUser2 has quit [Remote host closed the connection]

12:32 AndroUser2 has joined #dri-devel

12:34 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

12:44 jkrzyszt has quit [Ping timeout: 480 seconds]

12:45 benjaminl has joined #dri-devel

12:46 <jenatali> Have to?

12:53 benjaminl has quit [Ping timeout: 480 seconds]

12:59 alyssa has joined #dri-devel

12:59 <alyssa> karolherbst: ...Lol

12:59 <alyssa> I googled loop merging and the result is a presentation from a prof at my school :-p

13:00 * alyssa recognized the name

13:03 <karolherbst> :D

13:03 <karolherbst> it's a sign

13:03 <karolherbst> now you have to do it, it's the law

13:03 <karolherbst> jenatali: well.. for getting more perf I mean

13:04 <jenatali> Got it

13:05 <karolherbst> it basically leads to threads getting diverged less often and you even need to converge them less

13:06 <alyssa> karolherbst: The school I graduated from and am slowly recovering mentally from? Indeed a sign that I should not write the nir pass :-D

13:06 <karolherbst> :D

13:06 <karolherbst> oh well.. guess I'll have to do it sooner or later as it's more critical for compute anyway

13:08 elongbug has joined #dri-devel

13:09 jkrzyszt has joined #dri-devel

13:14 jkrzyszt has quit [Remote host closed the connection]

13:18 JohnnyonFlame has joined #dri-devel

13:21 benjaminl has joined #dri-devel

13:21 DottorLeo has joined #dri-devel

13:26 <alyssa> karolherbst: there's still https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18399#note_1779418 for loop fun

13:29 benjaminl has quit [Ping timeout: 480 seconds]

13:29 <karolherbst> GPUs have branch predictors?

13:30 <karolherbst> at least I'm sure nvidia GPUs don't have any

13:31 heat has joined #dri-devel

13:37 ced117 has joined #dri-devel

13:39 DottorLeo has quit [Quit: Konversation terminated!]

13:44 <alyssa> not any I know of

13:45 <glehmann> does xeon phi count?

13:45 <karolherbst> I misread the comment there anyway.. but yeah... I don't really see any benefit of doing this on nvidia hardware

13:45 <karolherbst> you'll have to predicated branches in either case

13:45 <karolherbst> ehh wait.. one non predicated in the original code version

13:46 jkrzyszt has joined #dri-devel

13:47 <karolherbst> _but_ it could lead to more optimized code as the if condition and the loop bodys instructions live in the same block

13:47 <karolherbst> so dunno

13:48 <karolherbst> but we probalby already optimize that?

13:49 ced117 has quit [Ping timeout: 480 seconds]

13:49 <karolherbst> I'd say we should go with whatever benchmarks say

13:51 <alyssa> karolherbst: The point is cutting the # of branches executed in half

13:51 <karolherbst> but looping is also a branch

13:51 <alyssa> loop { if foo { break } ... }

13:51 <alyssa> that's 2 branches per iteration

13:52 <karolherbst> not necassarily

13:52 <alyssa> if bar { do { .. } while (bar)

13:52 <alyssa> that's one branch per iteration (plus one at the start)

13:52 <karolherbst> on nvidia hardware you can do all of this without actual branching

13:52 <karolherbst> only thread divergency is a problem you need to keep into account

13:53 <karolherbst> but you can do this without branching

13:54 <karolherbst> sure, you jump, but you don't really need to deal with blocks in the native ISA inside the loop

13:55 <karolherbst> because there is just one

13:55 <karolherbst> (in either case)

13:57 benjaminl has joined #dri-devel

13:57 Dr_Who has joined #dri-devel

13:58 <sima> karolherbst, yeah lockdep_assert_held is the best one

13:58 <karolherbst> alyssa: something like this: https://gist.githubusercontent.com/karolherbst/69a58884b1b20fe06ee8297386c9cf8e/raw/5a8eab6e0415b41a0af538a815cc8b9bd522dfce/gistfile1.txt

13:58 <sima> karolherbst, and yeah part of the dma_fence compromise bikeshed was to make the fastpath fast

13:58 <sima> since in some drivers that was just an ordered load for a quick check

13:59 <karolherbst> uhhh

13:59 <karolherbst> pain

13:59 <alyssa> karolherbst: Right. The second case is executing half the jumps as the first.

13:59 <sima> karolherbst, we have endless amounts of code that practically assumes that dma_fence_is_signaled is practically dirt cheap once it's signalled

13:59 <alyssa> (1 predicated + 1 unconditional --vs-- 1 predicated)

14:00 <alyssa> For n iterations, the top one is (2 * n) jumps but the bottom is n + 1

14:00 Leopold_ has quit [Ping timeout: 480 seconds]

14:00 <alyssa> unless there are 0 iterations in which case they are both 1 jump

14:00 <karolherbst> ahh right...

14:00 <karolherbst> yeah, no you are right, I was looking at it from an block perspevtive too much

14:00 <alyssa> so if it's 0 or 1 iteration it's the same, and if there are multiple iterations the second is strictly better

14:01 <alyssa> if you imagine running the loop 100 times... that adds up

14:01 <karolherbst> yeah...

14:01 <sima> karolherbst, imo the fastpath with dma_fence is fairly ok, the real absolute pain is the rcu protected lookups

14:01 <karolherbst> sima: it's still quite racy

14:01 <sima> because the rules are that the driver may recycle

14:01 <sima> so you don't just have data races, you have "this might have become an entirely different dma_fence" races

14:02 <karolherbst> yeah...

14:02 <karolherbst> it's bad tho :P

14:02 <sima> karolherbst, yeah but as far as lockless tricks goes, it's kind standard

14:02 <sima> karolherbst, so did you find a bug in there?

14:02 <karolherbst> no, but nouveau used dma_fence_is_signaled incorecctly and that lead me to look deeper into it

14:02 <karolherbst> or rather..

14:03 <karolherbst> sima: we had to do this: https://cgit.freedesktop.org/drm/drm/commit/?h=drm-fixes&id=c8a5d5ea3ba6a18958f8d76430e4cd68eea33943

14:03 <karolherbst> and it does fix or improve the situation a lot

14:03 <karolherbst> anyway.. assuming dma_fence_is_signaled locks data is a wrong assumption

14:04 <karolherbst> because currently it doesn't

14:04 <karolherbst> or not all data at least

14:04 <karolherbst> it has this fast path which can race with other threads

14:04 Leopold_ has joined #dri-devel

14:05 kem has quit [Quit: Leaving]

14:05 benjaminl has quit [Ping timeout: 480 seconds]

14:05 kem has joined #dri-devel

14:05 <sima> karolherbst, use-after-free is a refcount issue

14:05 <sima> if you don't hold a full refcount, you can _only_ do a _very_ limited set of refcount checks

14:05 <sima> this aint a locking issue

14:08 <sima> s/refcount checks/rcu protect dma_fence checks/

14:08 <sima> (not yet enough oxygen in the brain after work out I guess)

14:10 <karolherbst> ehhh.. yes, but I also found another nouveau bug I think...

14:10 <sima> unless you do a weak reference protected by a lock or so, if you need locking to fix a use-after-free your design is very cursed

14:10 <karolherbst> the fence lock needs to be taken before calling into dma_fence_signal_locked, right?

14:11 <karolherbst> nouveau design here is a little cursed anyway

14:12 <karolherbst> anywya, my point was rather, that those interfaces are hard to use correctly and we should at least try to warn on certain patterns violating API contracts

14:12 aravind has quit [Ping timeout: 480 seconds]

14:12 <karolherbst> I look at this dma-fence code and it immediate rings "people try to outsmart locking" bells

14:13 <karolherbst> and what nouveau seems to be doing is to have a global fence list lock and takes this instead of taking the fence own locks

14:14 <gfxstrand> uh oh...

14:14 <sima> karolherbst, it's maybe just badly documented, but we assume your driver can cope with concurrent calls to this

14:14 <gfxstrand> Don't try to outsmart the locking. That will not go well for you.

14:15 <karolherbst> well.. dma_fence_signal_locked states "Unlike dma_fence_signal(), this function must be called with &dma_fence.lock held)" which nouveau absolteuly doesn't :)

14:16 <karolherbst> and I'd rather have the kernel warn on violating this rule

14:16 <sima> karolherbst, oh that's clearly a bug

14:16 <karolherbst> or drop the rule and put whatever is the actual rule

14:16 <karolherbst> okay :)

14:16 <karolherbst> so we _should_ warn on using it incorrectly

14:16 <sima> karolherbst, enable lockdep and it will

14:16 <karolherbst> it didn't

14:16 <sima> dma_fence_signal_timestamp_locked() has lockdep_assert_held(fence->lock);

14:17 <karolherbst> huh...

14:17 <karolherbst> maybe I should check that out again, but I was sure that lockdep didn't tell me anything

14:17 <karolherbst> but yeah.. dma_fence_signal_timestamp_locked has indeed a assert here...

14:17 <sima> karolherbst, you checked it's still running? lockdep gets disabled after the first splat

14:18 <karolherbst> uhhh... dunno actually

14:18 <sima> tbf I'd have surprised me if we'd indeed sucked that much

14:18 <sima> I've been sprinkling lockdep_assert_held and encouraged others to do the same for years now

14:18 <sima> they're both good documentation and good runtime checks

14:18 <karolherbst> yeah.. let me check that out again just to be sure

14:19 <karolherbst> maybe the problem was me running the full debug kernel and uhm.. doing other weird things

14:23 idr has joined #dri-devel

14:28 kzd has joined #dri-devel

14:31 benjaminl has joined #dri-devel

14:32 <sima> karolherbst, so locking at this again I think we're missing a load_acquire barrier before the various test_bit()

14:32 <karolherbst> potentially

14:33 <karolherbst> but `test_bit` isn't strictly an atomic operation, is it?

14:33 <sima> the test_and_set_bit is an rmw which on linux has full barriers

14:33 <sima> it is

14:33 <karolherbst> ahh, so it is

14:33 <sima> linux atomic bitops do not have atomic_ anywhere in their name

14:33 <karolherbst> silly

14:33 <sima> for entertainment value

14:33 <sima> the non-atomic versions have a __ prefix

14:33 <karolherbst> ....

14:34 <karolherbst> maybe we need a "keep naming closer to C11" patch

14:34 <karolherbst> the closest you can get to a CC all on the lkml

14:35 <sima> correction, only the rmw with return value have full barrier semantics

14:35 <karolherbst> pain

14:35 <karolherbst> it's still feels wrong

14:35 <karolherbst> s/it's/it/

14:36 <sima> so yeah we need a pile of smp_mb_after_atomic I think

14:36 <sima> for the "it's signalled already" case

14:36 <sima> plus a pile of comments

14:36 <karolherbst> yeah so my complain about dma_fence_signal specifically is, that it appears to be a locked operation but strictly isn't

14:37 <karolherbst> ehh

14:37 <karolherbst> I meant the other one

14:37 <karolherbst> dma_fence_is_signaled

14:37 <sima> yeah it's only a conditional barrier

14:37 <sima> or well, supposed to be, it's a bit buggy in that regard

14:37 <karolherbst> yep

14:37 <sima> this follows the design of waitqueue and completion and everything else in the linux kernel

14:38 <sima> so yeah this is how this works

14:38 <karolherbst> pain

14:38 <karolherbst> it shouldn't

14:38 <sima> imo it's the right semantics

14:38 alyssa has left #dri-devel [#dri-devel]

14:38 <sima> for completions or anything that looks like one

14:38 <karolherbst> I disagree :P

14:38 <sima> if your completion needs locking your design seriously smells

14:38 <karolherbst> yeah, probably

14:39 <karolherbst> I'm sure nouveaus code there is kinda wrong anyway, but oh well

14:39 <karolherbst> the future is just to use linas abstractions on this probably anyway :P

14:39 benjaminl has quit [Ping timeout: 480 seconds]

14:40 <sima> yeah

14:40 <sima> in general, if the barrier semantics of core primitives (completion, work, kref, anything really) don't work for you

14:40 <sima> you're doing something really fishy

14:40 <sima> the atomics are lolz because they don't match C11 and have inconsistent naming

14:41 <karolherbst> yeah... dunno.. maybe they work, but nouveau doesn't take fence locks but instead it's own lock across a list of fences

14:41 <karolherbst> so that's kinda fishy

14:41 <sima> but the other stuff is imo solid

14:41 <karolherbst> maybe it does so in a few places.. I found one where it doesn't

14:41 <sima> yeah if that fence list lock keeps the fence alive, then the irq handler might need to take it too

14:41 <sima> karolherbst, btw you're volunteering for the dma_fence barrier review patch?

14:41 <sima> it's fun, I promise :-P

14:42 <karolherbst> https://gitlab.freedesktop.org/drm/nouveau/-/blob/nouveau-next/drivers/gpu/drm/nouveau/nouveau_fence.c#L54

14:42 <karolherbst> call to `dma_fence_signal_locked`

14:42 <karolherbst> doesn't take fence->base.lock

14:42 <karolherbst> so maybe that's the actual bug here and everything just works after solving that

14:42 <karolherbst> or it should juse use dma_fence_signal instead.. dunno

14:43 <karolherbst> I'll check if lockdep screams at me

14:43 <sima> karolherbst, it should splat, so maybe it holds the lock somehow?

14:43 <sima> (assuming lockdep is on and all that)

14:43 <karolherbst> yeah.. dunno

14:47 <karolherbst> it doesn't seem to be at all.. so that's kinda confusing

14:47 sukrutb has quit [Ping timeout: 480 seconds]

14:48 <karolherbst> let's see how hard it screams at me...

14:49 <karolherbst> sima: uhh... you said it disables after the first splat?

14:50 <karolherbst> remembered that weird mm lockdep discussion we had a few weeks ago? :D

14:52 benjaminl has joined #dri-devel

14:52 <sima> karolherbst, no memories of that?

14:52 <karolherbst> that stuff happening on driver init loading firmware

14:53 <sima> oh request_firmware in the wrong place

14:53 <sima> yeah that'll kill lockdep

14:53 <karolherbst> :')

14:53 <karolherbst> pain

14:53 <karolherbst> I'll turn that one into a WARN_ON thing then (the dma-fence one) and see if it triggers

14:55 <sima> yeah

14:55 elongbug has quit [Read error: Connection reset by peer]

14:55 <sima> but also, you have to fix lockdep splats

14:55 <karolherbst> right...

14:55 <sima> or the bugs start piling in real bad, real fast

14:55 <karolherbst> let me try to do it over the next few weeks :')

14:55 <karolherbst> I hope this doesn't fix like most of hte nouveau instabilty problems

14:55 elongbug has joined #dri-devel

14:56 <sima> karolherbst, how old is that fw lockdep splat?

14:56 idr has quit [Remote host closed the connection]

14:56 idr has joined #dri-devel

14:58 <karolherbst> uhhh... dunno

14:58 <karolherbst> potentially old? I have no idea

15:00 jewins has joined #dri-devel

15:04 mbrost has joined #dri-devel

15:08 Duke`` has joined #dri-devel

15:09 lynxeye has quit [Quit: Leaving.]

15:10 <sima> karolherbst, just to have a guess of how much pain awaits you

15:10 benjaminl has quit [Ping timeout: 480 seconds]

15:10 <karolherbst> a lot

15:10 <sima> lockdep splats are an excellent canary for bad design and busted data structures

15:14 bmodem1 has quit [Ping timeout: 480 seconds]

15:23 digetx has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

15:28 tzimmermann has quit [Quit: Leaving]

15:30 Leopold___ has joined #dri-devel

15:34 Leopold_ has quit [Ping timeout: 480 seconds]

15:35 digetx has joined #dri-devel

15:51 vliaskov has quit [Remote host closed the connection]

15:51 benjaminl has joined #dri-devel

15:52 Dr_Who has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

15:54 Dr_Who has joined #dri-devel

15:58 swalker__ has quit [Remote host closed the connection]

15:59 tursulin has quit [Ping timeout: 480 seconds]

16:00 anujp has joined #dri-devel

16:12 benjaminl has quit [Ping timeout: 480 seconds]

16:13 f11f12 has quit [Quit: Leaving]

16:18 Dr_Who has quit []

16:19 Dr_Who has joined #dri-devel

16:25 benjaminl has joined #dri-devel

16:40 <mbrost> dakr: where is the latest version of gpuva? Also any idea if / when you plan on landing this upstream? We are fairly close to trying to land Xe with it.

16:40 idr has quit [Ping timeout: 480 seconds]

16:43 benjaminl has quit [Quit: WeeChat 3.8]

16:43 benjaminl has joined #dri-devel

16:47 Dr_Who has quit []

16:52 Dr_Who has joined #dri-devel

17:01 frankbinns has quit [Remote host closed the connection]

17:06 AndroUser2 has quit [Remote host closed the connection]

17:06 smiles_1111 has quit [Ping timeout: 480 seconds]

17:07 AndroUser2 has joined #dri-devel

17:11 anujp has quit [Remote host closed the connection]

17:14 anujp has joined #dri-devel

17:21 Dr_Who has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

17:38 iive has joined #dri-devel

17:40 HerrSpliet is now known as RSpliet

17:59 Guest3147 has quit [Remote host closed the connection]

17:59 Daanct12 has joined #dri-devel

18:19 AndroUser2 has quit [Remote host closed the connection]

18:19 AndroUser2 has joined #dri-devel

18:27 gouchi has joined #dri-devel

18:27 gouchi has quit [Remote host closed the connection]

18:30 ced117 has joined #dri-devel

18:53 AndroUser2 has quit [Remote host closed the connection]

18:58 AndroUser2 has joined #dri-devel

19:01 jkrzyszt has quit [Ping timeout: 480 seconds]

19:23 djbw_ has joined #dri-devel

19:39 <airlied> karolherbst: there was a lockdep splat we were seeing that was being ignored i think

19:39 <airlied> pretty sure i dont see it now after the fix

19:39 <karolherbst> yeah....

19:39 <airlied> but there is a new one to fix

19:40 <karolherbst> mind trying to reproduce it, because I sure can't

19:47 <jannau> is there a way to tell Xorg/modesetting to not use a kms device? preferably in a generic way like MatchDriver in OutputClass

19:48 <jannau> one option might be to patch modesetting to ignore "non-desktop" device

19:48 heat has quit [Read error: Connection reset by peer]

19:48 heat_ has joined #dri-devel

19:50 <jannau> we have now a working minimal implementation for the touchbar on apple silicon macbooks

19:52 Company has joined #dri-devel

19:54 <karolherbst> jannau: I suspect the plan is to have a special daemon take care of displaying stuff on it?

20:00 <jannau> karolherbst: yes, that exist now: https://github.com/WhatAmISupposedToPutHere/tiny-dfr

20:00 <karolherbst> I wonder if a dirty enough hack would be to ensure it starts before X

20:01 <jannau> but sddm/Xorg either grabs the touchbar kms device or fails to start if the daemon is already running

20:01 <jannau> unfortunately not

20:01 <karolherbst> uhh... right

20:01 <karolherbst> there was this drm lease thing, could that be useful?

20:01 <karolherbst> not sure it ever landed though

20:02 <karolherbst> ahh it did

20:02 <karolherbst> wayland docs: https://wayland.app/protocols/drm-lease-v1

20:02 sima has quit [Ping timeout: 480 seconds]

20:02 <karolherbst> the basic idea behind DRM leases are is to hand over control of a decide to a client

20:04 <karolherbst> but no idea if that also works in X

20:04 axeldavy has joined #dri-devel

20:04 <airlied> karolherbst: karolherbst: https://paste.centos.org/view/raw/36a38a6b was the classic one I've been seeing, which I think is the fw one you mentioned

20:05 <airlied> but I'm not seeing it at the moment in that form

20:05 <jannau> it works with gdm + plasma under wayland. either since those respect a multi-seat config or because of "non-desktop" = 1

20:05 <karolherbst> airlied: yeah.. it's kinda random it seems

20:06 <karolherbst> jannau: right... I think DRM leases would allow you to do this without relying on any kind of configs. But I also see why compositors/X shouldn't use the touchbar anyway, because it's really not a display in the common sense and it probably messes up things

20:06 <airlied> but yeah once that goes down you ain't seeing anything else

20:06 <karolherbst> but I think with DRM leases the daemon won't need root privileges

20:07 <karolherbst> but no idea how mature all of this is

20:07 axeldavy is now known as adavy

20:08 <karolherbst> but yeah.. I guess Xorg shouldn't use `non-desktop` displays at all, but not sure if that's the responsibility of the compositor in an X world or not

20:08 <karolherbst> and also not sure if we even care enough

20:09 <karolherbst> just let X die already 🙃

20:10 <airlied> there's a difference between non-desktop displays and non-desktop kms drivers though

20:18 <jannau> we're ready to kill X on apple silicon devices (declaring it broken and unsupported) but we need a sddm release with wayland support first

20:18 <DavidHeidelberg[m]> any last reviews before I merge the new farm ON/OFF handling? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23629

20:19 <karolherbst> uhh... sddm still doesn't support wayland? mhhh

20:19 <karolherbst> annoying

20:19 <jannau> airlied: what is a non-desktop kms driver?

20:19 <airlied> well a kms driver that is attached to a single small display that isn't for desktop use

20:20 <jannau> sddm git supports wayland but the latest release (~2 years old) does not

20:20 <karolherbst> ahh, fair

20:21 <jannau> fedora has afaik a git snapshot with wayland support

20:24 <karolherbst> I think if nobody has something against Xorg ignoring such devices then it's fine, probably

20:31 <karolherbst> gfxstrand: I'm sure you are going to like this: OpMemoryBarrier with non constant semantics

20:33 <karolherbst> aparently that's legal in CL SPIR-V :)

20:34 <karolherbst> whyyyyyyy

20:35 frankbinns has joined #dri-devel

20:38 frankbinns1 has joined #dri-devel

20:39 <cmarcelo> karolherbst: ugh :( do you have a test around for that?

20:40 f11f12 has joined #dri-devel

20:40 <karolherbst> yes, but it's SyCL

20:40 <cmarcelo> karolherbst: can I run it using rusticl?

20:41 <jenatali> karolherbst: just apply all?

20:41 <karolherbst> uhhhh...

20:41 <karolherbst> cmarcelo: https://gist.github.com/karolherbst/2965158c53342bb7b3f87b85d97f07e2

20:41 <karolherbst> I think this is easier

20:42 <gfxstrand> Applying all should be a legal implementation

20:42 <karolherbst> maybe we could scan all possible values...

20:42 <karolherbst> but uhhhh

20:43 <jenatali> I could see a pass that gathers bcsels and phis and ors all the constant values, but you'd have to give up pretty easily once you get out of that pattern

20:43 <karolherbst> yeah...

20:44 frankbinns has quit [Ping timeout: 480 seconds]

20:44 <karolherbst> well... I think it has to resolve to constants

20:44 <karolherbst> so it's just a phi of constants in the end

20:44 <jenatali> Oh then that's not so bad

20:45 <jenatali> Could even do that straight in vtn

20:45 <karolherbst> maybe we could do this: if it all resolves to constant, use that, otherwise do all

20:45 <karolherbst> jenatali: well.. I suspect it could also be nested phis

20:45 <jenatali> Sure

20:45 <karolherbst> maybe llvm gets smart and adds alus on it and we get cursed spir-v

20:46 <jenatali> And then we give up lol

20:46 <jenatali> Unless those alus are bcsels

20:46 <karolherbst> I think it would still all constant fold

20:46 <karolherbst> maybe we should have a scoped barrier intrinsics with variable semantics

20:46 <karolherbst> and then after constant folding we resolve it

20:47 <karolherbst> and we shall call it cursed_scoped_barrier

20:48 <jenatali> Appropriate name at least

20:48 <jenatali> I'd just as soon put it straight in vtn and not try hard at all before giving up

20:48 <karolherbst> I'll deal with lower hanging fruits for now

20:48 <karolherbst> jenatali: btw, are you subscribed to the OpenCL label?

20:49 <jenatali> I am

20:49 <karolherbst> ahh, okay

20:49 <karolherbst> I have more additions to clc :D

20:49 kasper93 has joined #dri-devel

20:49 <jenatali> I'll be honest though I haven't been paying too much attention, just following to make sure we don't get broken

20:50 <karolherbst> I'm just adding options to the validator

20:50 <karolherbst> as apparently the validator checks number of function args and....

20:50 <jenatali> Oh I saw that one. I'm not in the office today but remind me on Monday and I can take a look

20:50 <karolherbst> cool :)

20:51 <karolherbst> I also fixed a bug, because apparently if you pass options, the "src" is a null string killing the logger :)

20:51 Duke`` has quit [Ping timeout: 480 seconds]

20:51 <jenatali> Hah

20:51 <karolherbst> and I was confused about those "(file=" errors :)

21:07 mbrost_ has joined #dri-devel

21:12 mbrost has quit [Ping timeout: 480 seconds]

21:14 bgs has quit [Remote host closed the connection]

21:18 mbrost_ has quit [Ping timeout: 480 seconds]

21:21 frankbinns1 has quit [Remote host closed the connection]

21:21 rasterman has quit [Quit: Gettin' stinky!]

21:41 <jannau> airlied: looks like rejecting devices with max_width or max_height smaller than 120 pixels (arbitrarily chosen) would be easier than checking the non-desktop connector property

21:46 <jannau> desktop at a smaller resolution than 320x240 would be painful smaller than quarter of that hopefully hits not too many fringe use cases

21:51 AndroUser2 has quit [Ping timeout: 480 seconds]

22:06 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

22:06 TMM has joined #dri-devel

22:11 kasper93_ has joined #dri-devel

22:16 kasper93 has quit [Ping timeout: 480 seconds]

22:19 kasper93_ has quit [Ping timeout: 480 seconds]

22:22 jfalempe has quit [Quit: Leaving]

22:26 <DavidHeidelberg[m]> Merged the updated farm handling, if for some reason you'll see some anomaly when testing MR or running Marge, please ping me, it could be related. Just for sure!

22:27 <DavidHeidelberg[m]> *anomaly = missing or having too many farms inside the pipeline

22:38 jewins has quit [Quit: jewins]

22:38 Kayden has quit [Remote host closed the connection]

22:38 jewins has joined #dri-devel

22:41 Kayden has joined #dri-devel

22:49 frankbinns1 has joined #dri-devel

22:57 jewins has quit [Quit: jewins]

22:57 jewins has joined #dri-devel

23:03 Company has quit [Quit: Leaving]

23:05 iive has quit [Quit: They came for me...]

23:16 Dr_Who has joined #dri-devel

23:20 <karolherbst> airlied: wanna review the llvmpipe part of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22893 ? thanks

23:21 <karolherbst> Kayden: I know I kinda hand waved a lot on this topic in the past, but what's the actual deal with the SIMD situation with iris? 8, 16, 32 are generally supported? Or can certain CSOs only support certain SIMD sizes, and how would I know? And it appears that 16 is somehow the one which is preferred?

23:24 Dr_Who has quit [Ping timeout: 480 seconds]

23:27 benjaminl has quit [Ping timeout: 480 seconds]

23:29 <Kayden> for compute, it depends on the local workgroup size

23:29 <Kayden> the shaders can run in 8, 16, or 32 channels

23:30 <Kayden> well...be dispatched in groups of 8, 16, or 32 lanes

23:30 <karolherbst> right... so I started to implement CL subgroups and they are kinda annoying

23:31 <Kayden> not surprising

23:31 <karolherbst> sooo.. on some gpus it appears you can launch 56 sub groups max, which doesn't work with e.g. SIMD16 if you want to launch 1024 threads

23:31 <Kayden> yeah. for those, we force to SIMD32 :/

23:31 <karolherbst> so I'm mostly trying to understand what's like the optimalSIMD size in general

23:31 <karolherbst> or rather what gives more perf

23:32 <karolherbst> I'm planning to add a cb to get the SIMD size for a given block: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22893/diffs?commit_id=50245f910d14bba01393c871be75c71463a07701

23:32 <karolherbst> so atm it's mostly just "what's optimal"

23:32 <karolherbst> e.g. Intel seems to limit to 256 threads in CL

23:32 <karolherbst> and I suspect it's because of their messy SIMD situation :)

23:32 <Kayden> Almost nothing can run in SIMD32, so those programs are basically pairs of SIMD16 instructions for each half

23:32 <Kayden> so we only do that when necessary. it can perform better for things like simple pixel shaders

23:33 <Kayden> but it's often really bad for register pressure too

23:33 <karolherbst> I see

23:33 <Kayden> SIMD16 is usually the sweet spot as long as your register pressure isn't terrible

23:33 <Kayden> memory access can normally happen 16 channels at a time

23:33 <karolherbst> maybe I should do some benchmarks and force a SIMD mode and see how it goes

23:33 <karolherbst> okay

23:33 <karolherbst> mhhh

23:33 AndroUser2 has joined #dri-devel

23:34 <karolherbst> I wonder if I want to limit to max_subgroups * preferred_simd_size (being 16) for iris

23:34 <karolherbst> Kayden: btw, "devinfo->max_cs_workgroup_threads" is the amount of subgroups, right?

23:34 <karolherbst> or is there a different limit I'm not aware of

23:35 <Kayden> I guess so

23:35 <karolherbst> I was seeing worse perf compared to Intel's stack, so I'm actually wondering if this has anything to do with the SIMD size forced based on the block size

23:35 <Kayden> I was thinking max_cs_threads but it looks like that's basically clamped

23:35 <Kayden> err

23:35 <Kayden> max_cs_workgroup_threads is a clamped version of max_cs_threads

23:35 <Kayden> so it's probably what you want, yeah

23:35 <karolherbst> it reports 56 on gen0.5

23:35 <karolherbst> *9.5

23:36 <karolherbst> and 64 on gen12

23:36 <karolherbst> which kinda makes sense

23:36 <Kayden> mmm

23:37 <Kayden> yeah it's clamped to 64 based on the bit-width of a GPGPU_WALKER field...

23:37 <Kayden> you can actually have 112 threads on gen12...

23:37 frankbinns1 has quit [Remote host closed the connection]

23:37 <karolherbst> huh

23:37 <Kayden> I think we're dispatching height 1 row N grids, I guess we'd have to do a rectangular grid

23:38 <karolherbst> yeah well.. doesn't really matter as long as max_cs_workgroup_threads * 16 stays at or above the max threads

23:38 <karolherbst> on my 9.5 it's werid because 56 * 16 < 1024

23:38 frankbinns1 has joined #dri-devel

23:39 <karolherbst> so if I report 1024 threads, applications might just enqueue 1024 threads per block and force SIMD32

23:41 <karolherbst> but again, intel only reports 256 threads and 8 as the subgroup size, so maybe that allows for even more optimized code? no clue.. it's kinda weird

23:43 <karolherbst> the one advantage I have is, that if an application does not specify the block size I can pick freely, so for this alone it would already be helpful know what's the best to pick

23:44 <karolherbst> if e.g. SIMD8 is generally the fastest, I'd just use that

23:44 <karolherbst> unless the app sets constraints making that impossible

23:45 jewins has quit [Ping timeout: 480 seconds]

23:48 <Kayden> yeah, awkwardly I think SIMD16 is the best, but it really depends :/

23:56 ngcortes has joined #dri-devel

23:59 AndroUser2 has quit [Remote host closed the connection]