#dri-devel on 2021-05-26 — irc logs at oftc.irclog.whitequark.org

00:06 pnowack has quit [Quit: pnowack]

00:12 uzi has quit [Ping timeout: 480 seconds]

00:16 pcercuei has quit [Quit: dodo]

00:19 uzi has joined #dri-devel

00:43 uzi has quit [Ping timeout: 480 seconds]

00:46 uzi has joined #dri-devel

01:01 alatiera5 has joined #dri-devel

01:05 alatiera has quit [Ping timeout: 480 seconds]

01:08 orbea has quit [Ping timeout: 480 seconds]

01:14 uzi has quit [Ping timeout: 480 seconds]

01:21 uzi has joined #dri-devel

01:35 Kayden has quit [Read error: Connection reset by peer]

01:35 Kayden has joined #dri-devel

01:40 uzi has quit [Remote host closed the connection]

01:45 uzi has joined #dri-devel

01:46 yann-kaelig has quit []

01:52 orbea has joined #dri-devel

02:07 heat has quit [Ping timeout: 480 seconds]

02:12 Lucretia has quit []

02:14 uzi has quit [Ping timeout: 480 seconds]

02:20 uzi has joined #dri-devel

02:45 uzi has quit [Ping timeout: 480 seconds]

02:47 uzi has joined #dri-devel

03:04 macromorgan_ has joined #dri-devel

03:04 macromorgan has quit [Read error: Connection reset by peer]

03:05 macromorgan_ has quit []

03:05 macromorgan has joined #dri-devel

03:17 uzi has quit [Ping timeout: 480 seconds]

03:22 uzi has joined #dri-devel

03:37 xlei has joined #dri-devel

03:45 uzi has quit [Read error: Connection reset by peer]

03:46 uzi has joined #dri-devel

03:48 cef is now known as Guest4929

03:48 cef has joined #dri-devel

03:55 Guest4929 has quit [Ping timeout: 480 seconds]

03:56 xlei has quit [Quit: ZNC - https://znc.in]

03:56 vivijim has quit [Remote host closed the connection]

04:02 pnowack has joined #dri-devel

04:17 uzi has quit [Ping timeout: 480 seconds]

04:20 gpoo has quit [Ping timeout: 480 seconds]

04:23 uzi has joined #dri-devel

04:25 Plagman has joined #dri-devel

04:27 jernej_ is now known as jernej

04:28 blue__penquin has joined #dri-devel

04:32 V has joined #dri-devel

04:32 TD-Linux has joined #dri-devel

04:32 idr has quit [Quit: Leaving]

04:43 uzi has quit [Read error: Connection reset by peer]

04:43 uzi has joined #dri-devel

04:50 danvet has joined #dri-devel

04:57 bcarvalho has quit [Remote host closed the connection]

04:58 bcarvalho has joined #dri-devel

05:01 aiddamse has joined #dri-devel

05:18 uzi has quit [Ping timeout: 480 seconds]

05:18 uzi has joined #dri-devel

05:24 xlei has joined #dri-devel

05:47 uzi has quit [Remote host closed the connection]

05:49 uzi has joined #dri-devel

05:58 * airlied opens a can of self created worms in draw with primitive ids

06:01 bcarvalho has quit [Remote host closed the connection]

06:02 bcarvalho has joined #dri-devel

06:09 DPA has quit [Quit: ZNC 1.8.2+deb2~bpo10+1 - https://znc.in]

06:09 DPA has joined #dri-devel

06:16 aravind has joined #dri-devel

06:18 uzi_ has joined #dri-devel

06:18 uzi has quit [Read error: Connection reset by peer]

06:21 aiddamse has quit [Remote host closed the connection]

06:22 aiddamse has joined #dri-devel

06:26 aravind has quit [Ping timeout: 480 seconds]

06:36 mbrost has quit [Ping timeout: 480 seconds]

06:37 <JoshuaAshton> airlied: Pixel ordered primitive IDs?

06:40 <airlied> JoshuaAshton: hehe, not just broken primitive id passing between tess/geom/frag, I kinda ignored it before, now I can't avoid it

06:47 uzi_ has quit [Read error: Connection reset by peer]

06:47 uzi has joined #dri-devel

06:48 <JoshuaAshton> ah

06:53 bcarvalho_ has joined #dri-devel

06:53 bcarvalho has quit [Read error: Connection reset by peer]

07:11 pekkari has joined #dri-devel

07:15 uzi has quit [Remote host closed the connection]

07:16 uzi has joined #dri-devel

07:23 blue__penquin has quit []

07:28 aravind has joined #dri-devel

07:34 mlankhorst has joined #dri-devel

07:34 aiddamse has quit [Ping timeout: 480 seconds]

07:42 evadot has joined #dri-devel

07:58 lynxeye has joined #dri-devel

08:06 turol has joined #dri-devel

08:09 pcercuei has joined #dri-devel

08:30 dolphin has joined #dri-devel

08:42 SanchayanM has joined #dri-devel

08:51 dolphin has left #dri-devel [#dri-devel]

08:53 arnd has quit [Quit: Updating details, brb]

08:53 arnd has joined #dri-devel

09:01 dolphin has joined #dri-devel

09:04 pcercuei has quit [Quit: brb]

09:05 pcercuei has joined #dri-devel

09:07 pcercuei has quit []

09:07 xantoz has joined #dri-devel

09:08 pcercuei has joined #dri-devel

09:09 Surkow|laptop has joined #dri-devel

09:09 pcercuei has quit []

09:10 pcercuei has joined #dri-devel

09:12 bl4ckb0ne has quit [Remote host closed the connection]

09:12 emersion has quit [Remote host closed the connection]

09:13 bl4ckb0ne has joined #dri-devel

09:14 emersion has joined #dri-devel

09:17 <danvet> mripard, do you have time perhaps for "[PATCH 05/11] drm/atomic-helper: make drm_gem_plane_helper_prepare_fb the default" and later patches in that series?

09:19 frieder has joined #dri-devel

09:20 SanchayanM has quit []

09:20 SanchayanM has joined #dri-devel

09:46 ceyusa has joined #dri-devel

09:59 pcercuei has quit [Quit: brb]

09:59 ceyusa has quit [Remote host closed the connection]

10:01 pcercuei has joined #dri-devel

10:09 ceyusa has joined #dri-devel

10:16 rossy has joined #dri-devel

10:16 <mripard> danvet: acked-by for all of them

10:17 <danvet> mripard, did you double-check I didn't butcher logic too much, like what noralf spotted?

10:18 * danvet occasionally very blind

10:33 drawat has joined #dri-devel

10:36 SanchayanM has quit []

10:37 SanchayanM has joined #dri-devel

10:39 pcercuei has quit [Quit: brb]

10:40 SanchayanM has quit []

10:40 SanchayanMaity_ has joined #dri-devel

10:41 SanchayanMaity_ has quit []

10:41 pcercuei has joined #dri-devel

10:48 <drawat> Hi could I get ack/review for https://lists.freedesktop.org/archives/dri-devel/2021-May/307338.html It's been long since I contributed to dri-devel, not sure if my commit rights to drm-mics are still there. Any way to check that?

11:14 <emersion> drawat: `ssh <username>@git.freedesktop.org` should tell you. "permission denied" means negative, "Missing original command" means positive

11:15 <emersion> well, that doesn't check whether you can push to drm-misc specifically

11:17 lemonzest has joined #dri-devel

11:24 bcarvalho_ has quit []

11:36 vivijim has joined #dri-devel

11:42 gpoo has joined #dri-devel

11:53 thellstrom has joined #dri-devel

11:54 heat has joined #dri-devel

12:00 <thellstrom> danvet, airlied: Ack to move the fast WC memcpy from i915 to drm? https://patchwork.freedesktop.org/patch/435252/?series=90022&rev=4

12:02 <danvet> thellstrom, Documentation/gpu/drm-mm.rst include is missing

12:02 <danvet> also thematically I think this fits into the drm_clflush.c hacks we also have ...

12:03 <danvet> also I have no idea what _dbm means

12:04 <danvet> I think _iomem is the better suffix there?

12:04 <danvet> (maybe needs a mv drm_clflush.c drm_memory.c or something like that

12:04 <danvet> aside from bikesheds I think it's all ok to have in drm core

12:04 <danvet> thellstrom, maybe one more: shouldn't we pull the fallback into these functions?

12:05 <danvet> fallback as in the right version of memcpy/memcpy_fromio

12:06 * danvet looks and realizes maybe shouldn't have looked

12:07 <danvet> I have no idea what the fallbacks even do from a cursory look ...

12:07 <pcercuei> dbm == decibels, no? :)

12:08 <thellstrom> We could probably move in the callbacks if needed, but at least some paths in i915 appears to deliberately not have fallback.

12:09 <thellstrom> *fallbacks

12:10 <danvet> yeah, but why?

12:11 <danvet> I'd expect such a helper to be essentially memcpy_but_faster

12:11 <danvet> but it's not

12:12 <danvet> so before we lift it to something subsystem sanction should answer that

12:12 <danvet> and decide whether that's a good reason or not to have this explicit fallback

12:12 <danvet> or whether it's just complexity because we can, of which there's unfortunately way too much in i915 gem all over

12:13 <danvet> I can't tell from looking at it quickly, that's for sure ...

12:13 <thellstrom> OK, I'll take a look at that.

12:18 <danvet> thellstrom, we don't have to fix it all right away, we could leave the i915 version around and take a note about fixing it later imo

12:18 <danvet> if the answer is "doesn't make that much sense really"

12:19 aravind has quit []

12:20 <thellstrom> danvet, That sounds like a better option. Then we can do the dma-buf-map version only in drm and lift the fallbacks from TTM.

12:25 Lightkey has quit [Ping timeout: 480 seconds]

12:33 bcarvalho has joined #dri-devel

12:34 Lightkey has joined #dri-devel

12:44 jnd has quit [Quit: Connection closed]

12:47 gnustomp[m] has joined #dri-devel

12:54 gnustomp[m] has quit [Quit: authenticating]

12:54 gnustomp[m] has joined #dri-devel

12:56 gnustomp[m] has quit []

12:56 gnustomp[m] has joined #dri-devel

12:58 gnustomp[m] has quit []

12:58 gnustomp[m] has joined #dri-devel

13:09 <mripard> danvet: yep, as far as I can tell they look good aside from what Noralf pointed out

13:09 <danvet> mripard, did you reply on-list?

13:09 * danvet somewhat burried ...

13:11 FireBurn has joined #dri-devel

13:13 FireBurn has quit []

13:13 FireBurn has joined #dri-devel

13:14 adjtm_ has quit []

13:14 adjtm has joined #dri-devel

13:14 karolherbst has quit [Remote host closed the connection]

13:18 karolherbst has joined #dri-devel

13:19 jcline has joined #dri-devel

13:19 blue__penquin has joined #dri-devel

13:19 karolherbst has quit []

13:20 karolherbst has joined #dri-devel

13:22 FireBurn has quit []

13:22 FireBurn has joined #dri-devel

13:24 adjtm has quit []

13:24 bcarvalho_ has joined #dri-devel

13:24 adjtm has joined #dri-devel

13:24 FireBurn has quit []

13:25 FireBurn has joined #dri-devel

13:25 bcarvalho has quit [Read error: Connection reset by peer]

13:26 bcarvalho_ has quit []

13:26 bcarvalho has joined #dri-devel

13:27 bcarvalho is now known as bcarvalho_

13:27 adjtm has quit []

13:27 adjtm has joined #dri-devel

13:27 bcarvalho_ is now known as bcarvalho

13:28 FireBurn has quit []

13:28 FireBurn has joined #dri-devel

13:29 drawat has quit [Remote host closed the connection]

13:29 drawat has joined #dri-devel

13:32 blue__penquin has quit []

13:44 <dschuermann> karolherbst: imirkin: can one of you have a look at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9254/diffs?commit_id=dcfb33480f3c738570b1699edb781e4f22627b64 please?

13:45 <karolherbst> dschuermann: I guess this is fine, but I can do a test run later

13:49 <dschuermann> if you prefer to, you can also create a bug report later :P

13:49 adjtm_ has joined #dri-devel

13:49 <karolherbst> dschuermann: well.. we are using nir on volta+

13:50 adjtm has quit [Remote host closed the connection]

13:51 <pq> in "shared fence", does "shared" refer to letting multiple actors continue simultaneously or cross-device fences?

13:51 <pq> or something else?

13:53 xp4ns3 has joined #dri-devel

13:54 <danvet> pq, multiple concurrent actors

13:54 <danvet> might or might not be cross device

13:54 <pq> thanks, so it does mean I thought it means

14:07 <daniels> shared == read, excl == write

14:08 <daniels> so excl synchronises against everything before it and blocks everything after it, whereas shared only synchronises against any prior excl

14:08 adjtm_ is now known as adjtm

14:08 <emersion> is producer/consumer a good enough approximation?

14:09 SolarAquarion has joined #dri-devel

14:11 adjtm has quit [Quit: Leaving]

14:11 adjtm has joined #dri-devel

14:11 <karolherbst> dschuermann: best case you get the result in a few hours

14:13 <dschuermann> karolherbst: thx!

14:14 alatiera5 has left #dri-devel [#dri-devel]

14:15 alatiera has joined #dri-devel

14:15 <daniels> emersion: yeah, that works too

14:27 heat_ has joined #dri-devel

14:28 heat has quit [Read error: Connection reset by peer]

14:30 heat_ has quit []

14:30 heat has joined #dri-devel

14:38 robher has joined #dri-devel

14:49 tango_ has quit [Quit: I'm never quite so stupid as when I'm being smart (Linus van Pel)]

14:53 tango_ has joined #dri-devel

14:54 pmoreau[m] is now known as Guest8

14:59 heat_ has joined #dri-devel

15:00 heat_ has quit []

15:00 heat_ has joined #dri-devel

15:00 heat_ has quit [Remote host closed the connection]

15:01 heat_ has joined #dri-devel

15:03 xp4ns3 has quit [Quit: Konversation terminated!]

15:03 xp4ns3 has joined #dri-devel

15:04 adjtm is now known as Guest17

15:04 adjtm has joined #dri-devel

15:05 heat has quit [Ping timeout: 480 seconds]

15:05 Guest17 has quit [Remote host closed the connection]

15:06 heat_ has quit []

15:07 heat has joined #dri-devel

15:17 vivijim has quit []

15:17 vivijim has joined #dri-devel

15:24 <alyssa> what's the semantic for load_barycentric_pixel?

15:24 karolherbst has quit [Read error: Connection reset by peer]

15:24 karolherbst has joined #dri-devel

15:25 <jekstrand> It loads the barycentric coordinates for the center of the pixel

15:25 <karolherbst> dschuermann: ehh.. I just wanted to tell that I didn't find any regressions, but it failed to build and install :D

15:26 <jekstrand> As opposed to a particular sample or a particular offset from center

15:26 <karolherbst> llvmpipe build fails

15:26 <jekstrand> alyssa: ^^

15:27 <alyssa> jekstrand: Ah..

15:28 <alyssa> bifrost has a "load at center" mode and a "load at sample position if per-sample shading, else load at center" mode

15:28 <alyssa> and a particular optimization requires the latter mode is used

15:29 <karolherbst> alyssa: uhh.. that remindes me that our instructions are also a bit annoying here. So you can specify a mode on the load instruction, but can overwrite it through metadata as well :/

15:30 <jekstrand> On older hardware, we go out of our way to look for interpolate+load_barycentric combinations and handle them as a fused op. On ICL+, we handle them as two things.

15:31 <jekstrand> For at_sample without MSAA, we have a pass which smashes to center

15:31 <alyssa> All of our hw is interpolate+load_barycentric fused :o

15:31 <alyssa> Mali tries really hard to avoid shader keys. It's kinda cute.

15:32 <alyssa> Just as the hw influences the sw, the sw (LLVM) influences the hw ;p

15:33 <karolherbst> alyssa: yeah.. it's also kind of fused for us, but also kind of not

15:34 <jekstrand> It's not actually fused for us. The barycentrics come in with the payload and we interpolate. It's just that we have PLN pre-ICL and it's a tricky instruction with some "special" semantics. It's easier to just fuse.

15:35 Chaekyung has quit [Remote host closed the connection]

15:52 bcarvalho_ has joined #dri-devel

15:52 bcarvalho has quit [Read error: Connection reset by peer]

15:59 <imirkin> karolherbst: it's *pretty* fused. as opposed to e.g. amd where you can just get the i/j coords

16:00 <karolherbst> yeah

16:00 <karolherbst> sure, it's more fused than unfused

16:00 <karolherbst> some details are just a little annoying

16:01 <imirkin> like the offsets being in some weird fixed point format? :)

16:01 <karolherbst> yep

16:02 FireBurn has quit [Ping timeout: 480 seconds]

16:04 alanc has quit [Remote host closed the connection]

16:05 alanc has joined #dri-devel

16:09 <dschuermann> karolherbst: the build should be fixed now, said pendingchaos (can I register somewhere as IRC bot? :D )

16:09 <karolherbst> yeah, it's fine

16:10 <karolherbst> I am seeing a few regressions though...

16:10 <karolherbst> I will take a look and comment on the MR

16:10 <jekstrand> It's unfused on turing, though, isn't it?

16:12 <imirkin> jekstrand: what makes you say that?

16:12 <imirkin> i don't really see that, but perhaps it's hidden

16:12 <imirkin> or perhaps i don't understand what 'fused' is

16:12 <imirkin> (or the most likely option -- both!)

16:12 <karolherbst> it's less fused than previous gens

16:13 <karolherbst> imirkin: check the interp lowering for gv100

16:13 <jekstrand> imirkin: They added an NV extension for explicit barycentrics

16:13 <imirkin> karolherbst: yeah, for PINTERP, but it's not too different than usual

16:13 <imirkin> jekstrand: ah, probably that stuff is accessible now

16:13 <jekstrand> Which is even niftier than the AMD one

16:13 heat_ has joined #dri-devel

16:13 <karolherbst> jekstrand: I think this is a bit optional though

16:13 <imirkin> but it doesn't "have" to be used

16:13 <jekstrand> sure

16:13 <jekstrand> I could believe that

16:14 <imirkin> in a couple gens they'll drop the old way

16:14 heat has quit [Remote host closed the connection]

16:14 <karolherbst> probably

16:15 <imirkin> or perhaps there's enough benefit in maintaining the "common" case in hw

16:16 <karolherbst> imirkin: I guess because general compute becomes more and more important, they will probably use the space for something else :D

16:16 <karolherbst> but maybe it doesn't matter..

16:17 <karolherbst> mhh dschuermann: "KHR-GL46.gpu_shader_fp64.builtin.isnan" regresses :/

16:17 ngcortes has joined #dri-devel

16:17 FireBurn has joined #dri-devel

16:18 <dschuermann> do you have the NIR shader at hand?

16:19 <karolherbst> currently bisecting

16:19 <karolherbst> it's probably something trivial though... oh well..

16:20 <karolherbst> ehhh

16:20 <karolherbst> not all commits compile

16:20 <dschuermann> lovely :)

16:22 <karolherbst> the vertex shader is a nop

16:24 <karolherbst> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9254#note_933798

16:28 <karolherbst> dschuermann: ahh, llvmpipe also regresses :)

16:28 FireBurn has quit [Read error: Connection reset by peer]

16:28 FireBurn has joined #dri-devel

16:30 <pendingchaos> seems it's because the comparison isn't marked exact, nir_opt_algebraic assumes the operands are not NaN

16:30 <karolherbst> probably

16:33 <karolherbst> pendingchaos: I just hope most of your stats are not because of isnan being wrongly implemented now :p

16:33 <karolherbst> *stat changes

16:35 <Venemo> in NIR, do we have a good way to express that a divergent branch is always taken?

16:36 frieder has quit [Remote host closed the connection]

16:36 <Venemo> if not, what would be the right way to approach that?

16:36 <karolherbst> Venemo: we have a divergency analysis pass if that helps

16:36 <bnieuwenhuizen> a branch that is always taken is not idvergent right?

16:36 <karolherbst> but it's mainly to find uniform values

16:36 <karolherbst> so if the condition is uniform your know which branch will be taken

16:36 <karolherbst> sometimes

16:36 <Venemo> not exactly

16:36 <karolherbst> ehh wait

16:36 <pendingchaos> Venemo: you mean that at least one invocation in the subgroup takes the branch?

16:37 <Venemo> yes, that's what I mean

16:37 <karolherbst> you know if the branch taken is uniformly taken

16:37 <karolherbst> ahh

16:37 <Venemo> I know I know about the divergence analysis, but that's not what I need now

16:37 <karolherbst> yeah sorry.. I missunderstood what you asked for :)

16:38 <Venemo> sometimes you can know that at least 1 invocation will be active in a block, always

16:38 <Venemo> for example, with elect, or thread_id<N (where you can prove N!=0)

16:38 <karolherbst> yeah.. I guess for that you probably have to write a pass analysing the conditions

16:39 <Venemo> these are divergent

16:39 <Venemo> would it be allright to add a boolean field to nir_if to let the backend know this?

16:40 <karolherbst> we already added stuff for divergency, so I guess it would be :D

16:40 <Venemo> this is not the same as divergency

16:40 <Venemo> divergence*

16:40 <karolherbst> I know

16:40 <karolherbst> but if you are worried about using more spacve

16:40 <karolherbst> the divergent bool is 1 byte and you can bitfield it

16:41 <Venemo> I'm not worried :)

16:41 <karolherbst> ehh wait

16:41 <karolherbst> wrong struct

16:41 <karolherbst> Venemo: okay.. :D

16:43 <Venemo> currently we always emit a branching instruction for divergent branches. the reason I'm interested in this is in order not to emit that instruction when we know a branch is always taken by at least 1 invocation

16:45 <dschuermann> Venemo: pendingchaos wrote something like that for the atomic optimization

16:45 <dschuermann> that detects if some branch is taken by one invocation. it's not used for anything but to skip the optimization in this case ;)

16:45 <Venemo> can you point me to where that is?

16:46 <Venemo> is it nir_opt_uniform_atomics?

16:46 <pendingchaos> yes, is_atomic_already_optimized()

16:46 <pendingchaos> it can return true for branches which are taken by more than one invocation though

16:47 <Venemo> that's not an issue

16:47 <pendingchaos> (not likely in any realistic code though)

16:47 <Venemo> I'd like true for branches that are taken by >= 1 invocations

16:48 <dschuermann> we could probably generalize some analysis like that and flag cf_nodes which are always taken by at least one invocation

16:48 idr has joined #dri-devel

16:49 <dschuermann> Venemo: I don't really see that giving an edge over the vskip heuristik, though

16:50 <pendingchaos> not sure if it's possible for is_atomic_already_optimized() to return true for branches which are not taken by any invocations

16:53 <Venemo> if the branch isn't take by any invocations, then it should be false yeah.

16:53 <danvet> daniels, since you typed up the X11 and wayland version, how's Xwayland different?

16:53 <danvet> for completeness

16:53 * danvet enjoyed that read

17:05 ngcortes has quit [Remote host closed the connection]

17:08 lynxeye has quit []

17:09 xp4ns3 has quit [Quit: Konversation terminated!]

17:12 xp4ns3 has joined #dri-devel

17:14 <pendingchaos> karolherbst: I've added "glsl,glsl/nir: emit exact comparisons for isnan() and isinf()" to the MR and the test now passes on radeonsi

17:16 <karolherbst> pendingchaos: yeah, that fixes it for llvmpipe and nouveau as well

17:31 <thellstrom> danvet, airlied: Also ack for Maarten's drm vma patch https://patchwork.freedesktop.org/patch/435262/?series=90022&rev=4 ?

17:41 <danvet> a-b: me

17:41 <danvet> maybe include a bit more blabla on this one, since iirc the plan is that this is only temporary?

17:41 <danvet> or am I confused

17:42 <daniels> danvet: the common case for Xwayland is that it's just passthrough to the compositor, so if you were to type up DRI3.2 with UMF, then you pass the UMF through and let upstream compositor deal with it; if anyone ever does XGetImage, or if you need to do X11-internal composition (e.g. subwindow trees), you fall back to a dumb spin

17:42 <danvet> daniels, so all the X11 fun you described just doesn't apply to Xwayland?

17:43 <danvet> everyone gets their own private buffer and all that goes along with that?

17:50 pekkari has quit []

17:53 <alyssa> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10984.patch

17:54 <alyssa> this might be the most cryptic commit I've seen in ages 😁

17:54 <ccr> eh

17:57 <karolherbst> yeah...

17:57 <karolherbst> that's not how you do open source :p

17:58 <alyssa> 🍿

17:58 neonking has joined #dri-devel

18:00 <jekstrand> uh....

18:00 <jekstrand> Feel free to comment and ask for a better commit message

18:00 <alyssa> jekstrand: I'm not sure I want one 😉

18:01 <daniels> danvet: it's another boundary condition, really

18:01 <daniels> danvet: if you just do the sensible straight-line thing of having a single non-parented window which you send DRI comment to, you get the lovely fast path

18:01 <jekstrand> alyssa: It's probably just "Oops, we can't actually do that many threads"

18:02 <daniels> danvet: if you want to get weird and do XGetImage to ask the X server to tell you what you just sent it because you're amnesiac, or you want to do X11 native rendering, or you have X11 subwindows, then you get to eat the pain of the X server blocking on you

18:02 <alyssa> jekstrand: Yeah, I gathered that from 14013840143

18:03 <bnieuwenhuizen> jekstrand: also fun if you have chicken bits in registers whose name basically consists of the workaround number :P

18:03 xp4ns3 has quit [Quit: Konversation terminated!]

18:03 <daniels> jekstrand: ah yes, the classic off-by-twelve

18:05 xp4ns3 has joined #dri-devel

18:05 <mwk> well they overestimated the capacity of their shader geometry by 12, obviously

18:05 <mwk> that's what this commit is doing

18:06 <alyssa> jekstrand: Basically I'm trying to understand the rules for fragment inputs that are neither centroid nor sample

18:06 uzi has quit [Ping timeout: 480 seconds]

18:06 <alyssa> NIR is feeding them in as `pixel` but that seems to be a choice on our part, not a spec requirement? At least in ESSL?

18:07 <jekstrand> Uh....

18:07 <imirkin> alyssa: you're supposed to interpolate at the center

18:07 <imirkin> except for various cases

18:07 <jekstrand> ^^

18:08 <jekstrand> That's always the kicker, isn't it?

18:08 <imirkin> like MSAA you're supposed to interpolate at sample, no matter what

18:08 <jekstrand> Those "various cases" :P

18:08 <alyssa> imirkin: various cases indeed

18:08 <imirkin> except ... not no matter what?

18:08 <imirkin> i forget

18:08 <imirkin> i've long-ago paged out those rules

18:08 <alyssa> bleh

18:08 <imirkin> BUT

18:09 <imirkin> as long as you're not trying to CHANGE the rules

18:09 <alyssa> hw fast path is for "sample if per-sample shading, center otherwise"

18:09 <idr> imirkin: MSAA is still pixel center. That's why centroid was invented... because the pixel center might not be covered by any of the samples.

18:09 <imirkin> idr: right, i realized that as i remembered the extence of the 'sample' thing

18:09 <idr> MSAA w/o per-sample shading, anyway.

18:09 <alyssa> and I have no idea what that corresponds to in GLSL/NIR

18:09 <imirkin> and yeah, i should have said per-sample shading

18:09 <imirkin> idr: like what happens if you do per-sample shading but don't have any qualifiers? still center?

18:10 <imirkin> i forget :)

18:10 <alyssa> I think that's supposed to be sample

18:10 <idr> The great thing about standards. ;)

18:10 <imirkin> alyssa: i think it depends

18:10 <imirkin> but mesa normalizes all that for you

18:10 <imirkin> which is nice.

18:10 <alyssa> usually, yes

18:10 <imirkin> alyssa: for example, there's a rast->force_persample_interp

18:10 <imirkin> which is a rasterizer-level setting which forces you to interpret at sample rather than at center

18:10 <alyssa> now it's less nice because I need to unnormalize it for the opt >_>

18:10 <imirkin> now you could go around complaining about its existence

18:10 <imirkin> but the reality is that it exists :)

18:11 <imirkin> (actually there's a PIPE_CAP for it ... if you don't support it, you just get shader recompiles)

18:11 <alyssa> sure

18:11 <imirkin> so basically just do the interp that the shader tells you

18:12 <imirkin> and all will be well

18:12 <alyssa> unfortunately that never corresponds to the fast path unless I add is_sample_shading to the key

18:12 <alyssa> (i.e. `set_min_samples` triggering recompiles)

18:12 <imirkin> right

18:12 <imirkin> so the thing is

18:12 <jekstrand> So, I suspect that was invented for the "default" case

18:12 <imirkin> a shader may be used per-sample

18:12 <imirkin> and not-per-sample

18:12 <alyssa> yep

18:12 <imirkin> so if you want diff code for those cases

18:12 <imirkin> then a shader key feels like the only way (or binary fixup)

18:13 <imirkin> for nouveau, we do binary fixups :)

18:13 <imirkin> slight change to the interp op iirc

18:13 <jekstrand> If no one uses any sample qualifiers and there's no gl_SampleID, then you only one once per-fragment unless....

18:13 <alyssa> arm why couldn't you just given me an extra sample bit

18:13 <alyssa> jekstrand: unless set_min_samples(N>1)

18:13 <karolherbst> alyssa: you know what we do? we flip bits in the compiled shader :D

18:13 <jekstrand> someone smashes the shading rate thing (can't remember what it's called) in whic case, you run MSAA and everything's per-sample.

18:13 <karolherbst> ahh imirkin already mentioned it

18:13 <imirkin> =]

18:13 <alyssa> jekstrand: right.. but then the NIR code is still interp_pixel

18:13 <imirkin> karolherbst: almost as if i know something about that one...

18:14 <karolherbst> :D

18:14 <karolherbst> I had too much fun with that myself

18:14 <imirkin> karolherbst: it's a good one.

18:14 <alyssa> jekstrand: so if I follow the NIR, I have to do all interp at center. But would it also be ok to interp at the samples? I dunno!

18:14 <imirkin> alyssa: you think that's annoying?

18:14 <imirkin> try this -- gl_SampleMaskIn

18:14 <imirkin> most hardware provides a coverage mask

18:14 <imirkin> but you're supposed to give it a 1-bit mask when per-sample shading

18:14 <jekstrand> alyssa: Do you need to use this fancy opcode or can you just shader key it?

18:14 <imirkin> HOWEVER

18:14 <imirkin> a single shader may be used both per-sample and per-pixel

18:14 <imirkin> so ... yeah. fun times.

18:15 <karolherbst> it was a mistake to make the shader depend on runtime settings :p

18:15 <imirkin> even worse -- there's no hardware (on nvidia) way of determining which samples are actually going to be "used" for output if the sample rate is below the "msaa" rate

18:15 <imirkin> so you have to force min samples == total samples if gl_SampleMaskIn is used

18:15 <jekstrand> imirkin: It gets even better on Intel. On ICL, they "accidentally" anded the coverage mask with some stuff to make it even more useless. :D

18:15 <karolherbst> isn't stuff like that actually fixed with vulkan?

18:15 <alyssa> jekstrand: I can key it, I'm just hoping I can spec lawyer my way out of it

18:16 <karolherbst> or are there still those implicit recompiles?

18:16 <imirkin> jekstrand: lol

18:16 <imirkin> jekstrand: accidentally on purpose? :)

18:16 <jekstrand> I don't know.

18:16 <jekstrand> It was some fallout from when they added VRS/CPS

18:16 <alyssa> jekstrand: fwiw the fast path is not a fancy opcode, it's a mechanism to preload r0 with the results of a varying or texture prefetched lookup

18:16 <imirkin> jekstrand: does intel hw allow running fragment shaders at e.g. 2x msaa even though the surface is 4x msaa?

18:17 <imirkin> nvidia allows that

18:17 <imirkin> (why? no clue. but it's there.)

18:17 <jekstrand> imirkin: Not until they added VRS/CPS

18:17 <alyssa> The idea being, if the entire shader is `gl_FragColor = texture2D(tex, v_TexCoord)`, this eliminateshalf of the instructions

18:17 <jekstrand> AFAIK, Nvidia's the only ones that can do that

18:18 uzi has joined #dri-devel

18:20 buhman has joined #dri-devel

18:25 spstarr has joined #dri-devel

18:25 ngcortes has joined #dri-devel

18:27 <danvet> daniels, I can't grok your latest mail

18:27 <danvet> feels like you're mixing up a few too many things

18:27 <daniels> which bit?

18:27 <danvet> personally I think mixing up userspace memory fences with implicit sync is a very bad idea

18:28 <danvet> mostly because I don't want to think through the options

18:28 <daniels> I totally agree

18:28 <daniels> but I'm not talking about userspace fences here :)

18:28 <daniels> I'm talking about the import/export ioctl which is the subject of the actual patch

18:28 <danvet> well your 1. is that

18:28 <danvet> if you live in a dma-fence world, _every_ CS gets dma_fence, which we attach to _every_ buffer's dma_resv

18:29 <danvet> that's how it works

18:29 <danvet> if you want something else you very quickly end up in the UMF world

18:29 <danvet> the more we look at these at least the two models are 100% incompatible it's one or the other, no mixing

18:30 flibitijibibo has joined #dri-devel

18:30 <danvet> the only thing we can do is label them with "relevant for implicit sync" and "not relevant for implicit sync"

18:30 <daniels> so, I agree on no mixing, there is no way you can ever bridge the two worlds (sorry jekstrand)

18:31 <danvet> and I'm still not seeing where your example oversyncs

18:31 <daniels> hence the suggestion that rather than trying to generate a fence-alike for UMF, we just do purely userspace sync, and give the consumer a nice hammer to zap the producer's ctx if it doesn't deliver in time

18:31 <danvet> like even if you're extremely dumb about it and pass the explicit sync fence to both libva encoders

18:31 <danvet> and they both set it at import time

18:31 <danvet> the kernel sees that, realizes you're a bit silly and de-dupes it all

18:32 <daniels> the 1. about not dumping a fence for every CS into the resv in the non-UMF case was because I'd understood from prior discussion that there was a plan to do the amdgpu thing and skip resv for CS which is 'known' to not need to participate in implicit sync

18:32 <danvet> so you end up with a no-op ioctl 2nd time around

18:32 <danvet> or I'm confused

18:32 <daniels> ok, so 'to both libva encoders' ... but you then need to export the fence from the libva read CS, right

18:32 <danvet> ok with jekstrand current patch to import sync_file you get oversync issues with 2 libva encoders

18:32 <daniels> so you can synchronise further use against that

18:32 <danvet> but once it's fixed with my suggestion it should be fine

18:33 <danvet> yeah

18:33 <danvet> so if you want the buffer back in vk

18:33 <danvet> there's 2 ways

18:33 <danvet> one is yolo and broken

18:33 <daniels> if you don't clearly know whether your next (temporally, not in single-thread code flow) use is going to be implicit/explicit, you're going to need to do import/export at essentially every boundary

18:33 <danvet> the other is you refcount all the libva users, decrement until it hits zero

18:33 <danvet> and only _then_ grab the shared sync_file and give the buffer back to vk

18:34 <daniels> so I can very much see people being pessimistic and just constant import/export dumps, which end up serialising reads against each other

18:34 <daniels> yeah

18:34 <danvet> well for the 2 encoder use case if you're dumb about it

18:34 <danvet> then the first grab of sync_file without all the shared fences from the 2nd is just ... wrong

18:34 <daniels> so that refcount-then-export totally wfm as a Wayland compositor, but people doing media pipelines with like 57 threads are going to be srsly unhappy

18:34 <daniels> the first grab isn't wrong tho, because import is additive not replacement, right?

18:35 <daniels> so if you do race, then you end up with one submitting everything and the second submitting a no-op

18:35 <danvet> why do you import anything when you get it back from libva?

18:35 <danvet> I'm assuming here libva is implicit synced

18:35 <daniels> yeah

18:35 <danvet> so there's nothing to import here

18:35 <daniels> but say your pipeline is all explicit, because it contains explicit elements

18:35 <jenatali> If it helps you guys at all, Windows has both implicit sync (GDI) and explicit sync (D3D/VK), and we support mixing them. The only caveat is that when you try to submit something that's implicit sync, we'll make sure that it's not going to depend on any explicit sync that isn't guaranteed to be drainable

18:35 <danvet> or we're talking about wrong direction of import

18:36 <jenatali> If you try to submit not-guaranteed-drainable implicit sync work, we block/stall until we can detect that it's drainable

18:36 <daniels> so at every point where you enter implicit world, you bracket it with import (based on last-known fence) and export (based on what you just generated)

18:36 <danvet> jenatali, mostly we're having fun with the warts of our implicit sync model unfortunately

18:36 <danvet> I think the other pieces are fairly clear, if not yet typed up

18:36 <jenatali> Heh, "fun"

18:36 <daniels> jenatali: yeah, unfortunately we didn't have a clear cut between the two, and now we're stuck with them forever

18:36 <danvet> daniels, ok so import/export here from the dma-buf/kernel pov

18:37 <danvet> so why do you import a sync_file into the dma-buf after you get the dma-buf back from libva?

18:37 <daniels> jenatali: and Wayland API precludes client-side threaded/delayed submit

18:37 <danvet> for an encode session

18:37 <danvet> decode it makes sense

18:37 <daniels> jenatali: but luckily the kernel is magic and fixes everything for us \o/

18:37 <jenatali> daniels: Ah right, forgot about that

18:37 uzi has quit [Ping timeout: 480 seconds]

18:38 <bnieuwenhuizen> daniels: dumb question but would it make sense to adjust that in the wayland protocol instead of trying to do kernel heroics?

18:38 <daniels> bnieuwenhuizen: no

18:38 <daniels> that's the short answer :P

18:38 <danvet> imo "implicit sync as a very funny IPC" is ok wrt kernel heroics

18:38 <danvet> implicit sync hiding userspace memory fences isn't

18:39 <danvet> former is ok because most of the complexity we need to solve in the kernel anyway, for various reasons

18:39 <daniels> the long answer is that we'd either have to build an actual IPC/semaphore mechanism into Wayland itself (no), or that we'd have to have everyone who might do threaded submit do cross-thread callbacks, and make clients push their work into those, which if we weren't using C might be viable but ...

18:39 <bnieuwenhuizen> yeah mostly talking about the umf stuff

18:39 <daniels> shrug, we don't need kernel heroics for UMF

18:39 <daniels> the compositor deals with UMF up front

18:39 <daniels> it's the only way which is even a little bit viable, and we're perfectly OK to do it

18:40 radii has joined #dri-devel

18:41 <daniels> danvet: wrt your 'why do you import a sync_file back' - I think the only way to handle explicit sync in an arbitrary pipeline framework (let's call it GStreamer) is that you add explicit-sync awareness to the framework itself, so you can e.g. mix Vulkan and VA. so you do pretty much what DRM did with the BKL - if your element declares that it's explicit-aware, then you pass fences in and out of it, but if it's not, you bracket the

18:41 <daniels> accesses with import (from the last explicit fence you got from upstream) and export (to the next downstream)

18:41 <daniels> danvet: it seems, especially if you have multiple elements accessing the same BO simultaneously from multiple threads because they know it's safe to do so when they're not racing read vs. write, that most implementations of that would end up eating their own dogfood

18:42 <daniels> and that you'd export shared+excl, import that back into excl, and then you're totally serialised

18:42 <danvet> ok, I think you can make this work if you know slightly more about the expected access

18:43 <danvet> if your implicit synced pipeline element only reads, then

18:43 <danvet> - don't import any fence (because the read fences are all there already)

18:43 <danvet> (or at least import only shared fences)

18:43 <danvet> - take out _only_ the shared sync_file

18:44 <danvet> if it writes, then you need both

18:44 <daniels> yeah

18:44 <danvet> or something like that

18:44 <danvet> tbh my brain is a bit toast right now

18:44 <daniels> mine too :)

18:44 <bnieuwenhuizen> danvet: the read fences are not there because in the explicit case we may avoid the implicit sync fences altogether? so you can't avoid taking out the shared fences

18:44 <danvet> bnieuwenhuizen, right now you can't ever avoid the shared fences

18:44 <daniels> I definitely think there's a solution in there, but it's the balance of whether we inflict mutex death on userspace for the benefit (?) of not populating the shared slot in the kernel

18:45 <danvet> daniels, I think for full glorious future we do need the shared import

18:45 <danvet> the problem is just that with the current drivers, it's a no-op

18:46 <danvet> since the fence will be there already (except if you do something really dumb like adding arbitrary unrelated fences)

18:46 <daniels> yeah, I agree

18:46 <daniels> but I'd rather surface things to userspace and have them do it properly from the get-go

18:46 <danvet> which also means userspace wont use it, so if we later on do add the distinction between implicit sync relevant/not-relevant for fences in dma_resv

18:46 <danvet> we're screwed

18:46 <daniels> rather than having people rely on always exclusive

18:46 <daniels> meh, if we ship exclusive-only import now, then it'll take us 5 years to get userspace across to choosing the right thing between either exclusive or shared

18:47 <danvet> yeah, if we don't do this, then we're also screwed because we've locked down the semantics

18:47 <danvet> the other problem is: too many drivers which don't even opt-out of implicit sync (like amdgpu right now)

18:47 <danvet> so it's all fairly hopeless in reality anyway :-/

18:48 <daniels> so yeah, whilst it's useless now (with everyone always populating the resv on every access), letting us choose the right thing for import gives userspace the tools it needs to one day _not_ populate the resv on every access, which is something that a) we want (I think), and b) is going to be forced on us by UMF hardware models anyway

18:48 <daniels> heh

18:48 <daniels> well, at least we can put a plausible model together and give people compelling reasons as to why they should use it

18:48 <danvet> daniels, ok got my example wrong: the fence slot you pick for import/export depends upon what the previous/next pipelime element will do, not what the current one has done

18:48 * jekstrand will read backlog eventually. On the phone with the internet people

18:49 <danvet> I think

18:49 <danvet> e.g. if the next one only reads, you only need the explicit sync slot

18:49 <danvet> *exclusive

18:49 <daniels> jekstrand: good luck!

18:49 <danvet> why does both start with ex*

18:49 <daniels> danvet: uh?

18:49 <daniels> danvet: surely it's on your behaviour?

18:49 <daniels> if you write, you sync against both slots & populate exclusive

18:49 <daniels> if you read, you sync against excl slot & populate shared

18:50 <danvet> nah, the implicit pipeline element will set the right one for its own access

18:50 <danvet> but how you sync depends upon what the previous thing did

18:50 <danvet> daniels, the kernel does that for you for the implicit pipeline element

18:51 <daniels> right, but if you have a Vulkan read, then you use that to populate the shared slot, and then implicit will DTRT

18:51 <danvet> also if you do explicit sync with lots of parallel access

18:51 <danvet> you'll have to keep track of a pile of fences

18:51 <danvet> like for readers all previous relevant writers (if you do it parallel/tiled or whatever)

18:51 <danvet> and for writers all previous access for their area

18:51 <danvet> so juggling multiple fences and picking the right one should be ok

18:52 <daniels> ulimit -n 0xffffffff

18:52 <danvet> at least

18:52 <danvet> if your vulkan read is before the implicit element

18:52 <danvet> then doing nothing already takes care of everything

18:52 <danvet> since not even vulkan can avoid the shared slot here, because it's always set

18:52 <daniels> right, but then is there a future where it's not always set on CS?

18:53 <danvet> if the vulkan read is after the implicit sync, you export the read sync_file (which actually exports the exlusive slot)

18:53 uzi has joined #dri-devel

18:53 <daniels> and that it's only populated by userspace which takes care to fill resv (on explicit -> implicit transition) and pull from resv (on implicit -> explicit transition), but explicit CS doesn't need to populate resv itself?

18:53 <danvet> but otoh if your vk element writes, you need the read-write slot, which exports all the implicit fences into your sync_file

18:53 <danvet> the future where it's not set means UMF

18:54 <danvet> 99% sure on that statement

18:54 <daniels> heh ok, I'd figured there was a transitional world where people wanted to not populate resv if not necessary, but before UMF

18:54 <danvet> it's necessary

18:54 <danvet> if you don't you either a) pin all the memory (not so much appreciated in upstream)

18:54 <danvet> or b) have gpu page fault support, which _requires_ umf

18:55 <daniels> I get that it's necessary for UMF, but I thought it would be current world -> driver optimistically doesn't populate resv if userspace is smart but we still have normal fences -> full UMF world

18:55 <daniels> ah right

18:55 <danvet> also, umf _requires_ that you either have a) gpu page fault support or b) pin everything or c) only attach ctx preempt fences which are useless for sync

18:55 <daniels> so we need resv populated on every single CS no matter what, because relocation fences

18:55 <danvet> so goes both ways

18:56 <daniels> they need to sync against that so they can swap backing storage out, because until we have actual demand paging from GPUs it needs to be a stop-the-world stall event

18:56 <danvet> hm this little argument actually convinced me that shared import is useless

18:56 <danvet> since it can't ever happen

18:56 <daniels> s/they need to sync against that/relocation fences need to synchronise against every single prior access and preclude future access/

18:56 <danvet> ah no, we can mark the implicit shared fences up, but atm no one does that

18:57 <danvet> yup

18:57 <danvet> also memory management fences

18:57 <daniels> what are MM fences?

18:57 <danvet> since at least i915 has gpu relocations of bo addresses

18:58 <danvet> dma_fence that the kernel uses to track bo moves

18:58 <danvet> i.e. your relocation fence I think

18:58 <danvet> my relocation fence is used for this code I want to disable https://lore.kernel.org/dri-devel/20210526163730.3423181-1-daniel.vetter@ffwll.ch/T/#u

18:59 <daniels> right :)

18:59 <daniels> that actually explains quite a lot, because I'd previously thought reloc fences were Christian's MM/paging fences

19:00 <danvet> what's the paging fence?

19:00 <daniels> you want to unpin a BO

19:00 <daniels> or, well, you want to pin a bO

19:00 <daniels> either way, backing storage has changed

19:01 <danvet> in the kernel that one is called ttm_bo->moving right now

19:01 <danvet> mostly

19:01 <daniels> yep

19:01 <danvet> minus bugs in drivers

19:01 <daniels> and is a hard barrier so you can exchange the backing storage with no race

19:05 <danvet> daniels, jekstrand so assuming we'd not have shared import from the get-go

19:05 <danvet> could we have an upgrade path for later on

19:05 <danvet> ?

19:07 <danvet> hm I think we're hosed already

19:07 <danvet> currently vk says "everything explicit"

19:07 <danvet> but if you render on _any_ current driver

19:07 <danvet> export to dma-buf and then use it in libva

19:07 <danvet> it will work

19:07 mlankhorst has quit [Ping timeout: 480 seconds]

19:08 <danvet> rendering with libva and then reading from vk already needs explicit action from apps (currently poll() on the dma-buf or something like that)

19:08 uzi has quit [Ping timeout: 480 seconds]

19:11 * jekstrand gives up on backlog

19:11 <daniels> I think the answer is to rev the dmabuf vk exts and make them explicit-only

19:11 <daniels> jekstrand: welcome to the party!

19:11 <danvet> daniels, yeah

19:12 <danvet> or well more explicit

19:12 <jekstrand> danvet, daniels: To be clear, I don't expect mixing dma_fence with UMF. What I meant is that if we want any sort of "it'll be here soon" with a timeout, that needs to originate on the client side.

19:12 <jekstrand> Regardless of how it's implemented.

19:12 <danvet> for winsys where the vk winsys imports/exports the fence we could upgrade and make it even better

19:13 <daniels> jekstrand: the client has to originate a promise, for sure

19:13 <jekstrand> We may have a kernel object which gets created and sent to the compositor but if it's just a compositor trywait, it's tricky. We'd really like to VK_ERROR_DEVICE_LOST if a client promises to finish rendering in 100us and doesn't follow-through.

19:13 <jekstrand> But how that promise is communicated is an implementation detail.

19:13 <daniels> jekstrand: but the promise can be that the client hands the winsys a hammer to destroy the client ctx, and if the winsys is ever dismayed at the client, it can whack that hard

19:13 <jekstrand> daniels: Yup

19:13 <daniels> and then ... don't anger the winsys

19:13 <jekstrand> daniels: And that hammer could be killing its Wayland connection[

19:14 <jekstrand> One of the things we do have to think through, though, is a UMF-based driver on an implicit sync window-system. Once we convert ANV to UMF, it's going to be UMF all the time regardless of what it's running on.

19:15 <jekstrand> That doesn't mean importing a UMF into implicit sync. It may very well mean we wait on the UMF in a thread.

19:15 <jekstrand> Except for those wayland cases where that's not allowed in which case I guess vkQueuePresent stalls.

19:15 <jekstrand> The big thing I was trying (and maybe failed) to communicate with my e-mail is that I don't think timeline syncobj is useful for WSI.

19:16 <jekstrand> Maybe it is if you want to pass the object once and then just pass u64 serials rather than passing sync_file.

19:16 <jekstrand> But I don't think there's anything truly useful as a transition between sync_file and UMF.

19:16 <daniels> well, it's useful in the sense that it exists today, and UMF doesn't :P

19:16 <daniels> so it's something that we can build out and test against, and then the conversion is much closer to a sed job

19:17 <jekstrand> Sure

19:17 <jekstrand> If it helps with prototyping, go for it.

19:17 <daniels> yeah, just a crutch, not a long-term useful plan

19:17 <jekstrand> Ok, as long as we're clear on that. :)

19:17 uzi has joined #dri-devel

19:17 <jekstrand> In particular, don't design anything assuming that you have a "wait for a fence to materialize" ioctl.

19:17 <daniels> we're all agreeing with each other, in a very roundabout way

19:18 <daniels> yeah

19:18 <zmike> epic-handshake.jpg

19:18 ngcortes has quit [Remote host closed the connection]

19:18 <daniels> I still hold out hope that people are going to pull back from the brink and give us enough doorbell that epoll for fence materialising is a useful thing which can be implemented for efficiency improvement

19:18 <daniels> but meh

19:19 <jekstrand> Yeah, as I said in the mail, I think that can be done with some sort of scheduled vs. completed fence.

19:20 <jekstrand> But you still don't have real guarantees. It just lets you queue stuff up ahead of time a bit.

19:20 <jekstrand> Bad clients are still possible. If you get one, shoot it.

19:21 Toast has joined #dri-devel

19:22 Toast has quit []

19:22 <danvet> jekstrand, I don't think you can do an UMF-only vk right now

19:22 <danvet> start out in UMF, convert over when anyone asks for anything related to dma_fence

19:23 <jekstrand> danvet: We can't convert over either

19:23 <danvet> probably should share that code across drivers

19:23 <danvet> jekstrand, what's the hold up?

19:23 <jekstrand> re-creating objects on-the-fly

19:24 <jekstrand> I think the solution there, as I said, is that we wait in a thread.

19:24 <danvet> hm that will suck quite a bit I think

19:24 <jekstrand> Yup

19:24 <danvet> and the thing is, UMF is going to happen rsn now for i915

19:24 <danvet> for some value of soon

19:25 <jekstrand> Yup

19:25 <danvet> gen12+

19:25 <jekstrand> We can start off with an environment variable or something which puts the driver in UMF mode and gives you all the toys but doesn't advertise WSI.

19:25 <danvet> jekstrand, also you then can't have drm_syncob export/import anymore

19:25 <danvet> jekstrand, so which objects can't you convert?

19:26 <jekstrand> danvet: That's fine. drm_syncobj isn't exposed by Vulkan directly. sync_file is so we'll have to think about that a bit.

19:26 <danvet> you wont get it with UMF

19:26 <jekstrand> danvet: It's not that converting objects is impossible, it's that converting them on-the-fly is impossible.

19:26 <danvet> jekstrand, it would mean a spec breaking stall

19:27 <jekstrand> danvet: How am I supposed to do that if I don't know the dependency graph?

19:27 <danvet> seems less worse than picking an env variable and trying to set it right

19:27 <danvet> jekstrand, you stall everything

19:27 <danvet> because you just dont

19:27 <jekstrand> danvet: wait-before-signal

19:27 <danvet> uh, annoying

19:27 <jekstrand> danvet: They've got work in-flight depending on a UMF that they've not submitted work to signal yet. What am I supposed to do with that?

19:28 <danvet> hm I thought you must set at creation time whether you pick the "export to sync_file" option

19:28 <jekstrand> yes, that's a thing

19:29 <danvet> ok so you need a bit a fancier barrier, but this should work?

19:30 <danvet> like from that point on everything new is submitted/created with fence objects in the kernel

19:30 <danvet> for any in-flight vk semaphores and timelines you note whether it was an umf one or not

19:31 <danvet> for timeline this means you probably need to note the seqno of the first dma_fence you put in there

19:31 <danvet> if it's still from the umf world, push it off into the submit thread and wait there

19:31 <danvet> if you get one of these in the winsys do the same there too with submit thread

19:31 <danvet> so it's a rolling barrier

19:32 <danvet> why would this not work?

19:32 <danvet> (not saying it wouldn't be very nasty)

19:32 <jekstrand> It might be theoretically possible.

19:33 <jekstrand> We only have 7 VkSemaphore implementations in ANV, what's a half-dozen more?

19:33 <daniels> jekstrand: I think the answer is that you just fail exportable alloc unless the client also chains in the VK_EXT_i_promise_to_sync_explicitly_everywhere enable

19:33 <alyssa> jekstrand: that's the spirit

19:33 <danvet> jekstrand, I'm talking about i915 only here

19:33 <danvet> amd is stuck forever on amdkfd as their UMF thing, so forget porting vk

19:33 <danvet> the others aren't even close

19:34 <danvet> daniels, per-bo flag of how much you don't sync in anv

19:34 <danvet> we already deal with those more or less

19:35 <danvet> failing export of what previously worked isn't nice

19:35 <danvet> jekstrand, the thing is if we're not going to do the auto-upgrade to current mode for eventual UMF anv

19:35 <bnieuwenhuizen> danvet: what is the problem with amdkfd?

19:35 <danvet> I honestly dont see the point it trying to make dma_fence work better for implicit sync

19:35 <danvet> bnieuwenhuizen, separate world

19:35 <danvet> would be even more painful to cut over from amdkfd to amdgpu if you suddenly need to use dma_fence for sync

19:36 ngcortes has joined #dri-devel

19:36 <bnieuwenhuizen> oh that'd be painful, I'd hope we grow UMF support on amdgpu

19:37 <jekstrand> danvet: The patch series I sent today really does solve an actual perf issue.

19:37 <danvet> bnieuwenhuizen, I'm trying to convince agd5f and felix kuehling to figure it out

19:37 <jekstrand> s/today/last week/

19:37 <danvet> but it's a bit a case of "we've already planned the next 5 years"

19:37 uzi has quit [Remote host closed the connection]

19:37 <jekstrand> the dma-buf sync_file export one

19:37 <danvet> hm

19:38 <jekstrand> import, less so

19:38 <jekstrand> The benefits to import are pretty theoretical, IMO.

19:38 <danvet> jekstrand, well the current import is only slightly better than your current trick

19:38 <danvet> but if no one hits that issue then even the better import isn't going to help much

19:39 <danvet> jekstrand, no one does post processing of reading that frame again before they finish getting the present call out?

19:39 <danvet> s/post processing/prep for next frame/

19:39 <jekstrand> danvet: Once they've presented, they aren't allowed to read it.

19:39 <jekstrand> Well, there are ownership rules that only jamesjones understands, IIRC.

19:40 <danvet> yeah, the hit is only if they read before they've done the present call

19:40 <jekstrand> But I don't think you're supposed to touch it after vkQueuePresent

19:40 <jekstrand> danvet: Actually, the hit is if they start rendering something new before the present call. The dummy submit serializes with said new rendering because it's all on the same queue

19:40 sagar_ has quit [Quit: WeeChat 3.0.1]

19:40 <jekstrand> But, again, I don't think apps are doing too much of that

19:40 <danvet> ah right it's any rendering

19:41 <jekstrand> And if they are, they're going to get burned if they ever hit a prime blit anyway. Nothing we can do about that.

19:42 <daniels> jekstrand: if you love X11 so much, why don't you just solve the perf issue by never doing any syncing ever :P

19:42 <danvet> prime blt?

19:43 <bnieuwenhuizen> danvet: copy from device tiled texture to gtt linear texture

19:43 <bnieuwenhuizen> for when you do DRI_PRIME stuff

19:43 <danvet> yeah but why is that causing a burn?

19:43 <jekstrand> danvet: I don't love X11. It's like the drunk uncle that keeps coming to the family gatherings even though you've relocated 6 times and not told him and hoped he'd get the hint.

19:43 Charlie_Wang has joined #dri-devel

19:43 <bnieuwenhuizen> because that also happens on said queue and hence serializes with new rendering before present

19:43 <danvet> jekstrand, wrong dan

19:43 <danvet> bnieuwenhuizen, uh, that sounds like driver bug

19:44 <danvet> can prime use a separate sdma/blt ctx?

19:44 <danvet> jekstrand, ^^

19:44 <bnieuwenhuizen> we can certainly make it so, not sure what we do now

19:44 <danvet> this is kinda why we have copy engines to no end on modern gpu

19:44 <bnieuwenhuizen> danvet: I thought this entire serialization talk was because intel had only 1 queue

19:44 <bnieuwenhuizen> otherwise the dummy submit to get implicit sync going can be on a random queue?

19:45 <jekstrand> danvet: Prime could, yeah. We've just not wanted to complicate the code even more.

19:45 <danvet> bnieuwenhuizen, blt is separate

19:45 <danvet> and it can preempt

19:45 <danvet> so if you whack that copy job into blt on a separate gpu ctx

19:45 <danvet> then compositor does a flip

19:45 <danvet> we'll boost it and you get ahead of the queue

19:45 <danvet> plus/minus some details

19:45 <bnieuwenhuizen> well, a dummy submit does even less than a copy so presumably it can run on whatever blit queue you have

19:46 <danvet> bnieuwenhuizen, well that's essentially what the import ioctl does

19:46 <bnieuwenhuizen> yes

19:46 <danvet> "run" your fake job on an "engine" out of thin air

19:46 <daniels> jekstrand: I know, I'm just shitposting whilst making dinner, sorry

19:46 <jekstrand> :)

19:47 <danvet> jekstrand, so I now have "someone shot my puppy" vibes about your nope on the umf->dma_fence autoupgrade

19:47 <jekstrand> danvet: Well, you keep shooting my puppy. Turn about is fair play. :-P

19:48 <danvet> uh, my puppy was meant to be the savior for all of your puppies I shot ...

19:48 <alyssa> ...wha?

19:48 <danvet> the only other thing is some flag at vkDevice creation time

19:48 <danvet> alyssa, it's a mess, don't look

19:48 <alyssa> ok

19:49 <danvet> and expecting apps to set it correctly is about as likely as expecting users to set it correctly

19:49 <danvet> since app really can't know what your winsys wants

19:49 <danvet> or whether your libva can deal with umf or not

19:50 <danvet> I expect a lot of "we totally have enabled modifiers, expect not actually" vibes from this approach

19:50 <danvet> cool demo, useless product

19:50 <jekstrand> yeah

19:51 <danvet> like what do you do if e.g. something like blender uses one vkdevice for rendering with compute

19:51 <danvet> and another vkdevice for winsys display

19:51 <danvet> or something like that

19:51 <danvet> I expect a lot of "your compute j

19:51 <danvet> ob gets randomly killed by hangcheck"

19:52 <daniels> what I learned from modifiers is that if you don't make the transition between the worlds jarring and violent, you'll be lost in some kind of midpoint hell forever

19:52 <jekstrand> ugh

19:52 <danvet> daniels, we're on that

19:52 sagar_ has joined #dri-devel

19:52 <danvet> at least for i915

19:52 <danvet> non-modifier on gen12+ sucks because you don't even get X-tiled

19:53 <bnieuwenhuizen> anything that gets you an image is not violent enough

19:53 uzi has joined #dri-devel

19:53 <bnieuwenhuizen> a correct image*

19:53 <jekstrand> :D

19:54 <danvet> we could sample x-tiled by default

19:54 <danvet> gets the perf back to where it should

19:54 <bnieuwenhuizen> seriously, perf testing is hard and hence not frequently done

19:54 <danvet> and the jarring corruption :-)

19:54 <danvet> bnieuwenhuizen, oh we're better than that

19:54 <danvet> we just perf-test with modifiers enabled

19:54 <danvet> "look no problem"

19:55 <jekstrand> srly

19:55 <danvet> even better

19:55 <danvet> some internal jiras between arrogant/clueless with titles like "convince distros to enable modifiers by default"

19:55 <bnieuwenhuizen> I expect most testing on platform to be "full system testing gave us too much variance in results so we switch to a microbenchmark / specific test case that avoided all the modifier avoiding paths"

19:56 <danvet> as if we didn't disable this stuff in upstream compositors due to actual bug reports ...

19:56 <danvet> bnieuwenhuizen, ofc that too

19:56 <danvet> we're forever stuck trying to get a better cpu freq governor into upstream because all the testing is done with fixed freq below tdp

19:57 <danvet> and ofc the cpufreq people never test with any gpu workloads running on the same die, so don't hit the power sharing issues we have

19:57 <danvet> jekstrand, any can't we at least save some of these puppies?

19:57 <danvet> they're cute ...

19:57 flto has quit [Ping timeout: 480 seconds]

19:58 <alyssa> i like puppies

19:58 <karolherbst> danvet: don't get me started on freq stuff on intel :D

19:59 <karolherbst> alyssa: who doesn't?

19:59 <karolherbst> :p

20:01 <karolherbst> I actually want to have a fanless home server system here, but try to figure out what CPU isn't using more than twice the documented TDP in benchmarks and figure out which CPU performs well if the TDP is actually a hard cap and nothing you can ignore for a minute

20:01 <jekstrand> danvet: Uh... not sure

20:02 <jekstrand> danvet: As long as the Vulkan API gives us a point to do it, client-side wait wouldn't be the end of the world.

20:03 <danvet> you'd drop a bunch of sync file extensions

20:03 <jekstrand> Yeah....

20:03 <jekstrand> We might be able to make sync_file work, maybe

20:03 <danvet> which means interop with libva and everything would also mean threads + poll on dma-buf

20:03 <danvet> not for UMF

20:03 <jekstrand> At least well enough for Android

20:04 <bnieuwenhuizen> those sync file extensions have actual users though

20:04 * bnieuwenhuizen looks at Android

20:04 <jekstrand> Actually, for Android, we can stall and return -1 in right spot

20:04 <danvet> yeah that's only correct, not performant

20:04 <danvet> otoh intel and android SoC market ... lol

20:04 <bnieuwenhuizen> danvet: chromebooks

20:04 i-garrison has quit []

20:04 <danvet> oh right

20:05 <danvet> we might care about that

20:05 i-garrison has joined #dri-devel

20:05 <danvet> otoh for cros we could do a -Danv_umf_default=nope at build time

20:05 flto has joined #dri-devel

20:05 <jekstrand> :(

20:05 <danvet> which is kinda my point

20:06 <danvet> I don't think we can switch the default

20:06 <danvet> not even on desktop linux

20:06 <danvet> which means umf anv is a neat tech demo

20:06 <danvet> and given the canyon i915 is in, my appetite for neat tech demo is a bit low

20:07 <jekstrand> Never give up! Never surrender!

20:07 <danvet> next time cubanismo shows up at an xdc I need to have a chat with him about why exactly you create the winsys after the vkdevice

20:08 <danvet> or dont pass a list of winsys for this vkdevice

20:08 <bnieuwenhuizen> because memory gets allocated as part of a device?

20:08 <danvet> or something like that

20:09 <jekstrand> danvet: I don't want fundamentals of the driver changing based on winsys

20:09 <danvet> we kinda have to

20:09 uzi has quit [Ping timeout: 480 seconds]

20:09 <jekstrand> No, we need to fix the winsys, at least a little.

20:10 <danvet> that means UMF in sync_file

20:10 <danvet> or something like that

20:10 <jekstrand> yeah.....

20:11 <jekstrand> If it makes you feel better (it won't), NV is doing UMF in sync_file today. :D

20:11 <danvet> nv as in blob or nouveau.ko?

20:11 <danvet> also they can hack up whatever they want

20:12 <danvet> if it's nv

20:12 <jekstrand> blob

20:12 <danvet> yeah not my problem

20:12 <daniels> also arguably not even in the top 10 of weird things they do

20:12 <danvet> so the killer is, and we've shot this puppy before

20:12 <danvet> umf in sync_file breaks atomic kms

20:12 <danvet> daniels, that too

20:13 <danvet> jekstrand, the locking rule is that for atomic flip you only get either 100% umf or 100% dma_fence in your inputs

20:13 <danvet> because we also have an out sync_file

20:13 <danvet> which especially android likes to use

20:13 <jekstrand> danvet: Naturally. :)

20:13 <danvet> so you're back to "rev the entire protocols and winsys extensions"

20:14 <danvet> which is another one of these "possible in theory" things

20:15 <jekstrand> :-/

20:15 <danvet> the magic vk umf->dma_fence barrier is at least only localized

20:15 <danvet> so a theoretical approach with bounded time to roll out, given infinite amount of people

20:15 <jekstrand> Or we could YOLO sync_file on chromeos and go full UMF

20:15 <danvet> or something like that

20:15 <alyssa> ==22378== Invalid address alignment at address 0x18011829

20:15 <alyssa> ==22378== at 0x5CD365C: __aarch64_ldadd4_acq_rel (in /home/alyssa/lib/dri/libgallium_dri.so)

20:15 <alyssa> this raises so many questions

20:15 <danvet> who cares about hanging the kernel in inappropriate places

20:16 <karolherbst> alyssa: why? :D

20:16 ngcortes has quit [Ping timeout: 480 seconds]

20:16 <karolherbst> and which ones

20:18 <alyssa> karolherbst: "How the heck did I corrupt memory so bad I got an unaligned pointer in my BO ref count and yet valgrind says nothinge else"

20:19 <karolherbst> alyssa: try with libasan

20:19 <danvet> jekstrand, still feels like the vk umf->fence is the least impossible

20:19 <danvet> something like the protoctx you do

20:20 <danvet> except fastpath is just a few ordered loads and slow path takes the umf2fence_lock and rechecks

20:20 <danvet> on an object-by-object basis

20:20 <danvet> timeline waits would need to wait for both the fence to show up in the drm_syncobj

20:20 uzi has joined #dri-devel

20:20 <danvet> and the old umf to signal

20:21 jjardon has joined #dri-devel

20:21 <danvet> until the old umf context has finished

20:22 <danvet> at which point we set another flag to stop with all the umf spinning

20:22 <danvet> in waits

20:23 <danvet> after that you're stuck with an odered load and check in a bunch of places

20:23 <danvet> same if you never leave umf

20:24 xp4ns3 has quit []

20:25 ngcortes has joined #dri-devel

20:25 <danvet> aside from the funny transition state all the switches are the same as the env variable default thing

20:26 <danvet> and the busy spin in the submit thread during transition doesn't matter because even if the app renders the load splash before it sets up winsys

20:26 <danvet> the load splash really shouldn't take that long to render

20:28 <danvet> and I think aside from the submit thread spinny thing this should all be shareable code I think

20:33 Toast has joined #dri-devel

20:36 Toast has quit []

20:37 Toasty has joined #dri-devel

20:38 Toasty has quit []

20:39 <airlied> did I go back to bed and wake up to it's all screwed and no puppies in the future?

20:39 <alyssa> airlied: yes.

20:39 uzi has quit [Ping timeout: 480 seconds]

20:41 <airlied> danvet, jekstrand, daniels : might be nice to make that irc conversation of doom conclude in an email

20:41 Toasted has joined #dri-devel

20:41 <alyssa> and the oscar goes to

20:41 <alyssa> list _safe not actually being safe?

20:42 <imirkin> or just not safe in the way you expect?

20:42 <alyssa> this is C apologia

20:42 <alyssa> :p

20:43 <imirkin> if you say so

20:43 <imirkin> or maybe you just have different-from-everyone-else for what _safe does?

20:43 <imirkin> expectations*

20:43 <imirkin> iirc safe means you can remove the node from the currently-being-processed list without screwing up iteration

20:44 <alyssa> but it doesn't mean you can insert safely at the node point apparently

20:44 <imirkin> yeah, definitely not

20:44 <alyssa> ugh.

20:45 <daniels> airlied: yeah it will

20:45 <danvet> airlied, I'm not sure we've concluded on much yet

20:46 * alyssa wonders how this works in NIR

20:46 <daniels> danvet: I feel like we're at least circling the drain

20:46 <airlied> danvet: sounds like you were tending towards it's all screwed, and retiring

20:46 <danvet> most of the rehashed dead puppies we did document as part of indefinite fencing

20:46 <jani> alyssa: imirkin: _safe merely holds the current node in a temporary variable, that's all there is to it

20:46 <daniels> danvet: does Android still depend on swsync?

20:46 <danvet> daniels, we're definitely circling

20:47 <danvet> daniels, I think only on shit drivers

20:47 <alyssa> jani: RiiR.jpg

20:47 <daniels> danvet: is redefining 'shit' an option

20:47 <danvet> howIlearnedtolovethebomb.mkv you mean

20:47 <imirkin> jani: right

20:48 <imirkin> danvet: to stop worrying and love the bomb...

20:48 <alyssa> so long, mom

20:49 uzi has joined #dri-devel

20:49 Toasted has left #dri-devel [#dri-devel]

20:51 * jani reads list.h and lols at a _careful variant

20:52 <alyssa> _safe, _safer, _safest

20:52 <imirkin> tell that list to safen up!

20:52 <danvet> jani, yeah llist.h is absolute glorious in that regard

20:53 <danvet> to the point where I just don't trust it

20:54 <jani> danvet: pretty low rusty score

20:54 <alyssa> ok now I know I'm being bullshitted

20:54 <alyssa> https://rosenzweig.io/bs.txt

20:54 <jani> hah, that's not even a rust lang reference

20:54 <alyssa> The assert fails.

20:55 <tango_> 33

20:55 <tango_> ehm sorry

20:55 <alyssa> bi_foreach_instr_global_safe corrupting random memory

20:55 <alyssa> what more could I want

20:55 <zmike> when are you seeing this?

20:55 <alyssa> I should've rewritten this compiler in Rust when I had the chance

20:55 <imirkin> alyssa: i think you have everything you need :)

20:56 tlwoerner has quit [Quit: Leaving]

20:56 <icecream95> glHint(GL_LIST_HINT, GL_NICEST)

20:56 <alyssa> imirkin: WHEN IT SEEMS THAT WE HAVE LOST OUR WAY

20:56 jekstrand has quit [Read error: Connection reset by peer]

20:57 <alyssa> oh ffs()

20:57 <alyssa> not list.h's fault

20:58 <imirkin> big surprise.

20:58 <imirkin> the helpers everyone uses *aren't* broken

20:58 <alyssa> foreach_global is defined as

20:58 <alyssa> foreach_block() foreach_instr()

20:58 <alyssa> i.e. nested loops in terms of list.h's foreach

20:58 <alyssa> which means a break actually only breaks out of the inner loop, i.e. the current block

20:58 <alyssa> but keeps iterating instructions in the next block

20:59 <imirkin> this is why you should always use goto :p

20:59 <imirkin> less confusing.

20:59 <alyssa> srsly

20:59 <alyssa> now I could've sworn I'd cargo culted this pattern from somewhere...

20:59 <danvet> airlied, https://paste.debian.net/1198921/ maybe not entirely giving up just yet

20:59 <imirkin> the things they don't teach you in CS classes :)

20:59 <danvet> first half of that on dri-devel already

21:00 <cmarcelo> venemo: Kayden: how do you feel about "info.workgroup_size" instead of "info.local_workgroup_size"?

21:00 <alyssa> brw_cfg.h is probably the source of the cargo culting, with the helpful comment that _didn't_ get cargo culted:

21:00 <alyssa> /* Note that this is implemented with a double for loop -- break will * break from the inner loop only!

21:01 <imirkin> [btw, i hope you realize many of the things i say should be taken with a grain of salt... don't actually use goto a lot. sometimes useful.]

21:02 <alyssa> neither v3d nor ir3 have this pattern, that's good

21:03 <danvet> alyssa, iirc there's a very clever trick to compose for loops and still break correctly

21:04 Charlie_Wang has quit []

21:04 <idr> danvet: Orly?

21:04 <danvet> lemma check

21:05 <danvet> the one for if() in the macro vs. else blocks is fairly simple

21:05 <imirkin> in some languages (Java, Go), you can have named breaks

21:05 <alyssa> danvet: we could always rewrite IBC in Rust

21:06 <icecream95> break in bash supports breaking out of multiple levels

21:07 <danvet> idr, https://www.chiark.greenend.org.uk/~sgtatham/mp/

21:08 Viciouss has quit [Quit: The Lounge - https://thelounge.chat]

21:08 Viciouss has joined #dri-devel

21:08 <bnieuwenhuizen> daniels: wrt swsync I think for ChromeOS the answer is still yes ...

21:09 <daniels> job done *dusts hands off*

21:10 <imirkin> danvet: heh, but it doesn't let you use "break" directly. you have to use a MACRO_BREAK type of thing.

21:10 uzi has quit [Ping timeout: 480 seconds]

21:11 <imirkin> (not that that's particularly surprising...)

21:11 jekstrand has joined #dri-devel

21:12 <jenatali> Just needs a break_foreach_global which is implemented as break; break; :)

21:12 <danvet> imirkin, that's for your macro

21:12 <danvet> not in the actual code

21:12 <imirkin> danvet: oh, hm

21:12 <imirkin> i guess i didn't properly grok it in my scan

21:12 uzi has joined #dri-devel

21:13 <danvet> some of the macros need a manual break-rethrow ladder around them

21:13 <danvet> afair

21:13 <danvet> definitely not in the state of mind to understand this right now

21:13 <danvet> iirc the trick is to nest 2 loops and jump over the outermost one

21:13 <danvet> so that you know if you're in that loop, that was a result of a break

21:13 <danvet> and you can then jump to another place which is again dead code

21:14 <alyssa> danvet: I don't think I can un-read that webpage. Thank you.

21:14 <danvet> alyssa, it's horrible

21:14 <idr> I think this falls in the category of "We used cpp to re-invent C++ poorly."

21:15 <imirkin> are there any plans to repoint cbrill's dri-logger to OFTC?

21:15 <danvet> nah you can't do this in c++

21:15 <dcbaker> idr: which is itself a feat

21:15 <idr> You'd just use iterators, and you wouldn't need / want to do this.

21:15 <danvet> well yeah but where's the fun in that

21:16 <idr> Fair point.

21:16 <alyssa> Are C++'s iterators good?

21:16 <danvet> also the point is that it composes with control flow

21:16 <idr> I used to have a copy of the (long out of print) book "Obfuscated C and Other Mysteries."

21:16 <idr> It's pretty amazing.

21:16 <danvet> with the usual "you'll regret it later" C limitation

21:16 LaughingMan[m] has joined #dri-devel

21:17 <dcbaker> alyssa: they are once you get to c++11 and have range loops

21:17 thellstrom has quit [Remote host closed the connection]

21:17 <dcbaker> for (auto const & x : mycontainer) { ... };

21:17 <alyssa> dcbaker: Ah

21:18 <alyssa> Aesthetically Rust looks better :p

21:18 <dcbaker> sure, but Rust doesn't have 40+ years of baggage yet :)

21:18 <alyssa> dcbaker: True, but at the rate they're going they will in just a few years!

21:18 <alyssa> It's a high velocity language.

21:19 <jekstrand> dcbaker: No, but it's got cargo/crates so it can build up baggage much faster!

21:19 <danvet> lim_{XX->\inf} c++XX = rust or so

21:19 <danvet> we'll get there

21:19 <idr> It's good to have goals.

21:19 thellstrom has joined #dri-devel

21:19 <alyssa> Cyclone++

21:20 <danvet> unfortunately C++ isn't complete

21:20 <alyssa> i regret taking enough years of math to get that joke

21:20 <danvet> or maybe cauchy, for the joke to actually connect

21:21 * danvet should perhaps sleep

21:21 <alyssa> ....

21:21 <jekstrand> :D

21:21 <urja> i've watched enough youtube math stuff to know that nothing is complete

21:21 <alyssa> urja: R is

21:21 <dcbaker> jekstrand: I was working on some nodejs stuff over the weekened. I am every more terrified of Cargo/crates than I was before :/

21:21 <jekstrand> dcbaker: I know, right?

21:22 <danvet> my takeaway from math is pretty much 1. invent some funny new operator

21:22 <jekstrand> dcbaker: I built deqp-runner yesterday. It filled two pages in my terminal with crates it was pulling in.

21:22 <danvet> 2. realize it's not complete

21:22 <danvet> 3. spend next 300 years making it complete somehow

21:22 <dcbaker> jekstrand: I was trying to upgrade packages, and discovered that inevitably they either:

21:22 <dcbaker> 1. use X.Y.Z versions, but not semver semantics

21:23 <dcbaker> 2. dont' attempt to maintain API at all

21:23 <dcbaker> 3. pull in hosts of specific versions of dependencies wheich sometimes means you *can't* avoid having a dependency with a critical flaw

21:23 <imirkin> alyssa: enter the super-reals and hyper-reals...

21:23 <ccr> 2, is the "new cool language" problem, only old farts care about APIs and stability

21:24 <dcbaker> right, that's a "feature"

21:24 <dcbaker> I forgot

21:24 <dcbaker> I guess I should go shave before my neck-stubble becomes a neck-beard :)

21:25 valentind has joined #dri-devel

21:26 tlwoerner has joined #dri-devel

21:32 <imirkin> anyone know offhand what "iadd!" means in nir?

21:32 <imirkin> (and how it differs from "iadd")

21:32 <pendingchaos> it means it's exact

21:33 <pendingchaos> for integer operations, it doesn't mean anything

21:33 <imirkin> cool

21:33 <imirkin> thanks

21:33 <alyssa> pendingchaos: does that imply it's closed?

21:34 <jekstrand> iadd! is a bit weird

21:34 <pendingchaos> closed?

21:34 <alyssa> pendingchaos: oh i thought we were still making math puns

21:35 <idr> imirkin: Any idea how it got like that?

21:35 ngcortes has quit [Ping timeout: 480 seconds]

21:35 <imirkin> idr: yea, i printed the nir?

21:35 <pendingchaos> probably just nir_propagate_invariant() or something not caring?

21:35 <imirkin> or you mean who added the "!" to the print? that i don't know

21:35 <idr> I mean... how it got the exact bit set in the instruction.

21:35 <imirkin> no clue.

21:35 <idr> Maybe the incoming source decorated it as precise?

21:35 danvet has quit [Ping timeout: 480 seconds]

21:35 <imirkin> it's some simple shader.

21:36 <idr> Hm...

21:36 <pendingchaos> it doesn't hurt to set it, and avoiding doing so requires work

21:36 <imirkin> idr: https://paste.debian.net/1198932/

21:36 <alyssa> imirkin: maybe it means NIR is really excited about adding integers

21:37 <imirkin> alyssa: that's what i assumed

21:37 <imirkin> ADD HARDER!

21:37 <alyssa> iadd!

21:37 <imirkin> youadd?

21:37 <imirkin> oh, failed opportunity...

21:37 <imirkin> uadd?

21:37 <idr> It just learned how, so it's very excited. "I add!"

21:37 <idr> weadd

21:37 <pendingchaos> imirkin: the lowered code created by nir_lower_idiv() marks all instructions as exact

21:37 <pendingchaos> the .length() probably creates divisions

21:38 <imirkin> pendingchaos: yeah, it does

21:38 <imirkin> it does (buf size - immediate) / struct size

21:40 <alyssa> iadd! uadd! we all add 4i ... add!

21:41 <idr> alyssa wins. :)

21:41 <ccr> oompa-loompa

21:41 <alyssa> why you'd w

21:41 <jljusten> uaddbro?

21:41 <alyssa> ant to add 4i, idk

21:41 uzi has quit [Ping timeout: 480 seconds]

21:45 bcarvalho__ has joined #dri-devel

21:48 uzi has joined #dri-devel

21:51 bcarvalho_ has quit [Ping timeout: 480 seconds]

21:57 pnowack has quit [Quit: pnowack]

21:58 aaguilar has joined #dri-devel

22:01 aaguilar has quit []

22:02 mbrost has joined #dri-devel

22:02 aaguilar has joined #dri-devel

22:03 aaguilar has quit [Remote host closed the connection]

22:03 aaguilar has joined #dri-devel

22:04 aaguilar has quit []

22:12 uzi_ has joined #dri-devel

22:12 aaguilar has joined #dri-devel

22:12 uzi has quit [Ping timeout: 480 seconds]

22:15 aaguilar has quit []

22:18 ngcortes has joined #dri-devel

22:18 aaguilar has joined #dri-devel

22:23 aaguilar has quit []

22:23 cyrozap has joined #dri-devel

22:24 aaguilar has joined #dri-devel

22:24 aaguilar has quit []

22:25 Anorelsan has joined #dri-devel

22:25 <cmarcelo> NIR poll: how do people feel about consolidating NIR into the name workgroup_size (among the names in the code base: "group_size", "local_size", "local_group_size")? compiler/glsl would not change, retaining the GLSL relevant names.

22:26 <zmike> 👍

22:26 Anorelsan has quit []

22:28 Anorelsan has joined #dri-devel

22:29 <alyssa> 👍

22:29 <bnieuwenhuizen> +

22:30 <DrNick> what if instead you added nvidia's and Direct3D's nomenclature?

22:30 <alyssa> assuming those really are wquivalent

22:30 <alyssa> DrNick: Metal too

22:30 <alyssa> threadgroup size

22:31 <DrNick> warp and weft is pretty good you have to admit

22:31 <alyssa> ....weft?

22:31 <DrNick> the weft is perpendicular to the warp

22:31 <alyssa> I feel like I'm missing a pun here

22:32 <alyssa> oh. weaving, ok.

22:32 <DrNick> yeah, nivida used fabric names for their thread things

22:32 <DrNick> idk if they actually used weft

22:32 <DrNick> they definitely should have, though

22:33 <pcercuei> threads, fabric... that makes sense

22:33 <DrNick> call ARB_shader_ballot and similar weft operations

22:33 <jekstrand> cmarcelo: +1

22:33 <DrNick> because they operate horizontally across warps

22:37 <anholt> cmarcelo: +1

22:43 uzi_ has quit [Ping timeout: 480 seconds]

22:44 Anorelsan has quit [Quit: Leaving]

22:44 * alyssa stares at regalloc

22:48 marex has joined #dri-devel

22:49 <zmike> how is it possible for freedreno ci to flake this many times

22:49 <zmike> 😠

22:51 <airlied> now imagine that is in products shipping :-P

22:52 <zmike> freedreno ci is shipping in products?

22:52 <zmike> 😓

22:52 <zmike> that's it I need a vacation

22:53 <alyssa> airlied: the flakes are a5xx ime

22:53 <alyssa> a6xx is what's shipping

22:53 <airlied> ah maybe a6xx has better context separation

22:53 <anholt> we have per process pagetables on 6xx

22:53 <anholt> which, when you have things scribbling in piglit, matters a bunch.\

22:54 uzi has joined #dri-devel

22:56 <anholt> segfault from rsync? wat? https://gitlab.freedesktop.org/mesa/mesa/-/jobs/10149162

22:58 <airlied> bad ram usually

22:58 <anholt> there was a batch of them in that pipeline

23:01 Anorelsan has joined #dri-devel

23:01 Anorelsan has quit [Quit: Leaving]

23:02 Anorelsan has joined #dri-devel

23:03 <marex> a2xx works perfect for me

23:04 Anorelsan has quit []

23:04 Anorelsan has joined #dri-devel

23:08 Anorelsan has quit []

23:08 Anorelsan has joined #dri-devel

23:09 Anorelsan has quit []

23:09 Anorelsan has joined #dri-devel

23:10 Anorelsan has quit []

23:10 Anorelsan has joined #dri-devel

23:11 Anorelsan has quit []

23:11 <idr> anholt: Was that in my MR?

23:11 aaguilar has joined #dri-devel

23:11 aaguilar has quit []

23:11 <anholt> idr: nope

23:11 <idr> (I noticed that you sent it back to marge.)

23:11 aaguilar has joined #dri-devel

23:12 aaguilar has quit []

23:13 Anorelsan has joined #dri-devel

23:13 uzi has quit [Ping timeout: 480 seconds]

23:14 Anorelsan has quit []

23:15 Anorelsan has joined #dri-devel

23:17 Anorelsan has quit []

23:22 uzi has joined #dri-devel

23:22 pcercuei has quit [Quit: dodo]

23:42 mbrost has quit [Remote host closed the connection]

23:44 uzi has quit [Ping timeout: 480 seconds]

23:51 spstarr has quit []

23:51 uzi has joined #dri-devel

23:55 jekstrand has quit [Ping timeout: 480 seconds]