#dri-devel on 2022-04-07 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:57 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:01 nchery is now known as Guest1364

00:01 nchery has joined #dri-devel

00:02 <airlied> jekstrand: are cl events endless fences?

00:02 iive has quit []

00:03 <anholt> jasuarez: thanks!

00:05 <mattst88> Kayden, Hello71: no I don't remember :(

00:05 <mattst88> IIRC curro added that -- something about supporting multiple generations

00:05 <karolherbst> airlied: what to you mean by endless fences?

00:06 <karolherbst> if you have to wait forever? yes

00:07 Haaninjo has quit [Quit: Ex-Chat]

00:07 <airlied> karolherbst: the infinite fence problem

00:07 <karolherbst> airlied: forget fences, that's the smallest issue

00:07 <karolherbst> the API can deadlock by design

00:07 <airlied> https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#indefinite-dma-fences that sort of thing

00:08 <airlied> well it's more if you can take out the system if you happen to do it at the wrong time

00:08 <karolherbst> ahh yeah

00:08 Guest1364 has quit [Ping timeout: 480 seconds]

00:08 <karolherbst> airlied: I am not waiting forever though

00:08 <karolherbst> ehh, on the fence I do

00:08 <karolherbst> anyway, yes, there is no timeout

00:13 Surkow|laptop has quit [Ping timeout: 480 seconds]

00:24 Surkow|laptop has joined #dri-devel

00:33 apinheiro has quit [Ping timeout: 480 seconds]

00:53 anujp has quit [Ping timeout: 480 seconds]

00:56 HankB_ has quit [Remote host closed the connection]

00:57 HankB_ has joined #dri-devel

01:00 <Kayden> mattst88: Yeah, I think that was it...every thread could have its own generation-specific table. I can't really figure out why anybody would care about that feature though

01:01 co1umbarius has joined #dri-devel

01:02 columbarius has quit [Ping timeout: 480 seconds]

01:03 <Kayden> hmm, he was adding support for mapping hw opcode <-> IR opcode enum both ways

01:03 <Kayden> since some opcodes started aliasing, where the same number meant one thing on one gen and another thing on a different one

01:04 <Kayden> IIRC we suggested putting the tables in brw_compiler at the time but we might have had to plumb it through some places so he just went with TLS tricks

01:05 <Kayden> I think it's pretty clearly just "clever" and not really necessary

01:05 cheako has quit [Quit: Connection closed for inactivity]

01:07 <zmike> mareko: pointsize should be done now when you get a min

01:07 <airlied> Kayden: yeah that's pointlessly clever

01:11 mbrost has quit [Ping timeout: 480 seconds]

01:30 mbrost has joined #dri-devel

02:02 HankB_ has quit [Remote host closed the connection]

02:02 HankB_ has joined #dri-devel

02:16 DanaG has joined #dri-devel

02:33 ppascher has quit [Ping timeout: 480 seconds]

02:44 eukara_ has quit []

02:45 ppascher has joined #dri-devel

02:47 <mattst88> Kayden: oh yeah, that sounds familiar. I think it was about possibly loading the driver for 2 distinct GPU gens (since discrete is a thing) in the same process (???)

02:52 <Kayden> that's also possible

02:53 <Kayden> but very speculative, since said drivers have their own screens and own brw_compilers and there are a million other things that don't do this

02:53 <Kayden> I haven't heard yet why it's an actual problem

02:53 <Kayden> but it does seem like something we could change if we wanted

02:54 <airlied> yeah seems like something you could just store in brw_compiler as a pointer

03:09 aravind has joined #dri-devel

03:24 ngcortes has quit [Remote host closed the connection]

03:30 LexSfX has quit [Ping timeout: 480 seconds]

03:30 LexSfX has joined #dri-devel

03:32 shankaru has joined #dri-devel

04:15 DanaG has quit [Remote host closed the connection]

04:15 DanaG has joined #dri-devel

04:34 natto has quit [Remote host closed the connection]

04:37 natto has joined #dri-devel

04:42 Duke`` has joined #dri-devel

04:42 mclasen has quit [Read error: Connection reset by peer]

04:42 mbrost_ has joined #dri-devel

04:42 mclasen has joined #dri-devel

04:48 mbrost has quit [Ping timeout: 480 seconds]

04:49 agd5f_ has joined #dri-devel

04:53 mhenning has quit [Quit: mhenning]

04:55 agd5f has quit [Ping timeout: 480 seconds]

05:00 sdutt_ has joined #dri-devel

05:00 sdutt has quit [Remote host closed the connection]

05:01 danvet has joined #dri-devel

05:09 mbrost_ has quit []

05:31 jewins has quit [Ping timeout: 480 seconds]

05:43 Duke`` has quit [Ping timeout: 480 seconds]

05:56 mbrost has joined #dri-devel

06:00 maxzor has joined #dri-devel

06:01 heat has quit [Ping timeout: 480 seconds]

06:02 itoral has joined #dri-devel

06:03 ybogdano has quit [Read error: Connection reset by peer]

06:05 DanaG has quit [Remote host closed the connection]

06:10 DanaG has joined #dri-devel

06:30 frieder has joined #dri-devel

06:30 macromorgan is now known as Guest1380

06:30 macromorgan has joined #dri-devel

06:31 mszyprow has joined #dri-devel

06:32 paulk1 has quit [Ping timeout: 480 seconds]

06:32 Guest1380 has quit [Read error: Connection reset by peer]

06:37 jkrzyszt_ has joined #dri-devel

06:46 ahajda_ has joined #dri-devel

06:51 MajorBiscuit has joined #dri-devel

06:53 Major_Biscuit has joined #dri-devel

06:57 lemonzest has joined #dri-devel

07:00 MajorBiscuit has quit [Ping timeout: 480 seconds]

07:01 paulk1 has joined #dri-devel

07:02 tursulin has joined #dri-devel

07:04 ppascher has quit [Quit: Gateway shutdown]

07:06 tobiasjakobi has joined #dri-devel

07:18 tarceri has quit []

07:25 camus has quit []

07:26 camus has joined #dri-devel

07:30 tarceri has joined #dri-devel

07:41 h0tc0d3 has joined #dri-devel

07:41 mbrost has quit [Read error: Connection reset by peer]

07:59 lynxeye has joined #dri-devel

08:06 ella-0 has joined #dri-devel

08:06 nchery has quit [Read error: Connection reset by peer]

08:09 ella-0_ has quit [Read error: Connection reset by peer]

08:17 hansg has joined #dri-devel

08:50 pcercuei has joined #dri-devel

08:56 maxzor has quit [Remote host closed the connection]

08:59 toolchains has joined #dri-devel

09:10 asocialblade has quit [Ping timeout: 480 seconds]

09:12 <danvet> javierm, something went wrong, the revert patch is empty for me?

09:12 <danvet> in your resend pile

09:12 <danvet> but also since that patch is authored by me your sob counts as implied review anyway :-)

09:18 apinheiro has joined #dri-devel

09:18 <javierm> danvet: yeah, I'll add my Reviewed-by in the next version. Since I did in fact review it :)

09:18 <javierm> danvet: unsure why it was empty... I got it correctly on my inbox

09:18 <javierm> but I saw the same issue before, Chen-Yu's patches were all empty on my RH inbox but could read it from my personal inbox

09:19 <javierm> the joys of email based workflow

09:19 <danvet> hm lore has it with contents

09:19 <danvet> no idea what's happened

09:19 <javierm> danvet: but patchwork doesn't...

09:19 <danvet> wtf

09:20 <javierm> danvet: the archives also don't https://lists.freedesktop.org/archives/dri-devel/2022-April/349902.html

09:20 <javierm> dunno, it's weird

09:20 <danvet> hm maybe fdo misdelivered and lore picked up the one it got from lkml?

09:20 <javierm> danvet: ah, that could be indeed

09:21 <javierm> danvet: ah, btw. Checkpatch complains about your revert patch

09:21 <javierm> WARNING: From:/Signed-off-by: email address mismatch: 'From: Daniel Vetter <daniel.vetter@ffwll.ch>' != 'Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>'

09:21 <danvet> ah yes I'm evil

09:21 <javierm> do you mind if I fix that in the next revision?

09:21 <danvet> yeah you can

09:21 <javierm> or it's OK to keep it as is ?

09:21 <danvet> I just sob them with both

09:21 <javierm> ah, Ok will do that

09:21 <danvet> i.e. pls keep the intel one, they have the copyright on this stuff :-)

09:22 <javierm> danvet: sure, will add both

09:23 <javierm> danvet: btw, if there's a v2 patch series that I reviewed already and nobody else chimed in for v1, how long should I wait to push it to drm-misc-next ?

09:23 <javierm> it's https://lists.freedesktop.org/archives/dri-devel/2022-April/349872.html

09:24 <javierm> I would like to land ASAP so I could rebase my ssd130x SPI series on top

09:25 Haaninjo has joined #dri-devel

09:28 <danvet> javierm, oh for drivers I wouldn't wait that long really

09:28 <danvet> especially when it's all reviewed

09:29 <danvet> the 1-2 weeks is more a rule of thumb for shared code or where you expect others might want to chime in

09:32 toolchains has quit [Remote host closed the connection]

09:32 toolchains has joined #dri-devel

09:32 <javierm> danvet: ah, Ok. I'll wait a day or two then and land it

09:33 <danvet> javierm, well if v1 is a few days ago already imo just go ahead and land v2

09:33 <javierm> danvet: roger that

09:34 sdutt_ has quit [Ping timeout: 480 seconds]

09:35 hansg has quit [Quit: Leaving]

09:46 <ifreund> I can reliably reproduce an amdgpu hang/crash during a specific boss fight in elden ring played through steam's proton. I'm assuming there's a buggy shader used by the game there, but I'd like it if amdgpu recovered properly. I see several issues on the mesa tracker that seem similar but no real resolution.

09:47 <ifreund> since I seem to have 100% reliable reproduction, is the any information I can collect to make debugging/fixing this easier?

09:48 <ifreund> already have a dmesg log: https://paste.sr.ht/~ifreund/b56a1609c093d34a632940ae104f706b87d77799

09:52 <javierm> ifreund: '[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!' seems to be the issue, the question is why timed out

09:52 <javierm> ifreund: echo 0xff > /sys/module/drm/parameters/debug will give you more debug output

09:53 toolchains has quit []

09:53 toolchains has joined #dri-devel

09:54 <ifreund> thanks, will get new logs with that

10:03 shankaru has quit [Quit: Leaving.]

10:09 <ifreund> well, enabling debug output seems to have made the bug not reproduce, I love race conditions

10:09 <ifreund> at least I can carry on with the game now I guess :/

10:12 <javierm> ifreund: fun

10:12 flacks has quit [Quit: Quitter]

10:12 HankB_ has quit [Remote host closed the connection]

10:13 HankB_ has joined #dri-devel

10:14 flacks has joined #dri-devel

10:18 toolchains has quit [Remote host closed the connection]

10:19 toolchains has joined #dri-devel

10:22 devilhorns has joined #dri-devel

10:27 toolchains has quit [Ping timeout: 480 seconds]

10:27 HankB_ has quit [Remote host closed the connection]

10:28 HankB_ has joined #dri-devel

10:29 toolchains has joined #dri-devel

10:37 toolchains has quit [Ping timeout: 480 seconds]

10:38 <emersion> ifreund: timeouts are sometimes due to driver bugs, sometimes to sync bugs in the vulkan client, afaik

10:38 toolchains has joined #dri-devel

10:39 <emersion> ifreund: have you tried with mesa-git?

10:39 <emersion> (driver bugs → user-space driver bugs i mean)

10:41 tobiasjakobi has quit []

10:41 icecream95 has quit [Ping timeout: 480 seconds]

10:53 shankaru has joined #dri-devel

11:21 <ifreund> emersion: no, i was using 21.3.7. If it happens again I will investigate further

11:32 toolchains has quit [Remote host closed the connection]

11:33 toolchains has joined #dri-devel

12:05 Company has joined #dri-devel

12:21 toolchains has quit [Remote host closed the connection]

12:29 caef^ has quit [Remote host closed the connection]

12:30 shankaru has quit [Quit: Leaving.]

12:33 deathmist has quit [Remote host closed the connection]

12:34 deathmist has joined #dri-devel

12:42 itoral has quit [Remote host closed the connection]

12:47 apinheiro has quit [Ping timeout: 480 seconds]

12:54 rgallaispou has quit [Read error: Connection reset by peer]

13:00 mclasen has quit [Ping timeout: 480 seconds]

13:01 <javierm> mripard[m]: did you notice that vc4 is broken with latest drm-misc-next ?

13:01 <javierm> mripard[m]: I've this patch locally but only built tested so far: https://paste.centos.org/view/raw/25714681

13:03 <javierm> don't know if Christian König is here to double check that patch

13:09 <javierm> mripard[m]: nvm, I asked him in the list

13:10 mclasen has joined #dri-devel

13:13 shankaru has joined #dri-devel

13:31 fxkamd has joined #dri-devel

13:32 sarnex has quit [Ping timeout: 480 seconds]

13:33 rgallaispou has joined #dri-devel

13:45 tzimmermann_ has quit []

13:58 toolchains has joined #dri-devel

14:02 mbrost has joined #dri-devel

14:02 mbrost has quit [Remote host closed the connection]

14:02 rakko_ has joined #dri-devel

14:04 sdutt has joined #dri-devel

14:04 mbrost has joined #dri-devel

14:06 toolchains has quit [Ping timeout: 480 seconds]

14:09 oakk has quit [Ping timeout: 480 seconds]

14:12 <bcheng> hey jekstrand, in ca791f5c it looks like from_wsi is never set. shouldn't it be set in anv_image_init bsaed on the presence of WSI_IMAGE_CREATE_INFO_MESA?

14:31 toolchains has joined #dri-devel

14:42 toolchains has quit [Ping timeout: 480 seconds]

14:53 toolchains has joined #dri-devel

15:01 toolchains has quit [Ping timeout: 480 seconds]

15:03 nchery has joined #dri-devel

15:04 kts has joined #dri-devel

15:05 iive has joined #dri-devel

15:06 <jekstrand> bcheng: Uh... maybe? From the comment, it looks like that was just to avoid an assert.

15:06 <jekstrand> So if it's not asserting, we can drop the whole from_wsi bit

15:07 khfeng has quit [Remote host closed the connection]

15:07 khfeng has joined #dri-devel

15:08 <bcheng> its not asserting because its never gets set to true, so it doesn't really serve a purpose atm

15:08 ced117 has quit [Ping timeout: 480 seconds]

15:09 paulk1 has quit [Ping timeout: 480 seconds]

15:11 <bcheng> and since it doesn't do anything meaningfull, I was just wondering if we should use it as intended (by setting it somewhere) or just drop it

15:12 <jekstrand> IMO, we should drop it unless proven otherwise.

15:12 <jekstrand> There are cases on TGL where we may need it in future but we can drop it for now.

15:16 khfeng has quit [Ping timeout: 480 seconds]

15:16 thellstrom has joined #dri-devel

15:19 thellstrom1 has joined #dri-devel

15:19 thellstrom1 has quit []

15:23 lynxeye has quit [Quit: Leaving.]

15:24 thellstrom has quit [Ping timeout: 480 seconds]

15:26 guru_ has quit []

15:26 oneforall2 has joined #dri-devel

15:34 eukara has joined #dri-devel

15:45 ybogdano has joined #dri-devel

15:47 paulk1 has joined #dri-devel

15:55 DanaG has quit [Remote host closed the connection]

15:56 DanaG has joined #dri-devel

16:01 jkrzyszt_ has quit [Ping timeout: 480 seconds]

16:02 frieder has quit [Remote host closed the connection]

16:05 kts has quit [Quit: Konversation terminated!]

16:06 rgallaispou has left #dri-devel [#dri-devel]

16:11 kts has joined #dri-devel

16:13 h0tc0d3 has quit [Remote host closed the connection]

16:14 h0tc0d3 has joined #dri-devel

16:16 Duke`` has joined #dri-devel

16:21 ybogdano has quit [Ping timeout: 480 seconds]

16:22 lynxeye has joined #dri-devel

16:32 <zmike> ccr: what do you think about putting up a MR with your python trace stuff?

16:32 <zmike> seems like it would be a good addition

16:37 toolchains has joined #dri-devel

16:45 toolchains has quit [Ping timeout: 480 seconds]

16:45 ybogdano has joined #dri-devel

16:47 <jekstrand> mareko, Kayden: Is there a PIPE_BIND flag for ensuring something is CPU-visible (i.e., can be mapped)?

16:47 <jekstrand> I'm not seeing one but I don't really understand how all that works.

16:47 <jekstrand> Trying to figure out what to do with CL_MEM_ALLOC_HOST_PTR

16:48 <zmike> jekstrand: it's not a bind flag

16:48 <zmike> it's uhh... PIPE_USAGE?

16:48 <zmike> there's DYNAMIC, STAGING, IMMUTABLE, ...

16:48 <jekstrand> Yeah, that's a heuristic thing. This is a capability thing.

16:48 <jekstrand> Or maybe gallium assumes everything is mappable all the time?

16:49 <zmike> correct

16:49 <zmike> up to drivers to handle it

16:49 <jekstrand> Ideally, I'd like it to be something where, if you can't get a direct map (i.e., it might require a bit), the allocation would fail.

16:49 <jekstrand> I think

16:49 <jekstrand> I'm still not sure I grok what CL is doing there.

16:50 <zmike> use the NO_WAIT flag?

16:50 <mareko> jekstrand: dealing with invisible VRAM?

16:50 <jekstrand> mareko: Trying to make sure CL can request visible VRAM

16:51 <jekstrand> I'm not dealing with an actual problem yet besides looking at the spec and trying to figure out how to map it to gallium.

16:52 <mareko> there is no such flag, but mapping a buffer for CPU access should force it into a VRAM; if you set the persistent resource flag, the driver will map it directly, else it will map it using a temporary copy

16:52 <mareko> *into visible VRAM

16:54 toolchains has joined #dri-devel

16:56 <zmike> could always create the resources as STAGING? pretty sure every driver will give you something directly mappable then :D

17:01 zf has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

17:02 toolchains has quit [Ping timeout: 480 seconds]

17:02 <jekstrand> mareko: Ok, so maybe I want to turn MEM_ALLOC_HOST_PTR into "always set MAP_PERSISTENT"? That would probably work.

17:03 * jekstrand is too used to Vulkan where these things are explicit and not the GL world of "just do a thing; it'll work, promise."

17:05 mbrost has quit [Remote host closed the connection]

17:05 mbrost has joined #dri-devel

17:05 ppascher has joined #dri-devel

17:07 zf has joined #dri-devel

17:10 rakko_ has quit []

17:11 oakk has joined #dri-devel

17:13 toolchains has joined #dri-devel

17:20 mbrost has quit [Ping timeout: 480 seconds]

17:21 <DanaG> huh, there's a new package, libglx-amber0. What the heck is amber?

17:22 <jekstrand> It's old drivers that have been discontinued from main Mesa development.

17:22 <jekstrand> You probably don't have any hardware that requires amber.

17:22 <DanaG> The package description didn't really describe that at all, it just gave generic stuff about what Mesa is.

17:23 <jekstrand> Someone should probably fix that....

17:23 <DanaG> There's an aarch64 package of it for some reason; the only .so file in it is: /usr/lib/aarch64-linux-gnu/libGLX_amber.so.0.0.0

17:25 mbrost has joined #dri-devel

17:25 <DanaG> `strings` of that file includes: Override the DRI driver to load i830 i965 crocus iris radeon r200 r300 r600 radeonsi nouveau_vieux nouveau virtio_gpu

17:26 <DanaG> Old drivers, you mean things like S3 Savage and old ATI Rage and such?

17:26 <DanaG> https://ftp-master.debian.org/new/mesa-amber_21.3.7-1.html

17:28 <jekstrand> Yeah, that kind of old

17:30 <DanaG> Now I'm picturing somebody putting an ancient GPU in an ARM box... doubt it would work, given the lack of EFI.

17:30 <kisak> DanaG: Mesa 22.0.0 dropped support for "classic" (non-gallium based OpenGL) drivers. The mesa 21.3 release branch has been fitted with an amber mode to allow it to slot in beside a newer mesa build to fill that missing support.

17:30 devilhorns has quit []

17:31 LexSfX has quit [Ping timeout: 480 seconds]

17:32 LexSfX has joined #dri-devel

17:34 <Kayden> it's not old as in S3/Rage/etc - those DRI1 drivers were dropped long ago

17:34 <Kayden> amber is all the classic drivers

17:35 Major_Biscuit has quit [Ping timeout: 480 seconds]

17:37 <DanaG> I see, like the ones shown in `strings`.

17:38 <DanaG> Is nouveau non-gallium? If so, what's the gallium one? Or does the strings output show gallium ones too?

17:39 <jekstrand> There's an old nouveau that's classic

17:39 <jekstrand> But modern nouveau is gallium

17:41 shankaru has quit [Quit: Leaving.]

17:46 <Kayden> seems like it's showing gallium ones too...radeonsi, iris, crocus, etc should come from upstream mesa still, not the amber branch.

17:46 mbrost has quit [Remote host closed the connection]

17:47 mbrost has joined #dri-devel

17:47 anujp has joined #dri-devel

17:51 <DanaG> Yeah, I wonder why those are in strings? But maybe they're just coming from a list of known args of things, rather than from the actual modules.

17:59 <DanaG> Weird, vulkaninfo is crashing on my arm64 machine (Radeon WX 4100 in the PCIe slot). http://dpaste.com//6S43TAHTE

18:08 <ajax> yes, just the list of known drivers on that branch

18:08 <ajax> we haven't nerfed the rest, but -Damber=true builds won't try to build them

18:08 gouchi has joined #dri-devel

18:09 rasterman has joined #dri-devel

18:18 samuelig_ has quit []

18:20 samuelig has joined #dri-devel

18:22 nanonyme has joined #dri-devel

18:23 <nanonyme> Hey, would anyone be able to explain, what is this _XkeyTable thing actually that keeps getting flagged in our ABI tests for libX11? Is it something super-internal or is it a problem its size keeps changing?

18:24 <ajax> not a problem as long as it doesn't shrink

18:25 <ajax> and not, in fact a problem, it's only declared in a non-sdk header, any non-libX11 code that touches it is already in a state of sin

18:26 <nanonyme> Okay, sounds like with 99% certainty to the purpose we run ABI checks at all it can be completely ignored.

18:30 <nanonyme> I guess this is same category where various other projects use symbol version FOO_INTERNAL and whatnot

18:31 toolchains has quit [Remote host closed the connection]

18:31 toolchains has joined #dri-devel

18:32 <ajax> yeah. the problem is nobody wants to actually go through libX11 and fix visibility for non-sdk things, both because it's boring and because libX11 is such a long-baked de facto abi that you're just going to cause yourself problems

18:33 <nanonyme> That's fair. Thanks, this allows us to keep updating libX11

18:34 aravind has quit [Ping timeout: 480 seconds]

18:39 toolchains has quit [Ping timeout: 480 seconds]

18:42 <zmike> dcbaker: ping re: #6269

18:42 <zmike> any thoughts on this?

18:45 heat has joined #dri-devel

18:46 toolchains has joined #dri-devel

18:48 <jekstrand> Kayden: Struggling with iris_map_copy_region. It doesn't seem to support writes. Am I crazy?

18:54 toolchains has quit [Ping timeout: 480 seconds]

18:56 * jekstrand is very confused

19:03 <nchery> jekstrand: writes happen in iris_transfer_flush_region

19:03 toolchains has joined #dri-devel

19:04 <Kayden> jekstrand: Yeah. map does an optional read of the existing data into the staging area. unmap writes any updated data back

19:04 Sachiel has quit [Quit: WeeChat 3.4]

19:04 <Kayden> iris_transfer_flush_region also lets you write back things early

19:04 <Kayden> iris_flush_staging_region does the actual write

19:06 Sachiel has joined #dri-devel

19:08 <jekstrand> Ok, that makes sense. I didn't realize there was another stage

19:08 kts has quit [Quit: Konversation terminated!]

19:09 ngcortes has joined #dri-devel

19:11 rasterman has quit [Quit: Gettin' stinky!]

19:12 toolchains has quit [Ping timeout: 480 seconds]

19:12 agd5f has joined #dri-devel

19:14 mszyprow has quit [Ping timeout: 480 seconds]

19:17 agd5f_ has quit [Ping timeout: 480 seconds]

19:29 agd5f_ has joined #dri-devel

19:32 toolchains has joined #dri-devel

19:33 nchery has quit [Ping timeout: 480 seconds]

19:34 agd5f has quit [Ping timeout: 480 seconds]

19:39 h0tc0d3 has quit [Remote host closed the connection]

19:40 toolchains has quit [Ping timeout: 480 seconds]

19:43 nchery has joined #dri-devel

19:45 mclasen has quit [Ping timeout: 480 seconds]

19:50 <jekstrand> karolherbst: Found another flush bug

19:51 * jekstrand needs to read more about opencl queues

19:53 mclasen has joined #dri-devel

19:53 <jekstrand> karolherbst: Ok, so CL queues are pretty much like Vulkan queues.

19:54 <jekstrand> karolherbst: I'm starting to wonder if we don't want to require all drivers that want to do CL things to support syncobj and use that for events.

19:54 <jekstrand> Then you really could have, say, to AMD cards going at the same time and synchronizing back-and-forth without having to stall in userspace between things.

19:55 <jekstrand> *two

20:01 nchery has quit [Remote host closed the connection]

20:02 ybogdano has quit [Ping timeout: 480 seconds]

20:03 nchery has joined #dri-devel

20:07 toolchains has joined #dri-devel

20:10 mbrost has quit [Ping timeout: 480 seconds]

20:11 nchery has quit [Ping timeout: 480 seconds]

20:12 mbrost has joined #dri-devel

20:12 <jenatali> jekstrand: You can have an unlimited number of CL queues though, where Vulkan has caps to report how many there are

20:13 <jenatali> I couldn't map them to D3D queues in any kind of sane world, I had to completely virtualize them and only translate to D3D when the queues actually drain to the "device"

20:18 <karolherbst> jekstrand: we have to stall in userspace anyway

20:19 <karolherbst> there are still user events where we block on the application marking an event as done

20:19 <karolherbst> and what if we do some stuff in sw only?

20:23 <karolherbst> jekstrand: also, I am still super unhappy about the helper context situation. Currently what I am doing with the pipe_transfers is not thread safe at all :

20:23 <karolherbst> :/

20:24 toolchains has quit [Ping timeout: 480 seconds]

20:27 nchery has joined #dri-devel

20:27 lynxeye has quit [Quit: Leaving.]

20:31 <jekstrand> jenatali: Yeah, maybe. Vulkan (and probably D3D) are weirder than they need to be there, though. And, yeah, we may need to virtualize.

20:32 <jenatali> D3D doesn't have a limited number, but yeah it's still not quite a good enough mapping

20:32 <jekstrand> karolherbst: Application marking an event as done can be done with syncobj too.

20:32 <karolherbst> mhhh

20:33 <jekstrand> You just need each event to start off in the reset state and have the queue wait until the point has materialized.

20:33 * jekstrand really needs to write that kernel patch.

20:33 <karolherbst> I suppose we don't have gallium interfaces for that yet, right?

20:33 <jekstrand> karolherbst: Uh... maybe? There's something for GL <-> Vulkan sharing but I don't know how it works.

20:34 <jekstrand> We could also use sync_file

20:34 <jekstrand> That might be easier, actually.

20:34 <jekstrand> As long as we don't worry about running out of files.

20:34 <karolherbst> I have no idea about those kernel interfaces anyway

20:34 <airlied> let's not use sync_files

20:34 <airlied> unless things are going inter-process

20:35 <jekstrand> I think binary syncobj is what we really want

20:35 <karolherbst> jekstrand: what's the big problem with gallium fences though?

20:35 ngcortes has quit [Ping timeout: 480 seconds]

20:35 <jekstrand> karolherbst: Can you share them between devices?

20:36 <karolherbst> I don't think so

20:36 <jekstrand> There's your problem. :P

20:36 <karolherbst> jekstrand: how we are making sure we follow the event status thing though?

20:37 <jekstrand> What do you mean?

20:37 <karolherbst> CL has this weird req that previous events have to be CL_COMPLETE before you can even submit them to the hardware.. well.. according to the spec

20:37 <karolherbst> I think it's all stupid, but....

20:38 <airlied> that sounds like the same thing as vulkan has for semaphore signalling

20:38 <karolherbst> airlied: sure, but that doesn't happen on literally all operations, does it?

20:38 <airlied> yes you have to submit the thing that signals the semaphore before you submit the thing that waits on it

20:39 <karolherbst> I can't even imagine how people looked at the CL spec and thought "ahh yeah, that's great for throughput if _all_ events have to be finished on the hw before we start submitting new ones"

20:39 <karolherbst> airlied: ohh sure, but for CL this is valid for _all_ events

20:39 <karolherbst> not just dependencies

20:39 <karolherbst> only _one_ event is CL_RUNNING

20:40 <karolherbst> or well CL_SUBMITTED

20:40 <airlied> you really don't want the hw blocking on a spinlock

20:40 <airlied> esp a user controlled spinlock

20:40 apinheiro has joined #dri-devel

20:40 <karolherbst> I know

20:40 <airlied> it makes things very difficult :-P

20:40 <karolherbst> well

20:40 <karolherbst> I am just saying what the CL spec says

20:40 <airlied> I think some of those concepts were concieved when someone thought that was a good idea :-

20:40 <airlied> :-P

20:40 <karolherbst> before you can start working on an event, all previous ones have to be complete

20:41 <karolherbst> :D

20:41 <karolherbst> yeah, hence me saying it's all stupid

20:41 <jekstrand> karolherbst: spec text?

20:41 <jekstrand> Something I can search for?

20:42 <karolherbst> kind of the entire "5.11. Event Objects" section

20:42 * jekstrand reads

20:42 <karolherbst> the problem is, you can even install callbacks

20:42 <karolherbst> which listens on events going from one to another status

20:42 <karolherbst> and the CTS verifies some stuff there

20:42 <karolherbst> so we can't even bend the rules unlimited

20:43 <karolherbst> ahh well and there is "5.12. Markers, Barriers and Waiting for Events" as well

20:43 <karolherbst> markers and barriers only matter for out of order queues though

20:43 <airlied> "There is no guarantee that the callback functions

20:43 <airlied> registered for various execution status values for an event will be called in the exact order that

20:43 <airlied> the execution status of a command changes"

20:43 <karolherbst> yeah, the order doesn't matter

20:44 adjtm has quit [Read error: Connection reset by peer]

20:44 <karolherbst> there is one problem

20:44 <karolherbst> you are only submit when everything previously + deps are completed

20:44 <karolherbst> but you can't flush, because that would mean if something fails previously you would change state

20:44 <karolherbst> so you kind of have to wait

20:45 <airlied> it's pretty much what syncobjs and the kernel scheduler do

20:45 Daaanct12 has joined #dri-devel

20:45 <karolherbst> can't flush a kernel call to hw if you are not sure that previous stuff completed

20:45 danvet has quit [Ping timeout: 480 seconds]

20:46 <karolherbst> airlied: okay, so you can submit something and revoke the submission if some operation failed?

20:46 <karolherbst> like if you have a user event in the middle and it gets set to an error state

20:46 <airlied> nope, you just say it relies on this other thing, don't send it to hw, and if something fails you lost context

20:46 <airlied> like failing seems like a pretty big thing to happen here, like a GPU crash

20:47 <karolherbst> so I'd have to kill the entire context if the user event gets an error state?

20:47 <airlied> seems about right

20:47 <karolherbst> ehhh mhh

20:47 Haaninjo has quit [Quit: Ex-Chat]

20:47 <jenatali> karolherbst: Yeah I originally tried to write something that would be more efficient and batch together a chain of dependent commands into a single stream, but ended up following the spec text exactly and only submitting non-dependent work together at the same time

20:47 <karolherbst> airlied: doesn't really sound like something we want to use then?

20:48 <karolherbst> jenatali: yeah... CL is just broken in this regard sadly

20:48 * airlied isn't sure what you think should gracefully degrade here

20:48 <karolherbst> airlied: dependencies

20:48 <karolherbst> ehh

20:48 <karolherbst> the other thing

20:48 <airlied> like why does a dependency fail?

20:48 <karolherbst> airlied: so in a perfect world, you have like 10 CL events with work attached and you just flush it out to the hardware

20:48 <karolherbst> and everything is fine, because you know, nothing fails in a perfect world

20:49 <karolherbst> so you can just cheat and mark the status of those events according to the spec

20:49 <airlied> if they have dependencies you probably never flush it all out to the hw

20:49 <karolherbst> _but_ now you have user events

20:49 <karolherbst> and user events are in the control of the application

20:49 <karolherbst> airlied: why not?

20:49 <airlied> yeah if you have a user controlled thing, you probably ain't sending it to hw at all

20:49 <karolherbst> airlied: but the user thing depends on stuff and other things depend on that

20:49 <airlied> karolherbst: because we learned that gpu semaphores are a bad idea

20:49 <airlied> and wrote a gpu scheduler

20:50 <karolherbst> but that's a different thing, isn't it?

20:50 <airlied> no events are pretty much semaphores

20:50 <airlied> you have one command stream with a wait for event, other command stream with a signal event

20:50 <karolherbst> I can just call 5 times launch_grid in gallium, and I can rely on that things happen in order, no?

20:50 <airlied> you submit both of them to separate queues in theory and they sync up

20:50 <airlied> on the hw

20:51 <airlied> in practice you should submit them to a gpu scheduler and it does the ordering

20:51 <karolherbst> ehh, but I am only talking about one command stream and one queue

20:51 Daanct12 has quit [Ping timeout: 480 seconds]

20:51 <jekstrand> Yeah, I'm not seeing why CL events are any different than some smashing together of VkFence and VkSemaphore.

20:51 <karolherbst> if you have multiple, sure, you have to sync and everything

20:51 <karolherbst> but the case with just one queue is already bonkers

20:52 <airlied> like if you set a user event and the user never signals it, you blow away the context

20:52 <airlied> just like we do for vk events

20:52 <karolherbst> CL doesn't know timeouts

20:52 <airlied> but yeah it's also fine to just not submit it to hw at all

20:52 <airlied> and wait for the user to signal it before you do

20:52 <airlied> because I doubt anyone cares about optimallity in this path

20:53 <karolherbst> right

20:53 <karolherbst> but then we have to wait on the CPU side anyway, no?

20:54 <airlied> yes

20:54 <airlied> we always have to wait on the CPU side, waiting on the GPU is never a good idea

20:54 <karolherbst> right

20:54 <karolherbst> but that's kind of what I am doing atm anyway

20:54 <karolherbst> not waiting until thre is a need to

20:55 adjtm has joined #dri-devel

20:55 <karolherbst> I think I am not really getting the point what syncobjs or something else would win us here

20:56 <karolherbst> so what would it change/improve?

20:56 <karolherbst> (also I am sure, that drivers will use synbobjs for gallium fences internally anyway, no?)

20:57 <jekstrand> What it buys us is moving the CPU waiting closer to the hardware.

20:57 <jekstrand> Into the kernel, in particular.

20:57 <karolherbst> but the driver can do that

20:58 <jekstrand> Not cross-device

20:58 <jekstrand> Or even cross-cl_queue

20:58 <jekstrand> (ok, maybe if we're virtualizing)

20:59 DanaG has quit [Remote host closed the connection]

21:00 DanaG has joined #dri-devel

21:00 rasterman has joined #dri-devel

21:01 lemonzest has quit [Quit: WeeChat 3.4]

21:02 <jekstrand> As far as callbacks go, I'm not that worried. We can kick off a thread that waits, fires the callbacks, and then dies.

21:03 <jekstrand> But I think the bigger problem to fix is default context usage

21:03 <jekstrand> Right now, we're using it for all maps. We need to only use it for cl_mem initialization and nothing else.

21:04 <karolherbst> yeah...

21:04 <karolherbst> I honestly have no good solution here, because pipe_contexts are not thread safe at all

21:04 <karolherbst> and CL applications can do queue stuff on every thread and assume it's all thread safe

21:04 <karolherbst> also

21:05 <jekstrand> So we have a thread per queue which manages the pipe_context

21:05 <karolherbst> ehh no

21:05 <jekstrand> Or we lock around the pipe_context

21:05 <karolherbst> jekstrand: yes.. but you have to return the ptr

21:05 <jekstrand> For maps?

21:05 <karolherbst> yes

21:05 <jekstrand> Yup. So we have to synchronize.

21:05 <karolherbst> so we can't just push a task to the worker thread, because we need the result

21:05 <karolherbst> yeah...

21:05 <karolherbst> thing is

21:05 <karolherbst> there are also non blocking maps

21:06 <jekstrand> How are those expected to work?!?

21:06 <karolherbst> they don't flush the queue?

21:06 <karolherbst> we have a PIPE_MAP flag for that

21:06 <airlied> you get an event back and the map is on valid once the event is set

21:06 <karolherbst> but...

21:07 <airlied> so you'd really have to queue up the maps rather than asking for them

21:07 <karolherbst> airlied: sure, you still need the pointer and for that a pipe_context

21:07 <jekstrand> I mean, we could mmap(MAP_ANONYMOUS) to get some VA space and then hand that to the gallium map function somehow.

21:07 <karolherbst> jekstrand: maybe

21:08 <jekstrand> There are two other options:

21:08 <karolherbst> I kind of like the idea of a worker thread, because throughput and we don't have to lock around a pipe_context... but....

21:08 <jekstrand> 1. Do a persistent map which is immediate and trust the client not to touch it. This only works for buffers, probably.

21:08 <jekstrand> 2. Create a temporary which we can persistently map, schedule a blit to that, and then return the mapped pointer.

21:09 <karolherbst> we can't

21:09 <karolherbst> map has to return the host ptr if that was used

21:09 <karolherbst> so no temporary in that case

21:09 <jekstrand> Yeah, so we short-circuit in that case

21:09 <jekstrand> That's case 1

21:09 <karolherbst> we could use PIPE_MAP_UNSYNCHRONIZED ...

21:09 <karolherbst> like always

21:10 Duke`` has quit [Ping timeout: 480 seconds]

21:10 <jekstrand> Yup

21:10 <jekstrand> So for HOST_PTR, we return the host pointer. Else, for buffers, we MAP_UNSYNCHRONIZED always. Else, we make a temporary and schedule a blit.

21:10 <karolherbst> ehhh

21:10 <karolherbst> I don't like that temporary part :p

21:11 <jekstrand> What do you think iris is doing for image maps?

21:11 <karolherbst> drivers internally can do whatever they want :D

21:11 <jenatali> Yep that's what I did

21:12 <jenatali> I also have a single worker thread managing a context, and all the queues push work into a pool that the worker thread pulls from

21:12 <jekstrand> Gallium is an immediate API. OpenCL isn't. We're going to have to do some drivery things.

21:12 <karolherbst> jekstrand: there is one annoying thing about UNSYNCHRONIZED: "It should not be used with PIPE_MAP_READ."

21:12 <jekstrand> karolherbst: *should*

21:12 <karolherbst> :D

21:12 <karolherbst> okay

21:12 <karolherbst> fine

21:12 <jekstrand> karolherbst: That's because it's likely going to be a write-combined pointer which is going to suck for perf on discrete cards.

21:12 <karolherbst> right

21:13 <jekstrand> That's where the create hints come in.

21:13 <karolherbst> jenatali: okay, so for non blocking maps you create a temporary buffer and all that?

21:13 bbrezillon has quit [Ping timeout: 480 seconds]

21:13 <jenatali> Yep

21:13 <karolherbst> mhhh

21:13 <karolherbst> I guess we could do that

21:13 <jekstrand> If we flag it as an upload resource, we'll probably get write-combine. If we flag it as a download resource, we'll probably get something stuck in system RAM which will suck from the GPU a bit but otherwise be fine.

21:14 <jenatali> D3D doesn't have as much flexibility here anyway, we don't currently expose CPU-accessible vidmem

21:14 <jekstrand> We can also do temporaries for buffers for read/write maps

21:14 <jekstrand> And only do unsynchronized for write-only

21:14 <karolherbst> yeah.. somebody or I need to optimize that flag situation anyway sooner or later

21:14 <jekstrand> But integrated cards really want us to use PERSISTENT for everything.

21:14 <karolherbst> jekstrand: can't we use PIPE_MAP_UNSYNCHRONIZED | PIPE_MAP_DONTBLOCK always?

21:14 <karolherbst> although that still doesn't help with what context to use..

21:15 <karolherbst> the non blocking case is what's annoying anyway

21:15 <nanonyme> Is there btw any agreement on how to handle LLVM-support with Mesa amber? Is the intent that at some point amber and new releases will require different LLVM versions?

21:16 <jekstrand> nanonyme: No amber drivers use LLVM.

21:16 <karolherbst> okay.. so 1. if hostptr return hostptr, 2. if blocking do tricks with the event task and wait on that to get the ptr 3. if non blocking create a temporary resource and blit to that?

21:16 <airlied> except the GL_SELECT path :-P

21:16 <jekstrand> Yeah....

21:16 <jekstrand> But that works on basically any LLVM version

21:16 <nanonyme> jekstrand, oh, it's *that* old? Well, that's a relief

21:16 <airlied> jekstrand: the question is if we have to maintain amber for newer llvm

21:17 <jekstrand> karolherbst: Roughly?

21:17 <jekstrand> karolherbst: We may want a gallium cap for "are unsynchronized map reads fast?" and change behavior based on that.

21:18 <karolherbst> ehh I guess for third we just allocate mem and transfer it in the queue

21:18 <karolherbst> *into it

21:18 <nanonyme> airlied, right, was asking with packager hat on

21:18 <karolherbst> ehh wait...

21:18 <karolherbst> no, that's fine.. we deallocate on unmap then

21:19 ybogdano has joined #dri-devel

21:19 <karolherbst> jekstrand: yeah.. let me try to hack something up over the weekend then, got some ideas on how to do that without being too ugly

21:19 <jekstrand> karolherbst: Cool

21:19 <jekstrand> karolherbst: FYI, I pushed another patch to my rusticl/wip branch to fix more context flushing issues. :-/

21:20 <jekstrand> I hate it but it fixes a real bug. I'm down to two known test_basic fails on iris now

21:20 <jekstrand> I'm going to attack the libclc bug now

21:20 <karolherbst> yeah, I saw it

21:20 <karolherbst> but that's going to be fixed by the above anyway

21:20 <karolherbst> ehh well

21:20 <karolherbst> I'd consider it when fixing all of that

21:21 <karolherbst> (or it needs to change regardless)

21:21 <karolherbst> jekstrand: what I don't like is, that we sometimes use pointers for HashMap keys :/

21:22 <karolherbst> I think I'd really just use Arc<...>s for now, they are relatively cheap and just a pointer interally anyway. I was thinking about using references but that's like super ugly

21:24 <karolherbst> ohhh.. I think I might be able to use a Vec instead for that...

21:25 <karolherbst> mem objects aren't as annoying as program/kernels so we can rely on a device list from the context, mhhh

21:27 <karolherbst> yeah.. I see a bigger project for the weekend :D

21:30 <jekstrand> I don't see a way around having a HashMap which maps map pointers to the underlying pipe_transfer_map and relevant stuff.

21:31 <jekstrand> At least not for images and other cases where we don't just use the host pointer and/or a persistent map

21:32 <karolherbst> jekstrand: I was more refering to Mem.res

21:32 <jekstrand> Yeah...

21:32 <karolherbst> I am annoyed by the unsafe { **d } we do for the helper ctx :p

21:33 <jekstrand> Yeah, that's not great

21:33 ngcortes has joined #dri-devel

21:34 <karolherbst> but I think I can just use a vec and do mem.res.iter().zip(mem.context.devices) and get a (resource, device) thing, but that sucks in the kernel case where we need a resource for a particular device

21:34 <karolherbst> maybe I just use Arc and call it a day

21:34 gouchi has quit [Remote host closed the connection]

21:35 <karolherbst> I'll experiment a little

21:39 mbrost has quit [Ping timeout: 480 seconds]

21:46 sarnex has joined #dri-devel

21:52 mbrost has joined #dri-devel

21:59 icecream95 has joined #dri-devel

22:04 <jekstrand> Ok, have wait_group_events hacked. test_basic should complete now.

22:05 pcercuei has quit [Quit: dodo]

22:11 <jekstrand> FAILED 2 of 112 tests.

22:21 sdutt has quit []

22:21 sdutt has joined #dri-devel

22:21 columbarius has joined #dri-devel

22:23 co1umbarius has quit [Ping timeout: 480 seconds]

22:23 heat has quit [Remote host closed the connection]

22:23 heat has joined #dri-devel

22:24 khfeng has joined #dri-devel

22:28 mbrost has quit []

22:34 apinheiro has quit [Quit: Leaving]

22:37 toolchains has joined #dri-devel

22:37 ybogdano has quit [Ping timeout: 480 seconds]

22:41 rasterman has quit [Quit: Gettin' stinky!]

22:50 ahajda_ has quit [Read error: Connection reset by peer]

22:57 <jekstrand> PASSED 112 of 112 tests.

23:08 <karolherbst> :)

23:09 <karolherbst> now to figure out the crashes with test_conversions and math_brute_force

23:09 tursulin has quit [Ping timeout: 480 seconds]

23:10 <karolherbst> I am sure it's some 64 bit stuff

23:10 <karolherbst> or maybe that map race

23:10 <karolherbst> (well I assume there are races)

23:11 ybogdano has joined #dri-devel

23:13 mhenning has joined #dri-devel

23:15 <jekstrand> Well, now that I have maps working....

23:19 mattrope has quit [Quit: Leaving]

23:20 <jekstrand> karolherbst: I posted yet another iris MR with fixes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15811

23:20 <jekstrand> karolherbst: Once Kayden reviews and we land that one, it might be worth rebasing to pick up the fixes.

23:21 toolchains has quit [Ping timeout: 480 seconds]

23:21 reductum has quit [Ping timeout: 480 seconds]

23:23 * jekstrand kicks off conversion tests

23:25 <jekstrand> With that, time to call it a day. :)

23:29 mattrope has joined #dri-devel

23:35 morphis has quit [Ping timeout: 480 seconds]

23:35 morphis has joined #dri-devel

23:45 mattrope has quit [Remote host closed the connection]

23:46 alanc has quit [Remote host closed the connection]

23:47 alanc has joined #dri-devel

23:57 khfeng has quit [Remote host closed the connection]