#dri-devel on 2022-03-14 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:03 <karolherbst> ehhh I hoped for more passing tests, even though I only support global mem pointers as args atm

00:03 <karolherbst> but still 800 fails unrelated to that

00:12 heat_ has quit [Ping timeout: 480 seconds]

00:17 <alyssa> I have so many questions

00:17 <alyssa> all the colours on GNOME were wrong

00:17 <alyssa> and then I go clicking things in the dark and now.. it works? wat?

00:17 <alyssa> can't tell if panfrost bug or kernel drm driver bug

00:18 <airlied> why not both? :-)

00:18 <karolherbst> "Pass 1272 Fails 799 Crashes 102"

00:18 <alyssa> and when I click settings it goes back to being broken

00:19 <icecream95> what if you click it again?

00:19 <alyssa> doesn't seem to help, but if I click power off it goes back to working

00:19 <icecream95> :)

00:20 <icecream95> Yes, the screen is black, just as intended

00:20 <alyssa> lol

00:21 <alyssa> it's not black... something is seriously wrong but I can't tell what the effect is, it's an unfamiliar form of broken

00:21 <icecream95> Screenshot?

00:29 <alyssa> "disk i/o error" uh oh...

00:29 <icecream95> Kernel bug then?

00:30 <alyssa> no i ran out of storage on my other computer, unrelated

00:31 <HdkR> Need a couple of M.2 drives slapped in to USB enclosures? :P

00:31 <alyssa> icecream95: https://rosenzweig.io/foo.png

00:31 <alyssa> poking around, it looks like all text is being rendered black (instead of white)

00:31 <alyssa> why randomly clicking things makes it work again remains an open problem

00:32 <alyssa> HOWEVER

00:32 <alyssa> when I pkill xorg, what's left on the console is garbage in the same way!

00:32 <alyssa> which indicates kernel

00:33 <alyssa> https://rosenzweig.io/foo2.png

00:33 <alyssa> From the latter image, it looks like the DRM driver is using a busted pixel format?

00:34 <alyssa> (and when I run kmscube from that post-GNOME console, everything is garbled in the same way)

00:34 <alyssa> so that practically confirms it's kernel

00:34 <icecream95> Could just be that the gamma ramp is set wrong

00:34 <alyssa> (which implies I should stop hacking for the night, eat dinner, go to sleep, and bug my colleagues tomorrow :-p)

00:34 <alyssa> ooh actually kmscube is telling too

00:35 <alyssa> the artefacts on kmscube are /curved/

00:35 <alyssa> mediatek-drm is definitely busted

00:36 <icecream95> Does enabling "Night Light" in GNOME do anything?

00:37 <alyssa> possibly that's what I was hitting, uhh

00:38 mclasen has quit []

00:38 mclasen has joined #dri-devel

00:40 <alyssa> enabling or disabling triggers the issue (correct colours -> screwed up)

00:40 <alyssa> hitting power twice (I.e. turn display off and on) fixes the colours

00:41 <icecream95> Oh, you mean that the power key is mapped to suspend?

00:41 * icecream95 lives dangerously and has it set to power off

00:41 <alyssa> lol

00:41 <alyssa> defaults

00:42 <alyssa> ok for funsies, let's mess with the DT

00:43 <icecream95> ..Why not just mess about with kernel memory without rebooting?

00:51 Haaninjo has quit [Quit: Ex-Chat]

01:02 <karolherbst> hey.. why am I hitting memory corruptions in Rust, I thought it's magic and that should never happen

01:02 * karolherbst blinks at git grep unsafe *.rs | wc: 117

01:02 <HdkR> You've seen past the magic veil

01:03 <karolherbst> I think the problem is that I use unsafe nearly as often as a rust dev inside a single official rust source file

01:03 <karolherbst> ehh wait

01:03 <karolherbst> the driver deletes my nir

01:03 <karolherbst> how rude

01:04 <karolherbst> I have to dup it or something, no?

01:04 <imirkin> iirc it's the driver's nir.

01:04 <karolherbst> nah, it's mine

01:04 <imirkin> you just think that.

01:04 <karolherbst> I gave it to the driver

01:04 <imirkin> and now you want it back?

01:04 <imirkin> no such luck.

01:04 <imirkin> it's the driver's :)

01:04 <karolherbst> I don't, it just deletes it

01:04 <karolherbst> I still own it

01:04 <karolherbst> and have it

01:05 <imirkin> i don't think those are the semantics...

01:05 <imirkin> but maybE?

01:05 <karolherbst> I am sure they aren't

01:05 <imirkin> iirc the driver owns it.

01:05 <imirkin> and can do whatever it likes

01:05 <karolherbst> yeah I know

01:05 <karolherbst> it makes sense

01:05 <karolherbst> I just have to duplicate it

01:09 <karolherbst> how that nir_shader_clone magically fixes that crashes

01:19 slattann has joined #dri-devel

01:23 i-garrison has quit [Ping timeout: 480 seconds]

01:39 <karolherbst> mhh.. memory problems

01:40 co1umbarius has joined #dri-devel

01:42 columbarius has quit [Ping timeout: 480 seconds]

01:44 pcercuei has quit [Quit: dodo]

01:46 i-garrison has joined #dri-devel

01:46 slattann has quit [Read error: Connection reset by peer]

01:52 <karolherbst> ahh.. some memory leak somewhere

01:53 mclasen has quit []

01:53 mclasen has joined #dri-devel

02:38 ella-0 has joined #dri-devel

02:41 ella-0_ has quit [Remote host closed the connection]

02:43 mclasen has quit []

02:44 mclasen has joined #dri-devel

02:51 kts has joined #dri-devel

03:13 camus has quit [Remote host closed the connection]

03:23 kts has quit [Ping timeout: 480 seconds]

03:37 camus has joined #dri-devel

03:42 kts has joined #dri-devel

03:52 sdutt has quit []

03:52 sdutt has joined #dri-devel

03:53 aravind has joined #dri-devel

03:53 kts has quit [Ping timeout: 480 seconds]

03:57 kts has joined #dri-devel

04:14 rgallaispou1 has quit [Ping timeout: 480 seconds]

04:15 rgallaispou has joined #dri-devel

04:20 camus has quit []

04:24 aravind has quit [Ping timeout: 480 seconds]

04:33 Danct12 has quit [Remote host closed the connection]

04:37 mclasen has quit [Ping timeout: 480 seconds]

04:42 shankaru has joined #dri-devel

04:43 kts has quit [Ping timeout: 480 seconds]

05:02 jewins has quit [Ping timeout: 480 seconds]

05:15 shankaru1 has joined #dri-devel

05:15 shankaru has quit [Read error: Connection reset by peer]

05:40 digetx is now known as Guest2079

05:40 Guest2079 has quit [Read error: Connection reset by peer]

05:40 digetx has joined #dri-devel

05:46 rabbitz has joined #dri-devel

05:53 rabbitz has quit []

05:58 Duke`` has joined #dri-devel

06:12 aravind has joined #dri-devel

06:13 rabbitz has joined #dri-devel

06:23 itoral has joined #dri-devel

06:27 danvet has joined #dri-devel

06:35 camus has joined #dri-devel

06:36 rabbitz has quit []

06:56 pnowack has joined #dri-devel

07:02 thellstrom has joined #dri-devel

07:05 Duke`` has quit [Ping timeout: 480 seconds]

07:09 karolherbst has quit [Remote host closed the connection]

07:10 karolherbst has joined #dri-devel

07:28 aravind has quit [Ping timeout: 480 seconds]

07:35 alanc has quit [Remote host closed the connection]

07:36 alanc has joined #dri-devel

07:43 Haaninjo has joined #dri-devel

07:45 jkrzyszt has joined #dri-devel

07:51 frieder has joined #dri-devel

07:52 MajorBiscuit has joined #dri-devel

07:54 itoral_ has joined #dri-devel

07:54 kchibiso- has quit [Read error: No route to host]

07:54 kchibisov_ has joined #dri-devel

07:57 aravind has joined #dri-devel

07:59 kchibisov_ has quit []

08:00 itoral has quit [Ping timeout: 480 seconds]

08:00 kchibisov_ has joined #dri-devel

08:08 lemonzest has joined #dri-devel

08:08 tzimmermann has joined #dri-devel

08:17 digetx is now known as Guest2084

08:17 Guest2084 has quit [Remote host closed the connection]

08:17 digetx has joined #dri-devel

08:20 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

08:20 jernej has joined #dri-devel

08:34 itoral_ has quit [Read error: Connection reset by peer]

08:59 lynxeye has joined #dri-devel

09:10 mvlad has joined #dri-devel

09:11 tursulin has joined #dri-devel

09:13 kj has quit [Quit: Page closed]

09:13 kj has joined #dri-devel

09:25 <tomeu> zmike: anholt: for testing stability of a MR, I just schedule a pipeline hourly (or so) after hacking the .gitlab-ci.yml to leave only the required tests, and have them start automatically

09:26 <tomeu> I do have scripts for monitoring a bunch of specific issues in LAVA jobs, but I guess these won't be useful here

09:26 <tomeu> you could write something similar via the gitlab API though and parse the logs

09:26 rkanwal has joined #dri-devel

09:29 rasterman has joined #dri-devel

09:58 thellstrom1 has joined #dri-devel

09:58 thellstrom has quit [Read error: Connection reset by peer]

10:03 rabbitz has joined #dri-devel

10:06 thellstrom1 has quit [Ping timeout: 480 seconds]

10:08 rabbitz has quit [autokilled: This host violated network policy and has been banned. Mail support@oftc.net if you think this is in error. (2022-03-14 10:08:48)]

10:38 Danct12 has joined #dri-devel

10:38 sdutt has quit [Read error: Connection reset by peer]

10:44 mclasen has joined #dri-devel

10:49 adjtm has quit [Quit: Leaving]

10:50 adjtm has joined #dri-devel

10:50 flacks has quit [Quit: Quitter]

10:52 flacks has joined #dri-devel

11:18 pcercuei has joined #dri-devel

11:36 rkanwal has quit [Ping timeout: 482 seconds]

11:39 kts has joined #dri-devel

11:39 mclasen has quit []

11:40 mclasen has joined #dri-devel

11:40 kts has quit []

11:43 rkanwal has joined #dri-devel

11:47 <karolherbst> jekstrand: ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh I figured it out

11:47 <karolherbst> event.write_checked(cl_event::from_arc(e.clone())) if event is null, we create a dangling arc nobody ever retains :(

11:47 <karolherbst> and I am sure we will hit this in different palces

11:50 <karolherbst> I think we need a new API specific to the pointers "event.leak_ref(e)" and implement the proper thing there

12:08 pcercuei has quit [Remote host closed the connection]

12:09 pcercuei has joined #dri-devel

12:26 shankaru1 has quit []

12:29 shankaru has joined #dri-devel

12:29 almos has joined #dri-devel

12:31 almos has quit []

12:43 simon-perretta-img has quit [Quit: Leaving]

12:44 simon-perretta-img has joined #dri-devel

12:48 Surkow|laptop has quit [Quit: 418 I'm a teapot - NOP NOP NOP]

12:50 simon-perretta-img has quit [Quit: Leaving]

12:50 simon-perretta-img has joined #dri-devel

12:55 simon-perretta-img has quit []

12:55 simon-perretta-img has joined #dri-devel

12:56 simon-perretta-img has quit []

12:56 almos has joined #dri-devel

12:56 simon-perretta-img has joined #dri-devel

12:58 agd5f has joined #dri-devel

12:58 pcercuei has quit [Ping timeout: 480 seconds]

13:01 pcercuei has joined #dri-devel

13:04 Danct12 has quit [Quit: Quit]

13:11 macromorgan has quit [Quit: Leaving]

13:20 JohnnyonFlame has joined #dri-devel

13:20 <jekstrand> karolherbst: I'm going to need more context on that. Or maybe I just need to read the week-end scrollback.

13:21 <karolherbst> jekstrand: we leaked a pointer which the client never asked for

13:21 <karolherbst> I was just hitting memory leaks

13:22 <karolherbst> anyway, kernels are now running :)

13:24 <karolherbst> jekstrand: anyway.. changes I made: https://gitlab.freedesktop.org/karolherbst/mesa/-/commits/rusticl/wip

13:24 <karolherbst> "rusticl: implement clFinish and clFlush" is the first one

13:25 <karolherbst> and I think the first 5 are safe, the others are more like to get something working

13:25 <karolherbst> the cts runner also has a weirdo bug where some processes are just... "lost" no idea what those are all about

13:26 <karolherbst> I suspect faulty timeout handling

13:26 <jekstrand> karolherbst: Cool. I think I'll be back to CL hacking tomorrow. Today is going to be 90% meetings. :-/

13:27 <karolherbst> for those commits I am more or less interested in what you think about some of the changes. I think the nir related stuff is quite "fine" but maybe we can make it easier to call passes with a macro

13:28 <karolherbst> the last CTS run with my script took like 50 minutes on llvmpipe :( I think I have to use my desktop from now on :D

13:33 fxkamd has joined #dri-devel

13:35 aravind has quit [Ping timeout: 480 seconds]

13:43 <karolherbst> okay.. pushed a fixed runner :)

13:43 adjtm is now known as Guest2106

13:43 adjtm has joined #dri-devel

13:47 <karolherbst> jekstrand: ohh and we have to stop using unsigned in structs when we actually mean enums :D

13:47 <jekstrand> karolherbst: oh?

13:47 <karolherbst> that makes generating bindings and using those fields it in rust much easier

13:47 <jekstrand> Oh, right.

13:47 <jekstrand> NIR only does that a couple places and only for bitfields.

13:47 <karolherbst> bindgen doesn't generate stuff for everything, so everything not pulled in by patterns has to be added explicitly

13:47 <jekstrand> Because if we don't, it messes up MSVC.

13:48 <karolherbst> ehhh...

13:48 <karolherbst> so "nir_variable_mode mode:15;" won't work?

13:48 <jekstrand> not with MSVC

13:48 <jekstrand> it sucks

13:48 <karolherbst> I stop recognizing MSVC as a C compiler

13:48 <karolherbst> :D

13:48 Haaninjo has quit [Read error: Connection reset by peer]

13:48 <karolherbst> mhh...

13:48 <karolherbst> annoying

13:48 jewins has joined #dri-devel

13:48 <alyssa> remind me why we build with MSVC

13:49 <karolherbst> because some people prefer MSVC over a working compiler

13:49 <jekstrand> TBH, though, I don't think most of those bitfields in NIR are that useful. Most shaders, with a bit of opimization, don't have many variables so if the size bloats a bit fot the sake of compatibility, it's probably ok.

13:49 Haaninjo has joined #dri-devel

13:49 <jekstrand> alyssa: VMWare does. Microsoft stuff, too.

13:49 <karolherbst> jekstrand: I suspect shader caching as one reason

13:50 <karolherbst> jekstrand: I mean.. we can deal with it inside rust, it's just... more annoying

13:50 macromorgan has joined #dri-devel

13:51 Guest2106 has quit [Ping timeout: 480 seconds]

13:51 <karolherbst> jekstrand: ohh.. and meson doesn't support adding include paths for generated headers, so there is another hack with paths in meson.build

13:51 <karolherbst> "include_directories('../../../../build/src/compiler/nir/')," maybe it works for you, maybe not

13:51 <karolherbst> :D

13:52 <jekstrand> karolherbst: I use _build

13:52 <karolherbst> ahh

13:53 <karolherbst> also.. I wrote a rust Iterator for exec_lists.....

13:53 <karolherbst> surprised how much more pleasent that is to look at than our C macros for the same thing :D

13:54 <karolherbst> not sure if we need it besides iterating vars.. but let's see

13:56 <karolherbst> also.. I expect nearly nothing of that new stuff to work on an actual GPU

13:57 <karolherbst> I don't do fencing at all atm, so this might be something to look at

13:57 <jekstrand> sure

13:57 <jekstrand> I'm happy to work on some of that.

13:58 <jekstrand> We just need to get to "we can run some kernels" so we have a base to expand from.

13:58 <jekstrand> So I'll focus on reviewing that stuff first

13:58 <karolherbst> yeah

13:58 <karolherbst> cool

14:00 sdutt has joined #dri-devel

14:03 Danct12 has joined #dri-devel

14:15 <almos> hi, I have a question: amdgpu used to be able to show a table of the dpm frequencies, but now I can't find it anywhere, was that feature removed?

14:18 Surkow|laptop has joined #dri-devel

14:30 lemes has joined #dri-devel

14:33 qyliss has quit [Quit: bye]

14:35 <Sachiel> mareko: I don't know if you care about this, but I've been asked to forward this https://github.com/KhronosGroup/VK-GL-CTS/issues/308

14:36 qyliss has joined #dri-devel

14:36 qyliss has quit [Remote host closed the connection]

14:40 almos has quit [Remote host closed the connection]

14:40 qyliss has joined #dri-devel

14:42 <jekstrand> daniels: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15305 has been sitting for 45min waiting for sanity to finish. :-/

14:44 <daniels> jekstrand: yeah, VirGL were unintentionally spamming CI to death, which they're fixing

14:44 <kisak> Sachiel: CTS run on a mesa branch that hasn't been supported in 4 years is a flat no. Update your system and test with mesa 21.3 or 22.0.

14:44 <kisak> (or git main)

14:44 mattrope has joined #dri-devel

14:45 sdutt has quit []

14:45 sdutt has joined #dri-devel

14:45 almos has joined #dri-devel

14:57 libv has quit [Ping timeout: 480 seconds]

15:00 aravind has joined #dri-devel

15:04 libv has joined #dri-devel

15:11 <tzimmermann> danvet, if you have a bit of time, i'd appreciate if you could coment on https://lore.kernel.org/dri-devel/20220303205839.28484-1-tzimmermann@suse.de/

15:13 <tzimmermann> mlankhorst_, do you have the time to send out drm-misc-next-fixes? there are two patches since february

15:21 karolherbst_ has joined #dri-devel

15:21 karolherbst has quit [Read error: Connection reset by peer]

15:21 karolherbst_ is now known as karolherbst

15:22 <danvet> tzimmermann, looks all reasonable?

15:23 <tzimmermann> danvet, to me :)

15:23 <danvet> I mean I could throw a bikeshed onto it about how to shuffle stuff around or whether we should reuse fbdev code or not

15:23 <danvet> but that' seems silly

15:23 <tzimmermann> it's ok if you don't have comment. i just didn't want to go ahead without considering your feedback.

15:25 <danvet> it's annoying that we can't do a full generic vma mkwrite wrapper, but oh well

15:25 <danvet> also I didn't review whether the code works :-)

15:27 <jekstrand> karolherbst: As far as what passes to run, I can help with that. I think my intel_clc.c has a decent set. One thing we want to do with rusticl is to integrate better w/ gallium drivers here and not lower absolutely everything up-front.

15:30 mclasen has quit []

15:30 mclasen has joined #dri-devel

15:31 <jekstrand> karolherbst: Yeah, we'll need the global binding API or something like it. For images, however, I'm hoping we can start looking like a "normal" gallium driver.

15:42 ifreund has quit [Remote host closed the connection]

15:42 ifreund has joined #dri-devel

15:45 loki_val has joined #dri-devel

15:49 crabbedhaloablut has quit [Ping timeout: 480 seconds]

15:51 JohnnyonFlame has quit [Ping timeout: 480 seconds]

15:51 <karolherbst> "Pass 1301 Fails 808 Crashes 39 Timeouts 29" :3

15:55 aravind has quit [Ping timeout: 480 seconds]

15:55 aravind has joined #dri-devel

15:56 <alyssa> Woo

15:56 <alyssa> What set is that?

15:57 <karolherbst> alyssa: almost all of CL CTS

15:57 <karolherbst> don't get distracted by the low number, that stuff ran for 1.5 hours

15:58 <karolherbst> (the full CTS even needs more)

15:58 <karolherbst> jekstrand: there is one annoying thing btw, and I think I have a deadlock somewhere once the client multithreads stuff. CL_TEST_SINGLE_THREADED=1 helps as a workaround

16:00 gpiccoli_ has joined #dri-devel

16:00 gpiccoli has quit [Read error: Connection reset by peer]

16:00 <karolherbst> I think I will wire up constant memory support up next and play around using const buffers and the likes, but not quite sure yet how all of that will fit into place

16:03 <karolherbst> but I kind of fear that we have to treat constant as global :(

16:03 <karolherbst> another thing I was wondering about is to use a constant buffer instead of that input stuff

16:03 <karolherbst> oh well

16:04 <jekstrand> karolherbst: But it's rust code! That shouldn't happen!

16:04 <jekstrand> :(

16:04 <jekstrand> Maybe we're too unsafe? (-:

16:04 <karolherbst> not at all

16:04 <karolherbst> I use unsafe less often than rust devs

16:04 aravind has quit [Ping timeout: 480 seconds]

16:04 <jekstrand> lol

16:05 <karolherbst> but I do a lot of hand waving

16:05 <karolherbst> because 1. mesa code is safe, obviously 2. I just rely on sane clients as everything else is UB to begin with

16:05 <karolherbst> but yeah.. we have to make sure we use our own code correctly

16:13 <karolherbst> ehhh host ptr...

16:14 <karolherbst> jekstrand: soo.. there is one issue I don't have a good solution for yet. I create the proper buffer objects when the client creates a buffer. Thing is.. there is stuff like CL_MEM_COPY_HOST_PTR where we should init the content with the provided data. Problem is, buffer_map and buffer_subdata require a pipe_context which we don't have at that time

16:15 <karolherbst> but I also don't want to go the clover route and have a shadow buffer on the CPU, which we would have to manage then

16:15 <karolherbst> and lazy init the content later

16:16 <karolherbst> so I was thinking if we might want to have a helper pipe_context on the Device objects for this kind of stuff

16:16 gpiccoli_ is now known as gpiccoli

16:17 frieder has quit [Remote host closed the connection]

16:18 <karolherbst> good thing is, the API expects us to hard sync so if we always flush and wait on the fence of this helper pipe_context we should be fine

16:23 frieder has joined #dri-devel

16:26 JohnnyonFlame has joined #dri-devel

16:29 <karolherbst> anyway.. supporting COPY_HOST_PTR is quite important as a lot of tests rely on it

16:29 <karolherbst> so kind of have to find a good solution for that problem soonish

16:32 <jekstrand> karolherbst: Ugh...

16:32 <jekstrand> karolherbst: Why don't we create a pipe_context at device init time?

16:35 mclasen has quit []

16:35 mclasen has joined #dri-devel

16:42 frieder has quit [Remote host closed the connection]

16:45 <karolherbst> jekstrand: because we have to submit jobs from each queue, so a context is usually tied to that

16:45 shankaru has quit [Quit: Leaving.]

16:45 <karolherbst> maybe we could share a context for all queues? but not sure if that's how it's supposed to work

16:51 <jekstrand> I think we want a hidden queue for resource uploads

16:51 <karolherbst> yeah... that was my first thought as well

16:51 <karolherbst> shouldn't be too hard to do actually

16:51 <jekstrand> And then we can have an event for initialized resources for when the upload finishes and everything waits on that event. After 2ms, the wait is a no-op but oh, well.

16:51 <karolherbst> we have to do it blocking

16:52 <jekstrand> why?

16:52 <karolherbst> once createBuffer returns we should be done

16:52 <karolherbst> spec says so

16:52 <karolherbst> so the client can free the host_ptr

16:52 <jekstrand> Yes, but that only means we have to be done if we userptr it

16:52 <karolherbst> no

16:52 <jekstrand> If we copy into a temp buffer, there may still be a GPU copy in-flight.

16:53 nchery has joined #dri-devel

16:53 Duke`` has joined #dri-devel

16:53 <karolherbst> although not sure what we should do if the buffer is really still in use.. mhhh

16:54 <karolherbst> annoying :(

16:54 <karolherbst> ehh wait

16:54 <karolherbst> jekstrand: that's for _create_ buffer

16:54 <karolherbst> it's not in use

16:55 <jekstrand> karolherbst: For Buffers, we probably want to always map and memcpy

16:55 <karolherbst> CL_MEM_USE_HOST_PTR is already implemented via create_buffer_from_user, but needs a fallback if the driver can't fullfill that request. What I am talking about is just createBuffer with CL_MEM_COPY_HOST_PTR

16:55 <jekstrand> For images, though, we need to copy into a temporary and then blit on the GPU[

16:55 mclasen has quit []

16:55 <karolherbst> I use buffer_subdata for writes to a resource

16:56 mclasen has joined #dri-devel

16:56 <karolherbst> not sure if that's a bad idea or not

16:56 <karolherbst> but yeah.. nothing of that is tested for images

16:56 <karolherbst> or.. wait..

16:57 <karolherbst> yeah.. shouldn't

16:57 <jekstrand> buffer_subdata should work if the driver doesn't get too confused about all the contexts we have floating around. :)

16:57 <karolherbst> :D

16:57 <karolherbst> yeah..

16:57 <karolherbst> we can always change that later though

16:57 <jekstrand> sure

16:58 <karolherbst> I am more concerned about USE_HOST_PTR fallbacks and how ugly it could make the code

16:58 <karolherbst> for COPY_HOST_PTR having a helper context and block on all operations _should_ be good enough

16:58 <karolherbst> we just need to be careful to not use it for random stuff

16:58 <karolherbst> like initing a buffer with a helper context shouldn't mess up tings

16:58 <karolherbst> *things

17:00 <karolherbst> I think iris fails create_buffer_from_user for quite a lot of reasons, and I am not sure if the approach should be to improve create_buffer_from_user or to fallback and use a shadow buffer for that case :

17:00 <karolherbst> /

17:00 <jekstrand> I can look at that tomorrow

17:01 mclasen has quit []

17:01 mclasen has joined #dri-devel

17:04 ybogdano has joined #dri-devel

17:05 almos has quit [Quit: Page closed]

17:09 rgallaispou1 has joined #dri-devel

17:12 rgallaispou has quit [Ping timeout: 480 seconds]

17:31 mclasen has quit []

17:31 mclasen has joined #dri-devel

17:39 jkrzyszt has quit [Ping timeout: 480 seconds]

17:46 mclasen has quit []

17:47 mclasen has joined #dri-devel

17:51 JohnnyonFlame has quit [Ping timeout: 480 seconds]

17:57 libv is now known as libv_

18:09 suma has joined #dri-devel

18:10 suma has quit []

18:12 libv_ is now known as libv

18:16 <karolherbst> jekstrand: yeah.. so helper context works quite fine

18:18 ybogdano has quit [Ping timeout: 480 seconds]

18:18 <karolherbst> uhh..... I think that just fixed tons of fails, nice

18:20 <karolherbst> mhh constant_source is still crashing in test_basic

18:22 <karolherbst> I think the biggest problem we need to solve is to move gallium to 64 bit pointers :( that's going to be a huge pita

18:31 <karolherbst> soo.. nir_gather_explicit_io_initializers :)

18:51 ppascher has quit [Ping timeout: 480 seconds]

18:51 crabbedhaloablut has joined #dri-devel

18:54 loki_val has quit [Ping timeout: 480 seconds]

18:56 ybogdano has joined #dri-devel

19:00 <alyssa> wasn't that already a problem with clover?

19:01 <karolherbst> alyssa: yeah, but it goes deeper than just the bindings

19:01 <karolherbst> a lot of args in the gallium API are 32 bit

19:01 <karolherbst> offsets and sizes and the likes

19:02 <karolherbst> drivers also kind of only place buffers in a 32 bit VM because of reasons

19:02 <karolherbst> like nouveau only allocates in a 32 bit address space

19:02 <karolherbst> the ISA supports both even

19:03 ppascher has joined #dri-devel

19:04 <karolherbst> jekstrand: there are already tests I pass fully like geometrics and relationals :)

19:05 <karolherbst> "Pass 1400 Fails 744 Crashes 34 Timeouts 0: 100%" :)

19:06 <airlied> karolherbst: advertising 3.0?

19:06 <karolherbst> airlied: on the platform 3.0, on the device 1.0

19:06 <karolherbst> I am not crazy enough for 3.0 yet :D

19:06 iive has joined #dri-devel

19:06 <karolherbst> but there are enough tests I fixed using some bits of the 3.0 API

19:07 <karolherbst> or 1.1/1.2

19:07 <karolherbst> and I implement according to 3.0 directly

19:07 <karolherbst> supporting spirv binaries shouldn't be difficult either, so...

19:08 lynxeye has quit []

19:08 <karolherbst> airlied: if you want I can run with 3.0 and see what happens :D

19:08 <airlied> just gives a better idea for what the pass/fail rate really is :-P

19:10 <karolherbst> it all crashes

19:10 <karolherbst> :D

19:10 <karolherbst> ahh not all

19:11 <karolherbst> if I am confident enough...

19:11 <karolherbst> but it really shouldn't be too much work

19:13 <karolherbst> ahh clCreateCommandQueueWithProperties.. yeah, I guess I could implement that already

19:13 <karolherbst> in a crappy way

19:19 <karolherbst> okay.. seems to work :D

19:23 <alyssa> karolherbst: That reminds me, if I could ignore overflow (64-bit addresses but buffers stay within a single 32-bit page) might save some ALU...

19:24 <karolherbst> alyssa: the reason drivers do 32 bit VMs in OpenGL

19:24 <karolherbst> but it needs to be supported by the UAPI

19:25 <karolherbst> we do have flips on memory instructions to tell if we hand in a 32 or 64 bit pointer

19:25 rasterman has quit [Ping timeout: 480 seconds]

19:25 <Lyude> Hm, I don't see any immediate issues with this but: is there any issue with DRM helpers computing atomic state in the atomic commit path, as long as said state is guaranteed to have no effect on whether the commit succeeds or fails? The helpers in question in this case are the DRM MST helpers, and I'm thinking the easiest way of keeping track of VC start slots would likely just to

19:25 <Lyude> be to update their info in the MST atomic state at commit time rather then try to figure it out during the atomic check phase

19:27 <Lyude> (also, no idea why the MST protocol seems to want implementors to keep track of starting slots for VCs, despite also mandating that branches re-allocate streams VC slots to remove unused space between streams. seems like it'd be better to leave that up to branch devices, but oh well)

19:28 gawin has joined #dri-devel

19:30 <Lyude> would still be good to know the answer to whether that's OK in atomic, but at the very least it seems like I might be able to avoid needing to do this

19:32 JohnnyonFlame has joined #dri-devel

19:33 <Lyude> danvet: ^ any idea about this btw? since you're usually the one I throw MST related ideas at

19:41 rasterman has joined #dri-devel

19:54 <danvet> Lyude, it gets tricky, but it's doable

19:55 <danvet> the thing to keep in mind is there's no locking for state structs

19:55 <danvet> so if you change them in commit, then your check needs to assume it's all in-flight garbage potentially

19:55 <danvet> and so your duplicate_state should probably clear it all out with extreme prejudice

19:55 <danvet> but aside from this we do have a bunch of fields in state structs which are for commit code

19:56 <danvet> like crtc_state->event

19:56 <Lyude> danvet: "no locking"? I had assumed we'd hold the lock for an atomic struct until we've finished the commit tail and performed the actual state swap

19:56 <jekstrand> karolherbst: \o/

19:56 <danvet> Lyude, drm_crtc->state is protected by the lock

19:57 <danvet> but not the struct itself

19:57 <danvet> the struct itself is largely protected by "you're not allowed to change it once other threads can see it"

19:57 krushia has joined #dri-devel

19:57 <danvet> nonblocking commit does _not_ hold the locks

19:58 <danvet> otherwise a nonblocking commit would hold up the next atomic check

19:58 <danvet> which for TEST_ONLY would hold up the compositor's render loop

19:58 <danvet> which is Not Good

19:59 <Lyude> suddenly seeing actual reasons why payloads had their own locks :(

19:59 <danvet> Lyude, also commits can happen in parallel

19:59 <danvet> if they touch different crtc

19:59 <danvet> but that shouldn't be an issue for mst

20:00 <danvet> ordering between commits is also funky, since it's multi-stage ordered

20:00 <danvet> flip_done vs hw_done vs cleanup_done

20:00 <danvet> so even on the same crtc you can have multiple commits in flight, but in different phases

20:00 <danvet> if that wouldn't work, then we'd run at half refresh rate on most hw

20:01 <danvet> which is why plane cleanup needs to have it's own locking (if it's not a no-op)

20:01 <Lyude> danvet: mhm, I would assume that can't happen here since we've got our own private state object, and I'm fairly sure turning mst displays on/off basically requires vcpi calculations which locks the mgr for the state

20:01 <danvet> but usually that's take care of by the buffer mgr layer like gem shmem or ttm

20:01 <danvet> Lyude, that = parallel commit

20:01 <danvet> or that = commit without holding the lock

20:02 <danvet> the locking design is the same for all modeset objects, including private ones

20:02 <danvet> nonblocking modesets especially is something that is very badly tested (igt maybe in some cases, but by far not everything)

20:03 <danvet> and historically it's blown up plenty of times because people forget that it exists

20:04 <Lyude> danvet: as in, I don't -think- we can do parallel modesets with CRTCs if they're on the same topology mgr, since the atomic check phase for each CRTC should require pulling in the mst private state object (which I assume does at least mean we'll wait for commits using the mst private state object to complete before starting our own commit, right?)

20:04 <danvet> lemma check

20:04 <danvet> wasn't the case in the past

20:05 <danvet> nope still not there yet

20:06 <Lyude> drm_dp_atomic_find_vcpi_slots and drm_dp_atomic_release_vcpi_slots are the functions I'm thinking of that would pull the mst mgr's state in for any modeset

20:06 <danvet> you can probably rely on the ordering due to the connector moving between crtc though

20:06 <danvet> but yeah private state objects which move between crtc are right now very dangerous

20:06 <danvet> I thought mripard was looking into addressing that

20:07 <danvet> but I guess the fix was vc4 specific

20:07 <danvet> essentially you need to add drm_crtc_commit * pointers and make sure you wait for them in all the right places

20:07 <danvet> but like I said, since mst state is attached to a specific connector maybe it's all taken care of already

20:07 * danvet not sure

20:08 <danvet> hm on 2nd thought the connector is the mst end point

20:08 <danvet> not the physical dp connector

20:08 <danvet> so probably busted?

20:09 <danvet> I guess queue up piles of nonblocking modeset commits where you change the crtc for an mst connector and watch the world burn

20:09 <Lyude> danvet: note it's not always a connector I don't think, iirc nouveau does one MST mgr for each encoder

20:10 <danvet> yeah it's broken I think

20:10 edrex[m] has joined #dri-devel

20:11 <danvet> maybe we need to lift the drm_crtc_commit tracking that mripard has done int drm_private_state

20:11 <Lyude> to be honest, I don't think that MST should be allowing parallel modesets in general. or at least not ones that change payloads, but probably all of them

20:12 <Lyude> either way though I could try sort of keeping around the payload table in the mst mgr, although I'm trying very hard to keep as little mst info as possible outside of the atomic state

20:13 <Lyude> mripard: btw ^ any take on all of this?

20:19 MajorBiscuit has quit [Ping timeout: 480 seconds]

20:20 <Lyude> danvet: also, what drivers actually use parallel modesetting right now?

20:22 <karolherbst> airlied: "Pass 922 Fails 1206 Crashes 47 Timeouts 3: 100%" for CL 3.0

20:22 <danvet> Lyude, all of them support it's

20:22 <karolherbst> c11_atomics, long, non_uniform_work_group, image stuff, printf, profiling, is what seems to be regressing

20:23 <karolherbst> and then random API bits

20:23 <danvet> it's part of atomic uapi :-)

20:23 <danvet> the thing is that not much userspace uses them, because there are some warts

20:23 <karolherbst> I could probably just turn on long support and it would already fix most of them

20:23 <danvet> but that just means it's largely attack surface :-/

20:24 <karolherbst> ehh.. add that long64 lowering

20:24 <karolherbst> maybe I just do it

20:25 <Lyude> danvet: to me this just kind of sounds like I need to make sure that multiple commits using the same mst topology mgr private state object can't happen in paralle

20:25 <danvet> Lyude, yeah

20:26 <danvet> and that kind of ordering is done with drm_crtc_commit in atomic helpers

20:26 <danvet> but the infra isn't really there yet

20:26 <karolherbst> soo.. load_constant_base_ptr

20:26 <danvet> since iirc mripard has spent a lot of time wrestling this for vc4 (iirc) might be best to team up

20:27 <danvet> especially since this is in mst helper code it's much harder to put the waits in the right places without some generic infra

20:27 <emersion> what is parallel modesetting?

20:29 <Lyude> danvet: this being said though, I wonder if this should be a blocker or not for trying to convert the rest of the MST stuff to atomic

20:29 rasterman has quit [Quit: Gettin' stinky!]

20:29 <mareko> Sachiel: it's r600, which AMD doesn't maintain anymore

20:29 <danvet> Lyude, it's maybe busted already

20:29 <danvet> or maybe it's a regression since old mst had it's own locking?

20:30 <danvet> if continued conversion regresses locking/correctness I don't think it's a good idea

20:30 <Lyude> yeah it still does - I was hoping to get rid of it because the payload management code is getting unwildly, and strongly want to discourage certain contributors from trying to add more bandaids to solve issues with it by adding more code that doesn't/shouldn't need to be there…

20:32 <Lyude> emersion: btw - parallel modesetting is the ability to perform modesetting on different CRTCs in parallel

20:32 <Lyude> so you don't need to wait for one CRTC to finish turning on before starting to turn on the next

20:32 <HdkR> oooo, how many display controllers get angry at that? :D

20:33 <Lyude> well -ideally- we would have it so that things that can't run in parallel don't

20:33 <Lyude> but from the sound of it we're not really there yet, which I didn't realize until just now

20:33 <HdkR> I figured :(

20:35 <Lyude> I'm -really- set on trying to get this mst stuff done asap though, so I very much hope there's some hack I could use in the mean time to just block atomic commits from operating in parallel when it comes to MST

20:35 <danvet> Lyude, embed drm_crtc_commit, chain them up

20:36 <danvet> except that needs some serious work in atomic helpers

20:36 <danvet> or patches in every driver using mst helpers

20:36 <danvet> I'm not sure a quick hack or some locks somewhere would help or work

20:37 gouchi has joined #dri-devel

20:39 <danvet> anyway time to ^Z here now, ttyt

20:42 <emersion> Lynne: hm what's the upside of doing this vs. multiple NONBLOCK reqs?

20:43 risks has joined #dri-devel

20:44 risks has left #dri-devel [#dri-devel]

20:45 risks has joined #dri-devel

20:45 risks has quit []

20:47 mclasen has quit []

20:47 danvet has quit [Ping timeout: 480 seconds]

20:47 mclasen has joined #dri-devel

20:48 ngcortes has joined #dri-devel

20:48 <Lynne> emersion: err, do you mean Lyude?

20:55 gawin has quit [Remote host closed the connection]

20:56 gawin has joined #dri-devel

21:01 <emersion> err, yes, sorry!

21:01 <emersion> Lyude: hm what's the upside of doing this vs. multiple NONBLOCK reqs?

21:01 <Lyude> emersion: I think they're the same thing?

21:01 <emersion> is it a good idea to do NONBLOCK modesets even?

21:01 <Lyude> I'm not entirely sure though, I haven't been on the non-kernel side of this anywhere near as much

21:01 <emersion> (assuming you're doing TEST_ONLY before-hand)

21:02 <karolherbst> jekstrand: I am currently thinking if we should add our internal kerenl args before running nir_lower_vars_to_explicit_types on nir_var_uniform, but not sure how tricky that can be, because it requires to do a lot of lowering earlier (like constant initializers, printf or image stuff)

21:02 <emersion> ok

21:03 <karolherbst> although maybe we get away by just running nir_remove_dead_variables quite late

21:04 <co1umbarius> is there a way to get a zeroed dmabuf?

21:04 <karolherbst> and run it for mem_constant once with a cb filtering against mem_constant with initializers

21:04 <karolherbst> we need the proper driver_location data out of the dir, so...

21:04 <karolherbst> *nir

21:06 <karolherbst> although maybe that's fine. we just need to keep the uniform vars around for long enough

21:08 rkanwal has quit [Ping timeout: 480 seconds]

21:14 kchibisov_ has quit [Read error: No route to host]

21:15 kchibisov_ has joined #dri-devel

21:15 Jonathan_Cavitt has joined #dri-devel

21:17 Haaninjo has quit [Quit: Ex-Chat]

21:21 danvet has joined #dri-devel

21:23 kchibisov_ has quit [Ping timeout: 480 seconds]

21:25 <jekstrand> karolherbst: idk. I think we want to rethink kernel arguments anyway so this may be a good opportunity to do that.

21:25 <jekstrand> karolherbst: In particular, I think we want to stick them in a UBO like st/mesa does and get rid of kernel inputs as a separate concept.

21:29 kchibisov_ has joined #dri-devel

21:31 <karolherbst> jekstrand: sure, that's what I might just do, shouldn't be difficult. But that's quite unrelated to the question of what args are there and where we have to put those into the buffer (whatever that may be)

21:31 kchibisov_ has quit []

21:32 kchibisov_ has joined #dri-devel

21:33 <karolherbst> jekstrand: the only problem is just, that we do have a concept of uniforms which drivers might handle different than const buffers. Like for nouveau they just start at index 1, not 0

21:34 eukara_ has joined #dri-devel

21:34 eukara has quit [Remote host closed the connection]

21:34 ZeZu has quit [Quit: off to see the wizard]

21:35 kchibisov has quit []

21:35 ZeZu has joined #dri-devel

21:36 <karolherbst> although I am not quite sure on how all of that works out.. maybe st/mesa is doing it like this?

21:37 kchibisov_ has left #dri-devel [#dri-devel]

21:38 <karolherbst> anyway.. that's a different topic

21:38 <jekstrand> karolherbst: I'm pretty sure st/mesa puts all your GL uniforms in cbuf0

21:39 <jekstrand> And then UBOs are cbuf1+

21:39 <zmike> this is correct

21:39 <karolherbst> yeah, I think so as well

21:39 <karolherbst> I just wasn't sure

21:39 kchibisov has joined #dri-devel

21:39 <karolherbst> okay, so this input thing needs to go anyway as this clashes with set_constant_buffer

21:39 <karolherbst> (and would allow drivers to drop some code)

21:40 <jekstrand> Yup

21:40 <karolherbst> we just need to manager another buffer, but oh well

21:40 <karolherbst> *manage

21:40 <jekstrand> It does mean that iris needs to grow support for pushing cbuf0 for compute shaders but I can make that happen.

21:40 <jekstrand> Or bother Kayden into doing it. :P

21:40 <karolherbst> jekstrand: yeah.. I was wondering about how to make use of push constants

21:41 <karolherbst> but _maybe_ we can change gallium and make cbuf0 "special" in a sense, that it can have a different size

21:41 <karolherbst> and drivers could report something smaller

21:41 <jekstrand> idk

21:41 <karolherbst> yeah.. me neither, just a random thought

21:41 <jekstrand> for iris, we just push some subset of it and pull the rest based on a heuristic

21:41 <Kayden> going to be a pain

21:42 <karolherbst> ahh

21:42 <jekstrand> Kayden: Starting with gfx12.5+ (which, of course, I don't have), we should be able to say "if (num_uniforms == 0) push cbuf0"

21:42 <jekstrand> Kayden: Trying to do it together with sysvals isn't going to be possible, though.

21:42 <karolherbst> jekstrand: anyway.. we can keep using input as long as drivers aren't ready for using cbufs, but my question was more towards what passes we can run in nir without messing up uniforms

21:42 <zmike> karolherbst: lavapipe puts push constants into ubo0 currently

21:43 <Kayden> and sysvals contains the thread IDs, unless that's changed on 125

21:43 <jekstrand> Kayden: That changes on 125

21:43 <Kayden> okay, cool

21:43 <jekstrand> Kayden: Thread id is a magic reg now. It's SOOOOOO nice

21:43 <jekstrand> COMPUTE_WALKER finally made the Intel compute interfaces not totally suck.

21:43 <Kayden> (where has this been all these years)

21:43 <jekstrand> They still kinda suck but they don't totally suck. :)

21:44 <jekstrand> Actually... never mind. 12.5 also got rid of push constants for compute.

21:44 <jekstrand> So we should just do nothing and iris will suck ever so slightly because it has to pull a few things. Meh.

21:44 <karolherbst> my initial idea was, that we have this list of "spirv" args we enhance with whatever info we get from nir (location and size) and then push private args on top and mark them as "internal" so clients can't set them

21:44 <karolherbst> insert those into the nir befor assigning locations and be done with it

21:45 <karolherbst> although theoretically we could also dce those uniforms... but I think CL has a strict req on where everything lives for.... stupid? reasons

21:45 <zmike> I wouldn't be opposed to adding a push constant block interface to gallium

21:45 <karolherbst> although not quite sure

21:45 <karolherbst> because the CTS can't check if we keep dead uniforms

21:45 <karolherbst> because how would the CTS even know

21:45 <zmike> piglit does

21:46 <jekstrand> karolherbst: Right...

21:46 <karolherbst> zmike: how are they dead if you use them inside the shader?

21:46 <karolherbst> and how can you make sure they are still there without using them?

21:46 <jekstrand> karolherbst: It's also not hard once we have our internal representation of inputs to append stuff and bump the cbuf0 size at the same time.

21:46 <zmike> no, it does getuniformlocation in gl

21:46 <zmike> and if they've been eliminated then that call errors

21:46 <karolherbst> zmike: sure, but that doesn't say anything about placing them into some buffer

21:47 <karolherbst> that they can be set via the API is of course something we need to support

21:47 <zmike> well I didn't have that part of the context :)

21:47 <karolherbst> but they won't have to end up in any buffer

21:47 <karolherbst> jekstrand: thing is.. I don't want to append stuff after nir assigned locations

21:48 <karolherbst> I just want nir to do all the offset+size magic

21:48 <karolherbst> and just use that info

21:48 <jekstrand> Sure. Just keeping the option open. Not claiming it's a great option. :)

21:48 <karolherbst> what I don't know is how strict the spec is on arg locations, but I think the only thing the CTS does is checking for alignment

21:49 <karolherbst> which I don't even know why it cares about it, but here we are

21:49 <dcbaker> jekstrand, karolherbst: the meson structured_sources landed and will be in meson 0.62-rc2 (due out tomorrow), in case that's of interested with bindgen

21:50 danvet has quit [Ping timeout: 480 seconds]

21:52 <jekstrand> karolherbst: Is the client even aware of arg locations?

21:52 <jekstrand> I guess they are a bit because there's a limit on the number of inputs specified in bytes

21:52 <karolherbst> jekstrand: what I totally don't like about how things are is that the order of stuff is.... important. So we assume that whatever the CL APIs thinks is at index 2, we have to make sure that it stays at 2 all the way down

21:53 <karolherbst> so the trick we did was, we just make sure to never DCE until after we were able to extract size and offset information

21:53 <jekstrand> Right

21:53 <karolherbst> _although_ maybe location stays the same even if we DCE?

21:53 <jekstrand> Yeah, DCE doesn't change locations

21:54 <karolherbst> mhhhh

21:54 <karolherbst> maybe we should just make this a supported thing

21:54 <karolherbst> we just deal with nir uniforms being DCEed

21:54 <karolherbst> and write our code around that

21:54 <jekstrand> Nothing in NIR automatically changes locations. There are a few very specific passes that do in a controlled way, but you won't run any of them by accident.

21:54 <jekstrand> Yeah, that seems reasonable. As long as CL doesn't care about positions and only argument index, that should be fine.

21:54 <karolherbst> so if we don't have an uniform at loc 2, we just don't fill the input buffer with that

21:55 <jekstrand> Get a location out of spirv_to_nir and base everything on that

21:55 <karolherbst> yep

21:55 <karolherbst> yeah.. I think that should make stuff a lot simplier

21:55 <karolherbst> then I just add two new arg types: DCEed/Gone/whatever and Internal

21:56 <karolherbst> Internal doesn't have spirv info attached and the former one just doesn't get copied into the input buffer

21:56 <karolherbst> or well

21:56 <karolherbst> doesn't have nir infos attached

21:57 <jekstrand> Sure

21:57 <jekstrand> I believe this is what Option is for :D

21:57 <karolherbst> :D

21:57 <jekstrand> Or you can make your own tri-state enum

21:57 <karolherbst> a bit pointless on primitives though

21:57 <karolherbst> Option still reserves the full space

21:58 <karolherbst> well.. sometimes even more

21:58 <karolherbst> like if you only got primitives

21:58 Jonathan_Cavitt has quit []

21:58 <karolherbst> execpt a single pointer

21:59 mhenning has joined #dri-devel

21:59 <karolherbst> I do have a SPIRVKernelArg struct though which is a bit bigger, so I was already thinking about wrapping that with Option for internal args

21:59 <karolherbst> anyway... now I just need to find that test checking kernel args

22:01 <karolherbst> kernel_memory_alignment_*

22:01 <karolherbst> which I only pass for global and private

22:01 <karolherbst> but private are kernel inputs afiak

22:01 <karolherbst> yep

22:04 <karolherbst> jekstrand: okay.... yeah, seems like the OpenCL C spec really doesn't specify anything about location

22:05 mvlad has quit [Remote host closed the connection]

22:10 <mareko> jekstrand: cbuf0 is already special

22:12 <jekstrand> karolherbst: Cool. So we can throw the argument index in location and DCE all we want.

22:12 <karolherbst> yeah

22:12 <jekstrand> karolherbst: And maybe have a convention like "internal arguments start at 1024" or something.

22:13 <karolherbst> jekstrand: why?

22:14 <karolherbst> don't see why that would be needed

22:20 <jekstrand> If it's helpful

22:20 <jekstrand> In GL, we have lots of named locations for specific things.

22:20 <jekstrand> Like VARYING_SLOT_PSIZ or whatever

22:21 <zmike> gasp

22:21 <zmike> the forbidden varying

22:21 <jekstrand> :P

22:21 gouchi has quit [Remote host closed the connection]

22:24 <alyssa> zmike: psiz

22:24 <icecream95> * zmike dies

22:24 Duke`` has quit [Ping timeout: 480 seconds]

22:25 <zmike> no, no, I've put that one behind me

22:25 <zmike> it can no longer harm me

22:29 <karolherbst> jekstrand: ehhh.. you wouldn't believe why we can't DCE those things early :(

22:29 <jekstrand> uh oh...

22:29 <karolherbst> apparently we have to validate things.. like if the size args to setKernelArg matches

22:30 <karolherbst> sooo.... maybe we should just skip validating? dunno :D

22:30 <karolherbst> but maybe we can do two passes

22:30 <karolherbst> one two just fetch sizes

22:31 <karolherbst> and a later one to calc offsets

22:31 mhenning has quit [Quit: mhenning]

22:31 <karolherbst> mhh but that already becomes annoying

22:32 mhenning has joined #dri-devel

22:32 <jekstrand> We could build a list of kernel args visible to the client early, before DCE.

22:32 <jekstrand> Then let the compiler do whatever it wants to do and some of those args may never get assigned an offset

22:32 <karolherbst> yeah...

22:33 * karolherbst reads the spec

22:34 <karolherbst> I like how GL actually allows uniforms to become DCEed and stuff

22:36 ybogdano has quit [Ping timeout: 480 seconds]

22:45 <robclark> karolherbst, jekstrand: btw andrey-konovalov was asking about __constant ptrs in #freedreno .. and why they aren't lowered to UBO (which could be lowered to push consts and be *much* faster).. I guess there are some edge cases where that would be hard, but plenty of low hanging fruit there.. which would presumably benefit other drivers too.. was there a reason clover does that, or just no one got around to typing patches?

22:45 <robclark> *doesn't do that

22:46 <karolherbst> robclark: 1. generic pointers 2. SVM 3. CL in general

22:46 <karolherbst> we could optimize them to UBOs, but that's an optimization generally

22:46 <robclark> all things that are edge cases ;-)

22:47 <robclark> I assume we'd not completely remote load_global_const, just lower the "easy" ones

22:47 <karolherbst> I think it would be fine to optimize them to UBOs, but for that the kernel has to tell us it's actually safe

22:47 <robclark> hmm, isn't __constant enough for that?

22:47 <karolherbst> nope

22:47 ybogdano has joined #dri-devel

22:48 <karolherbst> in a CL 1.2 world maybe, but even then

22:48 <karolherbst> you can have like in kernel constants and pass around addresses

22:48 <jekstrand> Yeah, optimizing to UBOs is something we probably want to do eventually.

22:49 <jekstrand> If we can somehow statically determine the size of the access, it should be possible.

22:49 <jekstrand> Won't help Intel because of stupid push restrictions for compute but it could help freedreno/panfrost.

22:49 <karolherbst> nvidia has some checks when to do it as well afaik

22:49 <jekstrand> But it can only ever be an optimization

22:49 <robclark> it seems to be how blob cl has half as many `ldg` and is twice as fast (at some particular kernel that someone cares about)

22:49 <karolherbst> but generic pointers really make all of that super annoying

22:50 <jekstrand> karolherbst: Meh. We get rid of most of the generics during optimzation and lower to UBOs very late.

22:50 <jekstrand> s/lower/optimize

22:50 <karolherbst> robclark: yeah.. I think for the normal CL 1.2 and nothing fancy situation we could use UBOs

22:50 <robclark> yeah, I was thinking we'd never completely remove the load_global_const.. just hopefully optimize away enough of them in cases that people care about

22:51 <jekstrand> ANV already does something like that where we punt variable pointers off to global but "optimze" to a bound SSBO when we can.

22:51 <jekstrand> It's not hard

22:51 <jekstrand> It's a bit harder in CL because you have to figure out buffer bounds.

22:52 <karolherbst> jekstrand: I think optimizing a complete buffer to an UBO might be good enough for now

22:52 <robclark> yeah, we already have range analysis to decide what parts of UBO to lower to push consts

22:52 <karolherbst> constant buffers are limited in size and OOB is just undefined

22:52 <robclark> (in ir3)

22:53 <karolherbst> so if anybody wants to write that optimization they could just do it

22:53 <jekstrand> karolherbst: That's a good point. We can just make the buffer view the max object size all the time.

22:53 <karolherbst> jekstrand: why?

22:54 <karolherbst> I thought const buffer OOB is safe by definition

22:54 <karolherbst> or do driver have to make sure of that?

22:54 <jekstrand> karolherbst: The question is if we need to do shader analysis to figure out the access bounds.

22:54 <jekstrand> If we can OOB is undefined, we can just make the buffer binding the size of the whole buffer and not care about bounds.

22:55 <karolherbst> I know that on nv hw you can specify OOB behavior on cbufs afaik

22:55 <karolherbst> and just let it return 0 or whatever

22:56 <karolherbst> imirkin: or do I missremember something here? I thought we kind of have this level of protection inside hw, no?

22:56 <karolherbst> I am not sure if all hw is seeing it that way

23:02 <robclark> so, OoB access is kinda easier with UBO than ldg.. at least for us, the hw will clamp UBO access but not global

23:03 <jekstrand> The problem with UBO in CL isn't adding clamps. It's making sure your UBO is big enough that you don't get unintended clamps.

23:04 <karolherbst> jekstrand: why would that happen?

23:04 <karolherbst> constant buffers are not infinite in size

23:04 <karolherbst> the runtime reports back a max size per constant buffer and the applicatin has to make sure to not access outside of it

23:05 rabbitz has joined #dri-devel

23:05 <karolherbst> and you can report your UBO size as the max size for constant buffers

23:05 <karolherbst> (which I think clover already does)

23:06 <jekstrand> Yeah, that would work.

23:07 <karolherbst> nice.. kernel args packing implemented :)

23:07 <jekstrand> \o/

23:08 <karolherbst> it was quite easy even

23:08 <karolherbst> https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/c4b6940bc6c376c01117b26f97cd339d9d831d9b

23:11 <karolherbst> now imagine doing the same in clover

23:11 * karolherbst hides

23:13 <karolherbst> now let's see if that broke something

23:15 <karolherbst> at least my desktop runs the CTS tests like 3 times faster than my laptop

23:17 rabbitz has quit [autokilled: This host violated network policy. Mail support@oftc.net if you think this is in error. (2022-03-14 23:17:26)]

23:17 lemonzest has quit [Quit: WeeChat 3.4]

23:22 pnowack has quit [Quit: pnowack]

23:47 mclasen has quit []

23:47 mclasen has joined #dri-devel

23:54 kchibisov has quit [Quit: Huh]

23:56 kchibisov has joined #dri-devel