itoral_ has quit [Read error: Connection reset by peer]
lynxeye has joined #dri-devel
mvlad has joined #dri-devel
tursulin has joined #dri-devel
kj has quit [Quit: Page closed]
kj has joined #dri-devel
<tomeu>
zmike: anholt: for testing stability of a MR, I just schedule a pipeline hourly (or so) after hacking the .gitlab-ci.yml to leave only the required tests, and have them start automatically
<tomeu>
I do have scripts for monitoring a bunch of specific issues in LAVA jobs, but I guess these won't be useful here
<tomeu>
you could write something similar via the gitlab API though and parse the logs
rkanwal has joined #dri-devel
rasterman has joined #dri-devel
thellstrom1 has joined #dri-devel
thellstrom has quit [Read error: Connection reset by peer]
rabbitz has joined #dri-devel
thellstrom1 has quit [Ping timeout: 480 seconds]
rabbitz has quit [autokilled: This host violated network policy and has been banned. Mail support@oftc.net if you think this is in error. (2022-03-14 10:08:48)]
Danct12 has joined #dri-devel
sdutt has quit [Read error: Connection reset by peer]
mclasen has joined #dri-devel
adjtm has quit [Quit: Leaving]
adjtm has joined #dri-devel
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
pcercuei has joined #dri-devel
rkanwal has quit [Ping timeout: 482 seconds]
kts has joined #dri-devel
mclasen has quit []
mclasen has joined #dri-devel
kts has quit []
rkanwal has joined #dri-devel
<karolherbst>
jekstrand: ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh I figured it out
<karolherbst>
event.write_checked(cl_event::from_arc(e.clone())) if event is null, we create a dangling arc nobody ever retains :(
<karolherbst>
and I am sure we will hit this in different palces
<karolherbst>
I think we need a new API specific to the pointers "event.leak_ref(e)" and implement the proper thing there
pcercuei has quit [Remote host closed the connection]
pcercuei has joined #dri-devel
shankaru1 has quit []
shankaru has joined #dri-devel
almos has joined #dri-devel
almos has quit []
simon-perretta-img has quit [Quit: Leaving]
simon-perretta-img has joined #dri-devel
Surkow|laptop has quit [Quit: 418 I'm a teapot - NOP NOP NOP]
simon-perretta-img has quit [Quit: Leaving]
simon-perretta-img has joined #dri-devel
simon-perretta-img has quit []
simon-perretta-img has joined #dri-devel
simon-perretta-img has quit []
almos has joined #dri-devel
simon-perretta-img has joined #dri-devel
agd5f has joined #dri-devel
pcercuei has quit [Ping timeout: 480 seconds]
pcercuei has joined #dri-devel
Danct12 has quit [Quit: Quit]
macromorgan has quit [Quit: Leaving]
JohnnyonFlame has joined #dri-devel
<jekstrand>
karolherbst: I'm going to need more context on that. Or maybe I just need to read the week-end scrollback.
<karolherbst>
jekstrand: we leaked a pointer which the client never asked for
<karolherbst>
"rusticl: implement clFinish and clFlush" is the first one
<karolherbst>
and I think the first 5 are safe, the others are more like to get something working
<karolherbst>
the cts runner also has a weirdo bug where some processes are just... "lost" no idea what those are all about
<karolherbst>
I suspect faulty timeout handling
<jekstrand>
karolherbst: Cool. I think I'll be back to CL hacking tomorrow. Today is going to be 90% meetings. :-/
<karolherbst>
for those commits I am more or less interested in what you think about some of the changes. I think the nir related stuff is quite "fine" but maybe we can make it easier to call passes with a macro
<karolherbst>
the last CTS run with my script took like 50 minutes on llvmpipe :( I think I have to use my desktop from now on :D
fxkamd has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
<karolherbst>
okay.. pushed a fixed runner :)
adjtm is now known as Guest2106
adjtm has joined #dri-devel
<karolherbst>
jekstrand: ohh and we have to stop using unsigned in structs when we actually mean enums :D
<jekstrand>
karolherbst: oh?
<karolherbst>
that makes generating bindings and using those fields it in rust much easier
<jekstrand>
Oh, right.
<jekstrand>
NIR only does that a couple places and only for bitfields.
<karolherbst>
bindgen doesn't generate stuff for everything, so everything not pulled in by patterns has to be added explicitly
<jekstrand>
Because if we don't, it messes up MSVC.
<karolherbst>
ehhh...
<karolherbst>
so "nir_variable_mode mode:15;" won't work?
<jekstrand>
not with MSVC
<jekstrand>
it sucks
<karolherbst>
I stop recognizing MSVC as a C compiler
<karolherbst>
:D
Haaninjo has quit [Read error: Connection reset by peer]
<karolherbst>
mhh...
<karolherbst>
annoying
jewins has joined #dri-devel
<alyssa>
remind me why we build with MSVC
<karolherbst>
because some people prefer MSVC over a working compiler
<jekstrand>
TBH, though, I don't think most of those bitfields in NIR are that useful. Most shaders, with a bit of opimization, don't have many variables so if the size bloats a bit fot the sake of compatibility, it's probably ok.
Haaninjo has joined #dri-devel
<jekstrand>
alyssa: VMWare does. Microsoft stuff, too.
<karolherbst>
jekstrand: I suspect shader caching as one reason
<karolherbst>
jekstrand: I mean.. we can deal with it inside rust, it's just... more annoying
macromorgan has joined #dri-devel
Guest2106 has quit [Ping timeout: 480 seconds]
<karolherbst>
jekstrand: ohh.. and meson doesn't support adding include paths for generated headers, so there is another hack with paths in meson.build
<karolherbst>
"include_directories('../../../../build/src/compiler/nir/')," maybe it works for you, maybe not
<karolherbst>
:D
<jekstrand>
karolherbst: I use _build
<karolherbst>
ahh
<karolherbst>
also.. I wrote a rust Iterator for exec_lists.....
<karolherbst>
surprised how much more pleasent that is to look at than our C macros for the same thing :D
<karolherbst>
not sure if we need it besides iterating vars.. but let's see
<karolherbst>
also.. I expect nearly nothing of that new stuff to work on an actual GPU
<karolherbst>
I don't do fencing at all atm, so this might be something to look at
<jekstrand>
sure
<jekstrand>
I'm happy to work on some of that.
<jekstrand>
We just need to get to "we can run some kernels" so we have a base to expand from.
<jekstrand>
So I'll focus on reviewing that stuff first
<karolherbst>
yeah
<karolherbst>
cool
sdutt has joined #dri-devel
Danct12 has joined #dri-devel
<almos>
hi, I have a question: amdgpu used to be able to show a table of the dpm frequencies, but now I can't find it anywhere, was that feature removed?
<tzimmermann>
mlankhorst_, do you have the time to send out drm-misc-next-fixes? there are two patches since february
karolherbst_ has joined #dri-devel
karolherbst has quit [Read error: Connection reset by peer]
karolherbst_ is now known as karolherbst
<danvet>
tzimmermann, looks all reasonable?
<tzimmermann>
danvet, to me :)
<danvet>
I mean I could throw a bikeshed onto it about how to shuffle stuff around or whether we should reuse fbdev code or not
<danvet>
but that' seems silly
<tzimmermann>
it's ok if you don't have comment. i just didn't want to go ahead without considering your feedback.
<danvet>
it's annoying that we can't do a full generic vma mkwrite wrapper, but oh well
<danvet>
also I didn't review whether the code works :-)
<jekstrand>
karolherbst: As far as what passes to run, I can help with that. I think my intel_clc.c has a decent set. One thing we want to do with rusticl is to integrate better w/ gallium drivers here and not lower absolutely everything up-front.
mclasen has quit []
mclasen has joined #dri-devel
<jekstrand>
karolherbst: Yeah, we'll need the global binding API or something like it. For images, however, I'm hoping we can start looking like a "normal" gallium driver.
ifreund has quit [Remote host closed the connection]
ifreund has joined #dri-devel
loki_val has joined #dri-devel
crabbedhaloablut has quit [Ping timeout: 480 seconds]
JohnnyonFlame has quit [Ping timeout: 480 seconds]
<karolherbst>
don't get distracted by the low number, that stuff ran for 1.5 hours
<karolherbst>
(the full CTS even needs more)
<karolherbst>
jekstrand: there is one annoying thing btw, and I think I have a deadlock somewhere once the client multithreads stuff. CL_TEST_SINGLE_THREADED=1 helps as a workaround
gpiccoli_ has joined #dri-devel
gpiccoli has quit [Read error: Connection reset by peer]
<karolherbst>
I think I will wire up constant memory support up next and play around using const buffers and the likes, but not quite sure yet how all of that will fit into place
<karolherbst>
but I kind of fear that we have to treat constant as global :(
<karolherbst>
another thing I was wondering about is to use a constant buffer instead of that input stuff
<karolherbst>
oh well
<jekstrand>
karolherbst: But it's rust code! That shouldn't happen!
<jekstrand>
:(
<jekstrand>
Maybe we're too unsafe? (-:
<karolherbst>
not at all
<karolherbst>
I use unsafe less often than rust devs
aravind has quit [Ping timeout: 480 seconds]
<jekstrand>
lol
<karolherbst>
but I do a lot of hand waving
<karolherbst>
because 1. mesa code is safe, obviously 2. I just rely on sane clients as everything else is UB to begin with
<karolherbst>
but yeah.. we have to make sure we use our own code correctly
<karolherbst>
ehhh host ptr...
<karolherbst>
jekstrand: soo.. there is one issue I don't have a good solution for yet. I create the proper buffer objects when the client creates a buffer. Thing is.. there is stuff like CL_MEM_COPY_HOST_PTR where we should init the content with the provided data. Problem is, buffer_map and buffer_subdata require a pipe_context which we don't have at that time
<karolherbst>
but I also don't want to go the clover route and have a shadow buffer on the CPU, which we would have to manage then
<karolherbst>
and lazy init the content later
<karolherbst>
so I was thinking if we might want to have a helper pipe_context on the Device objects for this kind of stuff
gpiccoli_ is now known as gpiccoli
frieder has quit [Remote host closed the connection]
<karolherbst>
good thing is, the API expects us to hard sync so if we always flush and wait on the fence of this helper pipe_context we should be fine
frieder has joined #dri-devel
JohnnyonFlame has joined #dri-devel
<karolherbst>
anyway.. supporting COPY_HOST_PTR is quite important as a lot of tests rely on it
<karolherbst>
so kind of have to find a good solution for that problem soonish
<jekstrand>
karolherbst: Ugh...
<jekstrand>
karolherbst: Why don't we create a pipe_context at device init time?
mclasen has quit []
mclasen has joined #dri-devel
frieder has quit [Remote host closed the connection]
<karolherbst>
jekstrand: because we have to submit jobs from each queue, so a context is usually tied to that
shankaru has quit [Quit: Leaving.]
<karolherbst>
maybe we could share a context for all queues? but not sure if that's how it's supposed to work
<jekstrand>
I think we want a hidden queue for resource uploads
<karolherbst>
yeah... that was my first thought as well
<karolherbst>
shouldn't be too hard to do actually
<jekstrand>
And then we can have an event for initialized resources for when the upload finishes and everything waits on that event. After 2ms, the wait is a no-op but oh, well.
<karolherbst>
we have to do it blocking
<jekstrand>
why?
<karolherbst>
once createBuffer returns we should be done
<karolherbst>
spec says so
<karolherbst>
so the client can free the host_ptr
<jekstrand>
Yes, but that only means we have to be done if we userptr it
<karolherbst>
no
<jekstrand>
If we copy into a temp buffer, there may still be a GPU copy in-flight.
nchery has joined #dri-devel
Duke`` has joined #dri-devel
<karolherbst>
although not sure what we should do if the buffer is really still in use.. mhhh
<karolherbst>
annoying :(
<karolherbst>
ehh wait
<karolherbst>
jekstrand: that's for _create_ buffer
<karolherbst>
it's not in use
<jekstrand>
karolherbst: For Buffers, we probably want to always map and memcpy
<karolherbst>
CL_MEM_USE_HOST_PTR is already implemented via create_buffer_from_user, but needs a fallback if the driver can't fullfill that request. What I am talking about is just createBuffer with CL_MEM_COPY_HOST_PTR
<jekstrand>
For images, though, we need to copy into a temporary and then blit on the GPU[
mclasen has quit []
<karolherbst>
I use buffer_subdata for writes to a resource
mclasen has joined #dri-devel
<karolherbst>
not sure if that's a bad idea or not
<karolherbst>
but yeah.. nothing of that is tested for images
<karolherbst>
or.. wait..
<karolherbst>
yeah.. shouldn't
<jekstrand>
buffer_subdata should work if the driver doesn't get too confused about all the contexts we have floating around. :)
<karolherbst>
:D
<karolherbst>
yeah..
<karolherbst>
we can always change that later though
<jekstrand>
sure
<karolherbst>
I am more concerned about USE_HOST_PTR fallbacks and how ugly it could make the code
<karolherbst>
for COPY_HOST_PTR having a helper context and block on all operations _should_ be good enough
<karolherbst>
we just need to be careful to not use it for random stuff
<karolherbst>
like initing a buffer with a helper context shouldn't mess up tings
<karolherbst>
*things
<karolherbst>
I think iris fails create_buffer_from_user for quite a lot of reasons, and I am not sure if the approach should be to improve create_buffer_from_user or to fallback and use a shadow buffer for that case :
<karolherbst>
/
<jekstrand>
I can look at that tomorrow
mclasen has quit []
mclasen has joined #dri-devel
ybogdano has joined #dri-devel
almos has quit [Quit: Page closed]
rgallaispou1 has joined #dri-devel
rgallaispou has quit [Ping timeout: 480 seconds]
mclasen has quit []
mclasen has joined #dri-devel
jkrzyszt has quit [Ping timeout: 480 seconds]
mclasen has quit []
mclasen has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
libv is now known as libv_
suma has joined #dri-devel
suma has quit []
libv_ is now known as libv
<karolherbst>
jekstrand: yeah.. so helper context works quite fine
ybogdano has quit [Ping timeout: 480 seconds]
<karolherbst>
uhh..... I think that just fixed tons of fails, nice
<karolherbst>
mhh constant_source is still crashing in test_basic
<karolherbst>
I think the biggest problem we need to solve is to move gallium to 64 bit pointers :( that's going to be a huge pita
<karolherbst>
airlied: on the platform 3.0, on the device 1.0
<karolherbst>
I am not crazy enough for 3.0 yet :D
iive has joined #dri-devel
<karolherbst>
but there are enough tests I fixed using some bits of the 3.0 API
<karolherbst>
or 1.1/1.2
<karolherbst>
and I implement according to 3.0 directly
<karolherbst>
supporting spirv binaries shouldn't be difficult either, so...
lynxeye has quit []
<karolherbst>
airlied: if you want I can run with 3.0 and see what happens :D
<airlied>
just gives a better idea for what the pass/fail rate really is :-P
<karolherbst>
it all crashes
<karolherbst>
:D
<karolherbst>
ahh not all
<karolherbst>
if I am confident enough...
<karolherbst>
but it really shouldn't be too much work
<karolherbst>
ahh clCreateCommandQueueWithProperties.. yeah, I guess I could implement that already
<karolherbst>
in a crappy way
<karolherbst>
okay.. seems to work :D
<alyssa>
karolherbst: That reminds me, if I could ignore overflow (64-bit addresses but buffers stay within a single 32-bit page) might save some ALU...
<karolherbst>
alyssa: the reason drivers do 32 bit VMs in OpenGL
<karolherbst>
but it needs to be supported by the UAPI
<karolherbst>
we do have flips on memory instructions to tell if we hand in a 32 or 64 bit pointer
rasterman has quit [Ping timeout: 480 seconds]
<Lyude>
Hm, I don't see any immediate issues with this but: is there any issue with DRM helpers computing atomic state in the atomic commit path, as long as said state is guaranteed to have no effect on whether the commit succeeds or fails? The helpers in question in this case are the DRM MST helpers, and I'm thinking the easiest way of keeping track of VC start slots would likely just to
<Lyude>
be to update their info in the MST atomic state at commit time rather then try to figure it out during the atomic check phase
<Lyude>
(also, no idea why the MST protocol seems to want implementors to keep track of starting slots for VCs, despite also mandating that branches re-allocate streams VC slots to remove unused space between streams. seems like it'd be better to leave that up to branch devices, but oh well)
gawin has joined #dri-devel
<Lyude>
would still be good to know the answer to whether that's OK in atomic, but at the very least it seems like I might be able to avoid needing to do this
JohnnyonFlame has joined #dri-devel
<Lyude>
danvet: ^ any idea about this btw? since you're usually the one I throw MST related ideas at
rasterman has joined #dri-devel
<danvet>
Lyude, it gets tricky, but it's doable
<danvet>
the thing to keep in mind is there's no locking for state structs
<danvet>
so if you change them in commit, then your check needs to assume it's all in-flight garbage potentially
<danvet>
and so your duplicate_state should probably clear it all out with extreme prejudice
<danvet>
but aside from this we do have a bunch of fields in state structs which are for commit code
<danvet>
like crtc_state->event
<Lyude>
danvet: "no locking"? I had assumed we'd hold the lock for an atomic struct until we've finished the commit tail and performed the actual state swap
<jekstrand>
karolherbst: \o/
<danvet>
Lyude, drm_crtc->state is protected by the lock
<danvet>
but not the struct itself
<danvet>
the struct itself is largely protected by "you're not allowed to change it once other threads can see it"
krushia has joined #dri-devel
<danvet>
nonblocking commit does _not_ hold the locks
<danvet>
otherwise a nonblocking commit would hold up the next atomic check
<danvet>
which for TEST_ONLY would hold up the compositor's render loop
<danvet>
which is Not Good
<Lyude>
suddenly seeing actual reasons why payloads had their own locks :(
<danvet>
Lyude, also commits can happen in parallel
<danvet>
if they touch different crtc
<danvet>
but that shouldn't be an issue for mst
<danvet>
ordering between commits is also funky, since it's multi-stage ordered
<danvet>
flip_done vs hw_done vs cleanup_done
<danvet>
so even on the same crtc you can have multiple commits in flight, but in different phases
<danvet>
if that wouldn't work, then we'd run at half refresh rate on most hw
<danvet>
which is why plane cleanup needs to have it's own locking (if it's not a no-op)
<Lyude>
danvet: mhm, I would assume that can't happen here since we've got our own private state object, and I'm fairly sure turning mst displays on/off basically requires vcpi calculations which locks the mgr for the state
<danvet>
but usually that's take care of by the buffer mgr layer like gem shmem or ttm
<danvet>
Lyude, that = parallel commit
<danvet>
or that = commit without holding the lock
<danvet>
the locking design is the same for all modeset objects, including private ones
<danvet>
nonblocking modesets especially is something that is very badly tested (igt maybe in some cases, but by far not everything)
<danvet>
and historically it's blown up plenty of times because people forget that it exists
<Lyude>
danvet: as in, I don't -think- we can do parallel modesets with CRTCs if they're on the same topology mgr, since the atomic check phase for each CRTC should require pulling in the mst private state object (which I assume does at least mean we'll wait for commits using the mst private state object to complete before starting our own commit, right?)
<danvet>
lemma check
<danvet>
wasn't the case in the past
<danvet>
nope still not there yet
<Lyude>
drm_dp_atomic_find_vcpi_slots and drm_dp_atomic_release_vcpi_slots are the functions I'm thinking of that would pull the mst mgr's state in for any modeset
<danvet>
you can probably rely on the ordering due to the connector moving between crtc though
<danvet>
but yeah private state objects which move between crtc are right now very dangerous
<danvet>
I thought mripard was looking into addressing that
<danvet>
but I guess the fix was vc4 specific
<danvet>
essentially you need to add drm_crtc_commit * pointers and make sure you wait for them in all the right places
<danvet>
but like I said, since mst state is attached to a specific connector maybe it's all taken care of already
* danvet
not sure
<danvet>
hm on 2nd thought the connector is the mst end point
<danvet>
not the physical dp connector
<danvet>
so probably busted?
<danvet>
I guess queue up piles of nonblocking modeset commits where you change the crtc for an mst connector and watch the world burn
<Lyude>
danvet: note it's not always a connector I don't think, iirc nouveau does one MST mgr for each encoder
<danvet>
yeah it's broken I think
edrex[m] has joined #dri-devel
<danvet>
maybe we need to lift the drm_crtc_commit tracking that mripard has done int drm_private_state
<Lyude>
to be honest, I don't think that MST should be allowing parallel modesets in general. or at least not ones that change payloads, but probably all of them
<Lyude>
either way though I could try sort of keeping around the payload table in the mst mgr, although I'm trying very hard to keep as little mst info as possible outside of the atomic state
<Lyude>
mripard: btw ^ any take on all of this?
MajorBiscuit has quit [Ping timeout: 480 seconds]
<Lyude>
danvet: also, what drivers actually use parallel modesetting right now?
<karolherbst>
c11_atomics, long, non_uniform_work_group, image stuff, printf, profiling, is what seems to be regressing
<karolherbst>
and then random API bits
<danvet>
it's part of atomic uapi :-)
<danvet>
the thing is that not much userspace uses them, because there are some warts
<karolherbst>
I could probably just turn on long support and it would already fix most of them
<danvet>
but that just means it's largely attack surface :-/
<karolherbst>
ehh.. add that long64 lowering
<karolherbst>
maybe I just do it
<Lyude>
danvet: to me this just kind of sounds like I need to make sure that multiple commits using the same mst topology mgr private state object can't happen in paralle
<danvet>
Lyude, yeah
<danvet>
and that kind of ordering is done with drm_crtc_commit in atomic helpers
<danvet>
but the infra isn't really there yet
<karolherbst>
soo.. load_constant_base_ptr
<danvet>
since iirc mripard has spent a lot of time wrestling this for vc4 (iirc) might be best to team up
<danvet>
especially since this is in mst helper code it's much harder to put the waits in the right places without some generic infra
<emersion>
what is parallel modesetting?
<Lyude>
danvet: this being said though, I wonder if this should be a blocker or not for trying to convert the rest of the MST stuff to atomic
rasterman has quit [Quit: Gettin' stinky!]
<mareko>
Sachiel: it's r600, which AMD doesn't maintain anymore
<danvet>
Lyude, it's maybe busted already
<danvet>
or maybe it's a regression since old mst had it's own locking?
<danvet>
if continued conversion regresses locking/correctness I don't think it's a good idea
<Lyude>
yeah it still does - I was hoping to get rid of it because the payload management code is getting unwildly, and strongly want to discourage certain contributors from trying to add more bandaids to solve issues with it by adding more code that doesn't/shouldn't need to be there…
<Lyude>
emersion: btw - parallel modesetting is the ability to perform modesetting on different CRTCs in parallel
<Lyude>
so you don't need to wait for one CRTC to finish turning on before starting to turn on the next
<HdkR>
oooo, how many display controllers get angry at that? :D
<Lyude>
well -ideally- we would have it so that things that can't run in parallel don't
<Lyude>
but from the sound of it we're not really there yet, which I didn't realize until just now
<HdkR>
I figured :(
<Lyude>
I'm -really- set on trying to get this mst stuff done asap though, so I very much hope there's some hack I could use in the mean time to just block atomic commits from operating in parallel when it comes to MST
<danvet>
Lyude, embed drm_crtc_commit, chain them up
<danvet>
except that needs some serious work in atomic helpers
<danvet>
or patches in every driver using mst helpers
<danvet>
I'm not sure a quick hack or some locks somewhere would help or work
gouchi has joined #dri-devel
<danvet>
anyway time to ^Z here now, ttyt
<emersion>
Lynne: hm what's the upside of doing this vs. multiple NONBLOCK reqs?
risks has joined #dri-devel
risks has left #dri-devel [#dri-devel]
risks has joined #dri-devel
risks has quit []
mclasen has quit []
danvet has quit [Ping timeout: 480 seconds]
mclasen has joined #dri-devel
ngcortes has joined #dri-devel
<Lynne>
emersion: err, do you mean Lyude?
gawin has quit [Remote host closed the connection]
gawin has joined #dri-devel
<emersion>
err, yes, sorry!
<emersion>
Lyude: hm what's the upside of doing this vs. multiple NONBLOCK reqs?
<Lyude>
emersion: I think they're the same thing?
<emersion>
is it a good idea to do NONBLOCK modesets even?
<Lyude>
I'm not entirely sure though, I haven't been on the non-kernel side of this anywhere near as much
<karolherbst>
jekstrand: I am currently thinking if we should add our internal kerenl args before running nir_lower_vars_to_explicit_types on nir_var_uniform, but not sure how tricky that can be, because it requires to do a lot of lowering earlier (like constant initializers, printf or image stuff)
<emersion>
ok
<karolherbst>
although maybe we get away by just running nir_remove_dead_variables quite late
<co1umbarius>
is there a way to get a zeroed dmabuf?
<karolherbst>
and run it for mem_constant once with a cb filtering against mem_constant with initializers
<karolherbst>
we need the proper driver_location data out of the dir, so...
<karolherbst>
*nir
<karolherbst>
although maybe that's fine. we just need to keep the uniform vars around for long enough
rkanwal has quit [Ping timeout: 480 seconds]
kchibisov_ has quit [Read error: No route to host]
kchibisov_ has joined #dri-devel
Jonathan_Cavitt has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
danvet has joined #dri-devel
kchibisov_ has quit [Ping timeout: 480 seconds]
<jekstrand>
karolherbst: idk. I think we want to rethink kernel arguments anyway so this may be a good opportunity to do that.
<jekstrand>
karolherbst: In particular, I think we want to stick them in a UBO like st/mesa does and get rid of kernel inputs as a separate concept.
kchibisov_ has joined #dri-devel
<karolherbst>
jekstrand: sure, that's what I might just do, shouldn't be difficult. But that's quite unrelated to the question of what args are there and where we have to put those into the buffer (whatever that may be)
kchibisov_ has quit []
kchibisov_ has joined #dri-devel
<karolherbst>
jekstrand: the only problem is just, that we do have a concept of uniforms which drivers might handle different than const buffers. Like for nouveau they just start at index 1, not 0
eukara_ has joined #dri-devel
eukara has quit [Remote host closed the connection]
ZeZu has quit [Quit: off to see the wizard]
kchibisov has quit []
ZeZu has joined #dri-devel
<karolherbst>
although I am not quite sure on how all of that works out.. maybe st/mesa is doing it like this?
kchibisov_ has left #dri-devel [#dri-devel]
<karolherbst>
anyway.. that's a different topic
<jekstrand>
karolherbst: I'm pretty sure st/mesa puts all your GL uniforms in cbuf0
<jekstrand>
And then UBOs are cbuf1+
<zmike>
this is correct
<karolherbst>
yeah, I think so as well
<karolherbst>
I just wasn't sure
kchibisov has joined #dri-devel
<karolherbst>
okay, so this input thing needs to go anyway as this clashes with set_constant_buffer
<karolherbst>
(and would allow drivers to drop some code)
<jekstrand>
Yup
<karolherbst>
we just need to manager another buffer, but oh well
<karolherbst>
*manage
<jekstrand>
It does mean that iris needs to grow support for pushing cbuf0 for compute shaders but I can make that happen.
<jekstrand>
Or bother Kayden into doing it. :P
<karolherbst>
jekstrand: yeah.. I was wondering about how to make use of push constants
<karolherbst>
but _maybe_ we can change gallium and make cbuf0 "special" in a sense, that it can have a different size
<karolherbst>
and drivers could report something smaller
<jekstrand>
idk
<karolherbst>
yeah.. me neither, just a random thought
<jekstrand>
for iris, we just push some subset of it and pull the rest based on a heuristic
<Kayden>
going to be a pain
<karolherbst>
ahh
<jekstrand>
Kayden: Starting with gfx12.5+ (which, of course, I don't have), we should be able to say "if (num_uniforms == 0) push cbuf0"
<jekstrand>
Kayden: Trying to do it together with sysvals isn't going to be possible, though.
<karolherbst>
jekstrand: anyway.. we can keep using input as long as drivers aren't ready for using cbufs, but my question was more towards what passes we can run in nir without messing up uniforms
<zmike>
karolherbst: lavapipe puts push constants into ubo0 currently
<Kayden>
and sysvals contains the thread IDs, unless that's changed on 125
<jekstrand>
Kayden: That changes on 125
<Kayden>
okay, cool
<jekstrand>
Kayden: Thread id is a magic reg now. It's SOOOOOO nice
<jekstrand>
COMPUTE_WALKER finally made the Intel compute interfaces not totally suck.
<Kayden>
(where has this been all these years)
<jekstrand>
They still kinda suck but they don't totally suck. :)
<jekstrand>
Actually... never mind. 12.5 also got rid of push constants for compute.
<jekstrand>
So we should just do nothing and iris will suck ever so slightly because it has to pull a few things. Meh.
<karolherbst>
my initial idea was, that we have this list of "spirv" args we enhance with whatever info we get from nir (location and size) and then push private args on top and mark them as "internal" so clients can't set them
<karolherbst>
insert those into the nir befor assigning locations and be done with it
<karolherbst>
although theoretically we could also dce those uniforms... but I think CL has a strict req on where everything lives for.... stupid? reasons
<zmike>
I wouldn't be opposed to adding a push constant block interface to gallium
<karolherbst>
although not quite sure
<karolherbst>
because the CTS can't check if we keep dead uniforms
<karolherbst>
because how would the CTS even know
<zmike>
piglit does
<jekstrand>
karolherbst: Right...
<karolherbst>
zmike: how are they dead if you use them inside the shader?
<karolherbst>
and how can you make sure they are still there without using them?
<jekstrand>
karolherbst: It's also not hard once we have our internal representation of inputs to append stuff and bump the cbuf0 size at the same time.
<zmike>
no, it does getuniformlocation in gl
<zmike>
and if they've been eliminated then that call errors
<karolherbst>
zmike: sure, but that doesn't say anything about placing them into some buffer
<karolherbst>
that they can be set via the API is of course something we need to support
<zmike>
well I didn't have that part of the context :)
<karolherbst>
but they won't have to end up in any buffer
<karolherbst>
jekstrand: thing is.. I don't want to append stuff after nir assigned locations
<karolherbst>
I just want nir to do all the offset+size magic
<karolherbst>
and just use that info
<jekstrand>
Sure. Just keeping the option open. Not claiming it's a great option. :)
<karolherbst>
what I don't know is how strict the spec is on arg locations, but I think the only thing the CTS does is checking for alignment
<karolherbst>
which I don't even know why it cares about it, but here we are
<dcbaker>
jekstrand, karolherbst: the meson structured_sources landed and will be in meson 0.62-rc2 (due out tomorrow), in case that's of interested with bindgen
danvet has quit [Ping timeout: 480 seconds]
<jekstrand>
karolherbst: Is the client even aware of arg locations?
<jekstrand>
I guess they are a bit because there's a limit on the number of inputs specified in bytes
<karolherbst>
jekstrand: what I totally don't like about how things are is that the order of stuff is.... important. So we assume that whatever the CL APIs thinks is at index 2, we have to make sure that it stays at 2 all the way down
<karolherbst>
so the trick we did was, we just make sure to never DCE until after we were able to extract size and offset information
<jekstrand>
Right
<karolherbst>
_although_ maybe location stays the same even if we DCE?
<jekstrand>
Yeah, DCE doesn't change locations
<karolherbst>
mhhhh
<karolherbst>
maybe we should just make this a supported thing
<karolherbst>
we just deal with nir uniforms being DCEed
<karolherbst>
and write our code around that
<jekstrand>
Nothing in NIR automatically changes locations. There are a few very specific passes that do in a controlled way, but you won't run any of them by accident.
<jekstrand>
Yeah, that seems reasonable. As long as CL doesn't care about positions and only argument index, that should be fine.
<karolherbst>
so if we don't have an uniform at loc 2, we just don't fill the input buffer with that
<jekstrand>
Get a location out of spirv_to_nir and base everything on that
<karolherbst>
yep
<karolherbst>
yeah.. I think that should make stuff a lot simplier
<karolherbst>
then I just add two new arg types: DCEed/Gone/whatever and Internal
<karolherbst>
Internal doesn't have spirv info attached and the former one just doesn't get copied into the input buffer
<karolherbst>
or well
<karolherbst>
doesn't have nir infos attached
<jekstrand>
Sure
<jekstrand>
I believe this is what Option is for :D
<karolherbst>
:D
<jekstrand>
Or you can make your own tri-state enum
<karolherbst>
a bit pointless on primitives though
<karolherbst>
Option still reserves the full space
<karolherbst>
well.. sometimes even more
<karolherbst>
like if you only got primitives
Jonathan_Cavitt has quit []
<karolherbst>
execpt a single pointer
mhenning has joined #dri-devel
<karolherbst>
I do have a SPIRVKernelArg struct though which is a bit bigger, so I was already thinking about wrapping that with Option for internal args
<karolherbst>
anyway... now I just need to find that test checking kernel args
<karolherbst>
kernel_memory_alignment_*
<karolherbst>
which I only pass for global and private
<karolherbst>
but private are kernel inputs afiak
<karolherbst>
yep
<karolherbst>
jekstrand: okay.... yeah, seems like the OpenCL C spec really doesn't specify anything about location
mvlad has quit [Remote host closed the connection]
<mareko>
jekstrand: cbuf0 is already special
<jekstrand>
karolherbst: Cool. So we can throw the argument index in location and DCE all we want.
<karolherbst>
yeah
<jekstrand>
karolherbst: And maybe have a convention like "internal arguments start at 1024" or something.
<karolherbst>
jekstrand: why?
<karolherbst>
don't see why that would be needed
<jekstrand>
If it's helpful
<jekstrand>
In GL, we have lots of named locations for specific things.
<jekstrand>
Like VARYING_SLOT_PSIZ or whatever
<zmike>
gasp
<zmike>
the forbidden varying
<jekstrand>
:P
gouchi has quit [Remote host closed the connection]
<alyssa>
zmike: psiz
<icecream95>
* zmike dies
Duke`` has quit [Ping timeout: 480 seconds]
<zmike>
no, no, I've put that one behind me
<zmike>
it can no longer harm me
<karolherbst>
jekstrand: ehhh.. you wouldn't believe why we can't DCE those things early :(
<jekstrand>
uh oh...
<karolherbst>
apparently we have to validate things.. like if the size args to setKernelArg matches
<karolherbst>
sooo.... maybe we should just skip validating? dunno :D
<karolherbst>
but maybe we can do two passes
<karolherbst>
one two just fetch sizes
<karolherbst>
and a later one to calc offsets
mhenning has quit [Quit: mhenning]
<karolherbst>
mhh but that already becomes annoying
mhenning has joined #dri-devel
<jekstrand>
We could build a list of kernel args visible to the client early, before DCE.
<jekstrand>
Then let the compiler do whatever it wants to do and some of those args may never get assigned an offset
<karolherbst>
yeah...
* karolherbst
reads the spec
<karolherbst>
I like how GL actually allows uniforms to become DCEed and stuff
ybogdano has quit [Ping timeout: 480 seconds]
<robclark>
karolherbst, jekstrand: btw andrey-konovalov was asking about __constant ptrs in #freedreno .. and why they aren't lowered to UBO (which could be lowered to push consts and be *much* faster).. I guess there are some edge cases where that would be hard, but plenty of low hanging fruit there.. which would presumably benefit other drivers too.. was there a reason clover does that, or just no one got around to typing patches?
<robclark>
*doesn't do that
<karolherbst>
robclark: 1. generic pointers 2. SVM 3. CL in general
<karolherbst>
we could optimize them to UBOs, but that's an optimization generally
<robclark>
all things that are edge cases ;-)
<robclark>
I assume we'd not completely remote load_global_const, just lower the "easy" ones
<karolherbst>
I think it would be fine to optimize them to UBOs, but for that the kernel has to tell us it's actually safe
<robclark>
hmm, isn't __constant enough for that?
<karolherbst>
nope
ybogdano has joined #dri-devel
<karolherbst>
in a CL 1.2 world maybe, but even then
<karolherbst>
you can have like in kernel constants and pass around addresses
<jekstrand>
Yeah, optimizing to UBOs is something we probably want to do eventually.
<jekstrand>
If we can somehow statically determine the size of the access, it should be possible.
<jekstrand>
Won't help Intel because of stupid push restrictions for compute but it could help freedreno/panfrost.
<karolherbst>
nvidia has some checks when to do it as well afaik
<jekstrand>
But it can only ever be an optimization
<robclark>
it seems to be how blob cl has half as many `ldg` and is twice as fast (at some particular kernel that someone cares about)
<karolherbst>
but generic pointers really make all of that super annoying
<jekstrand>
karolherbst: Meh. We get rid of most of the generics during optimzation and lower to UBOs very late.
<jekstrand>
s/lower/optimize
<karolherbst>
robclark: yeah.. I think for the normal CL 1.2 and nothing fancy situation we could use UBOs
<robclark>
yeah, I was thinking we'd never completely remove the load_global_const.. just hopefully optimize away enough of them in cases that people care about
<jekstrand>
ANV already does something like that where we punt variable pointers off to global but "optimze" to a bound SSBO when we can.
<jekstrand>
It's not hard
<jekstrand>
It's a bit harder in CL because you have to figure out buffer bounds.
<karolherbst>
jekstrand: I think optimizing a complete buffer to an UBO might be good enough for now
<robclark>
yeah, we already have range analysis to decide what parts of UBO to lower to push consts
<karolherbst>
constant buffers are limited in size and OOB is just undefined
<robclark>
(in ir3)
<karolherbst>
so if anybody wants to write that optimization they could just do it
<jekstrand>
karolherbst: That's a good point. We can just make the buffer view the max object size all the time.
<karolherbst>
jekstrand: why?
<karolherbst>
I thought const buffer OOB is safe by definition
<karolherbst>
or do driver have to make sure of that?
<jekstrand>
karolherbst: The question is if we need to do shader analysis to figure out the access bounds.
<jekstrand>
If we can OOB is undefined, we can just make the buffer binding the size of the whole buffer and not care about bounds.
<karolherbst>
I know that on nv hw you can specify OOB behavior on cbufs afaik
<karolherbst>
and just let it return 0 or whatever
<karolherbst>
imirkin: or do I missremember something here? I thought we kind of have this level of protection inside hw, no?
<karolherbst>
I am not sure if all hw is seeing it that way
<robclark>
so, OoB access is kinda easier with UBO than ldg.. at least for us, the hw will clamp UBO access but not global
<jekstrand>
The problem with UBO in CL isn't adding clamps. It's making sure your UBO is big enough that you don't get unintended clamps.
<karolherbst>
jekstrand: why would that happen?
<karolherbst>
constant buffers are not infinite in size
<karolherbst>
the runtime reports back a max size per constant buffer and the applicatin has to make sure to not access outside of it
rabbitz has joined #dri-devel
<karolherbst>
and you can report your UBO size as the max size for constant buffers