<airlied>
JoshuaAshton: hehe, not just broken primitive id passing between tess/geom/frag, I kinda ignored it before, now I can't avoid it
uzi_ has quit [Read error: Connection reset by peer]
uzi has joined #dri-devel
<JoshuaAshton>
ah
bcarvalho_ has joined #dri-devel
bcarvalho has quit [Read error: Connection reset by peer]
pekkari has joined #dri-devel
uzi has quit [Remote host closed the connection]
uzi has joined #dri-devel
blue__penquin has quit []
aravind has joined #dri-devel
mlankhorst has joined #dri-devel
aiddamse has quit [Ping timeout: 480 seconds]
evadot has joined #dri-devel
lynxeye has joined #dri-devel
turol has joined #dri-devel
pcercuei has joined #dri-devel
dolphin has joined #dri-devel
SanchayanM has joined #dri-devel
dolphin has left #dri-devel [#dri-devel]
arnd has quit [Quit: Updating details, brb]
arnd has joined #dri-devel
dolphin has joined #dri-devel
pcercuei has quit [Quit: brb]
pcercuei has joined #dri-devel
pcercuei has quit []
xantoz has joined #dri-devel
pcercuei has joined #dri-devel
Surkow|laptop has joined #dri-devel
pcercuei has quit []
pcercuei has joined #dri-devel
bl4ckb0ne has quit [Remote host closed the connection]
emersion has quit [Remote host closed the connection]
bl4ckb0ne has joined #dri-devel
emersion has joined #dri-devel
<danvet>
mripard, do you have time perhaps for "[PATCH 05/11] drm/atomic-helper: make drm_gem_plane_helper_prepare_fb the default" and later patches in that series?
frieder has joined #dri-devel
SanchayanM has quit []
SanchayanM has joined #dri-devel
ceyusa has joined #dri-devel
pcercuei has quit [Quit: brb]
ceyusa has quit [Remote host closed the connection]
pcercuei has joined #dri-devel
ceyusa has joined #dri-devel
rossy has joined #dri-devel
<mripard>
danvet: acked-by for all of them
<danvet>
mripard, did you double-check I didn't butcher logic too much, like what noralf spotted?
<karolherbst>
dschuermann: I guess this is fine, but I can do a test run later
<dschuermann>
if you prefer to, you can also create a bug report later :P
adjtm_ has joined #dri-devel
<karolherbst>
dschuermann: well.. we are using nir on volta+
adjtm has quit [Remote host closed the connection]
<pq>
in "shared fence", does "shared" refer to letting multiple actors continue simultaneously or cross-device fences?
<pq>
or something else?
xp4ns3 has joined #dri-devel
<danvet>
pq, multiple concurrent actors
<danvet>
might or might not be cross device
<pq>
thanks, so it does mean I thought it means
<daniels>
shared == read, excl == write
<daniels>
so excl synchronises against everything before it and blocks everything after it, whereas shared only synchronises against any prior excl
adjtm_ is now known as adjtm
<emersion>
is producer/consumer a good enough approximation?
SolarAquarion has joined #dri-devel
adjtm has quit [Quit: Leaving]
adjtm has joined #dri-devel
<karolherbst>
dschuermann: best case you get the result in a few hours
<dschuermann>
karolherbst: thx!
alatiera5 has left #dri-devel [#dri-devel]
alatiera has joined #dri-devel
<daniels>
emersion: yeah, that works too
heat_ has joined #dri-devel
heat has quit [Read error: Connection reset by peer]
heat_ has quit []
heat has joined #dri-devel
robher has joined #dri-devel
tango_ has quit [Quit: I'm never quite so stupid as when I'm being smart (Linus van Pel)]
tango_ has joined #dri-devel
pmoreau[m] is now known as Guest8
heat_ has joined #dri-devel
heat_ has quit []
heat_ has joined #dri-devel
heat_ has quit [Remote host closed the connection]
heat_ has joined #dri-devel
xp4ns3 has quit [Quit: Konversation terminated!]
xp4ns3 has joined #dri-devel
adjtm is now known as Guest17
adjtm has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
Guest17 has quit [Remote host closed the connection]
heat_ has quit []
heat has joined #dri-devel
vivijim has quit []
vivijim has joined #dri-devel
<alyssa>
what's the semantic for load_barycentric_pixel?
karolherbst has quit [Read error: Connection reset by peer]
karolherbst has joined #dri-devel
<jekstrand>
It loads the barycentric coordinates for the center of the pixel
<karolherbst>
dschuermann: ehh.. I just wanted to tell that I didn't find any regressions, but it failed to build and install :D
<jekstrand>
As opposed to a particular sample or a particular offset from center
<karolherbst>
llvmpipe build fails
<jekstrand>
alyssa: ^^
<alyssa>
jekstrand: Ah..
<alyssa>
bifrost has a "load at center" mode and a "load at sample position if per-sample shading, else load at center" mode
<alyssa>
and a particular optimization requires the latter mode is used
<karolherbst>
alyssa: uhh.. that remindes me that our instructions are also a bit annoying here. So you can specify a mode on the load instruction, but can overwrite it through metadata as well :/
<jekstrand>
On older hardware, we go out of our way to look for interpolate+load_barycentric combinations and handle them as a fused op. On ICL+, we handle them as two things.
<jekstrand>
For at_sample without MSAA, we have a pass which smashes to center
<alyssa>
All of our hw is interpolate+load_barycentric fused :o
<alyssa>
Mali tries really hard to avoid shader keys. It's kinda cute.
<alyssa>
Just as the hw influences the sw, the sw (LLVM) influences the hw ;p
<karolherbst>
alyssa: yeah.. it's also kind of fused for us, but also kind of not
<jekstrand>
It's not actually fused for us. The barycentrics come in with the payload and we interpolate. It's just that we have PLN pre-ICL and it's a tricky instruction with some "special" semantics. It's easier to just fuse.
Chaekyung has quit [Remote host closed the connection]
bcarvalho_ has joined #dri-devel
bcarvalho has quit [Read error: Connection reset by peer]
<imirkin>
karolherbst: it's *pretty* fused. as opposed to e.g. amd where you can just get the i/j coords
<karolherbst>
yeah
<karolherbst>
sure, it's more fused than unfused
<karolherbst>
some details are just a little annoying
<imirkin>
like the offsets being in some weird fixed point format? :)
<karolherbst>
yep
FireBurn has quit [Ping timeout: 480 seconds]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
<dschuermann>
karolherbst: the build should be fixed now, said pendingchaos (can I register somewhere as IRC bot? :D )
<karolherbst>
yeah, it's fine
<karolherbst>
I am seeing a few regressions though...
<karolherbst>
I will take a look and comment on the MR
<jekstrand>
It's unfused on turing, though, isn't it?
<imirkin>
jekstrand: what makes you say that?
<imirkin>
i don't really see that, but perhaps it's hidden
<imirkin>
or perhaps i don't understand what 'fused' is
<imirkin>
(or the most likely option -- both!)
<karolherbst>
it's less fused than previous gens
<karolherbst>
imirkin: check the interp lowering for gv100
<jekstrand>
imirkin: They added an NV extension for explicit barycentrics
<imirkin>
karolherbst: yeah, for PINTERP, but it's not too different than usual
<imirkin>
jekstrand: ah, probably that stuff is accessible now
<jekstrand>
Which is even niftier than the AMD one
heat_ has joined #dri-devel
<karolherbst>
jekstrand: I think this is a bit optional though
<imirkin>
but it doesn't "have" to be used
<jekstrand>
sure
<jekstrand>
I could believe that
<imirkin>
in a couple gens they'll drop the old way
heat has quit [Remote host closed the connection]
<karolherbst>
probably
<imirkin>
or perhaps there's enough benefit in maintaining the "common" case in hw
<karolherbst>
imirkin: I guess because general compute becomes more and more important, they will probably use the space for something else :D
<karolherbst>
dschuermann: ahh, llvmpipe also regresses :)
FireBurn has quit [Read error: Connection reset by peer]
FireBurn has joined #dri-devel
<pendingchaos>
seems it's because the comparison isn't marked exact, nir_opt_algebraic assumes the operands are not NaN
<karolherbst>
probably
<karolherbst>
pendingchaos: I just hope most of your stats are not because of isnan being wrongly implemented now :p
<karolherbst>
*stat changes
<Venemo>
in NIR, do we have a good way to express that a divergent branch is always taken?
frieder has quit [Remote host closed the connection]
<Venemo>
if not, what would be the right way to approach that?
<karolherbst>
Venemo: we have a divergency analysis pass if that helps
<bnieuwenhuizen>
a branch that is always taken is not idvergent right?
<karolherbst>
but it's mainly to find uniform values
<karolherbst>
so if the condition is uniform your know which branch will be taken
<karolherbst>
sometimes
<Venemo>
not exactly
<karolherbst>
ehh wait
<pendingchaos>
Venemo: you mean that at least one invocation in the subgroup takes the branch?
<Venemo>
yes, that's what I mean
<karolherbst>
you know if the branch taken is uniformly taken
<karolherbst>
ahh
<Venemo>
I know I know about the divergence analysis, but that's not what I need now
<karolherbst>
yeah sorry.. I missunderstood what you asked for :)
<Venemo>
sometimes you can know that at least 1 invocation will be active in a block, always
<Venemo>
for example, with elect, or thread_id<N (where you can prove N!=0)
<karolherbst>
yeah.. I guess for that you probably have to write a pass analysing the conditions
<Venemo>
these are divergent
<Venemo>
would it be allright to add a boolean field to nir_if to let the backend know this?
<karolherbst>
we already added stuff for divergency, so I guess it would be :D
<Venemo>
this is not the same as divergency
<Venemo>
divergence*
<karolherbst>
I know
<karolherbst>
but if you are worried about using more spacve
<karolherbst>
the divergent bool is 1 byte and you can bitfield it
<Venemo>
I'm not worried :)
<karolherbst>
ehh wait
<karolherbst>
wrong struct
<karolherbst>
Venemo: okay.. :D
<Venemo>
currently we always emit a branching instruction for divergent branches. the reason I'm interested in this is in order not to emit that instruction when we know a branch is always taken by at least 1 invocation
<dschuermann>
Venemo: pendingchaos wrote something like that for the atomic optimization
<dschuermann>
that detects if some branch is taken by one invocation. it's not used for anything but to skip the optimization in this case ;)
<Venemo>
can you point me to where that is?
<Venemo>
is it nir_opt_uniform_atomics?
<pendingchaos>
yes, is_atomic_already_optimized()
<pendingchaos>
it can return true for branches which are taken by more than one invocation though
<Venemo>
that's not an issue
<pendingchaos>
(not likely in any realistic code though)
<Venemo>
I'd like true for branches that are taken by >= 1 invocations
<dschuermann>
we could probably generalize some analysis like that and flag cf_nodes which are always taken by at least one invocation
idr has joined #dri-devel
<dschuermann>
Venemo: I don't really see that giving an edge over the vskip heuristik, though
<pendingchaos>
not sure if it's possible for is_atomic_already_optimized() to return true for branches which are not taken by any invocations
<Venemo>
if the branch isn't take by any invocations, then it should be false yeah.
<danvet>
daniels, since you typed up the X11 and wayland version, how's Xwayland different?
<danvet>
for completeness
* danvet
enjoyed that read
ngcortes has quit [Remote host closed the connection]
lynxeye has quit []
xp4ns3 has quit [Quit: Konversation terminated!]
xp4ns3 has joined #dri-devel
<pendingchaos>
karolherbst: I've added "glsl,glsl/nir: emit exact comparisons for isnan() and isinf()" to the MR and the test now passes on radeonsi
<karolherbst>
pendingchaos: yeah, that fixes it for llvmpipe and nouveau as well
<danvet>
maybe include a bit more blabla on this one, since iirc the plan is that this is only temporary?
<danvet>
or am I confused
<daniels>
danvet: the common case for Xwayland is that it's just passthrough to the compositor, so if you were to type up DRI3.2 with UMF, then you pass the UMF through and let upstream compositor deal with it; if anyone ever does XGetImage, or if you need to do X11-internal composition (e.g. subwindow trees), you fall back to a dumb spin
<danvet>
daniels, so all the X11 fun you described just doesn't apply to Xwayland?
<danvet>
everyone gets their own private buffer and all that goes along with that?
<alyssa>
this might be the most cryptic commit I've seen in ages 😁
<ccr>
eh
<karolherbst>
yeah...
<karolherbst>
that's not how you do open source :p
<alyssa>
🍿
neonking has joined #dri-devel
<jekstrand>
uh....
<jekstrand>
Feel free to comment and ask for a better commit message
<alyssa>
jekstrand: I'm not sure I want one 😉
<daniels>
danvet: it's another boundary condition, really
<daniels>
danvet: if you just do the sensible straight-line thing of having a single non-parented window which you send DRI comment to, you get the lovely fast path
<jekstrand>
alyssa: It's probably just "Oops, we can't actually do that many threads"
<daniels>
danvet: if you want to get weird and do XGetImage to ask the X server to tell you what you just sent it because you're amnesiac, or you want to do X11 native rendering, or you have X11 subwindows, then you get to eat the pain of the X server blocking on you
<alyssa>
jekstrand: Yeah, I gathered that from 14013840143
<bnieuwenhuizen>
jekstrand: also fun if you have chicken bits in registers whose name basically consists of the workaround number :P
xp4ns3 has quit [Quit: Konversation terminated!]
<daniels>
jekstrand: ah yes, the classic off-by-twelve
xp4ns3 has joined #dri-devel
<mwk>
well they overestimated the capacity of their shader geometry by 12, obviously
<mwk>
that's what this commit is doing
<alyssa>
jekstrand: Basically I'm trying to understand the rules for fragment inputs that are neither centroid nor sample
uzi has quit [Ping timeout: 480 seconds]
<alyssa>
NIR is feeding them in as `pixel` but that seems to be a choice on our part, not a spec requirement? At least in ESSL?
<jekstrand>
Uh....
<imirkin>
alyssa: you're supposed to interpolate at the center
<imirkin>
except for various cases
<jekstrand>
^^
<jekstrand>
That's always the kicker, isn't it?
<imirkin>
like MSAA you're supposed to interpolate at sample, no matter what
<jekstrand>
Those "various cases" :P
<alyssa>
imirkin: various cases indeed
<imirkin>
except ... not no matter what?
<imirkin>
i forget
<imirkin>
i've long-ago paged out those rules
<alyssa>
bleh
<imirkin>
BUT
<imirkin>
as long as you're not trying to CHANGE the rules
<alyssa>
hw fast path is for "sample if per-sample shading, center otherwise"
<idr>
imirkin: MSAA is still pixel center. That's why centroid was invented... because the pixel center might not be covered by any of the samples.
<imirkin>
idr: right, i realized that as i remembered the extence of the 'sample' thing
<idr>
MSAA w/o per-sample shading, anyway.
<alyssa>
and I have no idea what that corresponds to in GLSL/NIR
<imirkin>
and yeah, i should have said per-sample shading
<imirkin>
idr: like what happens if you do per-sample shading but don't have any qualifiers? still center?
<imirkin>
i forget :)
<alyssa>
I think that's supposed to be sample
<idr>
The great thing about standards. ;)
<imirkin>
alyssa: i think it depends
<imirkin>
but mesa normalizes all that for you
<imirkin>
which is nice.
<alyssa>
usually, yes
<imirkin>
alyssa: for example, there's a rast->force_persample_interp
<imirkin>
which is a rasterizer-level setting which forces you to interpret at sample rather than at center
<alyssa>
now it's less nice because I need to unnormalize it for the opt >_>
<imirkin>
now you could go around complaining about its existence
<imirkin>
but the reality is that it exists :)
<imirkin>
(actually there's a PIPE_CAP for it ... if you don't support it, you just get shader recompiles)
<alyssa>
sure
<imirkin>
so basically just do the interp that the shader tells you
<imirkin>
and all will be well
<alyssa>
unfortunately that never corresponds to the fast path unless I add is_sample_shading to the key
<jekstrand>
So, I suspect that was invented for the "default" case
<imirkin>
a shader may be used per-sample
<imirkin>
and not-per-sample
<alyssa>
yep
<imirkin>
so if you want diff code for those cases
<imirkin>
then a shader key feels like the only way (or binary fixup)
<imirkin>
for nouveau, we do binary fixups :)
<imirkin>
slight change to the interp op iirc
<jekstrand>
If no one uses any sample qualifiers and there's no gl_SampleID, then you only one once per-fragment unless....
<alyssa>
arm why couldn't you just given me an extra sample bit
<alyssa>
jekstrand: unless set_min_samples(N>1)
<karolherbst>
alyssa: you know what we do? we flip bits in the compiled shader :D
<jekstrand>
someone smashes the shading rate thing (can't remember what it's called) in whic case, you run MSAA and everything's per-sample.
<karolherbst>
ahh imirkin already mentioned it
<imirkin>
=]
<alyssa>
jekstrand: right.. but then the NIR code is still interp_pixel
<imirkin>
karolherbst: almost as if i know something about that one...
<karolherbst>
:D
<karolherbst>
I had too much fun with that myself
<imirkin>
karolherbst: it's a good one.
<alyssa>
jekstrand: so if I follow the NIR, I have to do all interp at center. But would it also be ok to interp at the samples? I dunno!
<imirkin>
alyssa: you think that's annoying?
<imirkin>
try this -- gl_SampleMaskIn
<imirkin>
most hardware provides a coverage mask
<imirkin>
but you're supposed to give it a 1-bit mask when per-sample shading
<jekstrand>
alyssa: Do you need to use this fancy opcode or can you just shader key it?
<imirkin>
HOWEVER
<imirkin>
a single shader may be used both per-sample and per-pixel
<imirkin>
so ... yeah. fun times.
<karolherbst>
it was a mistake to make the shader depend on runtime settings :p
<imirkin>
even worse -- there's no hardware (on nvidia) way of determining which samples are actually going to be "used" for output if the sample rate is below the "msaa" rate
<imirkin>
so you have to force min samples == total samples if gl_SampleMaskIn is used
<jekstrand>
imirkin: It gets even better on Intel. On ICL, they "accidentally" anded the coverage mask with some stuff to make it even more useless. :D
<karolherbst>
isn't stuff like that actually fixed with vulkan?
<alyssa>
jekstrand: I can key it, I'm just hoping I can spec lawyer my way out of it
<karolherbst>
or are there still those implicit recompiles?
<imirkin>
jekstrand: lol
<imirkin>
jekstrand: accidentally on purpose? :)
<jekstrand>
I don't know.
<jekstrand>
It was some fallout from when they added VRS/CPS
<alyssa>
jekstrand: fwiw the fast path is not a fancy opcode, it's a mechanism to preload r0 with the results of a varying or texture prefetched lookup
<imirkin>
jekstrand: does intel hw allow running fragment shaders at e.g. 2x msaa even though the surface is 4x msaa?
<imirkin>
nvidia allows that
<imirkin>
(why? no clue. but it's there.)
<jekstrand>
imirkin: Not until they added VRS/CPS
<alyssa>
The idea being, if the entire shader is `gl_FragColor = texture2D(tex, v_TexCoord)`, this eliminateshalf of the instructions
<jekstrand>
AFAIK, Nvidia's the only ones that can do that
uzi has joined #dri-devel
buhman has joined #dri-devel
spstarr has joined #dri-devel
ngcortes has joined #dri-devel
<danvet>
daniels, I can't grok your latest mail
<danvet>
feels like you're mixing up a few too many things
<daniels>
which bit?
<danvet>
personally I think mixing up userspace memory fences with implicit sync is a very bad idea
<danvet>
mostly because I don't want to think through the options
<daniels>
I totally agree
<daniels>
but I'm not talking about userspace fences here :)
<daniels>
I'm talking about the import/export ioctl which is the subject of the actual patch
<danvet>
well your 1. is that
<danvet>
if you live in a dma-fence world, _every_ CS gets dma_fence, which we attach to _every_ buffer's dma_resv
<danvet>
that's how it works
<danvet>
if you want something else you very quickly end up in the UMF world
<danvet>
the more we look at these at least the two models are 100% incompatible it's one or the other, no mixing
flibitijibibo has joined #dri-devel
<danvet>
the only thing we can do is label them with "relevant for implicit sync" and "not relevant for implicit sync"
<daniels>
so, I agree on no mixing, there is no way you can ever bridge the two worlds (sorry jekstrand)
<danvet>
and I'm still not seeing where your example oversyncs
<daniels>
hence the suggestion that rather than trying to generate a fence-alike for UMF, we just do purely userspace sync, and give the consumer a nice hammer to zap the producer's ctx if it doesn't deliver in time
<danvet>
like even if you're extremely dumb about it and pass the explicit sync fence to both libva encoders
<danvet>
and they both set it at import time
<danvet>
the kernel sees that, realizes you're a bit silly and de-dupes it all
<daniels>
the 1. about not dumping a fence for every CS into the resv in the non-UMF case was because I'd understood from prior discussion that there was a plan to do the amdgpu thing and skip resv for CS which is 'known' to not need to participate in implicit sync
<danvet>
so you end up with a no-op ioctl 2nd time around
<danvet>
or I'm confused
<daniels>
ok, so 'to both libva encoders' ... but you then need to export the fence from the libva read CS, right
<danvet>
ok with jekstrand current patch to import sync_file you get oversync issues with 2 libva encoders
<daniels>
so you can synchronise further use against that
<danvet>
but once it's fixed with my suggestion it should be fine
<danvet>
yeah
<danvet>
so if you want the buffer back in vk
<danvet>
there's 2 ways
<danvet>
one is yolo and broken
<daniels>
if you don't clearly know whether your next (temporally, not in single-thread code flow) use is going to be implicit/explicit, you're going to need to do import/export at essentially every boundary
<danvet>
the other is you refcount all the libva users, decrement until it hits zero
<danvet>
and only _then_ grab the shared sync_file and give the buffer back to vk
<daniels>
so I can very much see people being pessimistic and just constant import/export dumps, which end up serialising reads against each other
<daniels>
yeah
<danvet>
well for the 2 encoder use case if you're dumb about it
<danvet>
then the first grab of sync_file without all the shared fences from the 2nd is just ... wrong
<daniels>
so that refcount-then-export totally wfm as a Wayland compositor, but people doing media pipelines with like 57 threads are going to be srsly unhappy
<daniels>
the first grab isn't wrong tho, because import is additive not replacement, right?
<daniels>
so if you do race, then you end up with one submitting everything and the second submitting a no-op
<danvet>
why do you import anything when you get it back from libva?
<danvet>
I'm assuming here libva is implicit synced
<daniels>
yeah
<danvet>
so there's nothing to import here
<daniels>
but say your pipeline is all explicit, because it contains explicit elements
<jenatali>
If it helps you guys at all, Windows has both implicit sync (GDI) and explicit sync (D3D/VK), and we support mixing them. The only caveat is that when you try to submit something that's implicit sync, we'll make sure that it's not going to depend on any explicit sync that isn't guaranteed to be drainable
<danvet>
or we're talking about wrong direction of import
<jenatali>
If you try to submit not-guaranteed-drainable implicit sync work, we block/stall until we can detect that it's drainable
<daniels>
so at every point where you enter implicit world, you bracket it with import (based on last-known fence) and export (based on what you just generated)
<danvet>
jenatali, mostly we're having fun with the warts of our implicit sync model unfortunately
<danvet>
I think the other pieces are fairly clear, if not yet typed up
<jenatali>
Heh, "fun"
<daniels>
jenatali: yeah, unfortunately we didn't have a clear cut between the two, and now we're stuck with them forever
<danvet>
daniels, ok so import/export here from the dma-buf/kernel pov
<danvet>
so why do you import a sync_file into the dma-buf after you get the dma-buf back from libva?
<daniels>
jenatali: and Wayland API precludes client-side threaded/delayed submit
<danvet>
for an encode session
<danvet>
decode it makes sense
<daniels>
jenatali: but luckily the kernel is magic and fixes everything for us \o/
<jenatali>
daniels: Ah right, forgot about that
uzi has quit [Ping timeout: 480 seconds]
<bnieuwenhuizen>
daniels: dumb question but would it make sense to adjust that in the wayland protocol instead of trying to do kernel heroics?
<daniels>
bnieuwenhuizen: no
<daniels>
that's the short answer :P
<danvet>
imo "implicit sync as a very funny IPC" is ok wrt kernel heroics
<danvet>
former is ok because most of the complexity we need to solve in the kernel anyway, for various reasons
<daniels>
the long answer is that we'd either have to build an actual IPC/semaphore mechanism into Wayland itself (no), or that we'd have to have everyone who might do threaded submit do cross-thread callbacks, and make clients push their work into those, which if we weren't using C might be viable but ...
<bnieuwenhuizen>
yeah mostly talking about the umf stuff
<daniels>
shrug, we don't need kernel heroics for UMF
<daniels>
the compositor deals with UMF up front
<daniels>
it's the only way which is even a little bit viable, and we're perfectly OK to do it
radii has joined #dri-devel
<daniels>
danvet: wrt your 'why do you import a sync_file back' - I think the only way to handle explicit sync in an arbitrary pipeline framework (let's call it GStreamer) is that you add explicit-sync awareness to the framework itself, so you can e.g. mix Vulkan and VA. so you do pretty much what DRM did with the BKL - if your element declares that it's explicit-aware, then you pass fences in and out of it, but if it's not, you bracket the
<daniels>
accesses with import (from the last explicit fence you got from upstream) and export (to the next downstream)
<daniels>
danvet: it seems, especially if you have multiple elements accessing the same BO simultaneously from multiple threads because they know it's safe to do so when they're not racing read vs. write, that most implementations of that would end up eating their own dogfood
<daniels>
and that you'd export shared+excl, import that back into excl, and then you're totally serialised
<danvet>
ok, I think you can make this work if you know slightly more about the expected access
<danvet>
if your implicit synced pipeline element only reads, then
<danvet>
- don't import any fence (because the read fences are all there already)
<danvet>
(or at least import only shared fences)
<danvet>
- take out _only_ the shared sync_file
<danvet>
if it writes, then you need both
<daniels>
yeah
<danvet>
or something like that
<danvet>
tbh my brain is a bit toast right now
<daniels>
mine too :)
<bnieuwenhuizen>
danvet: the read fences are not there because in the explicit case we may avoid the implicit sync fences altogether? so you can't avoid taking out the shared fences
<danvet>
bnieuwenhuizen, right now you can't ever avoid the shared fences
<daniels>
I definitely think there's a solution in there, but it's the balance of whether we inflict mutex death on userspace for the benefit (?) of not populating the shared slot in the kernel
<danvet>
daniels, I think for full glorious future we do need the shared import
<danvet>
the problem is just that with the current drivers, it's a no-op
<danvet>
since the fence will be there already (except if you do something really dumb like adding arbitrary unrelated fences)
<daniels>
yeah, I agree
<daniels>
but I'd rather surface things to userspace and have them do it properly from the get-go
<danvet>
which also means userspace wont use it, so if we later on do add the distinction between implicit sync relevant/not-relevant for fences in dma_resv
<danvet>
we're screwed
<daniels>
rather than having people rely on always exclusive
<daniels>
meh, if we ship exclusive-only import now, then it'll take us 5 years to get userspace across to choosing the right thing between either exclusive or shared
<danvet>
yeah, if we don't do this, then we're also screwed because we've locked down the semantics
<danvet>
the other problem is: too many drivers which don't even opt-out of implicit sync (like amdgpu right now)
<danvet>
so it's all fairly hopeless in reality anyway :-/
<daniels>
so yeah, whilst it's useless now (with everyone always populating the resv on every access), letting us choose the right thing for import gives userspace the tools it needs to one day _not_ populate the resv on every access, which is something that a) we want (I think), and b) is going to be forced on us by UMF hardware models anyway
<daniels>
heh
<daniels>
well, at least we can put a plausible model together and give people compelling reasons as to why they should use it
<danvet>
daniels, ok got my example wrong: the fence slot you pick for import/export depends upon what the previous/next pipelime element will do, not what the current one has done
* jekstrand
will read backlog eventually. On the phone with the internet people
<danvet>
I think
<danvet>
e.g. if the next one only reads, you only need the explicit sync slot
<danvet>
*exclusive
<daniels>
jekstrand: good luck!
<danvet>
why does both start with ex*
<daniels>
danvet: uh?
<daniels>
danvet: surely it's on your behaviour?
<daniels>
if you write, you sync against both slots & populate exclusive
<daniels>
if you read, you sync against excl slot & populate shared
<danvet>
nah, the implicit pipeline element will set the right one for its own access
<danvet>
but how you sync depends upon what the previous thing did
<danvet>
daniels, the kernel does that for you for the implicit pipeline element
<daniels>
right, but if you have a Vulkan read, then you use that to populate the shared slot, and then implicit will DTRT
<danvet>
also if you do explicit sync with lots of parallel access
<danvet>
you'll have to keep track of a pile of fences
<danvet>
like for readers all previous relevant writers (if you do it parallel/tiled or whatever)
<danvet>
and for writers all previous access for their area
<danvet>
so juggling multiple fences and picking the right one should be ok
<daniels>
ulimit -n 0xffffffff
<danvet>
at least
<danvet>
if your vulkan read is before the implicit element
<danvet>
then doing nothing already takes care of everything
<danvet>
since not even vulkan can avoid the shared slot here, because it's always set
<daniels>
right, but then is there a future where it's not always set on CS?
<danvet>
if the vulkan read is after the implicit sync, you export the read sync_file (which actually exports the exlusive slot)
uzi has joined #dri-devel
<daniels>
and that it's only populated by userspace which takes care to fill resv (on explicit -> implicit transition) and pull from resv (on implicit -> explicit transition), but explicit CS doesn't need to populate resv itself?
<danvet>
but otoh if your vk element writes, you need the read-write slot, which exports all the implicit fences into your sync_file
<danvet>
the future where it's not set means UMF
<danvet>
99% sure on that statement
<daniels>
heh ok, I'd figured there was a transitional world where people wanted to not populate resv if not necessary, but before UMF
<danvet>
it's necessary
<danvet>
if you don't you either a) pin all the memory (not so much appreciated in upstream)
<danvet>
or b) have gpu page fault support, which _requires_ umf
<daniels>
I get that it's necessary for UMF, but I thought it would be current world -> driver optimistically doesn't populate resv if userspace is smart but we still have normal fences -> full UMF world
<daniels>
ah right
<danvet>
also, umf _requires_ that you either have a) gpu page fault support or b) pin everything or c) only attach ctx preempt fences which are useless for sync
<daniels>
so we need resv populated on every single CS no matter what, because relocation fences
<danvet>
so goes both ways
<daniels>
they need to sync against that so they can swap backing storage out, because until we have actual demand paging from GPUs it needs to be a stop-the-world stall event
<danvet>
hm this little argument actually convinced me that shared import is useless
<danvet>
since it can't ever happen
<daniels>
s/they need to sync against that/relocation fences need to synchronise against every single prior access and preclude future access/
<danvet>
ah no, we can mark the implicit shared fences up, but atm no one does that
<danvet>
yup
<danvet>
also memory management fences
<daniels>
what are MM fences?
<danvet>
since at least i915 has gpu relocations of bo addresses
<danvet>
dma_fence that the kernel uses to track bo moves
<daniels>
that actually explains quite a lot, because I'd previously thought reloc fences were Christian's MM/paging fences
<danvet>
what's the paging fence?
<daniels>
you want to unpin a BO
<daniels>
or, well, you want to pin a bO
<daniels>
either way, backing storage has changed
<danvet>
in the kernel that one is called ttm_bo->moving right now
<danvet>
mostly
<daniels>
yep
<danvet>
minus bugs in drivers
<daniels>
and is a hard barrier so you can exchange the backing storage with no race
<danvet>
daniels, jekstrand so assuming we'd not have shared import from the get-go
<danvet>
could we have an upgrade path for later on
<danvet>
?
<danvet>
hm I think we're hosed already
<danvet>
currently vk says "everything explicit"
<danvet>
but if you render on _any_ current driver
<danvet>
export to dma-buf and then use it in libva
<danvet>
it will work
mlankhorst has quit [Ping timeout: 480 seconds]
<danvet>
rendering with libva and then reading from vk already needs explicit action from apps (currently poll() on the dma-buf or something like that)
uzi has quit [Ping timeout: 480 seconds]
* jekstrand
gives up on backlog
<daniels>
I think the answer is to rev the dmabuf vk exts and make them explicit-only
<daniels>
jekstrand: welcome to the party!
<danvet>
daniels, yeah
<danvet>
or well more explicit
<jekstrand>
danvet, daniels: To be clear, I don't expect mixing dma_fence with UMF. What I meant is that if we want any sort of "it'll be here soon" with a timeout, that needs to originate on the client side.
<jekstrand>
Regardless of how it's implemented.
<danvet>
for winsys where the vk winsys imports/exports the fence we could upgrade and make it even better
<daniels>
jekstrand: the client has to originate a promise, for sure
<jekstrand>
We may have a kernel object which gets created and sent to the compositor but if it's just a compositor trywait, it's tricky. We'd really like to VK_ERROR_DEVICE_LOST if a client promises to finish rendering in 100us and doesn't follow-through.
<jekstrand>
But how that promise is communicated is an implementation detail.
<daniels>
jekstrand: but the promise can be that the client hands the winsys a hammer to destroy the client ctx, and if the winsys is ever dismayed at the client, it can whack that hard
<jekstrand>
daniels: Yup
<daniels>
and then ... don't anger the winsys
<jekstrand>
daniels: And that hammer could be killing its Wayland connection[
<jekstrand>
One of the things we do have to think through, though, is a UMF-based driver on an implicit sync window-system. Once we convert ANV to UMF, it's going to be UMF all the time regardless of what it's running on.
<jekstrand>
That doesn't mean importing a UMF into implicit sync. It may very well mean we wait on the UMF in a thread.
<jekstrand>
Except for those wayland cases where that's not allowed in which case I guess vkQueuePresent stalls.
<jekstrand>
The big thing I was trying (and maybe failed) to communicate with my e-mail is that I don't think timeline syncobj is useful for WSI.
<jekstrand>
Maybe it is if you want to pass the object once and then just pass u64 serials rather than passing sync_file.
<jekstrand>
But I don't think there's anything truly useful as a transition between sync_file and UMF.
<daniels>
well, it's useful in the sense that it exists today, and UMF doesn't :P
<daniels>
so it's something that we can build out and test against, and then the conversion is much closer to a sed job
<jekstrand>
Sure
<jekstrand>
If it helps with prototyping, go for it.
<daniels>
yeah, just a crutch, not a long-term useful plan
<jekstrand>
Ok, as long as we're clear on that. :)
uzi has joined #dri-devel
<jekstrand>
In particular, don't design anything assuming that you have a "wait for a fence to materialize" ioctl.
<daniels>
we're all agreeing with each other, in a very roundabout way
<daniels>
yeah
<zmike>
epic-handshake.jpg
ngcortes has quit [Remote host closed the connection]
<daniels>
I still hold out hope that people are going to pull back from the brink and give us enough doorbell that epoll for fence materialising is a useful thing which can be implemented for efficiency improvement
<daniels>
but meh
<jekstrand>
Yeah, as I said in the mail, I think that can be done with some sort of scheduled vs. completed fence.
<jekstrand>
But you still don't have real guarantees. It just lets you queue stuff up ahead of time a bit.
<jekstrand>
Bad clients are still possible. If you get one, shoot it.
Toast has joined #dri-devel
Toast has quit []
<danvet>
jekstrand, I don't think you can do an UMF-only vk right now
<danvet>
start out in UMF, convert over when anyone asks for anything related to dma_fence
<jekstrand>
danvet: We can't convert over either
<danvet>
probably should share that code across drivers
<danvet>
jekstrand, what's the hold up?
<jekstrand>
re-creating objects on-the-fly
<jekstrand>
I think the solution there, as I said, is that we wait in a thread.
<danvet>
hm that will suck quite a bit I think
<jekstrand>
Yup
<danvet>
and the thing is, UMF is going to happen rsn now for i915
<danvet>
for some value of soon
<jekstrand>
Yup
<danvet>
gen12+
<jekstrand>
We can start off with an environment variable or something which puts the driver in UMF mode and gives you all the toys but doesn't advertise WSI.
<danvet>
jekstrand, also you then can't have drm_syncob export/import anymore
<danvet>
jekstrand, so which objects can't you convert?
<jekstrand>
danvet: That's fine. drm_syncobj isn't exposed by Vulkan directly. sync_file is so we'll have to think about that a bit.
<danvet>
you wont get it with UMF
<jekstrand>
danvet: It's not that converting objects is impossible, it's that converting them on-the-fly is impossible.
<danvet>
jekstrand, it would mean a spec breaking stall
<jekstrand>
danvet: How am I supposed to do that if I don't know the dependency graph?
<danvet>
seems less worse than picking an env variable and trying to set it right
<danvet>
jekstrand, you stall everything
<danvet>
because you just dont
<jekstrand>
danvet: wait-before-signal
<danvet>
uh, annoying
<jekstrand>
danvet: They've got work in-flight depending on a UMF that they've not submitted work to signal yet. What am I supposed to do with that?
<danvet>
hm I thought you must set at creation time whether you pick the "export to sync_file" option
<jekstrand>
yes, that's a thing
<danvet>
ok so you need a bit a fancier barrier, but this should work?
<danvet>
like from that point on everything new is submitted/created with fence objects in the kernel
<danvet>
for any in-flight vk semaphores and timelines you note whether it was an umf one or not
<danvet>
for timeline this means you probably need to note the seqno of the first dma_fence you put in there
<danvet>
if it's still from the umf world, push it off into the submit thread and wait there
<danvet>
if you get one of these in the winsys do the same there too with submit thread
<danvet>
so it's a rolling barrier
<danvet>
why would this not work?
<danvet>
(not saying it wouldn't be very nasty)
<jekstrand>
It might be theoretically possible.
<jekstrand>
We only have 7 VkSemaphore implementations in ANV, what's a half-dozen more?
<daniels>
jekstrand: I think the answer is that you just fail exportable alloc unless the client also chains in the VK_EXT_i_promise_to_sync_explicitly_everywhere enable
<alyssa>
jekstrand: that's the spirit
<danvet>
jekstrand, I'm talking about i915 only here
<danvet>
amd is stuck forever on amdkfd as their UMF thing, so forget porting vk
<danvet>
the others aren't even close
<danvet>
daniels, per-bo flag of how much you don't sync in anv
<danvet>
we already deal with those more or less
<danvet>
failing export of what previously worked isn't nice
<danvet>
jekstrand, the thing is if we're not going to do the auto-upgrade to current mode for eventual UMF anv
<bnieuwenhuizen>
danvet: what is the problem with amdkfd?
<danvet>
I honestly dont see the point it trying to make dma_fence work better for implicit sync
<danvet>
bnieuwenhuizen, separate world
<danvet>
would be even more painful to cut over from amdkfd to amdgpu if you suddenly need to use dma_fence for sync
ngcortes has joined #dri-devel
<bnieuwenhuizen>
oh that'd be painful, I'd hope we grow UMF support on amdgpu
<jekstrand>
danvet: The patch series I sent today really does solve an actual perf issue.
<danvet>
bnieuwenhuizen, I'm trying to convince agd5f and felix kuehling to figure it out
<jekstrand>
s/today/last week/
<danvet>
but it's a bit a case of "we've already planned the next 5 years"
uzi has quit [Remote host closed the connection]
<jekstrand>
the dma-buf sync_file export one
<danvet>
hm
<jekstrand>
import, less so
<jekstrand>
The benefits to import are pretty theoretical, IMO.
<danvet>
jekstrand, well the current import is only slightly better than your current trick
<danvet>
but if no one hits that issue then even the better import isn't going to help much
<danvet>
jekstrand, no one does post processing of reading that frame again before they finish getting the present call out?
<danvet>
s/post processing/prep for next frame/
<jekstrand>
danvet: Once they've presented, they aren't allowed to read it.
<jekstrand>
Well, there are ownership rules that only jamesjones understands, IIRC.
<danvet>
yeah, the hit is only if they read before they've done the present call
<jekstrand>
But I don't think you're supposed to touch it after vkQueuePresent
<jekstrand>
danvet: Actually, the hit is if they start rendering something new before the present call. The dummy submit serializes with said new rendering because it's all on the same queue
sagar_ has quit [Quit: WeeChat 3.0.1]
<jekstrand>
But, again, I don't think apps are doing too much of that
<danvet>
ah right it's any rendering
<jekstrand>
And if they are, they're going to get burned if they ever hit a prime blit anyway. Nothing we can do about that.
<daniels>
jekstrand: if you love X11 so much, why don't you just solve the perf issue by never doing any syncing ever :P
<danvet>
prime blt?
<bnieuwenhuizen>
danvet: copy from device tiled texture to gtt linear texture
<bnieuwenhuizen>
for when you do DRI_PRIME stuff
<danvet>
yeah but why is that causing a burn?
<jekstrand>
danvet: I don't love X11. It's like the drunk uncle that keeps coming to the family gatherings even though you've relocated 6 times and not told him and hoped he'd get the hint.
Charlie_Wang has joined #dri-devel
<bnieuwenhuizen>
because that also happens on said queue and hence serializes with new rendering before present
<danvet>
jekstrand, wrong dan
<danvet>
bnieuwenhuizen, uh, that sounds like driver bug
<danvet>
can prime use a separate sdma/blt ctx?
<danvet>
jekstrand, ^^
<bnieuwenhuizen>
we can certainly make it so, not sure what we do now
<danvet>
this is kinda why we have copy engines to no end on modern gpu
<bnieuwenhuizen>
danvet: I thought this entire serialization talk was because intel had only 1 queue
<bnieuwenhuizen>
otherwise the dummy submit to get implicit sync going can be on a random queue?
<jekstrand>
danvet: Prime could, yeah. We've just not wanted to complicate the code even more.
<danvet>
bnieuwenhuizen, blt is separate
<danvet>
and it can preempt
<danvet>
so if you whack that copy job into blt on a separate gpu ctx
<danvet>
then compositor does a flip
<danvet>
we'll boost it and you get ahead of the queue
<danvet>
plus/minus some details
<bnieuwenhuizen>
well, a dummy submit does even less than a copy so presumably it can run on whatever blit queue you have
<danvet>
bnieuwenhuizen, well that's essentially what the import ioctl does
<bnieuwenhuizen>
yes
<danvet>
"run" your fake job on an "engine" out of thin air
<daniels>
jekstrand: I know, I'm just shitposting whilst making dinner, sorry
<jekstrand>
:)
<danvet>
jekstrand, so I now have "someone shot my puppy" vibes about your nope on the umf->dma_fence autoupgrade
<jekstrand>
danvet: Well, you keep shooting my puppy. Turn about is fair play. :-P
<danvet>
uh, my puppy was meant to be the savior for all of your puppies I shot ...
<alyssa>
...wha?
<danvet>
the only other thing is some flag at vkDevice creation time
<danvet>
alyssa, it's a mess, don't look
<alyssa>
ok
<danvet>
and expecting apps to set it correctly is about as likely as expecting users to set it correctly
<danvet>
since app really can't know what your winsys wants
<danvet>
or whether your libva can deal with umf or not
<danvet>
I expect a lot of "we totally have enabled modifiers, expect not actually" vibes from this approach
<danvet>
cool demo, useless product
<jekstrand>
yeah
<danvet>
like what do you do if e.g. something like blender uses one vkdevice for rendering with compute
<danvet>
and another vkdevice for winsys display
<danvet>
or something like that
<danvet>
I expect a lot of "your compute j
<danvet>
ob gets randomly killed by hangcheck"
<daniels>
what I learned from modifiers is that if you don't make the transition between the worlds jarring and violent, you'll be lost in some kind of midpoint hell forever
<jekstrand>
ugh
<danvet>
daniels, we're on that
sagar_ has joined #dri-devel
<danvet>
at least for i915
<danvet>
non-modifier on gen12+ sucks because you don't even get X-tiled
<bnieuwenhuizen>
anything that gets you an image is not violent enough
uzi has joined #dri-devel
<bnieuwenhuizen>
a correct image*
<jekstrand>
:D
<danvet>
we could sample x-tiled by default
<danvet>
gets the perf back to where it should
<bnieuwenhuizen>
seriously, perf testing is hard and hence not frequently done
<danvet>
and the jarring corruption :-)
<danvet>
bnieuwenhuizen, oh we're better than that
<danvet>
we just perf-test with modifiers enabled
<danvet>
"look no problem"
<jekstrand>
srly
<danvet>
even better
<danvet>
some internal jiras between arrogant/clueless with titles like "convince distros to enable modifiers by default"
<bnieuwenhuizen>
I expect most testing on platform to be "full system testing gave us too much variance in results so we switch to a microbenchmark / specific test case that avoided all the modifier avoiding paths"
<danvet>
as if we didn't disable this stuff in upstream compositors due to actual bug reports ...
<danvet>
bnieuwenhuizen, ofc that too
<danvet>
we're forever stuck trying to get a better cpu freq governor into upstream because all the testing is done with fixed freq below tdp
<danvet>
and ofc the cpufreq people never test with any gpu workloads running on the same die, so don't hit the power sharing issues we have
<danvet>
jekstrand, any can't we at least save some of these puppies?
<danvet>
they're cute ...
flto has quit [Ping timeout: 480 seconds]
<alyssa>
i like puppies
<karolherbst>
danvet: don't get me started on freq stuff on intel :D
<karolherbst>
alyssa: who doesn't?
<karolherbst>
:p
<karolherbst>
I actually want to have a fanless home server system here, but try to figure out what CPU isn't using more than twice the documented TDP in benchmarks and figure out which CPU performs well if the TDP is actually a hard cap and nothing you can ignore for a minute
<jekstrand>
danvet: Uh... not sure
<jekstrand>
danvet: As long as the Vulkan API gives us a point to do it, client-side wait wouldn't be the end of the world.
<danvet>
you'd drop a bunch of sync file extensions
<jekstrand>
Yeah....
<jekstrand>
We might be able to make sync_file work, maybe
<danvet>
which means interop with libva and everything would also mean threads + poll on dma-buf
<danvet>
not for UMF
<jekstrand>
At least well enough for Android
<bnieuwenhuizen>
those sync file extensions have actual users though
* bnieuwenhuizen
looks at Android
<jekstrand>
Actually, for Android, we can stall and return -1 in right spot
<danvet>
yeah that's only correct, not performant
<danvet>
otoh intel and android SoC market ... lol
<bnieuwenhuizen>
danvet: chromebooks
i-garrison has quit []
<danvet>
oh right
<danvet>
we might care about that
i-garrison has joined #dri-devel
<danvet>
otoh for cros we could do a -Danv_umf_default=nope at build time
flto has joined #dri-devel
<jekstrand>
:(
<danvet>
which is kinda my point
<danvet>
I don't think we can switch the default
<danvet>
not even on desktop linux
<danvet>
which means umf anv is a neat tech demo
<danvet>
and given the canyon i915 is in, my appetite for neat tech demo is a bit low
<jekstrand>
Never give up! Never surrender!
<danvet>
next time cubanismo shows up at an xdc I need to have a chat with him about why exactly you create the winsys after the vkdevice
<danvet>
or dont pass a list of winsys for this vkdevice
<bnieuwenhuizen>
because memory gets allocated as part of a device?
<danvet>
or something like that
<jekstrand>
danvet: I don't want fundamentals of the driver changing based on winsys
<danvet>
we kinda have to
uzi has quit [Ping timeout: 480 seconds]
<jekstrand>
No, we need to fix the winsys, at least a little.
<danvet>
that means UMF in sync_file
<danvet>
or something like that
<jekstrand>
yeah.....
<jekstrand>
If it makes you feel better (it won't), NV is doing UMF in sync_file today. :D
<danvet>
nv as in blob or nouveau.ko?
<danvet>
also they can hack up whatever they want
<danvet>
if it's nv
<jekstrand>
blob
<danvet>
yeah not my problem
<daniels>
also arguably not even in the top 10 of weird things they do
<danvet>
so the killer is, and we've shot this puppy before
<danvet>
umf in sync_file breaks atomic kms
<danvet>
daniels, that too
<danvet>
jekstrand, the locking rule is that for atomic flip you only get either 100% umf or 100% dma_fence in your inputs
<danvet>
because we also have an out sync_file
<danvet>
which especially android likes to use
<jekstrand>
danvet: Naturally. :)
<danvet>
so you're back to "rev the entire protocols and winsys extensions"
<danvet>
which is another one of these "possible in theory" things
<jekstrand>
:-/
<danvet>
the magic vk umf->dma_fence barrier is at least only localized
<danvet>
so a theoretical approach with bounded time to roll out, given infinite amount of people
<jekstrand>
Or we could YOLO sync_file on chromeos and go full UMF
<danvet>
or something like that
<alyssa>
==22378== Invalid address alignment at address 0x18011829
<alyssa>
==22378== at 0x5CD365C: __aarch64_ldadd4_acq_rel (in /home/alyssa/lib/dri/libgallium_dri.so)
<alyssa>
this raises so many questions
<danvet>
who cares about hanging the kernel in inappropriate places
<karolherbst>
alyssa: why? :D
ngcortes has quit [Ping timeout: 480 seconds]
<karolherbst>
and which ones
<alyssa>
karolherbst: "How the heck did I corrupt memory so bad I got an unaligned pointer in my BO ref count and yet valgrind says nothinge else"
<karolherbst>
alyssa: try with libasan
<danvet>
jekstrand, still feels like the vk umf->fence is the least impossible
<danvet>
something like the protoctx you do
<danvet>
except fastpath is just a few ordered loads and slow path takes the umf2fence_lock and rechecks
<danvet>
on an object-by-object basis
<danvet>
timeline waits would need to wait for both the fence to show up in the drm_syncobj
uzi has joined #dri-devel
<danvet>
and the old umf to signal
jjardon has joined #dri-devel
<danvet>
until the old umf context has finished
<danvet>
at which point we set another flag to stop with all the umf spinning
<danvet>
in waits
<danvet>
after that you're stuck with an odered load and check in a bunch of places
<danvet>
same if you never leave umf
xp4ns3 has quit []
ngcortes has joined #dri-devel
<danvet>
aside from the funny transition state all the switches are the same as the env variable default thing
<danvet>
and the busy spin in the submit thread during transition doesn't matter because even if the app renders the load splash before it sets up winsys
<danvet>
the load splash really shouldn't take that long to render
<danvet>
and I think aside from the submit thread spinny thing this should all be shareable code I think
Toast has joined #dri-devel
Toast has quit []
Toasty has joined #dri-devel
Toasty has quit []
<airlied>
did I go back to bed and wake up to it's all screwed and no puppies in the future?
<alyssa>
airlied: yes.
uzi has quit [Ping timeout: 480 seconds]
<airlied>
danvet, jekstrand, daniels : might be nice to make that irc conversation of doom conclude in an email
Toasted has joined #dri-devel
<alyssa>
and the oscar goes to
<alyssa>
list _safe not actually being safe?
<imirkin>
or just not safe in the way you expect?
<alyssa>
this is C apologia
<alyssa>
:p
<imirkin>
if you say so
<imirkin>
or maybe you just have different-from-everyone-else for what _safe does?
<imirkin>
expectations*
<imirkin>
iirc safe means you can remove the node from the currently-being-processed list without screwing up iteration
<alyssa>
but it doesn't mean you can insert safely at the node point apparently
<imirkin>
yeah, definitely not
<alyssa>
ugh.
<daniels>
airlied: yeah it will
<danvet>
airlied, I'm not sure we've concluded on much yet
* alyssa
wonders how this works in NIR
<daniels>
danvet: I feel like we're at least circling the drain
<airlied>
danvet: sounds like you were tending towards it's all screwed, and retiring
<danvet>
most of the rehashed dead puppies we did document as part of indefinite fencing
<jani>
alyssa: imirkin: _safe merely holds the current node in a temporary variable, that's all there is to it
<daniels>
danvet: does Android still depend on swsync?
<danvet>
daniels, we're definitely circling
<danvet>
daniels, I think only on shit drivers
<alyssa>
jani: RiiR.jpg
<daniels>
danvet: is redefining 'shit' an option
<danvet>
howIlearnedtolovethebomb.mkv you mean
<imirkin>
jani: right
<imirkin>
danvet: to stop worrying and love the bomb...
<alyssa>
so long, mom
uzi has joined #dri-devel
Toasted has left #dri-devel [#dri-devel]
* jani
reads list.h and lols at a _careful variant
<alyssa>
_safe, _safer, _safest
<imirkin>
tell that list to safen up!
<danvet>
jani, yeah llist.h is absolute glorious in that regard
<alyssa>
imirkin: maybe it means NIR is really excited about adding integers
<imirkin>
alyssa: that's what i assumed
<imirkin>
ADD HARDER!
<alyssa>
iadd!
<imirkin>
youadd?
<imirkin>
oh, failed opportunity...
<imirkin>
uadd?
<idr>
It just learned how, so it's very excited. "I add!"
<idr>
weadd
<pendingchaos>
imirkin: the lowered code created by nir_lower_idiv() marks all instructions as exact
<pendingchaos>
the .length() probably creates divisions
<imirkin>
pendingchaos: yeah, it does
<imirkin>
it does (buf size - immediate) / struct size
<alyssa>
iadd! uadd! we all add 4i ... add!
<idr>
alyssa wins. :)
<ccr>
oompa-loompa
<alyssa>
why you'd w
<jljusten>
uaddbro?
<alyssa>
ant to add 4i, idk
uzi has quit [Ping timeout: 480 seconds]
bcarvalho__ has joined #dri-devel
uzi has joined #dri-devel
bcarvalho_ has quit [Ping timeout: 480 seconds]
pnowack has quit [Quit: pnowack]
aaguilar has joined #dri-devel
aaguilar has quit []
mbrost has joined #dri-devel
aaguilar has joined #dri-devel
aaguilar has quit [Remote host closed the connection]
aaguilar has joined #dri-devel
aaguilar has quit []
uzi_ has joined #dri-devel
aaguilar has joined #dri-devel
uzi has quit [Ping timeout: 480 seconds]
aaguilar has quit []
ngcortes has joined #dri-devel
aaguilar has joined #dri-devel
aaguilar has quit []
cyrozap has joined #dri-devel
aaguilar has joined #dri-devel
aaguilar has quit []
Anorelsan has joined #dri-devel
<cmarcelo>
NIR poll: how do people feel about consolidating NIR into the name workgroup_size (among the names in the code base: "group_size", "local_size", "local_group_size")? compiler/glsl would not change, retaining the GLSL relevant names.
<zmike>
👍
Anorelsan has quit []
Anorelsan has joined #dri-devel
<alyssa>
👍
<bnieuwenhuizen>
+
<DrNick>
what if instead you added nvidia's and Direct3D's nomenclature?
<alyssa>
assuming those really are wquivalent
<alyssa>
DrNick: Metal too
<alyssa>
threadgroup size
<DrNick>
warp and weft is pretty good you have to admit
<alyssa>
....weft?
<DrNick>
the weft is perpendicular to the warp
<alyssa>
I feel like I'm missing a pun here
<alyssa>
oh. weaving, ok.
<DrNick>
yeah, nivida used fabric names for their thread things
<DrNick>
idk if they actually used weft
<DrNick>
they definitely should have, though
<pcercuei>
threads, fabric... that makes sense
<DrNick>
call ARB_shader_ballot and similar weft operations
<jekstrand>
cmarcelo: +1
<DrNick>
because they operate horizontally across warps
<anholt>
cmarcelo: +1
uzi_ has quit [Ping timeout: 480 seconds]
Anorelsan has quit [Quit: Leaving]
* alyssa
stares at regalloc
marex has joined #dri-devel
<zmike>
how is it possible for freedreno ci to flake this many times
<zmike>
😠
<airlied>
now imagine that is in products shipping :-P
<zmike>
freedreno ci is shipping in products?
<zmike>
😓
<zmike>
that's it I need a vacation
<alyssa>
airlied: the flakes are a5xx ime
<alyssa>
a6xx is what's shipping
<airlied>
ah maybe a6xx has better context separation
<anholt>
we have per process pagetables on 6xx
<anholt>
which, when you have things scribbling in piglit, matters a bunch.\