<airlied>
just going to try and get anv to pass cts before I get to that
<Lynne>
kk
<Lynne>
what is radv waiting for to get the patchset merged?
<airlied>
more review I think
<airlied>
so really either bnieuwenhuizen or hakzsam to throw some more criticism at it :-P
<Lynne>
btw you can replace libx264 with libx265 for hevc, just keep in mind it'll generate a rext file due to a bug
<airlied>
though in fixing anv I'm seeing some minor fixes for radv
<Lynne>
you can still decode them fine (probably) by passing -hwaccel_flags +allow_profile_mismatch before -i when decoding
jfalempe has joined #dri-devel
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
kzd has quit [Quit: kzd]
fab has quit [Quit: fab]
sgruszka has joined #dri-devel
itoral_ has joined #dri-devel
itoral has quit [Ping timeout: 480 seconds]
mvlad has joined #dri-devel
sghuge has joined #dri-devel
rasterman has joined #dri-devel
a1batross has joined #dri-devel
<airlied>
okay anv passes all the current CTS tests
<airlied>
or will once I rebase/push it
* airlied
is wondering can I do separate dpb/dst on inte
<airlied>
intel
<airlied>
ah no I just misread my code, coincide it is
fab has joined #dri-devel
<dj-death>
airlied: it's just on SKL?
nuh^ has quit [Remote host closed the connection]
<airlied>
dj-death: SKL+ though I have to test it on DG2 to make sure it works on both ends of the spectrum
* airlied
doesn't have an integrated gen11/12
<airlied>
h265 is also going to be messy as I think for the as specced vulkan API it would need HuC
<airlied>
I have to work out if my DG2 board loads huc at all, I fell down the twisty mei paths
<airlied>
dj-death: Lynne tested on an SKL, and I've testing on a whiskylake so far
pochu has joined #dri-devel
kts has quit [Quit: Leaving]
frankbinns has joined #dri-devel
kts has joined #dri-devel
kts has quit [Remote host closed the connection]
tursulin has joined #dri-devel
lynxeye has joined #dri-devel
<dj-death>
airlied: appears to load fine on my dg2
<dj-death>
airlied: I'm on drm-tip 6.2.0-rc3+
<dj-death>
just need the right version of the blob
<MrCooper>
daniels danvet emersion jekstrand: if user-mode queues & fences and Vulkan wait-before-submit semantics can be handled in the kernel (which seems required for UMF to be usable by display servers), do we really need explicit sync in the display protocols?
ice99 has joined #dri-devel
vliaskov has joined #dri-devel
jkrzyszt has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
<danvet>
MrCooper, can it be handled in the kernel?
<MrCooper>
which part specifically?
<danvet>
well thus far I've seen some hand-waving that we just stuff umf into dma_resv somehow on the sideline
<danvet>
and pretend everything keeps working
<danvet>
but also the handle umf wait before submit in the kernel
<danvet>
like generally with umf you handle this in userspace by putting the right waits into the userspace queue
<MrCooper>
if UMF is to be usable for display servers, the kernel has to be involved somehow, doesn't it?
<danvet>
I also haven't seen a reasonable plan for umf vs dma_fence compat mode
<danvet>
the hand-waving just assumes everything is umf in your system
<danvet>
why?
<danvet>
also how?
<danvet>
with umf you get no guarantee it'll ever happen
<MrCooper>
Wayland compositors want something which can be plugged into an event loop, not "putting the right waits into the userspace queue"
<danvet>
so either compositor waits until it's signalled before it submits
<danvet>
or you put a magic queue wait with timeout into the command queue
<danvet>
then it's not really pure umf anymore
<MrCooper>
yeah, "pure UMF" seems unusable for Wayland compositors
<danvet>
and the trouble with augmented umf so that it kinda looks like a futex with pollable fd
<danvet>
you get into the entire "looks almost like dma_fence, but is entirely incompatible with that" mess
<danvet>
and since we need legacy dma_fence supporting mode anyway for the foreseeable future
<MrCooper>
entirely incompatible how?
<danvet>
I'd just use that and call it done
<danvet>
kernel deadlocks in memory reclaim
<danvet>
so you can do dma_fence built using umf primitives
<danvet>
even with userspace submit and all that
<MrCooper>
even with a timeout?
<danvet>
but then you ditch the entire nice future fences semantics of umf
<danvet>
mutex_lock_timeout does not fix a deadlock
<MrCooper>
sounds like Vulkan wait-before-submit semantics can't be safely supported at all?
<danvet>
there's the tdr timeout for jobs that take too long, but usually when people talk about timeout in umf context they mean essentially replacing mutex_lock with mutex_lock_timeout and having no plan for what happens when the timeout fails because you've managed to hit the architectural deadlock
<danvet>
yes
<danvet>
which is the big scary thing and the reason jekstrand wants to ditch
<MrCooper>
somebody should tell James Jones that
<danvet>
well, wait-before-submit can be supported, that's what the entire drm_syncobj and mesa submit thread stuff is about
<danvet>
but there's some older wait before submit vk stuff which wasn't this carefully engineering (mostly könig saying no really)
<danvet>
and which just works mostly, but not by design
pcercuei has joined #dri-devel
<danvet>
i.e. if you get a cpu fault at just the right time that gets into memory reclaim and hits just the right dma_fence, you functionally deadlock
<danvet>
tdr will clean up the mess, but you have a reset for something that vk spec says should work
<MrCooper>
submit thread won't fly either I'm afraid, per discussion on the explicit sync Wayland protocol
<danvet>
hm link for me to catch up?
<MrCooper>
sec
<danvet>
MrCooper, for the vk feature that's very gray on linux-dma_fence world ask jekstrand
<MrCooper>
vkQueuePresentKHR is not supposed to block either
srslypascal has joined #dri-devel
<MrCooper>
danvet: FWIW, I can't seem to find the reference right now, but I understand AMD's plans for user-space queues is to signal dma_fences from user space for implicit sync
<danvet>
yeah you can do that
<danvet>
if you do it right
<danvet>
I'm honestly not confident from the discussions thus far
<MrCooper>
isn't it fundamentally the same thing as wait-before-signal though?
<danvet>
not if you do it right :-)
<danvet>
essentially what you need is a) pin memory like with ioctl submit until each request is done
<danvet>
have an in-kernel timeline to make sure your ctx is advancing monotonically and in finite time
<danvet>
(that was b))
<danvet>
c) nuke the entire gpu context if userspace violates this
<danvet>
d) which means any wait-before-signal trick you do in the command stream that doesn't work with ioctl submit will also not work with this
<danvet>
at least work reliably as in "guaranteed to not occasionally deadlock and end up nuking the app"
<danvet>
MrCooper, ok read the wl proto issue, and yes daniels is right
<danvet>
unless you handle the dma_fence materialization in the protocol on the compositor side
<danvet>
then there's no way to implement this client-side only without breaking some guarantee somewhere
<MrCooper>
dma_fence materialization as in once the signal has been submitted for the wait?
<danvet>
yeah
<MrCooper>
would still need something usable in an event loop for that
<danvet>
yeah that's why I put my POLL_PRI comment at the bottom
<danvet>
if we want to tack it onto the drm_syncobj fd
<danvet>
or we do a new one
<danvet>
whatever compositors like really
<MrCooper>
but that still couldn't avoid the deadlock issues in the kernel, could it?
<danvet>
MrCooper, I think for compositor design it's useful to distinguish dma_fence + drm_syncobj case, where we can do all kinds of things like pollable fd and clear point of "the kernel is committed and guarantees it'll happen"
<danvet>
and pure umf, which is a memory location with no guarantees whatsoever about what'll happen to it in the future
<danvet>
MrCooper, drm_syncobj doesn't deadlock, because you cannot submit an un-materialized fence as a dependency anywhere
<danvet>
the "wait for fence materialization" is why the mesa submit thread is a thing
<danvet>
which means the kernel doesn't die, but otoh mesa might not be able to guarantee what the vk/wl specs say
<danvet>
which is why this is all a bit annoying :-/
<MrCooper>
yeah, seems like a thread can't really work, and it's not supposed to block either
<danvet>
yeah I mean fundamentally the problem doesn't disappear
<danvet>
by moving it into userspace
<danvet>
so you still have the mismatch between what the spec wants and what linux delivers
<MrCooper>
James Jones still seems to be under the impression it can work, that's why he's insisting on explicit sync support (in X as well)
<danvet>
I think fundamentally you can't have a) dynamic gpu memory management b) future fences c) compositors not getting stuck (i.e. cross security boundary guarantees) d) bottomless queues everywhere (no blocking)
<danvet>
nvidia blob largely drops a) on the floor
<danvet>
then this is all trivial
<MrCooper>
what a mess :/
<danvet>
I think they defacto also drop c) and just shrug
mauld has quit [Quit: WeeChat 3.5]
<danvet>
but one of these you really have to drop or it boils down to fully abstract diagram that demonstrates the deadlock
frieder has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
<emersion>
the protocol should support submitting fences that didn't materialize yet
<emersion>
what's wrong with waiting for UMF in the kernel with a timeout, like a Wayland compositor would?
<dolphin>
emersion: that's indeed the question :)
<dolphin>
I guess it boils down to a lot of code has been designed that just assumes that fence always materializes in the next 10 seconds, so that UMF would have to be a new construct compared to dma_fence
<emersion>
is a 10s timeout bad?
<dolphin>
the waiters don't have a timeout, the exporter gives a guarantee to either fulfill or fail the fence in that time
Duke`` has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
<emersion>
i don't really understand any of this
<danvet>
emersion, yeah if you wait for umf in the kernel you just rebuild the mesa submit thread in the kernel
<emersion>
yes
<danvet>
that's a pile of code on the wrong side of a security boundary
<emersion>
that bad?
<emersion>
well, it would only be for backwards compat
<danvet>
mesa exists already
<emersion>
is waiting a security issue?
<danvet>
nah but doing all the parsing and copying and stuffing it into a kthread :-)
<emersion>
oh there is parsing involved?
<danvet>
if you want perfect backwards compat with existing ioctl at least
<MrCooper>
it would waste CPU cycles too?
<danvet>
if not, then ... mesa submit thread and drm_syncob wait-for-materialize all exists and works
<emersion>
i thought it was just waiting for a bit in some chunk of memory
<danvet>
MrCooper, well if you don't write a fastpath
<danvet>
so more complexity in a legacy submit ioctl which a) tend to be complex already b) why would you touch them
<emersion>
i mean there is probably a way to wait for a UMF without wasting CPU cycles?
<danvet>
emersion, what dolphin said, you need to wait for umf at a different place than where all the current drivers wait for dma_fence
<emersion>
where is that place exaclty?
<danvet>
emersion, the cpu wasting is copying all the stuff to the kthread
<danvet>
after drm_sched_job_submit() essentially
<danvet>
umf wait must be before that
<danvet>
which also means you get propagating umf, because the dma_fence you hand back to userspace also becomes an umf
<danvet>
it's an infectious property
<danvet>
and with drm_syncobj we have the in-kernel semantics to at least stall in the right place
<danvet>
with sync_file not even that is a thing
<emersion>
okay, then we are in a situation similar to format modifiers, where everybody needs to support it in the whole stack before they can be used?
<danvet>
neither in dma_resv
<danvet>
and if you don't patch all these and the drivers using them, it's not good for more than a quick "hey this looks easy" demo
<danvet>
emersion, I think so
<danvet>
but I think I'm also more pessimistic on this than others
mauld has joined #dri-devel
<danvet>
I don't think it's more code in the kernel than userspace
<danvet>
but also not less at all, and putting all that for compat in the kernel instead of userspace just feels rather wrong
<emersion>
i'm not convinced about that, but i'm no kernel dev
<bnieuwenhuizen>
on the other hand if nobody comes up with alternatives for years ...
<danvet>
bnieuwenhuizen, I haven't seen more than handwaving on the kernel side either
<Lynne>
how would mcbp work with user-mode submits?
<MrCooper>
so to recap, it sounds like Vulkan wait-before-signal can't be fully supported with upstream drivers; assuming user-mode fences are handled at the kernel level in a way suitable for Wayland compositors, is explicit sync really needed in the display protocols?
<MrCooper>
Lynne: GPU FW needs to handle it presumably
<emersion>
wait-before-submit* (?)
<MrCooper>
wait-before-signal as in the GPU wait is submitted before the corresponding signal
<danvet>
MrCooper, imo yes
<emersion>
i mean that's what's happening in general, people wait before signal, and the wait is unblocked when the signal happens
<emersion>
or maybe my terminology is completely wrong
<danvet>
like if protocols really can't be fixed, then I guess we can add an umf slot to dma_buf as another iteration of the most hilarious ipc ever
<danvet>
which mesa stuffs in on one side and takes out on the other
Haaninjo has joined #dri-devel
<danvet>
kinda like the import/export ioctl that jekstrand landed
<danvet>
but I'm really not sure whether ipc-on-dmabuf is a great design
<emersion>
yeah, i'd rather not
<MrCooper>
danvet: the protocols can be fixed, I'm just wondering if there will be any tangible benefit from the churn
<dolphin>
right, but if kernel is just metadata carrier, still can't do dynamic memory management really
<danvet>
it's probably the quickest hack forward which isn't a design dumpster fire at a fundamental level though
<danvet>
dolphin, you'd need to use the mesa submit threads still
<emersion>
well, having a good design sounds like a tangible benefit to me
<emersion>
instead of stashing more stuff onto dmabufs
<danvet>
it's really just a "wl proto is immutable, let's add the missing field to the dma-buf ipc sidechannel" :-)
<emersion>
wl proto is not immutable
<emersion>
i can work on the proto side
<emersion>
i just need folks to fix the rest :P
* danvet
firmly tongue-in-cheek :-)
<dolphin>
emersion: asking everyone to move away from dma-buf to new_fence, will be bit of a job
<danvet>
but yeah the dma-buf ipc approach would only need changes to dma-buf.c and mesa winsys
<danvet>
well, more or less in exactly the same places as the dma-buf fence import/export
<MrCooper>
wouldn't it use the dma-buf fence import/export?
<emersion>
it doesn't have to be a flagday migration
srslypascal is now known as Guest2178
<danvet>
ok gtg now for lunch/workout/sauna, ttyl
srslypascal has joined #dri-devel
<danvet>
MrCooper, it's not a dma_fence/sync_file, so it'd be need
<emersion>
it can be a gradual migration like format modifiers
<danvet>
*new
<danvet>
or you're back to the "magic compat layer in the kernel" pandora's box
<MrCooper>
emersion: nvidia users would keep suffering from synchronization artifacts until the gradual migration completes
<emersion>
is nvidia migrating to UMF today?
<emersion>
hm, or do you mean nvidia is completely broken today?
<MrCooper>
emersion: re wait-before-signal, so far the kernel supports submitting GPU waits only after the corresponding signal was submitted
<emersion>
i'm not following
<emersion>
i submit a page-flip IOCTL before the GPU work for the frame has completed
<dolphin>
emersion: today you're not allowed to publish a fence unless you guarantee to be able to resolve it
<dolphin>
emersion: well, the problem is if you export the fence, it can be imported by pretty much anybody
<dolphin>
so you'd have to change the rules based on who imports it
alarumbe has quit [Ping timeout: 480 seconds]
<dolphin>
suddenly it would be your responsibility as exporter to ensure 10 second timeout, compared to the importer deciding it depending on if they are up for indefinitive timeout
<emersion>
there are multiple ways to go around this
<emersion>
hm, are you talking about export inside the kernel, or export as a FD?
<dolphin>
FD, mostly
<emersion>
so you could have a new FD type for new-fence
<emersion>
and exporting to a dma_fence would change the rules
<dolphin>
well, then you can loop back to what I said about getting all the userspace libraries to jump a new fence FD type
<dolphin>
everyone seems to be figuring things out by doing downstream hacks on top of dma_buf for now
<dolphin>
in upstream, you only export immovable pinned memory
<emersion>
there are not that many userspace libraries
<X512>
I asked about fence file and syncobj accessed from userland. Why not use syncobj FD for Wayland explicit sync protrocol?
<emersion>
X512: because UMF may need a New Thing
<X512>
It have no wait before submit problem.
<emersion>
so the wl work is stalled on kernel folks figuring out what the New Thing can be
<dolphin>
emersion: how's that? if you include media and compute
<kchibisov>
How would I debug memory issues within just libEGL? Are there options to build only libEGL under ASAN?
jdavies has joined #dri-devel
jdavies is now known as Guest2180
<kchibisov>
I have libEGL segfaulting when handling linux dmabuf v4 with a specific client on my system due to malloc errors.
<emersion>
dolphin: i mostly care about KMS and Vulkan
<dolphin>
emersion: and with compute, I also mean the scale-out network stuff
<dolphin>
well, that doesn't mean the libraries are not there, just means you don't care about them :)
jdavies_ has joined #dri-devel
<emersion>
do you mean oneAPI crap?
<emersion>
VA-API?
<emersion>
or something else?
<emersion>
in any case, i don't care if these degrade to the backwards compat path
<dolphin>
well, the kernel backwards compatibility rules really apply to everyone
<dolphin>
I think libfabric is the hip thing these days for compute
<MrCooper>
X512: don't think you can get a syncobj before the signalling work has been submitted; anyway, that particular use cases seems a lost cause for upstream drivers
<dolphin>
the problem is that those libraries and the compute in general is the reason why folks want the new fence
<emersion>
MrCooper: you can with timelines
<emersion>
the syncobj is a container and the drm_fence in it can be NULL
<emersion>
dma_fence*
alarumbe has joined #dri-devel
<X512>
WAIT_FOR_SUBMIT will cause to wait until dma_fence will be inserted to syncobj.
<emersion>
it's just the mesa implementation-defined behavior
<X512>
It cah be qualified by new Vulkan extension like dma_buf export FD extension.
<emersion>
yes, but jekstrand rejected that idea
<emersion>
on the basis that UMF will likely need a new fence, and we should skip directly to that
<emersion>
the tl;dr is that all explicit sync work is stalled on UMF being solved, as you can see
MajorBiscuit has joined #dri-devel
<bnieuwenhuizen>
I feel like we're falling into a trap of blocking everything on UMF and then being very slow to get UMF though
<emersion>
yes, i agree
<dolphin>
the real question is, is there some use case you can't implement without new fence when it comes to 3D/compositors
<bnieuwenhuizen>
like Jason mentioned 2-4 years for UMF yesterday, which kinda sucks to block any ecosystem improvements
<dolphin>
usually, new fence enters the discussion when you want to do multiple-hour compute workloads
sgruszka has joined #dri-devel
<emersion>
dolphin: fix NVIDIA, and future AMD hw
<emersion>
also wait-before-submit for WSI
<X512>
What is Vulkan semaphore FD for proprietary Nvidia drivers?
<emersion>
X512: no idea
<dolphin>
emersion: nvidia can also adapt the same model everyone else is doing
<emersion>
dolphin: future hw won't support the current model
<dolphin>
why wouldn't 10 second timeout be supported?
<emersion>
might as well move on now, instead of staying with the old model which will become obsolete eventually
<dolphin>
if one failed to perform an action in 10 seconds, your compositor/desktop experience is not going to be great
<emersion>
i don't know the details, i just know AMD will require UMF in the future
<dolphin>
they probably would like to take advantage of HW features that you can achieve with UMF (like everyone else)
<dolphin>
doesn't make it any more required for the desktop compositing and 3D workloads
<emersion>
the ML has more info
<dolphin>
now matter how complex your hardware, you can always assert 10 second timeout
<X512>
Does it mean that applications will be able to freeze whole GUI for 10 seconds?
<emersion>
no
<dolphin>
if you do your compositor right, should just freeze single window
<dolphin>
but if your compositor has a bug (or the underlying userspace driver), then you might
<dolphin>
meaning if the compositor itself hangs, then you may get 10 second stutter
<dolphin>
and that is the rationale for the timeout too, any more time and the user is going to hit the power button
<dolphin>
emersion: long story short, you can always assert a timeout even if your hardware could do something else, and you can make/keep the fence waiting kernel code more straightforward
<dolphin>
as a new FD would be needed, sharing the same name in kernel internally may just add confusion
<dolphin>
everyone would probably just want the new fence, but nobody has enough incentive to do it as everyone would have to be on board to get the benefits
<dolphin>
as one can always resolve the problems just for your own driver and stack in downstream
frieder has quit [Ping timeout: 480 seconds]
<MrCooper>
emersion: future AMD HW can work fine with what we have now (and so can nvidia HW, if they want to)
frieder has joined #dri-devel
<emersion>
AMD devs are saying otherwise
<MrCooper>
upstream drivers can't stop supporting dma_fence anyway, it would be a UAPI regression
<dolphin>
emersion: probably some miscommunication going on
<dolphin>
nothing stops anybody from starting the 10 second timer when creating the fence
<MrCooper>
AMD devs seem to understand this perfectly
<emersion>
alright, then sounds like a good idea to just let explicit sync work bitrot a few more years
<X512>
I am working on proof-of-concept Radeon GPU driver that run in userland as server (daemon) process.
<emersion>
less work for me, i won't complain :)
<X512>
Targeted GPU (Radeon SI) seems too old for prototyping UMF :(
<dolphin>
well, as far as I know, (apart from the corner cases that Vulkan API allows) there's really no use-case for 3D/desktop compositing where you need indefinite workloads
<dolphin>
emersion: the answer to "why?", is really the compute libraries
<X512>
Haiku have no implicit sync and X11 legacy so whole synchonization model can be designed from scratch.
<dolphin>
X512: in userspace, unless you can pin All the Memory (TM), you can't do anything but "new fence"
<dolphin>
as you won't have any guarantees for memory being available
<dolphin>
and if you are doing shmem() + futex() in userspace, then it goes back to danvet's point about why make it in KMD?
<X512>
My driver guarantee that all buffer objects have actually allocated memory and creating new buffer will fail if no more physical memory.
<X512>
For simplity.
<dolphin>
if the kernel does overcommit, the allocated memory is a lie
<dolphin>
your thread will be paused, memory taken away, and returned at later point
<X512>
No overcommit. GTT buffers must be locked in physical memory.
<X512>
Driver reject CPU memory that is not locked.
<dolphin>
right, if you allow userspace to lock unbounded amount of memory, then a lot of things are possible but that's an another story
<dolphin>
however your CPU thread may still be paused due to CPU timeslicing, so even if the GPU work submitted finishes, the CPU thread may not be there to pick it up
<dolphin>
so still can't give a guarantee of 10 seconds
<X512>
Do GPU preemption?
<dolphin>
hm?
<dolphin>
just that the userspace CPU thread can't make promises to complete anything in N seconds
<X512>
Than it is only a problem of one particular userland process and other processes unaffected?
<dolphin>
yes, but is the reason why you can't guarantee to signal a fence in 10 seconds
<dolphin>
aka. you can only do a "new fence" really if you do it in userspace
<X512>
That is the problem with doint in userspace if ignoring legacy compatibility problems?
<X512>
Userland processes can be fully isolated by separate GPU contexts and GPU context switching.
<dolphin>
doing what, exactly?
<dolphin>
you mean the old style dma_fence?
<X512>
No. Memory mapped userland fences.
<dolphin>
hmm, I don't think there is a problem. every waiter is responsible for specifying a timeout
<dolphin>
you don't have any hard guarantees of the system making forward progress, though
<X512>
If freeze is isolated to single GPU context and don't affect other GPU contexts then it seems no problem.
<dolphin>
yeah, if it's not your compositor, then probably not a problem
<X512>
Compositor can render old surface buffer (double buffering) if rendering new buffer by client process is frozen for some reason.
<dolphin>
yeah, but if your compositor itself is frozen, then user sees no new frame :)
<dolphin>
of course if you assign the compositor to a different cgroup and allow it to mlock memory, the chances will be lower
<X512>
I think that it is good idea to prohibit overcommit for compositor and design it so every allocation failure must be gracefully handled (fail to open window etc.).
<X512>
Compositor is critical real time task.
kts has quit [Quit: Leaving]
itoral_ has quit [Remote host closed the connection]
<Lynne>
umf isn't that far away, it's just a few patchsets away from being usable
tzimmermann has quit [Quit: Leaving]
tzimmermann has joined #dri-devel
<MrCooper>
danvet: hmm, can the kernel prevent wait-before-signal with user-mode queues though?
srslypascal has joined #dri-devel
epoll has quit [Ping timeout: 480 seconds]
frieder has quit [Ping timeout: 480 seconds]
fxkamd has joined #dri-devel
agd5f_ has joined #dri-devel
epoll has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
frieder has joined #dri-devel
srslypascal has joined #dri-devel
agd5f has quit [Ping timeout: 480 seconds]
agd5f has joined #dri-devel
agd5f_ has quit [Ping timeout: 480 seconds]
kem has quit [Ping timeout: 480 seconds]
krushia has quit [Quit: Konversation terminated!]
alarumbe has quit [Ping timeout: 480 seconds]
sarnex has joined #dri-devel
sarnex_ has quit [Ping timeout: 480 seconds]
jewins has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
rasterman has quit [Read error: Connection reset by peer]
kts has joined #dri-devel
rasterman has joined #dri-devel
fab has quit [Quit: fab]
<agd5f>
AMD hardware going back to navi1x can do user mode queues for gfx in theory
<ishitatsuyuki>
but does that imply errata-free? :p
<DavidHeidelberg[m]>
Sergi stress-tested it a lot; I also gave it a few runs, so stability turned out to be good.
natto has quit []
kzd has joined #dri-devel
<agd5f>
what we did to support implicit sync is to add a new secure semaphore packet which you put into the user mode queue. UMD tells the KMD what ring write pointer value is for the user queue associated with the implicit sync. The new secure packet then writes the wptr value to a location in the KMD's GPU address space. KMD can then use that value to sync against when it needs to. That memory can also be mapped RO into the GPU address space of
<agd5f>
other processes and they can use wait packets to sync against
natto has joined #dri-devel
fab has quit [Read error: No route to host]
boqun has joined #dri-devel
pochu has joined #dri-devel
<LaserEyess>
agd5f: is there any circumstances where the driver, given access to PIXEL_ENCODING_RGB *should* not use that? Assuming, of course, the display supports it
<LaserEyess>
shouldn't these checks on YCRCB444 and RGB be reversed?
jewins has joined #dri-devel
<LaserEyess>
if it *cannot* use it, I understand, but I don't understand why RGB wouldn't be preferred here
frieder has quit [Remote host closed the connection]
fab has joined #dri-devel
bgs has joined #dri-devel
fab has quit [Read error: No route to host]
<danvet>
MrCooper, it can't prevent that right now for most drivers either, you'd need a cmd parser to make sure userspace didn't sneak in a wait-before-submit
<danvet>
so all we really need is an uapi contract that userspace must not do that, and the pieces I listed on the kernel side to make sure the resulting dma_fence don't break any rules
<danvet>
if userspace breaks the contract it gets to keep the pieces
<danvet>
which results in garbage rendered into that buffer
<danvet>
which userspace can do anyway if it feels like
<danvet>
agd5f, I still think not even that was really necessary
<danvet>
you can also kmap that page and leave it in the gpu va to write into
<danvet>
and then have a little bit of validation on the kernel side to make sure the fence timeline doesn't walk backwards
<agd5f>
LaserEyess, not sure. question for hwentlan
<agd5f>
danvet, yeah, it just guarantees a monotonically increasing value
<LaserEyess>
ok thanks
<danvet>
agd5f, oh the secure write is an rmw cycle to make sure it's only ever going up?
<danvet>
still needs a bunch of kernel code to make sure the timeout&ctx nuking happens, but I guess you can in some cases consume them directly somewhere else
<MrCooper>
danvet: so if there is explicit sync in the display protocol, and user space is silly enough to attempt wait-before-signal, it may appear to work fine most of the time?
<danvet>
MrCooper, yeah
<danvet>
well, wait-before-sync even works nowadays most of the time
<danvet>
until you stack things deep enough and run out of luck at the right time
<danvet>
you don't really need a protocol to shovel them around, between engines/context in one process is good enough
<danvet>
or just engine/cpu
<danvet>
which vk allows
<MrCooper>
well, this is specifically about wait-before-signal drawing to a BO shared between a client and a display server
<danvet>
it's just wait-before-signal
<danvet>
no further conditions needed to go boom
<MrCooper>
I'm afraid I'm also starting to get more confused than I was when I started asking questions this morning :/
<danvet>
so the problem is that if you do wait before signal using dma-fence, then that fundamentally deadlocks
<danvet>
so you need to unwind it in userspace and do the entire drm_syncobj fence materialization dance
<danvet>
which mostly works, except protocols
<danvet>
and it's also a mess
<danvet>
but nothing is stopping userspace from just ignoring all these rules, and happily mixing wait-before-signal and dma_fence
<danvet>
and it will mostly work
<danvet>
even under memory pressure
<danvet>
until your luck works out
<danvet>
*runs out
<danvet>
so the protocol thing isn't really fundamental, it just forces these various gaps and mismatch more clearly into the light
fab has joined #dri-devel
gawin has joined #dri-devel
iive has joined #dri-devel
devilhorns has quit []
lynxeye has quit [Quit: Leaving.]
MajorBiscuit has quit [Ping timeout: 480 seconds]
kem has joined #dri-devel
tursulin has quit [Ping timeout: 480 seconds]
pochu has quit []
JohnnyonFlame has joined #dri-devel
djbw has joined #dri-devel
<agd5f>
danvet, yeah the write pointer is a 64 bit monotonic number since vega
srslypascal has joined #dri-devel
<DemiMarie>
bnieuwenhuizen emersion: I strongly recommend not waiting on UMF
<DemiMarie>
danvet: is the problem that dma-fence is a terrible API?
Kayden has quit [Quit: to the office, where the computers are not, nor are the radio signals]
agd5f_ has joined #dri-devel
agd5f has quit [Ping timeout: 480 seconds]
kts has quit [Quit: Leaving]
Kayden has joined #dri-devel
ngcortes has joined #dri-devel
agd5f_ has quit []
agd5f has joined #dri-devel
<airlied>
dj-death: testing on my dg2 throws up a tiling error, since the engine wants Y tiling will have to figure out what it wants on dg2
vliaskov has quit []
lemonzest has quit [Quit: WeeChat 3.6]
rmckeever has joined #dri-devel
gouchi has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
tzimmermann has quit [Quit: Leaving]
ybogdano has quit [Ping timeout: 480 seconds]
bgs has quit [Remote host closed the connection]
mvlad has quit [Remote host closed the connection]
gouchi has quit [Remote host closed the connection]
aswar002 has joined #dri-devel
<dcbaker>
zmike: I had to pull a couple of not-nominated patches to get one of the nominated zink patches to apply. Its the top 3 on the staing/23.0 branch currently, could you let me know if you're okay with that?
X512 has quit [Quit: Vision[]: i've been blurred!]
ybogdano has joined #dri-devel
<zmike>
dcbaker: whoops, I thought I was on top of conflicts there
<zmike>
dcbaker: ah yeah, that was originally one patch so the fixes tag didn't get propagated when it was split
<zmike>
lgtm
fab has quit [Quit: fab]
sarnex has quit [Quit: Quit]
sarnex has joined #dri-devel
orbea has quit [Remote host closed the connection]
orbea has joined #dri-devel
junaid has quit [Ping timeout: 480 seconds]
<jekstrand>
danvet: Bad tilers!
<danvet>
jekstrand, I just realized that maybe I shouldn't think about memory handling deadlocks that much
<jekstrand>
heh
<danvet>
I also wonder how many kinda funny-to-buggy drivers we have already
<jekstrand>
danvet: I'm not aware of any tilers that absolutely have to allocate mid-batch.
<zmike>
jekstrand: I pinged you on a zink MR, any chance you could take a look in the next couple days
<jekstrand>
Some of them can go faster if the up a pool size
<danvet>
jekstrand, hm right, so GFP_NORECLAIM is good enough
gawin has joined #dri-devel
<danvet>
I still don't want to read all the drivers to check that :-)
<jekstrand>
bnieuwenhuizen: On Mali, varying buffers are allocated in userspace
<jekstrand>
Apple and IMG have a heap that gets allocated by the kernel based on metrics but those can both handle OOM by spilling part-way through the render pass.
<jekstrand>
So if the kernel goes to allocate and fails, it's ok. It just needs to be careful to not throw the old buffer away until it's sure it has the new one.