ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
jewins has quit [Ping timeout: 480 seconds]
jewins has joined #dri-devel
danvet has quit [Ping timeout: 480 seconds]
<DemiMarie> jenatali: I had not thought of secondary command buffers.
<DemiMarie> I thought that the command buffers were handled by what is essentially a hardware state machine.
kts has quit [Quit: Leaving]
iive has quit [Quit: They came for me...]
kts has joined #dri-devel
cphealy has joined #dri-devel
fodasso has quit [Remote host closed the connection]
kts has quit [Quit: Leaving]
jewins has quit [Ping timeout: 480 seconds]
jdavies__ has quit [Remote host closed the connection]
zzoon[m] is now known as zzoon_holidays_till_Tuesday[m]
jewins has joined #dri-devel
Daanct12 has joined #dri-devel
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
Jeremy_Rand_Talos has quit [Remote host closed the connection]
Jeremy_Rand_Talos has joined #dri-devel
kts has quit [Quit: Leaving]
Leopold_ has quit [Ping timeout: 480 seconds]
Daanct12 has quit [Ping timeout: 480 seconds]
Daanct12 has joined #dri-devel
Company has quit [Quit: Leaving]
Daaanct12 has joined #dri-devel
Daanct12 has quit [Ping timeout: 480 seconds]
jewins has quit [Ping timeout: 480 seconds]
lemonzest has quit [Quit: WeeChat 3.6]
heat_ has quit [Ping timeout: 480 seconds]
lemonzest has joined #dri-devel
Daaanct12 has quit [Quit: Leaving]
<DemiMarie> bnieuwenhuizen: what I really want is some evidence from Intel of the security of the firmware.
dviola has quit [Ping timeout: 480 seconds]
dviola has joined #dri-devel
dviola has left #dri-devel [#dri-devel]
dviola has joined #dri-devel
<Lynne> airlied: for some reason rdna2 supports av1 at 422, maybe we should wire support for it
<Lynne> or at least vadumpcaps says it does, no idea if it lies or not, but it does say it outputs 422 with nv12 and p010, which is wrong
unerlige has quit []
sumits has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has joined #dri-devel
<airlied> Lynne: not sure vaapi always reports that I think it'll do internal copies
itoral has joined #dri-devel
tzimmermann has joined #dri-devel
kts has joined #dri-devel
ondracka has quit [Quit: Page closed]
Jeremy_Rand_Talos has quit [Remote host closed the connection]
Jeremy_Rand_Talos has joined #dri-devel
<Lynne> to convert pixfmts? yuck
<Lynne> it does advertise p016 for all other codecs, though, which is direct
<Lynne> probably a missing line
agd5f has joined #dri-devel
bgs has joined #dri-devel
<airlied> Lynne: I'd have to test vaapi with 420 and 422 to see what it actually programs the hw
kts has quit [Quit: Leaving]
<airlied> Lynne: h264 might support it as well
vyivel has quit [Remote host closed the connection]
vyivel has joined #dri-devel
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
agd5f_ has quit [Ping timeout: 480 seconds]
pret7 has quit []
kts has joined #dri-devel
<airlied> Lynne: status queries for anv should work noe
<airlied> now
fab has joined #dri-devel
<Lynne> I'll test it
<Lynne> yup, it
<Lynne> it's working
<Lynne> do you need a command line to generate 422 samples?
danvet has joined #dri-devel
<airlied> Lynne: yeah throw some at me
srslypascal has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<Lynne> ffmpeg -i <input> -c:v libx264 -pix_fmt yuv422p -y test_422.mkv
bgs has quit [Remote host closed the connection]
<Lynne> for 10bit 422, s/yuv422p/yuv422p10
<airlied> just going to try and get anv to pass cts before I get to that
<Lynne> kk
<Lynne> what is radv waiting for to get the patchset merged?
<airlied> more review I think
<airlied> so really either bnieuwenhuizen or hakzsam to throw some more criticism at it :-P
<Lynne> btw you can replace libx264 with libx265 for hevc, just keep in mind it'll generate a rext file due to a bug
<airlied> though in fixing anv I'm seeing some minor fixes for radv
<Lynne> you can still decode them fine (probably) by passing -hwaccel_flags +allow_profile_mismatch before -i when decoding
jfalempe has joined #dri-devel
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
kzd has quit [Quit: kzd]
fab has quit [Quit: fab]
sgruszka has joined #dri-devel
itoral_ has joined #dri-devel
itoral has quit [Ping timeout: 480 seconds]
mvlad has joined #dri-devel
sghuge has joined #dri-devel
rasterman has joined #dri-devel
a1batross has joined #dri-devel
<airlied> okay anv passes all the current CTS tests
<airlied> or will once I rebase/push it
* airlied is wondering can I do separate dpb/dst on inte
<airlied> intel
<airlied> ah no I just misread my code, coincide it is
fab has joined #dri-devel
<dj-death> airlied: it's just on SKL?
nuh^ has quit [Remote host closed the connection]
<airlied> dj-death: SKL+ though I have to test it on DG2 to make sure it works on both ends of the spectrum
* airlied doesn't have an integrated gen11/12
<airlied> h265 is also going to be messy as I think for the as specced vulkan API it would need HuC
<airlied> I have to work out if my DG2 board loads huc at all, I fell down the twisty mei paths
<airlied> dj-death: Lynne tested on an SKL, and I've testing on a whiskylake so far
pochu has joined #dri-devel
kts has quit [Quit: Leaving]
frankbinns has joined #dri-devel
kts has joined #dri-devel
kts has quit [Remote host closed the connection]
tursulin has joined #dri-devel
lynxeye has joined #dri-devel
<dj-death> airlied: appears to load fine on my dg2
<dj-death> airlied: I'm on drm-tip 6.2.0-rc3+
<dj-death> just need the right version of the blob
<MrCooper> daniels danvet emersion jekstrand: if user-mode queues & fences and Vulkan wait-before-submit semantics can be handled in the kernel (which seems required for UMF to be usable by display servers), do we really need explicit sync in the display protocols?
ice99 has joined #dri-devel
vliaskov has joined #dri-devel
jkrzyszt has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
<danvet> MrCooper, can it be handled in the kernel?
<MrCooper> which part specifically?
<danvet> well thus far I've seen some hand-waving that we just stuff umf into dma_resv somehow on the sideline
<danvet> and pretend everything keeps working
<danvet> but also the handle umf wait before submit in the kernel
<danvet> like generally with umf you handle this in userspace by putting the right waits into the userspace queue
<MrCooper> if UMF is to be usable for display servers, the kernel has to be involved somehow, doesn't it?
<danvet> I also haven't seen a reasonable plan for umf vs dma_fence compat mode
<danvet> the hand-waving just assumes everything is umf in your system
<danvet> why?
<danvet> also how?
<danvet> with umf you get no guarantee it'll ever happen
<MrCooper> Wayland compositors want something which can be plugged into an event loop, not "putting the right waits into the userspace queue"
<danvet> so either compositor waits until it's signalled before it submits
<danvet> or you put a magic queue wait with timeout into the command queue
<danvet> then it's not really pure umf anymore
<MrCooper> yeah, "pure UMF" seems unusable for Wayland compositors
<danvet> and the trouble with augmented umf so that it kinda looks like a futex with pollable fd
<danvet> you get into the entire "looks almost like dma_fence, but is entirely incompatible with that" mess
<danvet> and since we need legacy dma_fence supporting mode anyway for the foreseeable future
<MrCooper> entirely incompatible how?
<danvet> I'd just use that and call it done
<danvet> kernel deadlocks in memory reclaim
<danvet> so you can do dma_fence built using umf primitives
<danvet> even with userspace submit and all that
<MrCooper> even with a timeout?
<danvet> but then you ditch the entire nice future fences semantics of umf
<danvet> mutex_lock_timeout does not fix a deadlock
<MrCooper> sounds like Vulkan wait-before-submit semantics can't be safely supported at all?
<danvet> there's the tdr timeout for jobs that take too long, but usually when people talk about timeout in umf context they mean essentially replacing mutex_lock with mutex_lock_timeout and having no plan for what happens when the timeout fails because you've managed to hit the architectural deadlock
<danvet> yes
<danvet> which is the big scary thing and the reason jekstrand wants to ditch
<MrCooper> somebody should tell James Jones that
<danvet> well, wait-before-submit can be supported, that's what the entire drm_syncobj and mesa submit thread stuff is about
<danvet> but there's some older wait before submit vk stuff which wasn't this carefully engineering (mostly könig saying no really)
<danvet> and which just works mostly, but not by design
pcercuei has joined #dri-devel
<danvet> i.e. if you get a cpu fault at just the right time that gets into memory reclaim and hits just the right dma_fence, you functionally deadlock
<danvet> tdr will clean up the mess, but you have a reset for something that vk spec says should work
<MrCooper> submit thread won't fly either I'm afraid, per discussion on the explicit sync Wayland protocol
<danvet> hm link for me to catch up?
<MrCooper> sec
<danvet> MrCooper, for the vk feature that's very gray on linux-dma_fence world ask jekstrand
<danvet> I always forget the exact name
<MrCooper> vkQueuePresentKHR is not supposed to block either
srslypascal has joined #dri-devel
<MrCooper> danvet: FWIW, I can't seem to find the reference right now, but I understand AMD's plans for user-space queues is to signal dma_fences from user space for implicit sync
<danvet> yeah you can do that
<danvet> if you do it right
<danvet> I'm honestly not confident from the discussions thus far
<MrCooper> isn't it fundamentally the same thing as wait-before-signal though?
<danvet> not if you do it right :-)
<danvet> essentially what you need is a) pin memory like with ioctl submit until each request is done
<danvet> have an in-kernel timeline to make sure your ctx is advancing monotonically and in finite time
<danvet> (that was b))
<danvet> c) nuke the entire gpu context if userspace violates this
<danvet> d) which means any wait-before-signal trick you do in the command stream that doesn't work with ioctl submit will also not work with this
<danvet> at least work reliably as in "guaranteed to not occasionally deadlock and end up nuking the app"
<danvet> MrCooper, ok read the wl proto issue, and yes daniels is right
<danvet> unless you handle the dma_fence materialization in the protocol on the compositor side
<danvet> then there's no way to implement this client-side only without breaking some guarantee somewhere
<MrCooper> dma_fence materialization as in once the signal has been submitted for the wait?
<danvet> yeah
<MrCooper> would still need something usable in an event loop for that
<danvet> yeah that's why I put my POLL_PRI comment at the bottom
<danvet> if we want to tack it onto the drm_syncobj fd
<danvet> or we do a new one
<danvet> whatever compositors like really
<MrCooper> but that still couldn't avoid the deadlock issues in the kernel, could it?
<danvet> MrCooper, I think for compositor design it's useful to distinguish dma_fence + drm_syncobj case, where we can do all kinds of things like pollable fd and clear point of "the kernel is committed and guarantees it'll happen"
<danvet> and pure umf, which is a memory location with no guarantees whatsoever about what'll happen to it in the future
<danvet> MrCooper, drm_syncobj doesn't deadlock, because you cannot submit an un-materialized fence as a dependency anywhere
<danvet> the "wait for fence materialization" is why the mesa submit thread is a thing
<danvet> which means the kernel doesn't die, but otoh mesa might not be able to guarantee what the vk/wl specs say
<danvet> which is why this is all a bit annoying :-/
<MrCooper> yeah, seems like a thread can't really work, and it's not supposed to block either
<danvet> yeah I mean fundamentally the problem doesn't disappear
<danvet> by moving it into userspace
<danvet> so you still have the mismatch between what the spec wants and what linux delivers
<MrCooper> James Jones still seems to be under the impression it can work, that's why he's insisting on explicit sync support (in X as well)
<danvet> I think fundamentally you can't have a) dynamic gpu memory management b) future fences c) compositors not getting stuck (i.e. cross security boundary guarantees) d) bottomless queues everywhere (no blocking)
<danvet> nvidia blob largely drops a) on the floor
<danvet> then this is all trivial
<MrCooper> what a mess :/
<danvet> I think they defacto also drop c) and just shrug
mauld has quit [Quit: WeeChat 3.5]
<danvet> but one of these you really have to drop or it boils down to fully abstract diagram that demonstrates the deadlock
frieder has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
<emersion> the protocol should support submitting fences that didn't materialize yet
<emersion> what's wrong with waiting for UMF in the kernel with a timeout, like a Wayland compositor would?
<dolphin> emersion: that's indeed the question :)
<dolphin> I guess it boils down to a lot of code has been designed that just assumes that fence always materializes in the next 10 seconds, so that UMF would have to be a new construct compared to dma_fence
<emersion> is a 10s timeout bad?
<dolphin> the waiters don't have a timeout, the exporter gives a guarantee to either fulfill or fail the fence in that time
Duke`` has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
<emersion> i don't really understand any of this
<danvet> emersion, yeah if you wait for umf in the kernel you just rebuild the mesa submit thread in the kernel
<emersion> yes
<danvet> that's a pile of code on the wrong side of a security boundary
<emersion> that bad?
<emersion> well, it would only be for backwards compat
<danvet> mesa exists already
<emersion> is waiting a security issue?
<danvet> nah but doing all the parsing and copying and stuffing it into a kthread :-)
<emersion> oh there is parsing involved?
<danvet> if you want perfect backwards compat with existing ioctl at least
<MrCooper> it would waste CPU cycles too?
<danvet> if not, then ... mesa submit thread and drm_syncob wait-for-materialize all exists and works
<emersion> i thought it was just waiting for a bit in some chunk of memory
<danvet> MrCooper, well if you don't write a fastpath
<danvet> so more complexity in a legacy submit ioctl which a) tend to be complex already b) why would you touch them
<emersion> i mean there is probably a way to wait for a UMF without wasting CPU cycles?
<danvet> emersion, what dolphin said, you need to wait for umf at a different place than where all the current drivers wait for dma_fence
<emersion> where is that place exaclty?
<danvet> emersion, the cpu wasting is copying all the stuff to the kthread
<danvet> after drm_sched_job_submit() essentially
<danvet> umf wait must be before that
<danvet> which also means you get propagating umf, because the dma_fence you hand back to userspace also becomes an umf
<danvet> it's an infectious property
<danvet> and with drm_syncobj we have the in-kernel semantics to at least stall in the right place
<danvet> with sync_file not even that is a thing
<emersion> okay, then we are in a situation similar to format modifiers, where everybody needs to support it in the whole stack before they can be used?
<danvet> neither in dma_resv
<danvet> and if you don't patch all these and the drivers using them, it's not good for more than a quick "hey this looks easy" demo
<danvet> emersion, I think so
<danvet> but I think I'm also more pessimistic on this than others
mauld has joined #dri-devel
<danvet> I don't think it's more code in the kernel than userspace
<danvet> but also not less at all, and putting all that for compat in the kernel instead of userspace just feels rather wrong
<emersion> i'm not convinced about that, but i'm no kernel dev
<bnieuwenhuizen> on the other hand if nobody comes up with alternatives for years ...
<danvet> bnieuwenhuizen, I haven't seen more than handwaving on the kernel side either
<Lynne> how would mcbp work with user-mode submits?
<MrCooper> so to recap, it sounds like Vulkan wait-before-signal can't be fully supported with upstream drivers; assuming user-mode fences are handled at the kernel level in a way suitable for Wayland compositors, is explicit sync really needed in the display protocols?
<MrCooper> Lynne: GPU FW needs to handle it presumably
<emersion> wait-before-submit* (?)
<MrCooper> wait-before-signal as in the GPU wait is submitted before the corresponding signal
<danvet> MrCooper, imo yes
<emersion> i mean that's what's happening in general, people wait before signal, and the wait is unblocked when the signal happens
<emersion> or maybe my terminology is completely wrong
<danvet> like if protocols really can't be fixed, then I guess we can add an umf slot to dma_buf as another iteration of the most hilarious ipc ever
<danvet> which mesa stuffs in on one side and takes out on the other
Haaninjo has joined #dri-devel
<danvet> kinda like the import/export ioctl that jekstrand landed
<danvet> but I'm really not sure whether ipc-on-dmabuf is a great design
<emersion> yeah, i'd rather not
<MrCooper> danvet: the protocols can be fixed, I'm just wondering if there will be any tangible benefit from the churn
<dolphin> right, but if kernel is just metadata carrier, still can't do dynamic memory management really
<danvet> it's probably the quickest hack forward which isn't a design dumpster fire at a fundamental level though
<danvet> dolphin, you'd need to use the mesa submit threads still
<emersion> well, having a good design sounds like a tangible benefit to me
<emersion> instead of stashing more stuff onto dmabufs
<danvet> it's really just a "wl proto is immutable, let's add the missing field to the dma-buf ipc sidechannel" :-)
<emersion> wl proto is not immutable
<emersion> i can work on the proto side
<emersion> i just need folks to fix the rest :P
* danvet firmly tongue-in-cheek :-)
<dolphin> emersion: asking everyone to move away from dma-buf to new_fence, will be bit of a job
<danvet> but yeah the dma-buf ipc approach would only need changes to dma-buf.c and mesa winsys
<danvet> well, more or less in exactly the same places as the dma-buf fence import/export
<MrCooper> wouldn't it use the dma-buf fence import/export?
<emersion> it doesn't have to be a flagday migration
srslypascal is now known as Guest2178
<danvet> ok gtg now for lunch/workout/sauna, ttyl
srslypascal has joined #dri-devel
<danvet> MrCooper, it's not a dma_fence/sync_file, so it'd be need
<emersion> it can be a gradual migration like format modifiers
<danvet> *new
<danvet> or you're back to the "magic compat layer in the kernel" pandora's box
<MrCooper> emersion: nvidia users would keep suffering from synchronization artifacts until the gradual migration completes
<emersion> is nvidia migrating to UMF today?
<emersion> hm, or do you mean nvidia is completely broken today?
<MrCooper> emersion: re wait-before-signal, so far the kernel supports submitting GPU waits only after the corresponding signal was submitted
<emersion> i'm not following
<emersion> i submit a page-flip IOCTL before the GPU work for the frame has completed
<dolphin> emersion: today you're not allowed to publish a fence unless you guarantee to be able to resolve it
<emersion> the kernel will wait until the completion is signalled
<emersion> this is wait-before-signal
<MrCooper> not the same thing
<MrCooper> wait-before-signal is about GPU semaphores
<emersion> what is signal() then? it's not signalling completion of the fence?
<emersion> is signal() the thing that materializes the fence?
<MrCooper> emersion: BTW, nvidia doesn't even do implicit sync for page flips yet, so may get incomplete frames even for normal compositor drawing
<MrCooper> signalling the semaphore
Guest2178 has quit [Remote host closed the connection]
sgruszka has quit [Ping timeout: 480 seconds]
<emersion> somehow i'm more confused at the end of the discussion than at the start
<emersion> okay, so nvidia is broken, but since this is a sync issue, it only manifests itself in some edge cases
<MrCooper> it is a complex topic :)
<dolphin> emersion: just having a fence somewhere returned by IOCTL means it'll have to be signalled in <10 seconds
<dolphin> if you have a reference to a fence, it exists and you can expect it to fulfil or fail in that time period
<emersion> right, if we have UMF waits in the kernel, we can keep that guarantee
<dolphin> so you can do stuff like just plain wait it, holding the system making forward progress until it signals
kts has joined #dri-devel
<emersion> yes
<dolphin> well, you'd have to fix all users of dma fences in the kernel not to do that
<emersion> why?
<emersion> can't the dma_fence emulation for UMF record the creation time
<emersion> and timneout if the wait exceeds that timestamp+10s?
<dolphin> UMF fence means there's no guarantee when it will complete => potential memory management deadlocks
<emersion> UMF with a timeout fixes that issue
<dolphin> well, everybody is mostly after UMF for the fact that there would be no exporter enforced timeout
<dolphin> so UMF with fixed timeout doesn't really sound that much better than current dma fences
devilhorns has joined #dri-devel
<emersion> i mean, if there is a timeout for backwards-compat only, could be fine?
heat has joined #dri-devel
<dolphin> well, then you would essentially have two classes of dma fence
<dolphin> which really is kind of equal to the new fence which danvet and I referred to
<emersion> yeah, it would be a new fence, but with possible backwards compat with old dma_fence
<emersion> and then we can have a new-fence wl proto, KMS uAPI, etc
<dolphin> the thing is, because you can chain those fences
<dolphin> you basically have to fix pretty much all of it before you get the benefit
X512 has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
<emersion> yeah, you need to fix one driver + one compositor + mesa
<emersion> to not hit the backwards compat dma_fence codepath
<X512> Why 2 types of syncronization FD exists (fence file, syncobj)? Is fence file obsolete?
<emersion> X512: i'd suggest reading the kernel docs for drm_syncobj
<X512> It seems that it is possible to implement everything in userland by using only syncobj FDs.
<X512> fence faile is redurant.
<X512> faile -> file
<dolphin> emersion: well, the problem is if you export the fence, it can be imported by pretty much anybody
<dolphin> so you'd have to change the rules based on who imports it
alarumbe has quit [Ping timeout: 480 seconds]
<dolphin> suddenly it would be your responsibility as exporter to ensure 10 second timeout, compared to the importer deciding it depending on if they are up for indefinitive timeout
<emersion> there are multiple ways to go around this
<emersion> hm, are you talking about export inside the kernel, or export as a FD?
<dolphin> FD, mostly
<emersion> so you could have a new FD type for new-fence
<emersion> and exporting to a dma_fence would change the rules
<dolphin> well, then you can loop back to what I said about getting all the userspace libraries to jump a new fence FD type
<dolphin> everyone seems to be figuring things out by doing downstream hacks on top of dma_buf for now
<dolphin> in upstream, you only export immovable pinned memory
<emersion> there are not that many userspace libraries
<X512> I asked about fence file and syncobj accessed from userland. Why not use syncobj FD for Wayland explicit sync protrocol?
<emersion> X512: because UMF may need a New Thing
<X512> It have no wait before submit problem.
<emersion> so the wl work is stalled on kernel folks figuring out what the New Thing can be
<dolphin> emersion: how's that? if you include media and compute
<kchibisov> How would I debug memory issues within just libEGL? Are there options to build only libEGL under ASAN?
jdavies has joined #dri-devel
jdavies is now known as Guest2180
<kchibisov> I have libEGL segfaulting when handling linux dmabuf v4 with a specific client on my system due to malloc errors.
<emersion> dolphin: i mostly care about KMS and Vulkan
<dolphin> emersion: and with compute, I also mean the scale-out network stuff
<dolphin> well, that doesn't mean the libraries are not there, just means you don't care about them :)
jdavies_ has joined #dri-devel
<emersion> do you mean oneAPI crap?
<emersion> VA-API?
<emersion> or something else?
<emersion> in any case, i don't care if these degrade to the backwards compat path
<dolphin> well, the kernel backwards compatibility rules really apply to everyone
<dolphin> I think libfabric is the hip thing these days for compute
<MrCooper> X512: don't think you can get a syncobj before the signalling work has been submitted; anyway, that particular use cases seems a lost cause for upstream drivers
<dolphin> the problem is that those libraries and the compute in general is the reason why folks want the new fence
<emersion> MrCooper: you can with timelines
<emersion> the syncobj is a container and the drm_fence in it can be NULL
<emersion> dma_fence*
alarumbe has joined #dri-devel
<X512> WAIT_FOR_SUBMIT will cause to wait until dma_fence will be inserted to syncobj.
Guest2180 has quit [Ping timeout: 480 seconds]
Company has joined #dri-devel
<MrCooper> maybe it's "just" prone to deadlocks in the kernel then
<emersion> X512: btw, the wl proto based on drm_sycnobj is here: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/90
frieder has quit [Ping timeout: 480 seconds]
<MrCooper> or maybe you can get a syncobj like that, but still can't actually submit the GPU wait work before the signal work
<emersion> vulkan lacks an ext to import a drm_syncobj, so yeah you can't do that right now
<emersion> you can if you stay in vulkan-land
frieder has joined #dri-devel
<X512> Isn't Vulkan timeline semaphore exported opaque FD actually syncobj FD?
<emersion> oh, right
<emersion> but there's no guarantee it will be
<emersion> it's just the mesa implementation-defined behavior
<X512> It cah be qualified by new Vulkan extension like dma_buf export FD extension.
<emersion> yes, but jekstrand rejected that idea
<emersion> on the basis that UMF will likely need a new fence, and we should skip directly to that
<emersion> the tl;dr is that all explicit sync work is stalled on UMF being solved, as you can see
MajorBiscuit has joined #dri-devel
<bnieuwenhuizen> I feel like we're falling into a trap of blocking everything on UMF and then being very slow to get UMF though
<emersion> yes, i agree
<dolphin> the real question is, is there some use case you can't implement without new fence when it comes to 3D/compositors
<bnieuwenhuizen> like Jason mentioned 2-4 years for UMF yesterday, which kinda sucks to block any ecosystem improvements
<dolphin> usually, new fence enters the discussion when you want to do multiple-hour compute workloads
sgruszka has joined #dri-devel
<emersion> dolphin: fix NVIDIA, and future AMD hw
<emersion> also wait-before-submit for WSI
<X512> What is Vulkan semaphore FD for proprietary Nvidia drivers?
<emersion> X512: no idea
<dolphin> emersion: nvidia can also adapt the same model everyone else is doing
<emersion> dolphin: future hw won't support the current model
<dolphin> why wouldn't 10 second timeout be supported?
<emersion> might as well move on now, instead of staying with the old model which will become obsolete eventually
<dolphin> if one failed to perform an action in 10 seconds, your compositor/desktop experience is not going to be great
<emersion> i don't know the details, i just know AMD will require UMF in the future
<dolphin> they probably would like to take advantage of HW features that you can achieve with UMF (like everyone else)
<dolphin> doesn't make it any more required for the desktop compositing and 3D workloads
<emersion> the ML has more info
<dolphin> now matter how complex your hardware, you can always assert 10 second timeout
<X512> Does it mean that applications will be able to freeze whole GUI for 10 seconds?
<emersion> no
<dolphin> if you do your compositor right, should just freeze single window
<dolphin> but if your compositor has a bug (or the underlying userspace driver), then you might
<dolphin> meaning if the compositor itself hangs, then you may get 10 second stutter
<dolphin> and that is the rationale for the timeout too, any more time and the user is going to hit the power button
<dolphin> emersion: long story short, you can always assert a timeout even if your hardware could do something else, and you can make/keep the fence waiting kernel code more straightforward
<dolphin> as a new FD would be needed, sharing the same name in kernel internally may just add confusion
<dolphin> everyone would probably just want the new fence, but nobody has enough incentive to do it as everyone would have to be on board to get the benefits
<dolphin> as one can always resolve the problems just for your own driver and stack in downstream
frieder has quit [Ping timeout: 480 seconds]
<MrCooper> emersion: future AMD HW can work fine with what we have now (and so can nvidia HW, if they want to)
frieder has joined #dri-devel
<emersion> AMD devs are saying otherwise
<MrCooper> upstream drivers can't stop supporting dma_fence anyway, it would be a UAPI regression
<dolphin> emersion: probably some miscommunication going on
<dolphin> nothing stops anybody from starting the 10 second timer when creating the fence
<MrCooper> AMD devs seem to understand this perfectly
<emersion> alright, then sounds like a good idea to just let explicit sync work bitrot a few more years
<X512> I am working on proof-of-concept Radeon GPU driver that run in userland as server (daemon) process.
<emersion> less work for me, i won't complain :)
<X512> Targeted GPU (Radeon SI) seems too old for prototyping UMF :(
<dolphin> well, as far as I know, (apart from the corner cases that Vulkan API allows) there's really no use-case for 3D/desktop compositing where you need indefinite workloads
<dolphin> emersion: the answer to "why?", is really the compute libraries
<X512> Haiku have no implicit sync and X11 legacy so whole synchonization model can be designed from scratch.
<dolphin> X512: in userspace, unless you can pin All the Memory (TM), you can't do anything but "new fence"
<dolphin> as you won't have any guarantees for memory being available
<dolphin> and if you are doing shmem() + futex() in userspace, then it goes back to danvet's point about why make it in KMD?
<X512> My driver guarantee that all buffer objects have actually allocated memory and creating new buffer will fail if no more physical memory.
<X512> For simplity.
<dolphin> if the kernel does overcommit, the allocated memory is a lie
<dolphin> your thread will be paused, memory taken away, and returned at later point
<X512> No overcommit. GTT buffers must be locked in physical memory.
<X512> Driver reject CPU memory that is not locked.
<dolphin> right, if you allow userspace to lock unbounded amount of memory, then a lot of things are possible but that's an another story
<dolphin> however your CPU thread may still be paused due to CPU timeslicing, so even if the GPU work submitted finishes, the CPU thread may not be there to pick it up
<dolphin> so still can't give a guarantee of 10 seconds
<X512> Do GPU preemption?
<dolphin> hm?
<dolphin> just that the userspace CPU thread can't make promises to complete anything in N seconds
<X512> Than it is only a problem of one particular userland process and other processes unaffected?
<dolphin> yes, but is the reason why you can't guarantee to signal a fence in 10 seconds
<dolphin> aka. you can only do a "new fence" really if you do it in userspace
<X512> That is the problem with doint in userspace if ignoring legacy compatibility problems?
<X512> Userland processes can be fully isolated by separate GPU contexts and GPU context switching.
<dolphin> doing what, exactly?
<dolphin> you mean the old style dma_fence?
<X512> No. Memory mapped userland fences.
<dolphin> hmm, I don't think there is a problem. every waiter is responsible for specifying a timeout
<dolphin> you don't have any hard guarantees of the system making forward progress, though
<X512> If freeze is isolated to single GPU context and don't affect other GPU contexts then it seems no problem.
<dolphin> yeah, if it's not your compositor, then probably not a problem
<X512> Compositor can render old surface buffer (double buffering) if rendering new buffer by client process is frozen for some reason.
<dolphin> yeah, but if your compositor itself is frozen, then user sees no new frame :)
<dolphin> of course if you assign the compositor to a different cgroup and allow it to mlock memory, the chances will be lower
<X512> I think that it is good idea to prohibit overcommit for compositor and design it so every allocation failure must be gracefully handled (fail to open window etc.).
<X512> Compositor is critical real time task.
kts has quit [Quit: Leaving]
itoral_ has quit [Remote host closed the connection]
<Lynne> umf isn't that far away, it's just a few patchsets away from being usable
tzimmermann has quit [Quit: Leaving]
tzimmermann has joined #dri-devel
<MrCooper> danvet: hmm, can the kernel prevent wait-before-signal with user-mode queues though?
srslypascal has joined #dri-devel
epoll has quit [Ping timeout: 480 seconds]
frieder has quit [Ping timeout: 480 seconds]
fxkamd has joined #dri-devel
agd5f_ has joined #dri-devel
epoll has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
frieder has joined #dri-devel
srslypascal has joined #dri-devel
agd5f has quit [Ping timeout: 480 seconds]
agd5f has joined #dri-devel
agd5f_ has quit [Ping timeout: 480 seconds]
kem has quit [Ping timeout: 480 seconds]
krushia has quit [Quit: Konversation terminated!]
alarumbe has quit [Ping timeout: 480 seconds]
sarnex has joined #dri-devel
sarnex_ has quit [Ping timeout: 480 seconds]
jewins has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
rasterman has quit [Read error: Connection reset by peer]
kts has joined #dri-devel
rasterman has joined #dri-devel
fab has quit [Quit: fab]
<agd5f> AMD hardware going back to navi1x can do user mode queues for gfx in theory
<ishitatsuyuki> but does that imply errata-free? :p
<MrCooper> Lyude: FYI, https://gitlab.freedesktop.org/drm/amd/-/issues/2171#note_1734122 and its grand-parent comment indicate hwentlan's series doesn't fully fix the MST regressions
alarumbe has joined #dri-devel
<nroberts> are there any plans to make meson support generating the rustdoc documetation?
greenjustin_ is now known as greenjustin
ybogdano has joined #dri-devel
sgruszka has quit [Remote host closed the connection]
fab has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
<DavidHeidelberg[m]> going to merge kernel uprev from 6.0 to 6.1, any objections? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20855
pochu has quit []
<DavidHeidelberg[m]> Sergi stress-tested it a lot; I also gave it a few runs, so stability turned out to be good.
natto has quit []
kzd has joined #dri-devel
<agd5f> what we did to support implicit sync is to add a new secure semaphore packet which you put into the user mode queue. UMD tells the KMD what ring write pointer value is for the user queue associated with the implicit sync. The new secure packet then writes the wptr value to a location in the KMD's GPU address space. KMD can then use that value to sync against when it needs to. That memory can also be mapped RO into the GPU address space of
<agd5f> other processes and they can use wait packets to sync against
natto has joined #dri-devel
fab has quit [Read error: No route to host]
boqun has joined #dri-devel
pochu has joined #dri-devel
<LaserEyess> agd5f: is there any circumstances where the driver, given access to PIXEL_ENCODING_RGB *should* not use that? Assuming, of course, the display supports it
<LaserEyess> for example
<LaserEyess> shouldn't these checks on YCRCB444 and RGB be reversed?
jewins has joined #dri-devel
<LaserEyess> if it *cannot* use it, I understand, but I don't understand why RGB wouldn't be preferred here
frieder has quit [Remote host closed the connection]
fab has joined #dri-devel
bgs has joined #dri-devel
fab has quit [Read error: No route to host]
<danvet> MrCooper, it can't prevent that right now for most drivers either, you'd need a cmd parser to make sure userspace didn't sneak in a wait-before-submit
<danvet> so all we really need is an uapi contract that userspace must not do that, and the pieces I listed on the kernel side to make sure the resulting dma_fence don't break any rules
<danvet> if userspace breaks the contract it gets to keep the pieces
<danvet> which results in garbage rendered into that buffer
<danvet> which userspace can do anyway if it feels like
<danvet> agd5f, I still think not even that was really necessary
<danvet> you can also kmap that page and leave it in the gpu va to write into
<danvet> and then have a little bit of validation on the kernel side to make sure the fence timeline doesn't walk backwards
<agd5f> LaserEyess, not sure. question for hwentlan
<agd5f> danvet, yeah, it just guarantees a monotonically increasing value
<LaserEyess> ok thanks
<danvet> agd5f, oh the secure write is an rmw cycle to make sure it's only ever going up?
<danvet> still needs a bunch of kernel code to make sure the timeout&ctx nuking happens, but I guess you can in some cases consume them directly somewhere else
<MrCooper> danvet: so if there is explicit sync in the display protocol, and user space is silly enough to attempt wait-before-signal, it may appear to work fine most of the time?
<danvet> MrCooper, yeah
<danvet> well, wait-before-sync even works nowadays most of the time
<danvet> until you stack things deep enough and run out of luck at the right time
<danvet> you don't really need a protocol to shovel them around, between engines/context in one process is good enough
<danvet> or just engine/cpu
<danvet> which vk allows
<MrCooper> well, this is specifically about wait-before-signal drawing to a BO shared between a client and a display server
<danvet> it's just wait-before-signal
<danvet> no further conditions needed to go boom
<MrCooper> I'm afraid I'm also starting to get more confused than I was when I started asking questions this morning :/
<danvet> so the problem is that if you do wait before signal using dma-fence, then that fundamentally deadlocks
<danvet> so you need to unwind it in userspace and do the entire drm_syncobj fence materialization dance
<danvet> which mostly works, except protocols
<danvet> and it's also a mess
<danvet> but nothing is stopping userspace from just ignoring all these rules, and happily mixing wait-before-signal and dma_fence
<danvet> and it will mostly work
<danvet> even under memory pressure
<danvet> until your luck works out
<danvet> *runs out
<danvet> so the protocol thing isn't really fundamental, it just forces these various gaps and mismatch more clearly into the light
fab has joined #dri-devel
gawin has joined #dri-devel
iive has joined #dri-devel
devilhorns has quit []
lynxeye has quit [Quit: Leaving.]
MajorBiscuit has quit [Ping timeout: 480 seconds]
kem has joined #dri-devel
tursulin has quit [Ping timeout: 480 seconds]
pochu has quit []
JohnnyonFlame has joined #dri-devel
djbw has joined #dri-devel
<agd5f> danvet, yeah the write pointer is a 64 bit monotonic number since vega
srslypascal has joined #dri-devel
<DemiMarie> bnieuwenhuizen emersion: I strongly recommend not waiting on UMF
<DemiMarie> danvet: is the problem that dma-fence is a terrible API?
Kayden has quit [Quit: to the office, where the computers are not, nor are the radio signals]
agd5f_ has joined #dri-devel
agd5f has quit [Ping timeout: 480 seconds]
kts has quit [Quit: Leaving]
Kayden has joined #dri-devel
ngcortes has joined #dri-devel
agd5f_ has quit []
agd5f has joined #dri-devel
<airlied> dj-death: testing on my dg2 throws up a tiling error, since the engine wants Y tiling will have to figure out what it wants on dg2
vliaskov has quit []
lemonzest has quit [Quit: WeeChat 3.6]
rmckeever has joined #dri-devel
gouchi has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
tzimmermann has quit [Quit: Leaving]
ybogdano has quit [Ping timeout: 480 seconds]
bgs has quit [Remote host closed the connection]
mvlad has quit [Remote host closed the connection]
srslypascal has joined #dri-devel
junaid has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
Mangix has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
Mangix has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
gawin has quit [Ping timeout: 480 seconds]
libv_ has joined #dri-devel
libv has quit [Ping timeout: 480 seconds]
jluthra has quit [Remote host closed the connection]
jluthra has joined #dri-devel
<danvet> jekstrand, tilers that allocate more memory while executing a batch
ice99 has quit [Ping timeout: 480 seconds]
jkrzyszt has quit [Ping timeout: 480 seconds]
aswar002 has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
Duke`` has quit [Ping timeout: 480 seconds]
gouchi has quit [Remote host closed the connection]
aswar002 has joined #dri-devel
<dcbaker> zmike: I had to pull a couple of not-nominated patches to get one of the nominated zink patches to apply. Its the top 3 on the staing/23.0 branch currently, could you let me know if you're okay with that?
X512 has quit [Quit: Vision[]: i've been blurred!]
ybogdano has joined #dri-devel
<zmike> dcbaker: whoops, I thought I was on top of conflicts there
<zmike> dcbaker: ah yeah, that was originally one patch so the fixes tag didn't get propagated when it was split
<zmike> lgtm
fab has quit [Quit: fab]
sarnex has quit [Quit: Quit]
sarnex has joined #dri-devel
orbea has quit [Remote host closed the connection]
orbea has joined #dri-devel
junaid has quit [Ping timeout: 480 seconds]
<jekstrand> danvet: Bad tilers!
<danvet> jekstrand, I just realized that maybe I shouldn't think about memory handling deadlocks that much
<jekstrand> heh
<danvet> I also wonder how many kinda funny-to-buggy drivers we have already
<jekstrand> danvet: I'm not aware of any tilers that absolutely have to allocate mid-batch.
<zmike> jekstrand: I pinged you on a zink MR, any chance you could take a look in the next couple days
<jekstrand> Some of them can go faster if the up a pool size
<danvet> jekstrand, hm right, so GFP_NORECLAIM is good enough
gawin has joined #dri-devel
<danvet> I still don't want to read all the drivers to check that :-)
fab has joined #dri-devel
fab has quit []
<jekstrand> zmike: translated, more-or-less.
<jekstrand> bnieuwenhuizen: On Mali, varying buffers are allocated in userspace
<jekstrand> Apple and IMG have a heap that gets allocated by the kernel based on metrics but those can both handle OOM by spilling part-way through the render pass.
<jekstrand> So if the kernel goes to allocate and fails, it's ok. It just needs to be careful to not throw the old buffer away until it's sure it has the new one.
YuGiOhJCJ has joined #dri-devel
ngcortes has quit [Ping timeout: 480 seconds]
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
HerrSpliet has joined #dri-devel
TMM has joined #dri-devel
<danvet> lina, ^^ might also need more GFP_NORECLAIM ...
RSpliet has quit [Ping timeout: 480 seconds]
ngcortes has joined #dri-devel
danvet has quit [Ping timeout: 480 seconds]
<zmike> jekstrand: awesome, thanks
Jeremy_Rand_Talos has quit [Remote host closed the connection]
gawin has quit [Ping timeout: 480 seconds]
Jeremy_Rand_Talos has joined #dri-devel
jdavies__ has joined #dri-devel
frankbinns1 has joined #dri-devel
frankbinns has quit [Remote host closed the connection]
jdavies_ has quit [Remote host closed the connection]
ngcortes has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
pcercuei has quit [Quit: dodo]
ngcortes has joined #dri-devel
gawin has joined #dri-devel
<airlied> dj-death: okay dg2 needs some work, once I got past all the missing MOCS, will try and figure that out
<dj-death> airlied: thanks
ngcortes has quit [Remote host closed the connection]
ngcortes has joined #dri-devel