ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
nchery has quit [Ping timeout: 480 seconds]
helmhotz has joined #dri-devel
epoch101_ has quit [Ping timeout: 480 seconds]
epoch101 has joined #dri-devel
helmhotz has quit [Ping timeout: 480 seconds]
helmhotz has joined #dri-devel
epoch101 has quit [Ping timeout: 480 seconds]
epoch101 has joined #dri-devel
epoch101 has quit []
epoch101 has joined #dri-devel
iive has quit [Ping timeout: 480 seconds]
thegeeko has joined #dri-devel
nchery has joined #dri-devel
thegeeko has quit []
thegeeko has joined #dri-devel
<thegeeko>
Hi .. I'm trying to understand what vmids in amdgpus are .. I understood that they're vm created auto by the driver and u can ask and reserve some .. then I created an amdgpu device a bo and mapped the buffer object to cpu mem and wrote to the buffer using umr to read the mem I can see my writes are there .. the issue is when I read from vmid 1 it works and 2, 3, and 4 .. why so ? shouldn't it be 1 vmid per application ?
<soreau>
apparently, the change to GL name reuse is causing some nasty artifacts on gles compositors and want to make sure how to set this before trying latest mesa with the switch to test. related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12707
thegeeko has quit [Quit: Konversation terminated!]
thegeeko has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
DragoonAethis has quit [Quit: hej-hej!]
DragoonAethis has joined #dri-devel
alanc has quit [Remote host closed the connection]
<soreau>
I guess gstreamer vaapi waylandsink is broken on mesa git too but that's a bisect for another day :P
thegeeko has quit [Quit: Konversation terminated!]
thegeeko has joined #dri-devel
mbrost has joined #dri-devel
nchery has quit [Ping timeout: 480 seconds]
helmhotz has joined #dri-devel
benjaminl has quit [Remote host closed the connection]
benjaminl has joined #dri-devel
sguddati has joined #dri-devel
kts has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
helmhotz has quit [Ping timeout: 480 seconds]
thegeeko has quit [Quit: Konversation terminated!]
thegeeko has joined #dri-devel
a_fantom has joined #dri-devel
fantom has quit [Ping timeout: 480 seconds]
mbrost_ has joined #dri-devel
alane_ has joined #dri-devel
sguddati has quit [Ping timeout: 480 seconds]
mbrost has quit [Ping timeout: 480 seconds]
alane has quit [Ping timeout: 480 seconds]
epoch101 has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
smaeul_ has joined #dri-devel
smaeul has quit [Ping timeout: 480 seconds]
nerdopolis has quit [Ping timeout: 480 seconds]
<jadahl>
so with mutter more or less giving up on using atomic KMS for cursor updates for the time being, are there any guarantees that mixing legacy and atomic API is actually expected o work across all drivers?
mbrost_ has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
u-amarsh04 has quit []
amarsh04 has joined #dri-devel
zsoltiv_ has quit [Ping timeout: 480 seconds]
The_Company has quit [Read error: Connection reset by peer]
kts has quit [Ping timeout: 480 seconds]
thegeeko has quit [Quit: Konversation terminated!]
thegeeko has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
smaeul_ has quit [Read error: Connection reset by peer]
smaeul has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
jsa1 has joined #dri-devel
smaeul_ has joined #dri-devel
smaeul has quit [Read error: Connection reset by peer]
smaeul_ has quit [Ping timeout: 480 seconds]
Jeremy_Rand_Talos_ has quit [Remote host closed the connection]
Jeremy_Rand_Talos_ has joined #dri-devel
smaeul_ has joined #dri-devel
fab has joined #dri-devel
smaeul has joined #dri-devel
smaeul has quit []
smaeul_ has quit [Read error: No route to host]
mbrost has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
mehdi-djait3397165695212282475 has joined #dri-devel
thegeeko has quit [Quit: Konversation terminated!]
thegeeko has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
mszyprow has joined #dri-devel
gnuiyl_ has quit [Remote host closed the connection]
amarsh04 has quit []
glennk has joined #dri-devel
iive has joined #dri-devel
jsa1 has quit [Ping timeout: 480 seconds]
amarsh04 has joined #dri-devel
gnuiyl has joined #dri-devel
iive has quit [Ping timeout: 480 seconds]
thegeeko has quit [Quit: Konversation terminated!]
thegeeko has joined #dri-devel
jsa1 has joined #dri-devel
thegeeko has quit [Quit: Konversation terminated!]
thegeeko has joined #dri-devel
fab has quit [Quit: fab]
vliaskov has joined #dri-devel
kugel has quit [Ping timeout: 480 seconds]
smaeul has joined #dri-devel
vliaskov_ has joined #dri-devel
kugel has joined #dri-devel
vliaskov has quit [Ping timeout: 480 seconds]
kasper93 has quit [Ping timeout: 480 seconds]
tzimmermann has joined #dri-devel
thegeeko has quit [Quit: Konversation terminated!]
<tzimmermann>
shouldn't we do this in drm_dev_unplug() already ?
thegeeko has quit [Quit: Konversation terminated!]
thegeeko has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
sima has joined #dri-devel
thegeeko has quit [Quit: Konversation terminated!]
thegeeko has joined #dri-devel
sally has quit []
thegeeko has quit [Quit: Konversation terminated!]
thegeeko has joined #dri-devel
frankbinns has quit [Ping timeout: 480 seconds]
mvlad has joined #dri-devel
thegeeko has quit [Quit: Konversation terminated!]
thegeeko has joined #dri-devel
frankbinns has joined #dri-devel
illwieckz has quit [Remote host closed the connection]
thegeeko has quit [Quit: Konversation terminated!]
illwieckz has joined #dri-devel
apinheiro has joined #dri-devel
yrlf has quit [Ping timeout: 480 seconds]
yrlf has joined #dri-devel
sally has joined #dri-devel
<javierm>
tzimmermann: isn't drm_dev_unplug() only to make the DRM device not accessible to user-space? Why would the dev->dev kref be decremented for the parent device at this point?
<javierm>
or maybe I misunderstood the question
jsa1 has quit [Ping timeout: 480 seconds]
<tzimmermann>
javierm, sima, drm_dev_unplug() happens on pci/usb/etc device removal. i was a bit surprosed that we're not immediatelly removing our device references. that put_device() only happens when the drm_device is being cleaned up AFAIU.
<tzimmermann>
and drm_dev_unplug() signals drivers to not use the hardware device any longer. so i was under the impressino that we'd also release the hardware device then.
jsa1 has joined #dri-devel
jkrzyszt has joined #dri-devel
sgruszka has joined #dri-devel
mbrost has joined #dri-devel
mehdi-djait3397165695212282475 has quit [Ping timeout: 480 seconds]
<sima>
tzimmermann, it's complicated
<sima>
and buggy
<sima>
tzimmermann, the short summary is that drm_device is both the hw device
<sima>
and the lifetime of that ends after ->remove/unplug/whatever is finishes
<sima>
*finished
jsa1 has quit [Ping timeout: 480 seconds]
<sima>
and so that's what drm_dev_unplug does essentially, tell all concurrent code that the hw device is gone
<sima>
argh, I already got it wrong before I started :-/
<sima>
I typed this up somewhere, it's really complicated
<sima>
tzimmermann, while I'm trying to find a reference, might be better to start with why you're pondering this?
<for_opel_astra>
I decided to also quit talking about those things, as i am not being keenly expected to participate on other matters either it seems, even though mareko asked if i would participate and though it was very kind remark imo as the man himself is also cool guy but we'd just end up splitting our ways, and if that is not going to work out , it's a fight called in, i ain't gonna let myself
<for_opel_astra>
consistently humiliated and assaulted at all. I do not possibly see issues you see and that is clear underneath my soul.
<sima>
javierm, ah yeah that's the proper write up
<sima>
I tried to find it in our docs somewhere
<sima>
javierm, one part this is missing is lifetime of the module itself aka try_module_get()
<sima>
which is intentionally broken because developers prefer module unload convenience over correctness
<sima>
tzimmermann, maybe we should convert the mail javierm dug out into a doc section somewhere?
<tzimmermann>
sima, javierm, thanks. i'll read through this. i'm not saying it's broken. i looked through this code and that caught my eye. it is not what i expected
<sima>
tzimmermann, it's trapdoors all the way down in this area unfortunately
<sima>
maybe in a separate section for drivers about how to implement this
<tzimmermann>
sima, javierm, on that url: point 1: "devm is for hardware stuff, [...] _even_ when you hold onto a struct device reference." that nicely avoids answering my question. :p when all the HW resources are gone. what's the point of keeping the reference. shouldn't we at least _try_ to put the device reference here?
<sima>
tzimmermann, that's an uaf
<sima>
at least with current drivers
<sima>
if you have a driver that's entirely using drmm and devm, then yeah the very next thing will be the drm_dev_put()
<sima>
but we're not there yet, we'd need a devm_drm_dev_register for that
<sima>
iirc
* sima
not entirely awake, still a bit a stuffy nose
<sima>
so it's essentially 1. drm_dev_unplug 2. legacy cleanup for drivers that aren't fully using drmm or devm 3. drm_dev_put
<sima>
tzimmermann, there's also the hilarious confusion that most developers insist on drm_dev_unregister without the hotunplug, so that drm_atomic_helper_shutdown does something useful
<sima>
it's ... a complete mess
<tzimmermann>
it's ok. my take-away is that not all drivers support an early put_device()
mszyprow has quit [Remote host closed the connection]
<sima>
yeah I think in theory your understanding is the conceptually clean approach
<tzimmermann>
hence it's done later
<sima>
and what I'm trying to push drivers towards, eventually
mszyprow has joined #dri-devel
<tzimmermann>
sima, great. thanks a lot
<sima>
like one practical approach is that for production we really want drm_dev_unplug
<sima>
but developers really want drm_dev_unregister + hw shutdown
<sima>
and you can't have both
<sima>
and that's a bit the hold-up for converting the last bits over to drmm/devm for at least simpler drivers
<tzimmermann>
i personally do dev_unplug and that's it
<sima>
tzimmermann, but yeah might be worth it to review some of the simpler drivers and see what exactly we still have between the drm_dev_unplug and the drm_dev_put
<sima>
tzimmermann, it's a pain if you want to use module reload for testing new driver code without a full reboot
<javierm>
sima: even module removal I found that is usually not well tested for drivers in embedded plaforms, because people usually just built-in
<sima>
dakr, airlied I dropped a reply somewhere in that very big and very confused nova-core hotunplug discussion
<sima>
probably need more docs, see also discussion right now here
<sima>
javierm, yeah it's all very messy
<sima>
javierm, but driver unload should work, because that's usually subsystem levels of broken if it doesn't
<sima>
*cough* drm_bridge *cough*
<sima>
dakr, if I should reply somewhere else in there pls holler
<sima>
tzimmermann, looking at some tiny drivers I think what we'd need is a devm_drm_mode_config_reset (calls drm_atomic_helper_shutdown as devm cleanup action)
<sima>
and devm_drm_dev_register() (calls drm_dev_unplug as cleanup action)
<sima>
and we'd get to the nirvana world of "look nothing in ->remove callbacks anymore"
<sima>
oh also that's kinda the reason for the drm_dev_put not being immediate: drm_atomic_helper_shutdown not only shuts down hw
<sima>
but also cleans up a bunch of lingering references so that you don't hit any of the WARN_ON that detect leaks later on
<sima>
and you need a still-valid drm_device reference for calling that
<sima>
but yeah with that simple drivers would have 3 devm_drm_ calls int total (1. one is devm_drm_dev_alloc)
<sima>
and all the other sw stuff is drmm_
<sima>
and all the hw stuff devm_ (like gpio)
jsa1 has joined #dri-devel
<sima>
and pls don't look at drm_bridge, because that's totally broken still rn :-/
fab has quit [Quit: fab]
<javierm>
sima: and on top of that, the fbdev emulation layer requires the fbdev to outlive a a drm_dev_unplug() since user-space might still had a /dev/fb0 open and mmap()'ed
<javierm>
I remember some UAF that fixed somewhere due that and not the driver not waiting for the fd close
<javierm>
*and the driver not
<sima>
javierm, I thought we refcount that one already?
<sima>
like fb_open grabs a drm_device refcount, or at least it really, really should
<sima>
maybe with the exception of fb_open for fbcon, but that's a completely different can of worms with imo unfixable locking
rasterman has joined #dri-devel
<javierm>
sima: yes I think is fixed already, but was just mentioning as a reason for drm_dev_put not being immediate
<sima>
javierm, we have a few more iirc, I think dma_buf exports also should grab a drm_device reference, just to be safe
<javierm>
right
<sima>
with dma_fence the plan is different, but not yet implemented, because there doing lots of references for these short-lived things might not be the best
<sima>
javierm, well we only take a module refcount right now, so this might not be the right thing
<sima>
but fbdev unregister is also fairly synchronous
<sima>
so maybe it's enough
<sima>
tzimmermann, since you've done these, this might be a confusion between drm_device lifetime and underlying module lifetime, which is a confusing mess
<sima>
could be interesting to see whether a sysfs driver unbind can still hit the bugs, in which case we'd also need a drm_device_get/put in the various fb_open implementations
<sima>
also a bit funny we have three copies for these, they all look the same between dma/shmem/ttm to mem
<javierm>
so we have the DRM dev, the HW dev, the client DRM fbdev and DRM module lifetimes to take into account. My brain hurts right now :)
<jadahl>
re-asking the same question again when more people are awake: so with mutter more or less giving up on using atomic KMS for cursor updates for the time being, are there any guarantees that mixing legacy and atomic API is actually supported usage and expected o work across all drivers forever?
<sima>
jadahl, not really
<sima>
it's piles of hacks and mostly works on the drivers people care about for mutter and cros (since they do the same)
<sima>
or at least cros did the same for the longest time
<sima>
jadahl, I actually deleted a lot of the hacks a while ago because it was just too broken
<jadahl>
MrCooper is arguing that it is in practice, but it seems fragile. the problem is that the atomic uapi just isn't good enough of a replacement in this particular situation
<sima>
jadahl, I think drivers which implement async_flip on cursor planes might fare better
<sima>
jadahl, the practice is shrinking afaict
<sima>
or at least it includes the occasional kernel oops
<jadahl>
is it though? what we need is doing a cursor plane movement, that still lets us do a "real" commit in the same refresh cycle targeting the same vblank
<sima>
javierm, it's a lot of fun
<sima>
jadahl, I meant wrt driver support
<jadahl>
ah, I see
<sima>
the trouble is that on modern hw there's no cursor plane, so you need to support both commits on that plane and cursor semantics
<sima>
without going boom
mbrost has quit [Ping timeout: 480 seconds]
<sima>
back when I designed this all we assumed we'd only need this until Xorg is dead as a legacy hack
<jadahl>
I would imagine those wouldn't actually expose a cursor plane to begin with
<sima>
so I didn't put much thought into it and got it all wrong
<jadahl>
I can imagine that, yes, but seems we can't bend atomic to work well enough here :(
<sima>
jadahl, those =?
<jadahl>
those drivers which don't have a cursor plane
<jadahl>
hardware, I mean
<sima>
uh they do
<sima>
there's not endless amounts of planes, and some customers want all the planes as real planes
<sima>
and mutter wants a cursor plane
<sima>
so we make one do for both and suffer
<jadahl>
the apple hw one, IIRC, doesn't, because an o plane wasn't bendy enough to be seen as a cursor plane
<jadahl>
but maybe that is the special case, I dunno
<sima>
jadahl, yeah sometimes you don't have any cursor suitable plane at all
<sima>
but this is the crux, it's even harder in the kernel to do this right because the problem has become harder
<sima>
since more generic
for_opel_astra has quit [Read error: Connection reset by peer]
<jadahl>
from a userspace perspective, i'd rather have no cursor plane than a cursor plane that "arbitrarily" doesn't work
<sima>
if we do get it right we can sneak in late updates, and async_flip is trying to make that happen for real, properly
<sima>
jadahl, lotta people disagree
<sima>
hence endless hacks
<sima>
defacto on modern-ish hw a cursor plane can have all the same funny plane limitations like a real plane
<sima>
sometimes even when it's a dedicated cursor plane, due to hw bugs
<jadahl>
anyway, i'm somewhat reading your take on "atomic uapi + drmModeMoveCursor()" as "not really supported, but might work, for now."
<sima>
like some intel atom gpu on the 3rd pipe the cursor can't be off-screen partially
<sima>
jadahl, I think actual async_flip might be eventually, since there we could even do a test_only mode so you know whether the cursor plane supports your change
<jadahl>
can't be off screen IIRC was the apple hw limitation, or something along those lines. that just makes it impossible to use as a cursor plane
<sima>
but yeah atomic uapi plus legacy cursor is good luck, you'll need it
<jadahl>
because cursors really do go partially off screen
<sima>
jadahl, yeah xorg modesetting falls back to sw cursor any time the kernel ioctl fail
<sima>
so that's kinda the uapi contract for legacy cursor
<sima>
"exactly whatever nonsense xorg did, mostly as implemented by -modesetting"
<sima>
it's the "I want Xorg cursor uapi" uapi
<jadahl>
some very similar thing that is likely being used in mutter to get low latency cursor without running into atomic KMS problems
<sima>
jadahl, so maybe we should start out with documenting this stuff as it's used
kts has joined #dri-devel
<jadahl>
sima: that would be good
<sima>
we still have a lot of these legacy uapi warts, e.g. the recent patch from ville to no-op out dpms calls because apparently some xorg or whatever loves to do them
<jadahl>
fun
<sima>
well it's how legacy kms happened
<sima>
in theory it was a cross-driver api
<sima>
but in practice it started with every driver having it's matching xorg userspace, and so there was some really bad proliferation of messy and sometimes incompatible semantics
<sima>
and only very slowly we've nailed these down
<sima>
atomic tries to do a lot better
<sima>
we're probably still about as bad, since atomic is a lot more complex :-/
<jadahl>
it's more complex, but still not complex enough to handle this use case :P
<sima>
jadahl, well part is the intertia here of "legacy cursor seems good enough" and so no one put in the work to spec out what that should actually do
<sima>
and whether drivers actually implement it
<sima>
like you might randomly stall with legacy cursors, there's zero guarantees it wont eat a full vblank/atomic commit
mehdi-djait3397165695212282475 has joined #dri-devel
<jadahl>
you mean there are zero guarantees legacy cursor movements won't eat a full vblank?
amarsh04 has quit []
frankbinns has quit [Ping timeout: 480 seconds]
amarsh04 has joined #dri-devel
fab has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
fab has quit []
fab has joined #dri-devel
pcercuei has joined #dri-devel
mszyprow has quit [Read error: Connection reset by peer]
mszyprow has joined #dri-devel
<sima>
jadahl, not zero, but there's no way for userspace to find out
<sima>
and not zero in the sense of "some drivers try pretty hard"
<sima>
with no userspace accessible definition of "some" and "try pretty hard"
<sima>
this is why I think we need cursor as async_flip within atomic and clear semantics
<MrCooper>
and the async flag alone is not equivalent
<MrCooper>
anyway gotta go, bbl
<sima>
MrCooper, oh I know it's needed, people wouldn't keep using it otherwise
<sima>
it's just, as long as we have that "you get what xorg driver-specific umd originally wanted" uapi, you won't get anything with clearly defined semantics
<sima>
it's kinda like atomic except no ALLOW_MODESET flag
<jadahl>
sima: well, I'm more or less trying to figure out what response to expect if drmModeMoveCursor() regresses when used with atomic, i.e. if the response will be "will fix" or "that worked by accident, you're on your own"
<sima>
MrCooper, also what you explain in that comment is pretty much what async_flip is for
<sima>
except not just base address, but x/y coordinates too
<sima>
jadahl, the future is hard to predict
<jadahl>
indeed
<jadahl>
if it's documented expected usage, it's easier
<sima>
you'll probably get signed up to define what exact semantics it is you want
<sima>
and then implement it
<sima>
currently it's a pile of hacks
<sima>
MrCooper, afaik on intel display cursor isn't async like on amdgpu btw
<sima>
jadahl, ^^ so another fun one
guludo has joined #dri-devel
frankbinns has joined #dri-devel
helmhotz has joined #dri-devel
<zamundaaa[m]>
sima: TIL "xorg modesetting falls back to dw cursor any time the kernel ioctl fails"
<zamundaaa[m]>
That's more than most Wayland compositors do. Afaik KWin and Weston are still the only ones that do this with atomic even...
<sima>
with atomic?
<jadahl>
sima: how likely would it be to get a "DRM_MODE_CURSOR_MOVE is expected to work together without resulting in the next DRM_IOCTL_MODE_ATOMIC returning EBUSY or getting delayd one refresh cycle" doc patch accepted? :P
<sima>
we really should start documenting all these fallback expectations between kernel and compositors
<sima>
jadahl, nope
<zamundaaa[m]>
sima: most compositors expect being able to enable and move the cursor plane to any place without a test commit
<sima>
jadahl, well so the EBUSY would be a bug
<jadahl>
zamundaaa[m]: mutter tries to too automagically fall back to software cursors too
<sima>
that shouldn't happen
<sima>
zamundaaa[m], the other one is really best effort, and I think the only way to fix that is with atomic asyc_flip and some very explicit flags/semantics about how/when it should fail and what should happen
<sima>
zamundaaa[m], yeah that's not a thing
<zamundaaa[m]>
jadahl: good, so that got fixed at least. Definitely still plenty of compositors out there that don't though :/
sgruszka has left #dri-devel [Leaving]
<jadahl>
zamundaaa[m]: I think it has for a long time. maybe it stopped and got fixed. it was added due to some arm driver long long ago IIRC
<sima>
this is why I think vkms with arbitrary atomic_check restrictions implemented in ebpf would be really good for compositor testing
<sima>
because the list of things that work mostly, except on some hw is very, very long
<sima>
and so unless every compositor dev has a warehouse full of machines, you cannot test it
<jadahl>
sima: wouldn't that be nice :P
<javierm>
sima, zamundaaa[m] I think they did because cursor hotspots was not supported for atomic
<emersion>
sima: completely agree re: cursor API
<sima>
javierm, that's fixed now I thought?
<sima>
or at least the internal atomic machinery has hotspot support now
<zamundaaa[m]>
jadahl: definitely possible. This did regress in KWin a few times too, because it's so seldomly required in practice...
<javierm>
sima: yes, that's why I said "did", but I remember that argument was brought up when zackr added the hotspot support to atomic KMS
<sima>
another fun one that apple can bring back is connectors with status = unknown
<jadahl>
it might have regressed with the KMS cursor thread short cut, will have to check...
<jannau>
the apple HW limitation is 32 pixels (width and height) need to remain on screen. If there was a way to guarantee enough horizontal padding for the cursor fb it would work. vertical padding can be added in the driver. there is no cursor plane just overlays
<jadahl>
IIRC I added code to when adding the atomic support to bail on hw cursors if the atomic commit with a cursor failed
<sima>
afaict a lot of soc hw looks a lot more like apple than amd's display block
<sima>
jadahl, the biggest issue with legacy cursor is really that you cannot tell what it will do
<sima>
it might be an async x/y position update like in the comment MrCooper linked
<zamundaaa[m]>
For hw like that I really prefer to not have a cursor plane but just the universal / overlay planes
<sima>
or just async against concurrent atomic and wont eat a vblank
<sima>
or it'll eat a full vblank
<sima>
or it might "randomly" fail for arbitrary reasons
<sima>
there's just no rules really
<jadahl>
sima: I guess a deny/allow list is the best we can do then
<sima>
jadahl, nope, please nope
<sima>
we've had that with hotspot, it's just pain
<sima>
real semantic flags please instead of mutual guessing games
<sima>
hence async_flip in atomic
<sima>
with like actual docs of what it should do
<sima>
and a real contract that drivers need to obey or reject the request at least so userspace can fall back
<javierm>
jadahl: yeah, the deny list is painful because as sima said, when hotspot was fixed it didn't change anything in mutter until was patched to remove for example virtio-gpu from the list
<jadahl>
sima: we already need it. nvidia-drm apparently doesn't like drmModeMoveCursor()
* sima
sighs
<sima>
also lunch is ready, I'll be back later
<sima>
jadahl, I'd like to avoid it for upstream at least, because of what javierm said
<sima>
it just cements the current disaster uapi forever
<sima>
and fragments the uapi landscape even more
<sima>
because no one will make sure the deny list is consistent across compositors
<jadahl>
we'll see. if some upstream driver starts to delay atomic commits that gets a drmModeMoveCursor() early, we simply need to add them to said list
<jadahl>
or s/starts //
kts has joined #dri-devel
<jannau>
think the correct thing on driver side would be not to expose legacy cursor uapi. the only issue might that drm helpers make that available as soon as a cursor plane exists
<jannau>
I can't remember any issues reported due to the lack of support for legacy cursors on apple hw
<sima>
essentially the legacy cursor hack I've done is fundamentally busted
<sima>
and so you'd need an allow-list
<sima>
it's pretty much just i915, msm, amdgpu afaik
<sima>
everything else is "don't do that"
<sima>
otoh that patch is 7 years old by now, I guess people are just ok with being able to oops kms drivers if you try hard enough :-)
<jadahl>
MrCooper: sounds like maybe that deny list should be an allow list? ^^^
<sima>
jadahl, also for that special thing where the xy position update is async, I think it's really only amdgpu
<jadahl>
seems to have worked with other drivers in that mr
<sima>
jadahl, I mean worst case we'll just fake update the position and return immediately
<sima>
that's pretty easy to do
<sima>
well actually no
<sima>
jadahl, currently it works with most
<sima>
(if you ignore that you can blow up most)
<jadahl>
I'm getting mixed signals here it feels like :P
<sima>
jadahl, essentially from my pov it was a mistake, and the only easy fix is to just do stalls for drivers that don't have special cursor code
<sima>
jadahl, but the more userspace insists on no stalls, the more we just need to keep the oops in the kernel
for_opel_astra has joined #dri-devel
<sima>
so it's a case of very hard rock and hammer
kts has quit [Ping timeout: 480 seconds]
for_opel_astra has quit [Remote host closed the connection]
<jadahl>
doing kms is hitting rock bottom? :P
<sima>
jadahl, eventually we might get a cve and just have to land that, and break the world
<sima>
so I'd much prefer if the world insists a bit less on fast cursor updates concurrently with atomic
<pac85>
sima async flips on cursor means it could get torn right? Afaik legacy kms does effectively mailbox semantic
<sima>
pac85, undefined
<sima>
legacy does whatever legacy does essentially
<jadahl>
sima: people like their cursor movements smooth though
<pac85>
Though I think mailbox is the desidered semantics for cursor
<sima>
we have drivers that copy the provided image into the hw cursor, that definitely tears
<sima>
jadahl, I know
<pac85>
what about changing position?
<sima>
but they don't like them enough to be smooth to fix the mess we have
<sima>
pac85, who knows, I don't
<pac85>
I think that's what people want to optimize for
<pac85>
for minimizing cursor latency
<sima>
I know
<sima>
but as long as the contract is entirely undefined or just in random allow/deny lists in random compositors
<sima>
we'll keep spinning wheels
<sima>
and I'll keep sitting on an oops fix
<pac85>
I see
<sima>
and I've tried to fix this fully generically, it's kinda not doable
<sima>
it's really that much of a mess between inconsistent legacy semantics and very random hw limitations
<pac85>
I see
<jadahl>
as with the hotspot and cursor planes, I suspect the only viable way forward is still to have a list of allow/deny, and eventually an atomic replacement where one can predict whether it'll work as expected (not stall)
<sima>
jadahl, yeah and atm I think the only allow list is amdgpu and maybe i915/msm
<jadahl>
eventually the list would be defunct, since all drivers would support the non-legacy api, be it saying "can't" or "can"
<pac85>
what does the allow list do exactly?
kts has joined #dri-devel
<sima>
or other option is we go and nuke a lot of the atomic cursor support, but no idea what that could break
<jadahl>
allow drmModeMoveCursor() long before drmModeAtomicCommit() of a cycle
<sima>
pac85, whether you can use the legacy cursor together with atomic
<pac85>
ah I see
<sima>
jadahl, so that's the crux, the use-after-free fix is to just stall to the next atomic commit :-/
<sima>
or the previous one, wasn't sure anymore which one it is
<jadahl>
:(
<sima>
but iirc if you do a legacy cursor in between two atomic cursor plane updates, you just go boom
NiGaR has quit [Ping timeout: 480 seconds]
NiGaR has joined #dri-devel
<jadahl>
I mean there is a risk that that'd happen. set cursor fb id to A, *commit*, *move*, set cursor fb to B *commit*
<jadahl>
why would that go boom?
* zamundaaa[m]
is happy to have dropped atomic and legacy mixing in KWin a long time ago
phasta has joined #dri-devel
<jadahl>
zamundaaa[m]: I've kept it out until now :|
<sima>
jadahl, so the design idea of atomic is that you have a string of updates
<sima>
which are ordered through completions
<sima>
specifically the hw_done and flip_done completions
<sima>
the legacy cursor hack that makes almost all other drivers work just short-circuits that chain in the middle
<sima>
which kinda works as long as you don't mix it
<sima>
but if you do, and the atomic commits around the cursor update are nonblocking
<sima>
then the chain of completions is broken
<sima>
and commit B starts cleaning up shit before A has finished
<sima>
lolz ensures
<jadahl>
commits only happen once per cycle. why would there be a chain of them?
<sima>
it might be that it's enough to block on A finishing, but tbh it's so messy it's really not clear to me
<pac85>
does atomic still have the limitation that async flips cannot happen together with cursor updates?
<sima>
jadahl, they can overlap in their phases
<jadahl>
hrm, ok
<sima>
pac85, not sure, wasn't designed like that, but maybe is the case if your driver does cursors with the legacy hack
<sima>
jadahl, it's really hard to hit, all I have is a bunch of kasan reports from various users
<sima>
and CI occasionally blowing up in i915 until it stopped using that stuff
<sima>
hm maybe I could split that patch up
<sima>
jadahl, how bad is stalling on the previous atomic commit?
<jadahl>
the *previous*?
<sima>
yeah
<sima>
that might be enough to stop the uaf
<jadahl>
any stalling would be terrible
<jadahl>
every missed frame is terrible
<sima>
yeah I can't fix the uaf without some stalling
<sima>
or we latch the entire thing into a thread
<sima>
at that point you'll never get a failure value though if it goes wrong
<jadahl>
compare the cursor missing a frame, and the whole screen animation missing a frame
<jadahl>
apparently nouveau seems to handle the mixing well as well...
<sima>
jadahl, they all do as long as my hack is still there
<sima>
but they also almost all have an uaf in the kernel
<sima>
so right now your allow list is pretty much everything
<sima>
except I have a 7 year old bug on my hands I dunno how to fix
<jadahl>
ah, I see
<zamundaaa[m]>
pac85: yes, you can't do cursor updates with async right now
<zamundaaa[m]>
KWin "solves" that by assuming that it won't work and falling back to a software cursor while trying to do tearing
<jadahl>
tricky .. uaf and a smooth system, or no uaf and laggy system
<sima>
jadahl, well since userspace seems to insist that legacy cursor is the best we can ever do, I'm pondering whether I can fix this with some absolute horror show of in-kernel threading
<javierm>
pac85, zamundaaa[m]: yeah, I think is not limited to only cursor but legacy KMS does not really support async page flip
<pac85>
zamundaaa[m] ah I see. A lot of wayland compositors seems to just do tearing in a very unreliable way (like, they don't do it when the cursor is visible but also in some games it will just never work)
<javierm>
I remember had this issue when tried to add support in mutter to damage handling on legacy KMS
<sima>
it's not going to make the semantics of legacy cursors any better though
<pac85>
javierm legacy can do async flips fine all the time at least on amd
<javierm>
pac85: hmm, maybe was only related to dirtyfb that damage clips needed
<pac85>
javierm not sure what those things are, a bit outside my area of knowledge. I suppose related to compositor's damage tracking?
<jadahl>
sima: what userspace is predictability. if we can get a cursor plane, make it behave like one, in whatever way you do. if not, give us an overlay plane, and we'll put a cursor there and move it a bit slower (ideally, mutter doesn't but I plan to add it)
<sima>
hm
<sima>
well I just stumbled over the dumpster fire of DIRTYFB legacy semantics getting worse too
<sima>
yay
* sima
cries over 9e4dde28e9cd3
<sima>
the docs of dirtyfb are very explicit that yes it stalls, and it does so intentionally
<pac85>
uhm so reading a bit, if you scribble in the front buffer there is no guarantee it will show up on the display until dirtyfb is called?
<sima>
pac85, yes
<sima>
well, you can do atomic updates
<sima>
today is friday, and I feel like kms was a mistake
<pac85>
dirtyfb stalls on vblank? Then how does Xorg work? It can blit frames to the front buffer as soon as they are presented independently of vblank
<sima>
dirtyfb is called from your block handler
<sima>
so it batches up
<sima>
and there's been bug reports that it has to, or some dump shit application redraws at 100% cpu
<sima>
except everyone disagrees for their own little setup, so since we've done that drivers are randomly growing hacks to disable this again
<sima>
because having a consistent kms uapi is apparently not something folks want
<sima>
at least for legacy
<sima>
"eglgears is limited to 120fps"
<sima>
hence we must break dirtyfb for other people
* sima
apologizes for the sarcasm
<pac85>
I guess dirtyfb should have had a flags arg and accept async in it?
<sima>
it's called atomic, we have it
<sima>
and can't fix old uapi
<pac85>
I mean if you are not doing front buffer rendering async flips are actually supported better on legacy kms since they work with cursor updates
<sima>
oh I thought you've meant "nonblocking" when you've said "async"
<sima>
they're not the same
<sima>
there's no async dirty upload anywhere really
<sima>
robclark, 9e4dde28e9cd3 is this really required, and why for msm only?
<robclark>
sima: yes, really required.. and other drivers that support a combination of push and pull (command vs video) type displays likely want something similar.. but depends on how hw latches the changes, I suppose
tlwoerner has quit [Quit: Leaving]
<sima>
robclark, well the commit justifies it with "eglgears is limited to 120fps", which really, do we care that much for a legacy ioctl?
<robclark>
vblank_mode=0 should not be limited
<sima>
currently userspace has no way to find this out
<sima>
robclark, well when I did the atomic version for dirtyfb there was apparently some clear evidence that ratelimiting was needed
<robclark>
userspace should unconditionally dirtyfb.. kernel should nop it when it is not needed
<sima>
so something is not quite right
<robclark>
userspace isn't flipping, because vblank_mode=0
<sima>
yeah it's never flipping if it's dirtyfb
<robclark>
but dirtyfb should nop when it is not needed.. that is what that patch implements.. kms core doesn't really know, so I did it in driver
<sima>
robclark, yeah, but that's kinda not great if dirty ioctl can do stuff atomic userspace cannot
<sima>
all that achieves is further fragment semantics of legacy ioctls and proliferation of hacks
<sima>
(I started looking into this all again due to the endless legacy cursor saga)
<sima>
so really not a big fan if we merge new special semantics for legacy ioctl for one driver in 2022 or so
<sima>
we're trying to get away from that since a decade, and it's really hard
<robclark>
I'd have to go back and look at how this works w/ atomic, maybe it needs a similar trick.. but for sure _something_ is needed
<sima>
we = well maybe that many people perhaps
nerdopolis has joined #dri-devel
<sima>
robclark, maybe just flips lazily?
<sima>
instead of on every update
<sima>
or maybe a case of "xorg-modesetting doesn't do atomic, so hacking legacy is enough"
<robclark>
some combinations of wsi + compositor do that
<sima>
there's also intel's dirty fb, which is full blast nonblocking always
<sima>
so maybe that's the rule, but then we should implement that in the helper
<sima>
just push to a worker, fbcon already does that, we actually discussed whether we should except it looked like mostly dirtyfb should block
<robclark>
if compositor is managing things paced on vbl then all of a sudden we don't need hacks in the driver.. so I guess you could look at it as a workaround for legacy userspace
<sima>
yeah but then why just one driver
<robclark>
because I guess at the time dirtyfb was no-op for most anyone who didn't support dsi cmd mode panels?
* robclark
bbiab
tlwoerner has joined #dri-devel
<sima>
there's a lot of drivers that have some ->dirty
<sima>
and we have a fairly common one for all atomic drivers, except i915
<sima>
and since 2022 msm
<sima>
and since 2023 amdgpu
<sima>
it's like ... going in the wrong direction here
<sima>
robclark, was even you who implemented the generic version ...
<sima>
so robclark from 2018 disagrees with the one from 2022
<sima>
hwentlan_, agd5f 1c6b6bd0780f2 looks funny for similar reasons of "why" and "are you sure"
<sima>
but it's after the msm one landed
<pac85>
I had this issue running X on msm of vblank_mode=0 not being unthrottled as it should, is that what that patch fixes?
<pac85>
seems to be from the commit message. FWIW Xorg on amdgpu doesn't behave like that
kts has quit [Ping timeout: 480 seconds]
<pac85>
Also the fact that you get two frames in quick succession then one at the correct rate is probably why I was seeing the animation break in glxgears
mehdi-djait3397165695212282475 has quit []
<sima>
pac85, yeah amd has a hack too
helmhotz_ has joined #dri-devel
<pac85>
sima: If I can "vote" I'd want the non blocking behavior, it makes X quite broken on msm and I'd imagine X is the biggest user of those APIs
<sima>
pac85, then I kinda wonder where we had that "must block" idea from
<pac85>
I'd imagine X is the oldest user too
helmhotz has quit [Ping timeout: 480 seconds]
<pq>
jadahl, sima, how about using eBPF programs to let the kernel do the input device -> cursor position without roundtripping to userspace at all. ;-p
<pac85>
brilliant idea. Get rid of kms, bring back user mode setting but through eBPF. Now the kernel can't be blamed for any bug
<jadahl>
lets go HID-BPF directly to KMS-BPF, bypass libinput too
<jadahl>
well, that is just a different way of saying what pq meant I guess :P
<agd5f>
to add to the fun, the cursor plane on AMD hw is not actually a fully independent plane. It inherits a lot of attributes from the plane it's enabled on
Guest10339 has joined #dri-devel
<robclark>
sima: I guess if I disagree with myself, it is only because kms core doesn't know about push vs pull outputs? But non-blocking dirtyfb might be nice (although would need a kernel thread since from hw pov it is blocking.. unless you go thru the pain that msm does to make cursor updates non-blocking
<sima>
robclark, more wondering why we didn't do it in 2018 and thought blocking is the right semantics?
chaos_princess has quit [Quit: chaos_princess]
chaos_princess has joined #dri-devel
<robclark>
yeah, idk.. blocking fits w/ how the hw works, at least in my case.. blocking for cmd mode panel is less problematic because you only need to block until you've squirted the pixels to the panel (more or less).. so blocking-but-nop-for-video-mode is a practical soln
Company has joined #dri-devel
kzd has joined #dri-devel
<MrCooper>
no time to read through all scrollback, sorry; if I missed something important, please raise it again
<MrCooper>
sima: keep in mind mutter doesn't use overlay planes yet, so I fail to see a qualitative difference between my mutter MR and Xorg using the legacy page flip ioctl
bolson has joined #dri-devel
<sima>
MrCooper, oh it's the same issue, just fewer people using Xorg
<sima>
*ever fewer
<MrCooper>
(quantitatively, my MR is easier on the kernel than Xorg, because it calls atomic commits and legacy cursor ioctl from the same thread)
<MrCooper>
still a lot of them though, I'd export to see reports of such failures
<jadahl>
MrCooper: the tl;dr seems to be, "kind of safe for amdgpu, i915 and msm; not safe for the rest if sima's patch ever lands"
<MrCooper>
should I make it an allowlist for those drivers then?
<jadahl>
could do. sad part is that it apparently helps with some nouveau issue too. until sima's patch lands, if it does...
helmhotz_ has quit [Ping timeout: 480 seconds]
<sima>
jadahl, MrCooper I guess I accepted the reality that I need to do some horrible thread hacks in the kernel
<sima>
just no idea yet how
<sima>
so ... eh
<MrCooper>
sima: also, the async flip stuff isn't enough to get all the same benefits, see the earlier MR thread about VRR
<Ermine>
seems like a topic for xdc talk
<sima>
MrCooper, kinda not sure why VRR doesn't ramp up frequency when you do async flips ... that seems like a driver bug
<MrCooper>
other way around, it does but shouldn't
<sima>
oh
<MrCooper>
(for cursor plane moves)
<sima>
hm why?
<sima>
sluggish cursor sounds kinda bad
<MrCooper>
guess it's counter-intuitive
<MrCooper>
it actually helps for that
<sima>
huh, link for context?
<MrCooper>
the fundamental problem is that we don't want to start a refresh cycle for only moving the cursor plane if there's new plane content coming "soon"
<alyssa>
can we get Xinhui.Pan@amd.com removed from MAINTAINERS? emails keep bouncing
<sima>
agd5f, ^^
<MrCooper>
so without truly asynchronous cursor plane moves, the compositor has to hold back cursor plane moves in case new plane contents show up
<sima>
oh
<sima>
so more flags needed I guess
<MrCooper>
I'm skeptical those flags are useful for other use cases though
<MrCooper>
so it's not obvious to me that it's really the way to go
<zamundaaa[m]>
MrCooper: cursor-only atomic commits don't actually change the refresh rate with VRR
<sima>
well random special-case semantics isn't great either
<zamundaaa[m]>
Not with amdgpu at least; I know because I hit that case
<MrCooper>
zamundaaa[m]: that would be a bug, we talked about it at XDC
<zamundaaa[m]>
Yeah, it is. But that's to say, "it does but it shouldn't" isn't right, we need both behaviors
<MrCooper>
atomic KMS API is that every commit for the CRTC starts a cycle
<MrCooper>
hence the (for cursor plane moves) clarification
<zamundaaa[m]>
While I suppose the compositor could commit the primary plane when it wants the refresh rate to be affected, that would at least technically be an API break
<MrCooper>
sima: BTW, by "thread hacks" do you mean kernel-internal threads?
heat has joined #dri-devel
<sima>
MrCooper, yeah most likely just the late commit trickery mutter does
<sima>
or lazy committery
<sima>
not sure what we can do really
<sima>
the problem is really that every driver that looked into this grew it's own set of hacks
<sima>
so there's really no consistent uapi contract for legacy cursor at all
<sima>
even on atomic drivers
<sima>
MrCooper, your assumption that legacy cursor move does not trigger vrr uprates is very bold
<sima>
for an uapi that has pretty much zero documented rules
<MrCooper>
FWIW, if a driver can't program the cursor plane move immediately, I'd expect it to just remember the position and program the latest position next chance it gets
<pac85>
I guess with legacy cursor api if you have low framerate compensation going with vrr your cursor actually moves at the "real" refresh rate?
<sima>
yeah that's not what's happening for most drivers
<MrCooper>
sima: I didn't originally make that assumption, it's really kind of a tangent
<sima>
MrCooper, well it's kinda the entire story for legacy cursor
<sima>
there's no clear rules at all, it's just both sides piling up hacks as we go
<MrCooper>
I'd argue it's the only thing that makes sense though
<sima>
and so every round this gets worse
<sima>
not sure legacy cursor is still in the "makes sense" territory
<pac85>
uh that seems to be the case with legacy api, cursor moves at 130 with 65fps content
<sima>
legacy cursor is mostly just random semantics + ever more hacks
<MrCooper>
pac85: yep, that's one benefit with amdgpu
<sima>
on both kmd (per driver hacks only ofc) and compositor side (also seemingly moving to per-driver hacks)
<pac85>
I guess legacy cursor on amd semantics is what they meant
<sima>
jadahl, btw out of morbid curiosity, how does legacy cursor blow up on nvidia-drm?
<MrCooper>
sima: getting tired of the "Wayland cursor latency sucks" complaints :(
<sima>
MrCooper, oh I know the issue
<jadahl>
sima: dunno exactly, I'm playing catch up after being somewhat away and still in atimezone where every one but me sleeps for most of the day
<sima>
it's just, we're ever more walking away from making kms an actual cross-driver api you can use with all these fixes
<MrCooper>
sima: no special handling, so presumably every legacy cursor ioctl results in a full-blown atomic commit
<jadahl>
and I've been trying to smack this god damn mosquitoe that is taunting me for 2 hours now
<pac85>
btw from the little I know about amd hw doing the mailbox cursor movement semantics legacy kms has is just a matter of changing the coordinate regs, then when something else flips the move takes effect
<sima>
MrCooper, I think the only realistic way out is to do legacy cursor with async_flip, fix the semantics of that
<sima>
and force a consistent emulation on all other drivers
<sima>
in the kernel
<pac85>
uhm I think async flip wouldn't be as good
<sima>
and then add flags to async_flip as needed to make it to the right thing
<pac85>
it needs to be "passive" for vrr and not tear
<pac85>
(unless the main plane tears)
<sima>
from an sw pov the main thing with async is that it doesn't hold up atomic commits but free-wheels
<sima>
everything else is hw features
<sima>
which might or might not exist
<pac85>
well, for main plane you want async to tear, for cursor you want it to not tear
<MrCooper>
sima: I'm not against that, just afraid I'm not gonna be the one pushing that in the near future, and too impatient to wait for it myself :)
<pac85>
kms gives you that on amd
jsa1 has quit [Ping timeout: 480 seconds]
<MrCooper>
sima: pac85 makes a good point though, async_flip would allow tearing, whereas we don't want cursor plane moves to tear (the legacy cursor ioctl isn't supposed to)
<MrCooper>
so that would make yet another flag: "async, no tearing though, no new refresh cycle please, and feel free to program before previous commits"
<pac85>
yeah, I think it would be accepatble for it to tear if you are also tearing the main fb but not otherwise (I'd imagine that's how and hw would work).
<MrCooper>
I suspect AMD HW doesn't tear the cursor regardless, not sure though
<pac85>
MrCooper: IMHO it should be a flag called mailbox, and then something you attach to the whole commit to say "no new refresh cycle for vrr" which I'd call passive update. I say this because having mailbox semantics that trigger a new refresh cycle could be useful to have for plane updates. It would allow to simplify how direct scanout is done (no need for the compositor to guess a deadline, just commit any frame) and achieve lower latency
helmhotz_ has joined #dri-devel
<sima>
MrCooper, I'm mostly worried about how to make the kernel side no longer have uaf, and async_flip infrastructure is about the only way out there
<sima>
unless you just require that drivers implement bespoke cursors and get it all wrong in worse ways
<sima>
how you exactly smash this into the hw isn't my top concern, I'd just like to retire an 8 years old uaf eventually
<sima>
well known since 8 years at least
<sima>
meanwhile compositors seem to holler ever louder that they really rely on the semantics that uaf provides
<sima>
and mostly test on amd/i915 and msm, which all have bespoke legacy cursor handling anyway
<sima>
and then a quick test somewhere else, where "hey it works there too" because I haven't ripped out the uaf yet
jsa1 has joined #dri-devel
<MrCooper>
if I make the mutter MR active only for those 3 drivers, would you be comfortable with that for now?
<pac85>
sima: I know very little of this area of the kernel but from what you said above, it sounds like having a chain of updates that is tied to flip_done (which I suppose means either blank or an event that happens immediately for async) for completion is fundamentally incompatible with the desidered mailbox semantics for cursors? Would it be possible to lift that restriction? A mailbox is fundamentally not a chain of updates
<sima>
MrCooper, it's a bunch more drivers actually, trying to swap in all the context again
<sima>
MrCooper, it's more me lamenting that we're still piling up hacks, and that I'm doomed to write one for the kernel to ever get the uaf fix in
<MrCooper>
sima: all the ones which reference legacy_cursor_update or atomic_async_check?
<sima>
pac85, yeah that's what the new infra does
<sima>
it's also not async_flip but async_update because of exactly the tearing vs not-tearing and other questions
<sima>
except I'm not sure amd does this right ...
<sima>
MrCooper, all those which have an atomic_async_check or a bespoke ->update_plane for their cursor (I think only i915 is in the latter)
<sima>
the legacy_cursor_check is mostly trying to handle fallout from the hack and doing slightly fewer uaf
<sima>
except amdgpu hand-rolling a bit too much of their atomic_check implementation
<MrCooper>
I see amdgpu, loongson, mediatek, msm, rockchip, tegra & vc4 referencing atomic_async_check
<sima>
nouveau hand-rolls something funny too
<sima>
MrCooper, essentially my problem is that I have no idea what the legacy cursor uapi is, and whether I can emulate it at all without an uaf
<sima>
and I'm not sure you're clear on the first part either
<pac85>
sima: ok so it sounds to me like it is a matter of making the distinction between async with tearing and without explicit so it becomes possible to request an async flip that doesn't tear on hw that supports tearing. That + lifting all the restrictions that make cursor updates not work with async flip would fix this properly I guess?
<sima>
because amd, i915, msm and nouveau all have very custom hacks
<sima>
and so that doesn't yet even cover a driver that does the new atomic_async_check in it's pristine form
<sima>
so maybe that one is still wrong for what you want
rsalvaterra has quit []
rsalvaterra has joined #dri-devel
<sima>
plus we have a lot of atomic drivers with cursor planes that don't have an atomic_async_check
rsalvaterra has quit []
rsalvaterra has joined #dri-devel
<sima>
and without atomic_async_check best I can do is either uaf or a thread that lazily pushes an update and hopes
<sima>
I think, not entirely sure
<sima>
MrCooper, I think the uapi you want is one where you get the return of atomic_async_check and not have an in-kmd fallback to something you don't like
<sima>
because I pretty much guarantee you that one is wrong for some cases or drivers no matter what
<MrCooper>
the kernel offering functionality to user space but hoping for the latter not to use it (while Xorg has been all along) isn't really great either
<sima>
oh it's all around bad, I know
<sima>
I'm trying to find a way that's not just making it worse
<MrCooper>
making anything worse is definitely not my intention
<sima>
like I guess a minimal uapi would be that we expose "is this a driver with cursor that has atomic_async_check" except that leaves the hacks in i915 and nouveau out
<sima>
plus you still don't know when it fails and falls back to something you don't want, like a full committ
<sima>
I think ideally we have an uapi where userspace either gets a real cursor (for a sufficiently clear definition of real) or errno
<sima>
and then we just emulate something that's a bit less buggy for legacy ioctl
<sima>
but yeah if mutter starts using legacy ioctl the uaf might pop up again a lot more
<sima>
and the only fix I can come up with for drivers that don't have atomic_async_update is going to have at least some stalls in some cases
<agd5f>
sima, alyssa patch was included in this week's -fixes PR
<sima>
stalls or lagging cursor, it's a bit a sliding scale
helmhotz_ has quit [Ping timeout: 480 seconds]
<sima>
agd5f, thx
<alyssa>
agd5f: cool thx
<sima>
MrCooper, so I guess at least for current kernels you have the following set of legacy cursor uapi
<sima>
1. amdgpu dc
<sima>
2. nouveau
<sima>
3. msm
<sima>
4. i915
<sima>
5. all the others with atomic_async_commit
<sima>
6. atomic drivers with cursors not yet listed above
<sima>
7. whatever all the non-atomic drivers do, but I guess those don't matter since we're talking about drivers with atomic only
<sima>
none of these uapi are documented
<sima>
6 is buggy
<sima>
2 maybe too, haven't looked
<sima>
if you also include older kernels, it's more messy since the drivers with bespoke hacks all evolved on their own
<sima>
oh 6 is atomic drivers with cursor planes
mszyprow has quit [Ping timeout: 480 seconds]
<sima>
there's also an msm variant where the cursor exists, but it is not a cursor plane
<sima>
not sure how many of those we have in other places
<sima>
MrCooper, so the above mess is why I'm not super enthusiastic about you looking at this legacy cursor + atomic uapi combo and going "this is what I want"
<MrCooper>
that's not what I'm saying at all
<MrCooper>
I want comparable cursor latency as with Xorg, and this is the only way I can get it ATM
<sima>
yeah that's the other side, I'd like to give you that, hence why I've been pushing for ->atomic_async_check and things like that
<agd5f>
AMD cursors are double buffered and they latch on vblank so you can update them whenever you want for the most part
<MrCooper>
agd5f: latching only on vblank can't explain the numbers
<sima>
except there's no way for you to know you'll get a good cursor
<sima>
and not so much what it exactly means
<sima>
MrCooper, and if we go with an allow-list in mutter the incentives to create more drivers with good cursors are essentially just not there
<agd5f>
IIRC, there is a lock driver takes and it won't switch unless the lock is not held by driver
<MrCooper>
my assumption is that it can latch any time the current scanout line doesn't overlap with the cursor
<agd5f>
maybe it's changed on newer chips. My knowledge of this dates back the the pre-DC days :)
<pac85>
MrCooper: why not? with atomic you can only commit once per cycle and no matter how close you try to get to the actual limit there would be some time between your commit and actual vblank. With legacy you just keep changing the registers and it gets latched by hw exactly at vblank
<sima>
0b8de7a04f7c1 maybe relevant? agd5f?
<MrCooper>
pac85: why not what? :) sounds like what I'm saying
<sima>
there's been a bunch of recent-ish changes
<MrCooper>
pac85: ah, I'm saying it can latch even after vblank
<pac85>
MrCooper: I'm saying that even if it doesn't it still explains lower latency
<MrCooper>
otherwise I can't explain the numbers measured with Xorg
Duke`` has joined #dri-devel
<MrCooper>
pac85: nope, in that case mutter would be much closer to Xorg
<pac85>
uhm
<pac85>
can you share the numbers?
<MrCooper>
mutter commits ~1 ms before start of vblank and uses the latest cursor position available at that point
<pac85>
I suppose what you propose makes sense since amd has dedicated cursor hw
<MrCooper>
actually generally less than 1 ms before my MR
<pac85>
.9 of a frame yeah
<MrCooper>
right
<MrCooper>
consistent with the cursor being near the bottom, as on the picture
<MrCooper>
(the difference should be smaller near the top)
helmhotz_ has joined #dri-devel
<MrCooper>
and the difference was more noticeable than I'd expected at 60 Hz
<anholt>
mairacanal: ack, I meant to be removed from all kernel maintainership long ago.
mbrost has joined #dri-devel
kts has joined #dri-devel
kts has quit []
cphealy_ has joined #dri-devel
kzd has quit [Quit: kzd]
mbrost has quit [Ping timeout: 480 seconds]
kzd has joined #dri-devel
davispuh has joined #dri-devel
dsimic is now known as Guest10352
dsimic has joined #dri-devel
Calandracas_ has joined #dri-devel
Guest10352 has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
Calandracas has quit [Ping timeout: 480 seconds]
oneforall2 has quit [Read error: Connection reset by peer]
oneforall2 has joined #dri-devel
<sima>
tzimmermann, random thing I've noticed: I think you could unify a lot of the drm_fb_helper_funcs->fb_dirty implementations, I think only the ttm one still is special
<sima>
or I'm blind
<tzimmermann>
sima, i want this to go away entirely. but we're not there yet.
<sima>
even the damage_blit is probably fairly generic
<sima>
tzimmermann, ah even better :-)
<tzimmermann>
but now it's weekend here :)
<sima>
oh yeah here too
tzimmermann has quit [Quit: Leaving]
rsalvaterra has quit []
rsalvaterra has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
<karolherbst>
airlied: can I have multiple csos of the same nir_shader with llvmpipe? Because uhm.. that seems to be crashing for me inside LLVM
<karolherbst>
mhh actually the issue might be something else
<zmike>
I don't see why you couldn't
cyrinux has quit []
jsa1 has quit [Ping timeout: 480 seconds]
cyrinux has joined #dri-devel
jkrzyszt has quit [Ping timeout: 480 seconds]
<alyssa>
does anybody like XSD? if so can you explain why
<alyssa>
i am trying to have an open mind here but
mbrost has joined #dri-devel
cyrinux has quit []
cyrinux has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
phasta has quit [Quit: Leaving]
guludo has quit [Ping timeout: 480 seconds]
guludo has joined #dri-devel
mbrost has joined #dri-devel
heat is now known as Guest10360
Guest10360 has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
<alyssa>
relaxng wins :3
helmhotz has joined #dri-devel
helmhotz_ has quit [Ping timeout: 480 seconds]
Guest10339 has quit [Remote host closed the connection]
mbrost_ has joined #dri-devel
nitikesh_ has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
cyrinux has quit []
cyrinux has joined #dri-devel
mbrost_ has quit [Ping timeout: 480 seconds]
jhli has quit [Remote host closed the connection]
<alyssa>
btw, is there policy for python package build-time deps for Mesa?
<alyssa>
dcbaker: ^
<alyssa>
I'm pulling in a new package, it's in debian oldstable & fedora in addition to pip, but idk what the e.g. android build situation is like
benjaminl has quit [Read error: Connection reset by peer]
benjaminl has joined #dri-devel
<alyssa>
(I have it as an optional dep rn but obviously it's easier to require things instead of falling back)
<dcbaker>
alyssa: Not that I'm aware of? I'd guess people would probably frown at adding a dependency that does the same thing as an existing dependency (say adding jinja when we already have mako)
<alyssa>
sure
<alyssa>
it's rnc2rng & lxml, which together gives us rnc validation
<alyssa>
optional in the sense that you don't need validation if you don't change the xml file
<alyssa>
(i'm also falling in love with rnc, and discovered a new hatred of XSD, and i would like to start doing formal rnc schemas for more XML in tree.)
<mattst88>
idea is: you configure mesa with meson against the Android NDK to generate build.ninja files
<mattst88>
then ninja-to-soong translates that to Android.bp that you check into your mesa repo
<mattst88>
code generated during the build is pre-generated using the system tools, and then checked into the mesa repo along with Android.bp
<mattst88>
currently working on reducing the amount of stuff that needs to be pre-generated
Calandracas_ has quit [Ping timeout: 480 seconds]
<dcbaker>
mattst88: I’ve also been talking to Google people about a meson -> hermetic build thing that would use meson to generate song and bazel? Is that different effort?
<mattst88>
I know there's a "meson2hermetic" tool in AOSP's /external/mesa3d repo -- is that it?
<mattst88>
we looked at that and ... were not impressed
<mattst88>
IIRC it converted a meson build system into a single massive python program that would generate Android.bp, I think
<Lyude>
alyssa: yeah honestly most of what's left is just writing more kms bindings, but I think the hardest parts of that are more or less complete at this pint
<Lyude>
the actual abstraction stuff is afaict mostly figured out
<Lyude>
though I've only gotten review from sima and a few rust people
<Lyude>
(I will be sending out a new version of the patch series next week without the WIP tag btw)
<alyssa>
Lyude: yay!
<alyssa>
..what are you replying to
<Lyude>
11:18 <alyssa> also, we already have rust GPU driver that needs to get upstreamed as rust
<Lyude>
11:18 <alyssa> and once that's upstream, together with Lyude's KMS bindings upstreamed
<alyssa>
ah right
<alyssa>
Lyude: I am like DRAM
<alyssa>
I will forget things unless you are constantly refreshing my memory
<Lyude>
np :P, I am the same
apinheiro has quit [Quit: Leaving]
<Lyude>
but yeah - granted, the stuff we have right now is very bare (we don't even really have proper state iterators), but I've been focusing on trying to get dependencies and what we have so far upstream which is why there's not much more yet.
<alyssa>
yeah =D
<Lyude>
i'm kind of glad I've been keeping it at this state because it's already a lot of jumping around just handling the handful of deps I have for this :p
<alyssa>
yeah, for sure
<pinchartl>
alyssa: it is. I wish a "google problem" meant a problem for google to solve, but more often than not it's a problem created by google that we then have to deal with :-(
<alyssa>
...that too :(
<pinchartl>
mattst88: I suppose we shouldn't hope for a solution that will not require committing generated files ?
fab has quit [Quit: fab]
<mattst88>
pinchartl: maybe, but the current situation with intel_clc/mesa_clc and their dependence on llvm means we either have to (1) build and maintain llvm as part of the build process or (2) just check in generated files
<mattst88>
the rest of the generated files are things I have short term plans to stop checking in
guludo has quit [Quit: WeeChat 4.5.2]
guludo has joined #dri-devel
<pinchartl>
how will that work ? will they be generated from android.bp ?
<mattst88>
pinchartl: yeah
<mattst88>
AFAIU (I'm admittedly still very new to Android), you can generate sources in Android.bp with genrule/cc_genrule
<mattst88>
e.g. with flex/bison
<pinchartl>
I know there's limited support for file generation in soong. last time I checked, custom generators were supported by had to be written in go
<pinchartl>
we use python + jinja in libcamera to generate source files, and android builds are a real pain
<pinchartl>
the fact that the android build system team deprecated and drops features as soon as we start using them doesn't help
<pinchartl>
it's almost as if they were tracking our work and actively invested in making our life painful
<mattst88>
100% agreed, and I'm inside google...
<mattst88>
personally, I think it's crazy that there's a policy in place that all software in Android must build via this one build system, and there's seemingly no effort to provide tooling to help make that possible
<mattst88>
i.e., if this is the policy and you expect multiple separate teams (both inside and outside of google) to need to build various projects using cmake/meson/etc in Android... then you should provide a tool to make this a reasonably simple process
<pinchartl>
yes...
<mattst88>
but instead what I see is a lot of hand-rolled Android.bp and a large number of custom (and bad) tooling to do these conversions
<pinchartl>
I wish the merging of android and chromeos could have switched android to using portage :-)
<alyssa>
mattst88: btw intel_clc is dead
<mattst88>
pinchartl: you and me both!
<mattst88>
alyssa: yeah, I know :)
<alyssa>
I wonder if report_fossil.py should live in Mesa.