ChanServ changed the topic of #wayland to: https://wayland.freedesktop.org | Discussion about the Wayland protocol and its implementations, plus libinput
co1umbarius has joined #wayland
columbarius has quit [Ping timeout: 480 seconds]
fmuellner has quit [Ping timeout: 480 seconds]
Brainium has joined #wayland
Moprius has joined #wayland
Brainium has quit [Ping timeout: 480 seconds]
Moprius has quit [Quit: bye]
Dami-star has quit []
jess has quit []
jess has joined #wayland
nerdopolis has joined #wayland
Kerr has quit [Read error: Connection reset by peer]
manuel__ has quit [Remote host closed the connection]
manuel1985 has joined #wayland
MajorBiscuit has joined #wayland
carbonfiber has joined #wayland
<pq>
Reading the GPU reset discussion on dri-devel@, if gfx API starts ignoring calls, leading to application window looking as if it was frozen, is there anything a compositor could or should do to help the end user? E.g. a dialog "This app seems to be stuck. Do you want to: - try to close it gracefully, - terminate it?"
<pq>
Ignoring gfx API calls means the the Wayland client is not stuck from protocol perspective, it just happens to not be posting any new surface contents.
<pq>
I can't think of a really good way to detect that situation in a Wayland compositor.
<orowith2os>
pq I don't think a compositor has any sane way to know anything's going on, realistically?
<emersion>
what would eglSwapBuffers do?
<pq>
not without some kind of "send me a new buffer now" event, anyway
<emersion>
a commit with no new buffer?
<emersion>
what if the compositor wants to resize?
<pq>
emersion, good question.
<emersion>
it seems like it could pretty easily end up with a protocol error
<orowith2os>
all it can do is see if it responds to pings across Wayland - unless somehow that stops responding too, which sounds more common?
<orowith2os>
based on my current knowledge
<orowith2os>
what are the chances that OpenGL or Vulkan dies, and the window containing their contexts doesn't stop functioning and responding?
<pq>
I'm thinking of the UX here which would be horrible: the window look frozen but the app is still acting on all input as it would normally
<emersion>
right. so from UX PoV, it would be best to detect the reset as soon as it happens, and then gray out the window
<pq>
what would stop the end user from clicking all over the place before they try to close the window? Possibly doing things they never intended.
<emersion>
also, the reset can be per-context, so per-window, rather then per-process
<emersion>
pq, if the compositor greys out the window, it can also block input
<pq>
I don't think the compositor would know if the app is actually using robustness and has handle the reset already
<emersion>
i'm purely talking from a UX PoV for now
<emersion>
ie, ignoring current APIs and technical limitations
<pq>
yes, that would be ideal, but how could we get there?
<emersion>
so, to react, the compositor needs some kind of signal
<pq>
or a better question: is this a problem worth solving?
<emersion>
i'm not sure
<pq>
me neither
<jadahl>
seems worth solving IMO. a frozen window with reactive invisible buttons seems bad
<emersion>
i've seen GPU resets only affect the whole GPU for now
<emersion>
jadahl: but have you seen this in practice so far?
<jadahl>
emersion: personally no, I have an intel GPU and the only hiccups I have are crashes and deadlocks deep in iris code
<emersion>
any user bug reports?
<jadahl>
mostly nvidia proprietary and amd, but we don't really handle resets very well
<emersion>
all i've seen so far from my side (personally and bug reports) are whole GPU reset
<jadahl>
yea, whole gpu
<pq>
I've seen games on Proton/Xorg get stuck leading exactly to this UX, and I suspect it's not a GPU hang or reset.
<jadahl>
but lets say we have a opengl client that doesn't know how to handle resets. how do we know a surface invalidation response has content from a new and shiny context?
cvmn has joined #wayland
<pq>
I'd assume gfx APIs refuse to post a buffer if they cannot actually draw it.
<emersion>
jadahl: there cannot be content from an old context, since the old context is gone
caveman has quit [Ping timeout: 480 seconds]
<pq>
maybe a compositor would take re-using old buffers as not having handled the reset
<pq>
or does reset handling not require re-creating gfx API surfaces too? e.g. EGLSurface
<emersion>
pretty sure it does
<emersion>
the GPU memory is gone
<emersion>
the whole EGL context needs to be re-created, and EGLSurface depends on the EGLContext
<emersion>
i haven't checked vulkan
<emersion>
(but would assume something similar)
<jadahl>
emersion: so if a new buffer is attached, it should be fin ethen
<emersion>
i think so, yeah
<jadahl>
just need to verify it's actually *new* and not attaching some old buffer
<jadahl>
so if it is dumb and re-attaches an old buffer, it must be continued to be greyed out
<jadahl>
only that we won't know that because it might have attached new valid buffers before the invalidation.. hmm...
<emersion>
vulkan: the whole logical device is lost, which the swapchain depends on
<emersion>
right, the ordering between compositor and client reset notifications is undefined
<jadahl>
an easy way out is to go on about that it's the responsibility of the client to have an up to date graphics context etc and if doesn't support that don't implement that extension
<jadahl>
for wl_shm ones it's trivial since the CPU is the graphics context
<jadahl>
but then I guess there is a chance that the compositor sees the reset, sends an invalidation, before the client knows about the reset
<pq>
emersion, does an EGLSurface actually depend on an EGLContext?
<pq>
eglMakeCurrent can mix and match arbitrarily
rederick29 has joined #wayland
<emersion>
ehhh
<emersion>
> Any EGL rendering context that was created with respect to config can be used to render into the surface.
<pq>
it kinds depends on EGLConfig, yes, but... :-)
<pq>
emersion, a GPU reset might not invalidate *all* GPU memory. But I think it should invalidate all memory a failing context may have been writing to. OTOH, the allocation might still persist and only contents are lost, which might be remedied by simply writing it all again, meaning the wl_buffer is still the same. Something would need to not do such shortcuts.
<emersion>
pq, eglMakeCurrent will report EGL_CONTEXT_LOST
<emersion>
but yeah the EGLSurface may continue to work with another context
<emersion>
in particular, with a fresh new context
orbit has joined #wayland
orbit has quit []
<emersion>
i don't know enough to say whether a buffer can survive a context loss
<pq>
exactly, and are buffers associated with EGLSurface, not context?
<emersion>
buffers are not exposed
<emersion>
it's all hidden inside the EGL impl
<pq>
yeah, but they exist
<pq>
does the app needs to destroy EGLSurface to ensure no old buffers are re-used?
<emersion>
right, so you mean EGL impls store buffers per-surface
orbit has joined #wayland
<pq>
does context lost require apps to destroy also EGLSurfaces?
orbit has quit [Remote host closed the connection]
<pq>
if yes and yes, then a compositor could be certain that when it sees a brand new buffer, it cannot come from the failed client context
<jadahl>
pq: how can it know the "brand new" buffer is from just before or just after the gpu reset?
<pq>
no and yes would be fine too. Yes and no would not.
<pq>
jadahl, exercise left to the reader :-p
<jadahl>
:P
<MrCooper>
the compositor can't know if the client has created a new context, or even if the client needs to create a new context
<pq>
emersion, swick[m], what's the design for libdisplay-info high level API returning structs? Should I store them in struct di_info to have implicit lifetime, or let the caller free() them, or?
<emersion>
i think it depends whether they are the same for the whole lifetime of the di_info
<pq>
they are the same
<emersion>
if that's the case, we've usually just stored them in di_info/di_edid/etc and then returned a const pointer… but now that i think about it…
<emersion>
this case is a tad different, since the struct is computed from di_edid
<pq>
from di_info, yeah
<emersion>
returning an alloc'ed struct would work, but if we add some more alloc'ed fields in there, then we'll leak, so would probably need to expose a _destroy() function for it too
<pq>
yes
<emersion>
taking a pointer an filling it would cause more issues, ABI-wise and alloc-wise
<emersion>
i think the simpler solution is still to return a const pointer, with everything in it owner by di_info
<pq>
I could store a pointer in struct di_info that is NULL until the getter is caller, and automatically all freed when di_info is destroyed?
<emersion>
owned*
<emersion>
yeah, that would work too
<emersion>
and would just be an impl detail, the API would be the same
<pq>
yup
<emersion>
i suppose one detail is whether this function would return NULL on failure
<emersion>
if the struct is alloc'ed, it needs to be able to return NULL
<emersion>
if the struct is embedded in di_info, and the function cannot fail, it can guarantee that it never returns NULL
<emersion>
but this also restricts our future API extensions
<emersion>
as always, it's a balance
<emersion>
i'd be fine with it either way
<pq>
we have precedence of high level API returning NULL already
<pq>
*precedent
<pq>
do you mean the caller should be able to tell the difference between alloc fail and no info?
<emersion>
hm, no
<pq>
I*d pick dyn alloc, store pointer in struct di_info for automatic free, and doc the API maybe returning NULL.
<emersion>
i mean that if the function never returns NULL, it'd save callers from doing NULL checks (at the cost of restricting our extensibility)
<emersion>
yeaah, that's fine by me
<pq>
cool
<emersion>
also, fwiw, while we are on this topic… i've been wondering whether i want to guarantee ABI stability for my libraries across major and minor releases
<emersion>
(API stability for sure, but ABI is different)
<emersion>
the consequence would be that dependant binaries would need a rebuild for major and minor library upgrades
Major_Biscuit has joined #wayland
<emersion>
and that it would allow us to be less constrained when designing APIs (especially when structs are fed to the library)
<emersion>
but yeah, it's an Unpopular Opinion™
MajorBiscuit has quit [Ping timeout: 480 seconds]
<pq>
you'd be using something else than semantic versioning if I understood right?
<pq>
I'd assume distributions would hate it as much as they hate bundling libs in apps.
MrCooper has quit [Ping timeout: 480 seconds]
Momentum has joined #wayland
<pq>
E-EDID... I guess priorities are DDDB > DI-EXT > base
<emersion>
semantic versioning is just about API AFAIK
<emersion>
not about ABI
<emersion>
so it's still semantic versioning:
<emersion>
- major versions break API
<emersion>
- major and minor versions break ABI
<emersion>
IOW, minor versions never require downstream code to change, just a rebuild
<emersion>
example ABI change which isn't an API change: adding a field to a struct
<pq>
I've always though the ABI is the most important thing to keep stable.
sbine has left #wayland [#wayland]
sbine has joined #wayland
<pq>
maybe that's just a quirk of C again, where ABI and API can be distinguished
<davidre>
Qt and KDE Frameworks for example keep ABI between Major versions
<davidre>
And we have a fun page that explains what you can and cannot do
<emersion>
pq, why is the ABI important to keep stable?
<emersion>
in all other languages, one doesn't need to concern itself with ABI
<pq>
emersion, so that distributions (and users) can upgrade libs without rebuilding the world.
<emersion>
badly phrased
<emersion>
but that's not an issue with Rust, or Go, or…
<pq>
yes, the C world is weird but it's also big
<pq>
yeah, it's not an issue if shared libs do not exist in the first place. You cannot update any single dependency without rebuilding a whole lot.
<pq>
I don't think you install any Rust or Go deps via distros?
kts has joined #wayland
<kennylevinsen>
you do, but it's a bit weird
<kennylevinsen>
I think that's why rust added support for rust shared libs
<emersion>
debian insists on packaging each and every Go dependency, as well as JS, etc
<kennylevinsen>
just like how some distros package python packages outside pip
<pq>
python packages I kinda understand since you never statically link anything... do you?
<kennylevinsen>
different technology, same intent
<pq>
Rust and Go are built on statically linking a monolithic executable, and anything aside from that is "you keep the pieces", isn't it? So "installing deps" is really just getting the sources, and not build artifacts?
<pq>
If "linking" and "build from source" are not distinguishable steps, then ABI in libs seems completely irrelevant.
kts has quit [Ping timeout: 480 seconds]
<kennylevinsen>
Rust suports building shared libs for use as Rust dependencies
<kennylevinsen>
Not sure if any distro uses it currently, but I imagine it allows packaging similar to C shared libraries
<pq>
kennylevinsen, yeah, but I understood that it is also absolutely unstable to changes in anything.
<pq>
so it might as well not exist for practical purposes
<pq>
It's hard for me to argue in any direction here, because I am only assuming what distributions want, and I've been criticised of that before.
<kennylevinsen>
I suppose that issue goes away when the distro picks compiler and source, but it does start to feel like a tangent. :)
<pq>
yeah, and then every app uses a slightly different version of the lib :-p
<pq>
I guess the fundamental reason to do anything is to be able to address (security) bugs with minimum work.
<emersion>
note, bugfix releases can still guarantee ABI stability
<pq>
I suppose we're not talking about languages which do not meaningfully have shared libs like C does.
<pq>
so, yes
<pq>
I guess distros do exactly than, backport patched to libs themselves.
fmuellner has quit [Remote host closed the connection]
<pq>
hmm, but if you want to read, shouldn't you ensure that the writer has finished? That is, check that no-one is holding the exclusive fence? There was something weird about this which may have been an AMD quirk. Polling per se does not "take locks" on the buffer, it's just a way to check if anyone else has a fence open.
nerdopolis has joined #wayland
<emersion>
AMD used to have a bug where POLLIN would always return immediately
<emersion>
aha, thanks, Sebastian has said the same thing
kts has joined #wayland
<MrCooper>
emersion: FWIW, shared libraries must change SONAME on backward incompatible ABI changes, or distros will complain (which they might also if the SONAME changes too often)
<emersion>
yes
<emersion>
with my scheme, i'd bump SONAME on major and minor versions
<daniels>
Momentum: if your focus is really just on the apps, and all you need is a launcher and a clock, then sure. if you want more detailed stuff like network/audio/etc control, notifications, etc, then it doesn't have those features
<Momentum>
i see
<Momentum>
i thought those are generally not related to the compositor
r00tobo[BNC] has joined #wayland
r00tobo[BNC] has quit [Remote host closed the connection]
<MrCooper>
emersion: don't expect enthusiasm for that from distros, e.g. they'll have to make sure a process doesn't end up linking in multiple versions of that library
<emersion>
same when a new major version is shipped
Moprius has joined #wayland
kts has quit [Quit: Konversation terminated!]
<daniels>
Momentum: some compositors allow you to add external panels/etc, but none of the libweston-based ones do
<kchibisov>
Assuming I have 3 configure event [A, B, C] arrived in the sayed order, is it fine if I ack B, draw with B, and then on the next size of the iteration I'll ack C and draw with C?
carbonfiber has joined #wayland
<kchibisov>
s/next size/next loop iteration/
<kchibisov>
From what I read it's fine, but I'm not sure how sane it would be to do so. My case is: my loop and window are on the main thread, but I render from some other thread and commit a surface from it as well, thus acking a configure from it would be more robust solution from what I can see?
<kchibisov>
And the reason I use `B` is because I've started rendering operation with B, but got C while doing the rendering.
<kennylevinsen>
I don't see why not - seems no different than C racing with a single-threaded client busy acking and rendering for B
<kchibisov>
It's a bit weird to design an API in library for that though.
<kchibisov>
Since most libraries do ack unconditionally, but they assume single threaded sometimes.