ChanServ changed the topic of #wayland to: https://wayland.freedesktop.org | Discussion about the Wayland protocol and its implementations, plus libinput
nerdopolis has joined #wayland
shoragan has quit [Quit: quit]
shoragan has joined #wayland
shoragan has quit [Remote host closed the connection]
shoragan has joined #wayland
co1umbarius has joined #wayland
columbarius has quit [Ping timeout: 480 seconds]
nerdopolis has quit [Ping timeout: 480 seconds]
nerdopolis has joined #wayland
Company has quit [Remote host closed the connection]
sargoe has quit [Remote host closed the connection]
nerdopolis has quit [Ping timeout: 480 seconds]
fmuellner has quit [Ping timeout: 480 seconds]
Brainium has quit [Quit: Konversation terminated!]
Guest6008 has quit [Remote host closed the connection]
cool110 has joined #wayland
cool110 is now known as Guest6199
<kchibisov>
emersion: you could create a surface of size 1x1, then you resize to 800x600 with scale 2, then you swap buffers -> you crashed, because you've commited 1x1.
<kchibisov>
Or, you have 2 windows rendering at your application, each window has the same context, you get a resize with the scale of 2, you call eglMakeCurrent, you resize, you apply scale -> you've crashed, because you've commited the old size with scale 2.
<kchibisov>
Because eglMakeCurrent latched the buffer.
<kchibisov>
So unless you study mesa's EGL code and bugs you'll have a very good chance to get crashes.
sima has joined #wayland
Plasmoduck has joined #wayland
bodiccea has quit [Ping timeout: 480 seconds]
tzimmermann has joined #wayland
Plasmoduck has quit [Ping timeout: 480 seconds]
junaid has joined #wayland
junaid has quit [Remote host closed the connection]
junaid has joined #wayland
andyrtr_ has joined #wayland
andyrtr- has joined #wayland
andyrtr has quit [Ping timeout: 480 seconds]
andyrtr has joined #wayland
junaid has quit [Remote host closed the connection]
andyrtr_ has quit [Ping timeout: 480 seconds]
heapify has joined #wayland
andyrtr- has quit [Ping timeout: 480 seconds]
bodiccea has joined #wayland
heapify has quit [Quit: heapify]
<emersion>
kchibisov: I don't understand what this buffer latching is about
<kchibisov>
Though, the toolkits could workaround that by setting viewporter dst_size to the one they've resized. So when everything matches compositors will ignore, but if you got a missmatch due to EGL stuff it'll 'amortise' the issue. Though, I'd rather have mesa fixed wrt that context stuff.
bodiccea has quit [Quit: Leaving]
<emersion>
i still don't understand why locking/unlocking buffers has any impact on what happens on the wire Wayland-wise
<kchibisov>
The user might think that they've resized the buffer to account for the new scale.
<kchibisov>
But in reality they don't.
<kchibisov>
However the scale will be applied.
<emersion>
so, makecurrent, resize, set_scale, swapbuffers, and you got a wrong size?
<kchibisov>
So if you do (your surface was 501x401) eglMakeContextCurrent(), wl_egl_resize_surface(800x600), wl_surface.set_buffer_scale(2), eglSwapBuffers() -> crash, 501 not dividable by 2.
<kchibisov>
yes.
<emersion>
why do you resize while the context is current?
<emersion>
it sounds purely like a client design issue
<kchibisov>
If you have 2 windows at the same time one could assume that to start working with the surface you need to make it current.
<kchibisov>
That's what Qt for example did.
<kchibisov>
And that's what I've assumed until I've read all the mesa's wayland egl code.
<kchibisov>
And you can clearly resize once the context is current, you just can't call the function to make it current.
<kchibisov>
One other case, is if you don't do eglMakeCurrent, but do wl_egl_create_window.
<kchibisov>
Because it'll do the same to the actively current context.
<kchibisov>
Hm, maybe it was for eglCreateContext actually, so if you have a current context and you create a new one, it'll latch the current one.
<kchibisov>
One could make context current to compile shaders for example, then make it not current, but the buffer will still be latched, because you've called eglMakeCurrent.
<MrCooper>
"latched" as in "attached and committed"?
<kchibisov>
latched as in resizing won't apply until the eglSwapBuffers is called.
<kchibisov>
So your resize will go into the next frame.
<kchibisov>
But your scale can go to this frame.
<kchibisov>
Because scale is set by client, but resize is done by mesa.
<MrCooper>
so the client must only change buffer scale if it also calls eglSwapBuffers?
<kchibisov>
They must change the buffer scale before the buffer gets latched.
<kchibisov>
So before the operations listed in egl spec (buffer age, etc), eglMakuCurrent(probably a bug), eglCreateContext(if there's any other current context on the calling thread, probably also a bug).
<kchibisov>
The client though, could ensure that their size actually applied by querying the size.
<kchibisov>
So they could try to resize, check if they resized, and if the sizes matches they could set buffer scale.
junaid has joined #wayland
junaid has quit []
iomari892 has joined #wayland
<kchibisov>
Maybe I should stop caring about that and tell every 'my egl application crashes, your toolkit is bugged' bug report that it's not my issue and close them.
<kchibisov>
Once fractional scaling will be more adopted it won't be a thing anyway.
<emersion>
it is a client bug though
<emersion>
and i will not stop crashing your client
<kchibisov>
My client is fine.
<kchibisov>
Because I know how to write and when it'll latch.
<kchibisov>
But I still believe that eglCreateContext must not latch.
<kchibisov>
Because it doesn't even make any sense for it to latch other arbitrary context.
<pq>
_DOOM_ should not use libwayland-server as a generic utility library because if those utilities are lacking in some way, enhancements will probably not be accepted. That goes especially with the event loop stuff, and the other stuff is likely frozen by ABI anyway. So it's a risk having to use something else in the future anyway as your requirements grow.
<jadahl>
emersion: i'm not a flatpak maintainer, I can only try to nag again
iomari892 has joined #wayland
iomari891 has quit [Read error: Connection reset by peer]
cmichael has joined #wayland
<pq>
kchibisov, you have to be really *really* careful with EGL in order to be sure of the buffer size after resize is what you expect. Once EGL or GL internally needs a buffer, it's size is locked until the next swapbuffers. Use wl_egl_window API to query it, I think? Or always wl_egl_window_resize() immediately after eglSwapBuffers and live with that.
<kchibisov>
pq: I know, I'm just afraid that agressive 'kill the client logic' could make more harm with egl.
<kchibisov>
Like you update you kwin, and then none of the Qt apps launch anymore.
<kchibisov>
Because they have broken EGL handling.
<kchibisov>
Which worked for them on anything other than Wayland.
<kchibisov>
pq: my issue though, is not gl operations, but egl operatons which don't do any rendering or affect the buffer.
<kchibisov>
While it's not an issue right now, because you can't possibly crash due to that with the scaling from the wl_output.enter, once the wl_surface::preferred_buffer_scale will be used it'll start crashing in addition.
<kchibisov>
Simply because you render the first frame with the scale of 1 and scaling of 1 + broken size is not a real issue other than a 'glitch', but with the new event, it'll be delivered to you along the first configure, you'll apply that scaling and crash if the latched buffer was not dividable.
<pq>
No, you definitely should crash regardless of where you got your scale factor from.
<kchibisov>
I mean, that with legacy you render first frame at scale of 1.
<pq>
The protocol error is that the client is internally inconsistent: it promises the buffer size + viewporter results in integer surface size, and it doesn't.
<kchibisov>
I understand, but I'm saying that client doesn't really control the buffer size with mesa.
<pq>
oh, the initial scale 1
<pq>
client does control the buffer size with Mesa EGL, but it very hard to get it right.
<kchibisov>
You have a broken result even when you 1) Create a EGL surface of size 800x600 2) and the next operation you do is an instant resize to 900x700.
<kchibisov>
So whatever you've passed initially will be used.
<pq>
yeah, you'd better use the right size from the beginning.
<kchibisov>
Right, but it's sooo common, because folks don't want to have a Nullable type.
<pq>
It's really wasteful to create one size and then immediately resize.
<kchibisov>
The issue I'd at least fix is the eglCreateContext/Surface latching other context.
<pq>
I don't understand how a Nullable type relates here.
<emersion>
pq, if you create an EGLSurface with arbitrary size at init time, your surface is guaranteed to never be null
<emersion>
but yeah it's not great
<pq>
yes, I'd those are bugs
<emersion>
allocating large slices of GPU memory is slow
<kchibisov>
I've seen a lot of folks doing a 1x1 wl_egl_window stuff.
<kchibisov>
And then they resize to the right thing on new configure.
<kchibisov>
I've fixed myself 3-4 other folks clients due to that...
<kchibisov>
And there's still a Qt issue due to exact same reason...
<pq>
What do you want to happen? Compositors stop sending the protocol error for a while?
<kchibisov>
I would at least don't send them for dmabuf for a while.
<pq>
Compositors certainly can stop sending protocol errors if they want to.
<kchibisov>
If it was an error from wl_shm they must kill the client.
<pq>
but if compositors stop sending the error, then the problem disappears. Why would any client side get fixed then?
<emersion>
that sounds like a bad workaround
<pq>
can people actually see the glitch?
<kchibisov>
I can.
<pq>
oooh, that gives me an idea
<kchibisov>
That's the reason I'm aware of all of that, because I was fixing the glitch.
liquidh20 has joined #wayland
<emersion>
EGL can do shm, too
<kchibisov>
Hm, right...
<kchibisov>
And it'll have the exact same logic as dmabuf path.
<pq>
instead of a protocol error, a compositor could use placeholder "this window is bugged" graphics on the wl_surface until the size requirement is respected again.
<pq>
make the glitch much more severe while not killing the app
<kchibisov>
Just draw a bright pink window, it's really annoying.
<pq>
yeah, or if you want it fancy, have a text in it saying to file a bug or something - so if it doesn't go away immediately and the window becomes unusable, the user gets a clue.
<kchibisov>
Though, the thing is that once fractional scaling is in use, this glitch sort of goes away.
<pq>
why?
<kchibisov>
you'll have a wrong scaled client, but you can't kill it anymore.
<pq>
huh? why?
<kchibisov>
Because the cliest asked to be scaled to dst via viewporter.
iomari892 has quit [Read error: Connection reset by peer]
iomari892 has joined #wayland
<pq>
hwre does the requirement of integer wl_surface size disappear?
<pq>
doesn't fractional scaling depend on viewporter, which still guarantees integer wl_surface size?
<kchibisov>
I mean, it's all integer, you just can submit a buffer of wrong size.
<kchibisov>
And all the buffers are scale 1.
<pq>
oh course, but that's not a protocol error
<kchibisov>
exactly.
<kchibisov>
So the issue won't get fixed, but masked.
<pq>
we're not after wrongly scaled clients, we're after protocol errors
<kchibisov>
yeah, that's true.
<pq>
wrongly scaled clients are client bugs, we compositors don't care
<kchibisov>
I'm just saying that the root cause might not get actually fixed.
<pq>
right
<kchibisov>
Like if all toolkits do fractional scaling and all compositors do fractional scaling, the real issues will be tolerated.
<pq>
it just joins set the of other client bugs that a compositor cannot detect or warn about - life as usual
<kchibisov>
And given that Qt and kwin can do fractional scaling, the issue is not that big of a deal for kwin.
<pq>
yup, unless people see the glitch, which the compositor now cannot make more visible either
<kchibisov>
The only compositor where you can really observe glitches is sway, because it's tiling.
<pq>
oh, I thought you were able to see the glitch on a floating window
<kchibisov>
I know that glitch is observable in gnome and sway.
<kchibisov>
And in gnome the way to observe it is to try start your window maximized.
<kchibisov>
If the buffer is not matching what gnome wants it won't make it maximized.
<q234rty>
tbh I'm running a patched wlroots with that particular protocol error removed since the hidpi Xwayland patch I'm using also triggers that
<riteo>
the popup gets done, it gets a new buffer attached, gets destroyed, commits, syncs (I don't remember if I added it for good luck) aaand... Errno 32.
<riteo>
(the segfault is artificial)
<riteo>
uh by get destroyed I meant damaged, sorry
Nosrep has quit [Remote host closed the connection]
<kennylevinsen>
are you reading/flushing the display yourself? if this had been a dispatch, it should have shown the protocol error that caused the disconnect
<riteo>
I'm handling event handling myself yes
<riteo>
but I do get protocol errors
<kennylevinsen>
it could look like you flushed yourself, and exploded immediately as write failed despite there still being stuff to read
<riteo>
Actually I don't think I ever manually flushed
<kennylevinsen>
I don't see a protocol error here, just "error 32 while flushing the Wayland display"
<riteo>
sorry, I meant that I usually get protocol errors
<kennylevinsen>
(protocol error means that you received a message from the server telling you exactly what crime the client committed, followed by disconnection)
Nosrep has joined #wayland
<kennylevinsen>
right
<riteo>
oh I wrote flushing in the error message thinking about it
<riteo>
lemme check
<riteo>
oops I indeed flush in the main polling thread lol
<riteo>
there's an error condition in there though
<kennylevinsen>
there are times it makes sense to flush, but you should not die on the failed *write* - wait for the failed dispatch/read
<riteo>
and as I said I sometimes get protocol errors, also I have other logic to read the error so I'm not sure what I'm doing wrong there
<kennylevinsen>
without the debug log *with* the protocol error, I cannot say what is done wrong
<kennylevinsen>
but the complete log should show the exact sequence of events causing the issue
<riteo>
oh wait, so I should keep dispatching even after a failed flush?
<kennylevinsen>
dispatch until failed dispatch
<riteo>
all right, let me change the code
<kennylevinsen>
if you stop after failed flush, you might still have stuff to read (such as the protocol error!)
<riteo>
ohhh I see now, thanks for letting me know
<riteo>
noooooo
<riteo>
> xdg_surface@71: error 3: xdg_surface has never been configured
<riteo>
it is indeed the dreaded issue I linked then
<riteo>
thanks a lot for helping me with debugging the issue btw!
DPA2 has joined #wayland
<riteo>
sooo... I can only hope that mutter/kwin have some saner behaviour, otherwise we're in a bad situation
<riteo>
but first I'll make a new log
DPA has quit [Ping timeout: 480 seconds]
junaid has quit [Quit: leaving]
junaid has joined #wayland
<riteo>
wait all of a sudden it looks like it doesn't dispatch anymore, it just spins...
<kennylevinsen>
Unless it's a compositor bug, you're always in a bad place when you have protocol errors - even if other compositor are more lenient... :)
<kennylevinsen>
it dispatches until there is nothing left to dispatch (first loop), then writes, waits for data, and dispatches regardless of whether poll failed
heapify has quit [Quit: heapify]
<kennylevinsen>
Dispatching events first ensure that you do not block waiting for new stuff to read when you already have old stuff to do (deadlock), and dispatching after is... Well because you just read stuff so you have work to do.
<riteo>
I put this into a while loop though
DPA2 has quit [Ping timeout: 480 seconds]
<riteo>
so it'll prepare read and dispatch right away
<riteo>
it's a dedicated thread
<kennylevinsen>
In that case you can just as well call wl_dispatch
<riteo>
twice?
<kennylevinsen>
*wl_display_dispatch
<kennylevinsen>
Instead of rolling your own loop
<riteo>
oh there's a reason
<riteo>
we need a mutex
<riteo>
the main thread can do some stuff so locking avoids changing/reading data while the events thread is still changing it
<kennylevinsen>
Either way, follow the suggested standard method unless you're certain you do not need it - and certain you want to debug thsy yourself ;)
<kennylevinsen>
*that you dumb phone
* riteo
shrugs
<riteo>
I'll just implement it as the docs say
<riteo>
well, as I said I'll make a new log, send it here and... dunno, the issue is known
<riteo>
it's definitely a bug though as inert objects should just take requests as no-ops
<riteo>
that's just the nature of the asynchronous protocol
<emersion>
drakulix[m], d_ed[m]: do you want to meet sometime this week?
<drakulix[m]>
I was under the impression that there is a KDE conference this week and thus they couldn't make it.
<drakulix[m]>
And given the protocols we want to talk about, I was thinking next week might be a better time for the next w-p meeting.
<emersion>
oh right
<emersion>
sorry, i didn't remember
<drakulix[m]>
btw, do I remember correctly, that you wanted to post the minutes to the wiki?