<fdobridge>
<redsheep> Thanks, now that I'm not going office space on a printer I can head home and test
<fdobridge>
<gfxstrand> @zmike. No courage to review my damage MR? 😛
<fdobridge>
<zmike.> look if I couldn't do math 3 years ago when my brain actually worked I'm not gonna be able to do it now when I can barely remember what I had for lunch
<fdobridge>
<redsheep> Almost exactly getting PC load letter, felt like the twilight zone
<fdobridge>
<gfxstrand> But I fixed it by deleting all your math. 😛
<fdobridge>
<gfxstrand> Maybe we can con Ken into reviewing
<fdobridge>
<zmike.> maybe contact mensa
<fdobridge>
<zmike.> simple arithmetic seems like something they could handle
<fdobridge>
<zmike.> but me?
<fdobridge>
<zmike.> no way
<fdobridge>
<gfxstrand> lmao
gfxstrand has joined #zink
<fdobridge>
<gfxstrand> I poked Ken on IRC.
<fdobridge>
<gfxstrand> He might have the courage to review damage code. Or the blind *faith* to just trust me. :frog_upside_down:
<fdobridge>
<Owo> I wanna do some testing with having mutter re-render the ENTIRE surface
<fdobridge>
<gfxstrand> Just put a `return;` at the top of the function
<fdobridge>
<Owo> ~~forgot to add a "without patching the code" clause~~
<fdobridge>
<gfxstrand> But also that doesn't guarantee anything. The damage is just a hint. It doesn't actually make mutter re-draw
<fdobridge>
<gfxstrand> In order to do that, you need to disable the buffer age extension
<fdobridge>
<Owo> if I disable passing damage regions in zink, mutter *should* redraw everything, unless it does a diff on the buffers on its end, right?
<fdobridge>
<gfxstrand> You need to disable the EGL_buffer_age extension
<fdobridge>
<Owo> grr
<fdobridge>
<Owo> lemme take a look
<fdobridge>
<gfxstrand> platform_wayland.c:3093
<fdobridge>
<Owo> no environment variable I can use to disable them without recompiling?
<fdobridge>
<gfxstrand> I don't think so
<fdobridge>
<gfxstrand> Or at least not that I'm aware of
<fdobridge>
<Owo> maybe I should make that one of my first few Mesa PRs :wires:
<fdobridge>
<gfxstrand> Also, I think we still have some sort of buffer_age/damage issue. It's way more minor but I'm still sometimes seeing my screen go back by a frame or two.
<fdobridge>
<redsheep> I'm so glad you're looking at these because you're describing exactly the things that have been the most annoying on nvk+zink for like... probably the whole last year
<fdobridge>
<redsheep> About like, thanos snapping the zink bugs in half a few times
<fdobridge>
<redsheep> At least when it comes to sessions and ui
<fdobridge>
<Owo> ugh
<fdobridge>
<Owo> even unsetting all the variables, firefox is still on sw
<fdobridge>
<gfxstrand> Actually, I'm starting to suspect this one is an sRGB issue. Usually when I see a flash, it's mostly the right thing, just not quite right. That doesn't scare the hell out of me...
<fdobridge>
<Owo> oh. haha.
<fdobridge>
<Owo> Flatpak didn't have any GL drivers installed.
<fdobridge>
<Owo> and it still doesn't work
<fdobridge>
<Owo> whyyyyyy
<fdobridge>
<Owo> Vulkan works fine
<fdobridge>
<mhenning> what are you running under flatpak? I don't actually know how to get flatpak to use a user-built graphics driver
<fdobridge>
<Owo> uhm, everything
<fdobridge>
<mhenning> I tend to avoid flatpak for that reason
<fdobridge>
<Owo> you just build it yourself under org.freedesktop.Platform.GL and set FLATPAK_GL_DRIVERS to the name
<fdobridge>
<Owo> I got it working, `flatpak update` worked
<fdobridge>
<Owo> turns out I needed the 23.08 branch, Firefox was still on that
<fdobridge>
<mhenning> Okay, as long as you know you need to do additional steps for flatpak to get the new driver
<fdobridge>
<Owo> everything app-wise on my system is running under Flatpak except a system resources monitor (which has some issues under flatpak due to glibc shenanigans)
<fdobridge>
<Owo> well, it's working now
<fdobridge>
<Owo> let me go build that zink branch
<fdobridge>
<Owo> If anybody ever wants help with Flatpak bs btw, just @ me.
<fdobridge>
<Owo> The situation is either underdocumented (and I probably have docs saved in my brain or smth), or `flatpak update`
<fdobridge>
<Owo> @redsheep @ermine1716 if yall want to test Firefox on your end with the fixes ^^
<fdobridge>
<Owo> (or just strip the sources from the linked file if you want to do it system-wide)
<fdobridge>
<Owo> (yall might have to change things up for NVK and ANV, since my manifest there only builds AMD and zink)
<fdobridge>
<redsheep> I can get to it in a little bit. I never was having any crashing so I can probably just use the arch package just the same as before
<fdobridge>
<gfxstrand> Remember to grab the branch with everything instead of pulling individual MRs.
<fdobridge>
<gfxstrand> Yay
<fdobridge>
<Owo> The Mutter bug is still there, I guess, but we're not hitting it anymore?
<fdobridge>
<Owo> Is that what's going on?
<fdobridge>
<gfxstrand> I'm not sure. I need to dig into that tomorrow. It may have been a ghost.
<fdobridge>
<gfxstrand> I thought doing double commits made everything worse but I'm not convinced anymore.
<fdobridge>
<redsheep> Hmm. I must be having different issues. My desktop is quite unresponsive intermittently, which seems to be new. Could be an update to something else, so if I am the only one seeing that on nvk+zink in particular probably no cause for concern yet
<fdobridge>
<redsheep> But also, my particular firefox rendering issues remain unchanged, so far as I can tell
<fdobridge>
<redsheep> New mega-fix branch from Faith bug grinding seemingly all day
<fdobridge>
<Sid> interesting
<fdobridge>
<redsheep> I was pretty hopeful all of my issues overlapped with these fixes, but unfortunately it appears they do not
<fdobridge>
<Sid> f
<fdobridge>
<redsheep> it's annoying how fast my ssd and filesystem are, I can only get like 15 good seconds of fast scrolling out of my terminal by typing tree
<fdobridge>
<redsheep> I'm attempting to replicate the more general desktop flicker. I think that miiiight be fixed for me? That is good if true but was also so difficult for me to replicate I can't really call it yet
<fdobridge>
<Sid> let's try the branch
<fdobridge>
<redsheep> Wow this last time I went to open firefox it briefly had a load of corrupted stuff appearing that pulled from discord on my other monitor. It flickered that stuff in and out for a few seconds, before stabilizing on just being firefox
<fdobridge>
<redsheep> If you don't find anything wrong try playing some games. I am getting wicked long stutters, and pretty frequently when under load.
<fdobridge>
<Sid> opengl games?
<fdobridge>
<redsheep> Well I typicall test minecraft when I am doing a quick game but I don't think it's specific to that, I think it's something hitching the display server
<fdobridge>
<redsheep> Well I typically test minecraft when I am quickly checking a game but I don't think it's specific to that, I think it's something hitching the display server (edited)
<fdobridge>
<Sid> so opengl games, got it
<fdobridge>
<Sid> (since they go through zink)
<fdobridge>
<redsheep> Right but with how the freezing behaves it's clearly not the game freezing, it's the game loading *something* and causing the entire session to briefly stall
<fdobridge>
<Sid> fwiw MR 33855 regressed plasma wayland for me
<fdobridge>
<Sid> oh wait wtf
<fdobridge>
<Sid> nvm I'm still on x11 apparentl
<fdobridge>
<Sid> nvm I'm still on x11 apparently (edited)
<fdobridge>
<Sid> :doomthink:
<fdobridge>
<redsheep> So far I am testing x11 as well since that is the more stable session for me and that is the baseline for most of my testing, I will do wayland next. I believe I have confirmed the hanging is not specific to gl games going through zink
<fdobridge>
<Sid> ogay
<fdobridge>
<gfxstrand> I'm starting to think there's something even more seriously wrong with your multi-monitor setup. Possibly affecting other stuff. If Firefox is pulling from other Windows that's either memory that should have been zeroed and wasn't or something is broken with process isolation.
<fdobridge>
<redsheep> It's an X session so I just assumed it was compositing breaking, but now that you mentioned it if it is app specific that does seem bad
<fdobridge>
<gfxstrand> Yeah, with X weird things can happen
<fdobridge>
<redsheep> Moving onto wayland now
<fdobridge>
<redsheep> Oh dear. Um. Yeah I can't test a damn thing
<fdobridge>
<mhenning> the nouveau kernel driver doesn't zero vram. It's pretty common to get garbage across processes with it
<fdobridge>
<gfxstrand> I was seeing some GPU hangs with Zink on ANV today, which I think are unrelated to the WSI stuff I fixed. We may not be out of the woods on Zink bugs just yet.
<fdobridge>
<redsheep> The Wayland session is pretty angry
<fdobridge>
<gfxstrand> Uh... Pretty colors?
<fdobridge>
<Sid> bisexual lighting
<fdobridge>
<Sid> anyway, reboot and log in again
<fdobridge>
<redsheep> I think it's my background just going apeshit
<fdobridge>
<Sid> there's a plasma bug that breaks the wayland session if you log out of an x11 session and switch to wayland
<fdobridge>
<Sid> I've hit it on nv prop too, where it fails to put up the wayland session entirely
<fdobridge>
<redsheep> Wow this lack of zeroing is really something. I soft rebooted so I could have a fresh go at x11 but I accidentally did Wayland again, and it shows the same pixels... Including the ones that appear to be vram from discord... From before the soft reboot
<fdobridge>
<redsheep> If so that bug also survives soft reboots
<fdobridge>
<Sid> session specific?
<fdobridge>
<redsheep> Wayland was the first session for that
<fdobridge>
<Sid> ..nuts
<fdobridge>
<Sid> bah
<fdobridge>
<redsheep> Well it was x11 since Wayland is kill
<fdobridge>
<Sid> debuginfod is taking five billion years to download
<fdobridge>
<redsheep> I'll try a complete reboot and start with wayland
<fdobridge>
<redsheep> Just have to wait 75 years for my bios
<fdobridge>
<Sid> w/ the all-the-fixes branch my x11 session has become worse :ahh:
<fdobridge>
<Sid> kwin has honest-to-god crashes now
<fdobridge>
<redsheep> Oh wow, I'm not crashing at all. How are we all seeing such wildly different results?
<fdobridge>
<Sid> am🅱️ere vs a🇼a
<fdobridge>
<Sid> /j
<fdobridge>
<redsheep> Yeah no, Wayland first hits the same issue, just a lot more black without lots of garbage already in vram
<fdobridge>
<redsheep> Interestingly this profoundly corrupt session is actually functional, I just have to be kind of blindly using it. I managed to get Firefox open and um. Yeah I can't tell if it's flickering through the more complete corruption covering it
<fdobridge>
<Sid> funnily enough
<fdobridge>
<Sid> sddm-wayland is perfectly fine
<fdobridge>
<redsheep> How do
<fdobridge>
<Owo> I'd try gamescope with an OpenGL client (fuck it, Vulkan too, just to see if anything changes), and I'm pretty sure there are other KMS clients out there
<fdobridge>
<Owo> That one is what I'm talking about ^
<fdobridge>
<Owo> Not sure if it uses a graphics API tho
<fdobridge>
<mhenning> Are y'all connecting your displays to your nvidia gpu, or is a different card driving the display?
<fdobridge>
<Sid> :frog_nvidia:
<fdobridge>
<Sid> over displayport
<fdobridge>
<Owo> Ah, yeah, gl, it's good then
<fdobridge>
<redsheep> No other GPU enabled
<fdobridge>
<Sid> ~~gamescope is also a compositor btw~~
<fdobridge>
<gfxstrand> I think some of what we're seeing is some sort of KMS interaction.
<fdobridge>
<gfxstrand> Does Weston work?
<fdobridge>
<Sid> let me check
<fdobridge>
<Owo> That's partially why I recommended gamescope, because I expected it to be simpler than kwin or mutter in terms of how it does things under kms
<fdobridge>
<gfxstrand> (I ask because that's a little easier to repro with than GNOME or KDE)
<fdobridge>
<Owo> Weston completely left my mind :wires:
<fdobridge>
<redsheep> Appears the kmscube aur package doesn't work
<fdobridge>
<Sid> BESTon seems to be fine
<fdobridge>
<Sid> like, perfectly fine
<fdobridge>
<Sid> let's try sway for the hell of it
<fdobridge>
<Sid> wlroots based
<fdobridge>
<Sid> and also try gnome
<fdobridge>
<Sid> :ha:
<fdobridge>
<redsheep> Um I don't have Weston but I have openbox? It's just a black screen with cursor but I don't remember how to use this at all so for all I know that's normal
<fdobridge>
<gfxstrand> So that's definitely a tiling issue. It's rendering to tiled and displaying linear or vice versa. The good news is that shouldn't be total hell to track down if we can reproduce with something I can reasonably debug.
<fdobridge>
<Sid> openbox is x11 however
<fdobridge>
<redsheep> Oh like 3 minutes later it went gray and started responding to the right click context menu, it does work. It's odd that it was delayed
<fdobridge>
<Sid> does look like it, yeah
<fdobridge>
<Sid> gonna try sway and mutter too
<fdobridge>
<Sid> to check one wlroots comp and mutter
<fdobridge>
<Sid> sway is also fine
<fdobridge>
<gfxstrand> As long as it doesn't crash, debugging KWin might not be terrible. I haven't done much with it before, though.
<fdobridge>
<Sid> kwin does crash on x11 sometimes, wayland doesn't however
<fdobridge>
<Sid> for me at least
<fdobridge>
<redsheep> Fascinating, openbox seems to flicker Firefox more rapidly than plasma did, but it's about the same effect otherwise
<fdobridge>
<redsheep> I think that may just be the compositor going faster
<fdobridge>
<Sid> ~~mother~~ mutter is also fine
<fdobridge>
<gfxstrand> If it's a regression, you might be able to bisect. Nothing I did today affects tiling.
<fdobridge>
<Sid> so looks like only 🅱️lasma is running into this tiling issue
<fdobridge>
<Sid> will do
<fdobridge>
<Sid> though, it only broke on your branch :D
<fdobridge>
<Sid> can try current main again to confirm
<fdobridge>
<gfxstrand> Did you test main from today?
<fdobridge>
<gfxstrand> Because I'm gonna be surprised if a tiling issue bisects to one of today's patches.
<fdobridge>
<Sid> on it
<fdobridge>
<gfxstrand> Anyway, I'm off to sleep now. I'll check back in in the morning.
<fdobridge>
<Sid> have yesterday's main on disk, will check today's
<fdobridge>
<Sid> goob night :hug:
<fdobridge>
<Sid> it is a regression
<fdobridge>
<Sid> happened somewhere in the last 50 commits
<fdobridge>
<ermine1716> Good night
<fdobridge>
<Sid> this regression might be unrelated to mesa
<fdobridge>
<Sid> :doomthink:
<fdobridge>
<Sid> euh
<fdobridge>
<Sid> :ConfusedDoggy:
<fdobridge>
<Sid> oh, nvm
<fdobridge>
<Sid> am dumb
<fdobridge>
<Sid> time to restart this bisect
<fdobridge>
<ermine1716> firefox doesn't seem to crash anymore for me
<fdobridge>
<ermine1716> i can try to run sway
<fdobridge>
<ermine1716> it seems to work
<fdobridge>
<Sid> tempted to buy a fingerprint reader for my pc so I don't have to enter my password so many times
<fdobridge>
<ermine1716> i have one on my laptop but i don't use it
<fdobridge>
<ermine1716> KWin crashes though
<fdobridge>
<Sid> @gfxstrand here's the offending commit that breaks kwin wayland
<fdobridge>
<Sid> ```
<fdobridge>
<Sid> df1ff3c711459467432fcd48f7348a8aa78de814 is the first bad commit
<austriancoder>
DodoGTA: looking at mesa's sources I do not see any clear path that libglvnd is used for opengl entry points (without using using glx)
Sid127 has joined #zink
<fdobridge>
<Sid> austriancoder: have you looked at the glvnd source/readme?
<fdobridge>
<Sid> > libglvnd is a vendor-neutral dispatch layer for arbitrating OpenGL API calls between multiple vendors. It allows multiple drivers from different vendors to coexist on the same filesystem, and determines which vendor to dispatch each API call to at runtime.
<fdobridge>
<Sid> > Both GLX and EGL are supported, in any combination with OpenGL and OpenGL ES.
<austriancoder>
I even have build it from the sources for my non-linux target :)
<austriancoder>
and even gdb'ed into it
<austriancoder>
and even looked at the mesa sources I am building
<austriancoder>
thats why I am asking such a technical question
<austriancoder>
for libegl i see how its done in mesa - but not for libgl without any glx impl
<zmike>
probably ask in #dri-devel
<austriancoder>
zmike: will do
<fdobridge>
<Sid> ah, apologies 😅
<austriancoder>
np - been doing too much gdb'ing lately .. need a coffee :)
<fdobridge>
<gfxstrand> @zmike. Is that patch doing what I think it's doing? If Zink gets a dma-buf import without modifiers specified by the client, it now looks at the gallium driver and uses whatever modifiers it supports? If so, that's very very bogus.
<fdobridge>
<zmike.> **export**
<fdobridge>
<gfxstrand> That's still very surprising from an EGL clien'ts PoV. It created an image without modifiers, goes to export it, and boom! modifiers.
<fdobridge>
<fooishbar> that's why that extension is criminally terrible and should not exist
<fdobridge>
<gfxstrand> Really, EGLImage is the terrible thing.
<fdobridge>
<fooishbar> EGLImage is fine
<fdobridge>
<fooishbar> just use it as a reference to something you've exported, rather than trying to turn it into some kind of external-facing allocation API
<fdobridge>
<fooishbar> anyway, I have nfi the implications of that commit, but if it's to support that export, it should only apply to a resource which was created internally with no modifiers set
<fdobridge>
<fooishbar> anyway, I have nfi the implications of that commit, but if it's to support that export, it should only apply to a resource which was created internally with no modifiers set, and also never previously exported (edited)
<fdobridge>
<zmike.> and this is what it does
<fdobridge>
<zmike.> it's for enabling compression on CL exports
<fdobridge>
<zmike.> though also it will potentially impact anyone using this
<fdobridge>
<fooishbar> I can't see KWin calling ExportDMABUF thus far
<fdobridge>
<zmike.> but idgaf too much there
<MoeIcenowy>
EGLImage is surely the terrible thing
<MoeIcenowy>
but it has to exist
<MoeIcenowy>
KWin uses it to import windows contents
<MoeIcenowy>
Xorg uses it to implement front buffer rendering with EGL
<fdobridge>
<fooishbar> @tiredchiku I think you want to insert a breakpoint in the changed bit and get a backtrace from there
<MoeIcenowy>
EGL_MESA_image_dma_buf_export isn't so bad then -- it does not promise success
<fdobridge>
<gfxstrand> That would help. Though I think I may have an idea what's going on.
<fdobridge>
<fooishbar> it doesn't seem like KWin calls ExportDMABUF anywhere, and it's presumably not trying to share an image between GL & CL, so presumably 'let's funge modifiers' is happening in a context it shouldn't
<MoeIcenowy>
and surely we should be able to export something
<fdobridge>
<fooishbar> export it to where?
<MoeIcenowy>
any other APIs
<MoeIcenowy>
or even other devices
<fdobridge>
<fooishbar> right
<fdobridge>
<fooishbar> so do you export in the optimal format that is going to work well for you, or do you assume everyone always supports linear so only ever export to that?
<MoeIcenowy>
Well you can say that EGLImages should be allocated with specialized APIs
<MoeIcenowy>
such as GBM
<fdobridge>
<fooishbar> either you're broken or you're slow
<fdobridge>
<fooishbar> which is why every other API has explicit negotiation
<fdobridge>
<gfxstrand> @zmike. What's the difference between `zink_resource` and `zink_resource_object`?
<fdobridge>
<Sid> :birdnotes:
<fdobridge>
<zmike.> the former is the state tracker object, the latter is the internal state tracker object
<fdobridge>
<zmike.> the former is the public state tracker object, the latter is the internal state tracker object (edited)
<MoeIcenowy>
fooishbar: for other APIs on the same device, the answer is the former; for other devices, it's the latter
<fdobridge>
<Sid> I'll do it tomorrow morning, I was gonna go ~~play~~ test a game for a bit and then wind down with S2 of The Expanse
<fdobridge>
<fooishbar> yes, but when you say 'export this image as a dmabuf', you don't know which device(s) or API(s) it'll be imported into
<fdobridge>
<Sid> (it's 2104 where I am)
<fdobridge>
<fooishbar> so Mesa can pick the option which might not work, or the option that will be slow
<fdobridge>
<zmike.> S2 is when it starts to get good
<fdobridge>
<fooishbar> hence why that API is fundamentally misconceived
<MoeIcenowy>
well, do we have some more thing than EGL_MESA_image_dma_buf_export ?
<fdobridge>
<fooishbar> gbm
<fdobridge>
<Sid> I've heard 😅
<MoeIcenowy>
gbm can only allocate things
<fdobridge>
<fooishbar> also vk alloc, or pretty much everything else
<fdobridge>
<fooishbar> yes
<MoeIcenowy>
it cannot export things already in some API
<MoeIcenowy>
(and I think the implementation of gbm is also a hack
<fdobridge>
<fooishbar> that's why you allocate with gbm (a real allocator) and import into whatever 'some API' is
<MoeIcenowy>
gbm cannot cooperate with another libEGL
<MoeIcenowy>
e.g. gbm is built from one Mesa version and libEGL is built from another
<fdobridge>
<fooishbar> you can use gbm to allocate without having to touch any of the parts of GBM which interact with EGL
<MoeIcenowy>
then export dmabuf from gbm and import dmabuf in EGL?
<fdobridge>
<fooishbar> anyway, between an API that has an implementation which could be improved for the edge case where you have multiple versions of things and you only want to install some of them, or a completely conceptually broken API, I'm going to take the first one
<fdobridge>
<fooishbar> yes, that
<fdobridge>
<Sid> just so I know I understand correctly, I have to set the breakpoint w/ `break add_resource_bind`?
<MoeIcenowy>
BTW I think EGL_KHR_platform_gbm is broken too
<fdobridge>
<fooishbar> break on the line where it hits the 'I'm going to invent my own modifiers' case
<fdobridge>
<fooishbar> you don't have to use it if you don't like it?
<fdobridge>
<zmike.> it's a bit more complex; you need to also check what modifier the import is using
<fdobridge>
<Sid> okay, I think I got you
<fdobridge>
<Sid> hmm
<MoeIcenowy>
fooishbar: well yes
<fdobridge>
<zmike.> I'm guessing kwin is assuming that if it didn't explicitly create with a modifier then it's safe to use LINEAR
<fdobridge>
<fooishbar> which definitely seems like an improvement over swrast
<MoeIcenowy>
(offtopic) I start to wonder whether the usage of EGL_KHR_platform_gbm in Glamor is valid -- it does not use gbm_surface at all, only gbm_bo, but I don't think the definition of EGL_KHR_platform_gbm mentions BO at all
<fdobridge>
<fooishbar> you need an EGLDisplay to have a GL context, and you need a GL context to do any rendering
<fdobridge>
<gfxstrand> I've definitely found their GBM code
<fdobridge>
<gfxstrand> Okay, there's the EGL code
<fdobridge>
<gfxstrand> Okay, so here's a theory: Maybe they support modifiers but only for clients and not for scanout and they're assuming that a blind `gbm_bo_create()` and `gbm_bo_get_fd()` will give them an image they can pass to KMS?
<fdobridge>
<gfxstrand> That's starting to look likely...
<fdobridge>
<zmike.> the world is not ready for zink
<fdobridge>
<gfxstrand> Nah, the legacy paths just aren't capable of modifiers.
<fdobridge>
<gfxstrand> The legacy paths suck
<fdobridge>
<gfxstrand> And I'm a little annoyed kwin is maybe still using them.
<MoeIcenowy>
BTW I wonder why Zink cannot work with a X server capable of modifiers
<MoeIcenowy>
but it works when the DDX gains a dummy support for modifiers (which only allows linear/invalid)
<MoeIcenowy>
well I may say Kopper instead of Zink here?
<fdobridge>
<zmike.> ...?
<MoeIcenowy>
s/may say/should say/
<fdobridge>
<fooishbar> `src/core/gbmgraphicsbufferallocator.cpp` calls `gbm_bo_create_with_modifiers()`, if any modifiers are supplied
<fdobridge>
<fooishbar> `src/core/gbmgraphicsbufferallocator.cpp` calls `gbm_bo_create_with_modifiers()`, if any modifiers are advertised by the implementation (edited)
<fdobridge>
<zmike.> did you perhaps mean why zink does not work on an xserver that is NOT capable of modifiers?
<fdobridge>
<zmike.> because if so, it's the same issue that is being investigated now
<MoeIcenowy>
zmike.: in the case that things are NOT capable of modifiers, should everything be linear?
<fdobridge>
<zmike.> ideally, though some drivers do not universally interpret INVALID = LINEAR
<MoeIcenowy>
or is everything some undefined format that can be exported by X and imported by app by accident?
<fdobridge>
<zmike.> without explicit modifers, DRM_FORMAT_MOD_INVALID is used
<fdobridge>
<zmike.> which is "idklol maybe use a modifier or don't?"
<fdobridge>
<zmike.> zink maintains a list of platforms which have consistent behavior here
<MoeIcenowy>
so is these "some drivers" why explicit LINEAR is here?
<fdobridge>
<zmike.> drivers which aren't on the list don't work
<fdobridge>
<zmike.> and need to use LIBGL_KOPPER_DRI2=1
<MoeIcenowy>
well my dummy DRI 1.2 implementation only returns INVALID here...
<fdobridge>
<zmike.> yes, so you need to set that env var
<fdobridge>
<zmike.> or, if your platform guarantees that INVALID==LINEAR, add yourself to the list of allowed drivers
<MoeIcenowy>
zmike: well with this env var I can get around w/o LIBGL_KOPPER_DRI2
<MoeIcenowy>
even no LINEAR (although I know these buffers are linear because of being used for exchanging data between three IP cores from two vendors (3D GPU from IMG and 2D GPU / Disp from VeriSilicon
<fdobridge>
<zmike.> you can try adding your driver to the `can_do_invalid_linear_modifier` case and see what happens
<fdobridge>
<gfxstrand> Yeah, reading the code it looks like kwin should be doing modifiers all the way through. I think we need a backtrace to know what's going on.
<fdobridge>
<gfxstrand> It's more "This is ancient decision made by monks in the Himalayas that somehow the kernel and userspace have agreed on and it probably works for scanout."
<fdobridge>
<gfxstrand> It's more "This is ancient decision, made by monks in the Himalayas that somehow the kernel and userspace have agreed on and it probably works for scanout." (edited)
<fdobridge>
<gfxstrand> It's more "This is an ancient decision, made by monks in the Himalayas that somehow the kernel and userspace have agreed on and it probably works for scanout." (edited)
<fdobridge>
<zmike.> were there any example apps that are supposed to exhibit the tiling issue?
<fdobridge>
<zmike.> I'm trying weston demo apps on kwin+drm and it seems to work
<fdobridge>
<Sid> no, it's kwin itself
<fdobridge>
<zmike.> hm
<fdobridge>
<zmike.> seems to be working on radv, but maybe I need to do a full plasma session and not just kwin
<fdobridge>
<zmike.> plasma works great too
<fdobridge>
<Sid> :hmmmg:
<fdobridge>
<Sid> nvk issue? :doomthink:
<fdobridge>
<zmike.> installing there next
<fdobridge>
<zmike.> can't actually get it to start :migraine:
<fdobridge>
<zmike.> oh I see, on radv it wasn't actually using the mesa I told it to use
<fdobridge>
<zmike.> ...and even when I tell it to use the right mesa it still refuses
<fdobridge>
<zmike.> think that's a sign that it's time to go back to the things I'm able to sometimes handle: eating lunch
<fdobridge>
<redsheep> The sometimes is concerning. I wonder if that's the same qt bug that's causing us to have to set the vulkan icds explicitly?
DodoGTA has left #zink [#zink]
<fdobridge>
<gfxstrand> kwin is working okay for me
<fdobridge>
<gfxstrand> But I think it uses logind to get access to the tty and that might be scrubbing things
<fdobridge>
<gfxstrand> But again, it's not scrubbing `VK_ICD_FILENAMES` and yet it's loading the system Vulkan driver. This all makes no sense!
<fdobridge>
<gfxstrand> Looks like it talks to logind itself so no library loaded for that
<fdobridge>
<!DodoNVK (she) 🇱🇹> It looks like that the Vulkan Mesa device select layer may be problematic with zink in some cases (its Wayland event loop may freeze a Wayland compositor)
<fdobridge>
<Owo> @gfxstrand off work now. Any new things I should test on radv+Zink?
<fdobridge>
<Owo> I'm about to see if I can repro the performance stuff in a bit and file an issue if so
<fdobridge>
<gfxstrand> Yeah, depending on when it gets invoked with respect to other stuff in the compositor setup.
<fdobridge>
<zmike.> pretty sure I already fixed all of that
<fdobridge>
<!DodoNVK (she) 🇱🇹> In Git code (or the 25.0 release)?
<fdobridge>
<zmike.> I haven't touched it in a while
karolherbst has quit [Quit: Ping timeout (120 seconds)]
<fdobridge>
<gfxstrand> Wait... Is kwin implicitly using Flatpack or something crazy like that?
<fdobridge>
<gfxstrand> I see a lot of flatpack in the strace
<fdobridge>
<redsheep> I don't think I even have flatpak installed, so I doubt it
chiku has joined #zink
karolherbst has joined #zink
<fdobridge>
<Owo> what about it?
<fdobridge>
<Owo> kwin shouldn't have anything to do with flatpak, aside from maybe security-context stuff
<fdobridge>
<gfxstrand> I saw a bunch of flatpack stuff in the strace and it threw me
<fdobridge>
<gfxstrand> I don't think it actually is.
<fdobridge>
<gfxstrand> Something somewhere is just stomping LD_LIBRARY_PATH
<fdobridge>
<gfxstrand> Okay, I think I got it to load my libEGL but now it's failing. 😢
<fdobridge>
<gfxstrand> The Vulkan loader is using `secure_getenv()`
<fdobridge>
<gfxstrand> But also, regular `getenv()` is returning NULL for `LD_LIBRARY_PATH`
<fdobridge>
<gfxstrand> I'm pretty close to giving up for the moment
<fdobridge>
<gfxstrand> Or maybe I have to just bite my lip and install system-wide
<fdobridge>
<gfxstrand> Here's hoping I don't regret this...
<fdobridge>
<gfxstrand> Okay, system-wide install and I'm now getting my custom Zink+NVK and... kwin is fine.
<fdobridge>
<zmike.> what about:
<fdobridge>
<zmike.> the issue is caused by kwin being system mesa and apps being non-system mesa
<fdobridge>
<zmike.> and somehow it's a local issue that doesn't affect us
<fdobridge>
<gfxstrand> #11 0x00007fa5e44fa6ca in KWin::EglGbmLayer::doBeginFrame() () at /lib64/libkwin.so.6
<fdobridge>
<gfxstrand> #12 0x00007fa5e4272e88 in KWin::OutputLayer::beginFrame() () at /lib64/libkwin.so.6
<fdobridge>
<gfxstrand> #13 0x00007fa5e42617a7 in KWin::WaylandCompositor::composite(KWin::RenderLoop*) () at /lib64/libkwin.so.6
<fdobridge>
<gfxstrand> #14 0x00007fa5e155a26e in void doActivate<false>(QObject*, int, void**) () at /lib64/libQt6Core.so.6
<fdobridge>
<gfxstrand> #15 0x00007fa5e42759b4 in KWin::RenderLoop::frameRequested(KWin::RenderLoop*) () at /lib64/libkwin.so.6
<fdobridge>
<gfxstrand> #16 0x00007fa5e427a9d2 in KWin::RenderLoopPrivate::dispatch() () at /lib64/libkwin.so.6
<fdobridge>
<gfxstrand> #17 0x00007fa5e155a26e in void doActivate<false>(QObject*, int, void**) () at /lib64/libQt6Core.so.6
<fdobridge>
<gfxstrand> Looks like it's using `gbm_bo_create()` for display buffers and not using modifiers
<fdobridge>
<gfxstrand> Which works fine on every driver except Zink
<fdobridge>
<gfxstrand> Why isn't it using modifiers? I have no idea...
<fdobridge>
<gfxstrand> Pardon me while gdb downloads debug symbols...
<fdobridge>
<gfxstrand> And it's KDE so it's a lot of C++. :frog_weary:
<fdobridge>
<gfxstrand> Okay, I know what's going on.
<fdobridge>
<gfxstrand> Yeah, we need to just revert that commit
<fdobridge>
<zmike.> But why?
<fdobridge>
<gfxstrand> When KWin can't find usable atomic mode-setting, it disables modifiers for display and falls back to the legacy APIs. This means `gbm_bo_create()` and `gbm_bo_get_fd()` without querying the modifier. This is perfectly reasonable because those APIs are expected to magically create scanout-capable buffers which you can pass off to KMS. When Zink picks a random modifier and assigns it, there's no guarantee that the resulting buffer will be scan
<fdobridge>
<zmike.> Hmmm
<fdobridge>
<zmike.> Seems like that should just add a case depending on the export handle type
<fdobridge>
<gfxstrand> Well, the handle type in this case will be dma-buf
<fdobridge>
<zmike.> 🤕
<fdobridge>
<zmike.> Alright so @karolherbst you really need to plumb that CL screen create param
<fdobridge>
<karolherbst> for the refcounting rework you are doing?
<fdobridge>
<karolherbst> why is that such a mess
<fdobridge>
<gfxstrand> Why can we not tell the difference between CL<->GL sharing and GBM?
<fdobridge>
<gfxstrand> Why does CL<->GL sharing have to go through the import/export APIs anyway? Isn't this just for Zink to share with itself?
<fdobridge>
<karolherbst> we could pass in a special usage flag?
<fdobridge>
<karolherbst> isolation mostly, but also because it's all a pain
<fdobridge>
<zmike.> Yeah maybe add a different handle type or something
<fdobridge>
<karolherbst> just a custom `PIPE_HANDLE_USAGE_*` no?
<fdobridge>
<karolherbst> though I think dma-buf is saner here, because I don't really want to share the pipe_resource the GL side is using and hope that nothing fucks it up badly
<fdobridge>
<karolherbst> but I'm considering doing it anyway, because mipmaps are pure pain (tm)
<fdobridge>
<gfxstrand> There's no safe way to share images without sharing the pipe_resource
<fdobridge>
<gfxstrand> Not outside of 2D and maybe 2D arrays (but I wouldn't bet on it)
<fdobridge>
<gfxstrand> CL<->GL sharing was a mistake
<fdobridge>
<karolherbst> that's why zink does linear except for 2D
<fdobridge>
<karolherbst> or something
<fdobridge>
<gfxstrand> The one thing NVIDIA can do linear on. :frog_upside_down:
<fdobridge>
<karolherbst> there are new sharing extensions and they just use dma_buf on linux. Were mostly added for vulkan
<fdobridge>
<gfxstrand> Yes there are
<fdobridge>
<gfxstrand> Those are basically the CL version of GL_EXT_external_objects
<fdobridge>
<gfxstrand> Only maybe better? I'm not actually sure if they're better.
<fdobridge>
<karolherbst> yeah.. and atm they force linear, becuase somehow modifiers aren't part of it
<fdobridge>
<gfxstrand> But they're definitely more sane than the legacy mess.
<fdobridge>
<karolherbst> oh, for sure
<fdobridge>
<karolherbst> but also they don't support modifiers yet
<fdobridge>
<gfxstrand> Last I knew, they used the same driver/deviceUUID stuff and did it the Vulkan way.
<fdobridge>
<karolherbst> well.. you just tell the CL runtime to import an fd
<fdobridge>
<gfxstrand> But it's been a loooong time since I looked at a draft of that extension
<fdobridge>
<gfxstrand> Like Ben and I talked about it 6 years ago sort of long. 😅
<fdobridge>
<karolherbst> at least there is also an external semaphore part of it, so syncing isn't a disaster
<fdobridge>
<karolherbst> at some point I want to support it, but we'd still end up doing dma-buf
<fdobridge>
<karolherbst> so not sure if special casing GL is really such a great idea there
<fdobridge>
<karolherbst> also because I think having proper isolation there is actually not a terrible idea... it's just...
<fdobridge>
<karolherbst> if you ignore zink it all works well
<fdobridge>
<karolherbst> just zink is a problem child
<fdobridge>
<gfxstrand> Hey, look! My name is on that extension. 🙈
<fdobridge>
<karolherbst> native gallium drivers can just give all the information to properly import the resource
<fdobridge>
<karolherbst> there is `mesa_glinterop_export_out` and `mesa_glinterop_export_in` to deal with all that
<fdobridge>
<gfxstrand> Yeah, so if it's imported with `CL_EXTERNAL_MEMORY_HANDLE_OPAQUE_FD_KHR`, it works the same way as Vulkan. You just assume that, given identical parameters, the two drivers will compute identical image layouts, cross your fingers, and hope for the best.
<fdobridge>
<karolherbst> well..
<fdobridge>
<karolherbst> sure, but...
<fdobridge>
<karolherbst> I think the only sane way is to do linear unless you use private exts to transport the metadata
<fdobridge>
<gfxstrand> Nope
<fdobridge>
<gfxstrand> You assume both drivers do identical calculations
<fdobridge>
<karolherbst> pain
<fdobridge>
<gfxstrand> Yup
<fdobridge>
<karolherbst> I'd rather use a private ext as we do for cl_gl_sharing 🙃
<fdobridge>
<gfxstrand> That's why all the drivers have image layout libraries that are separate from the GL/Vulkan driver now.
<fdobridge>
<gfxstrand> One of the reasons, anyway.
<fdobridge>
<gfxstrand> But, annoyingly, there are ways that those can break because image creation is annoyingly complicated.
<fdobridge>
<karolherbst> I'm wondering why we only have one stride field and not two 🙃