<fdobridge>
<zmike.> this isn't too surprising to me
<fdobridge>
<zmike.> zink is mostly used for gaming and not on Wayland
<fdobridge>
<gfxstrand> @orowith2os In your testing, were you running just firefox with Zink or both firefox and the compositor?
<fdobridge>
<gfxstrand> Yeah, and EGL is hard and barely tested. The only way any of this shit is robust on other GL drivers is because users get annoyed and file bugs if we break it. And... that's exactly what's happening with zink right now. 😂
<fdobridge>
<zmike.> yup
<fdobridge>
<zmike.> that's the only reason it works at all right now
<fdobridge>
<zmike.> see also: the massive pile of closed wayland/egl/kopper/zink tickets
<fdobridge>
<zmike.> bugs drive technology
<fdobridge>
<gfxstrand> I tried for a while at Intel to convince people we needed to write a Wayland E2E test suite. I even told them how to do it. And then management decided that what we really should be testing was SurfaceFlinger. And then the team got moved on to something else.
<fdobridge>
<gfxstrand> Womp womp
<fdobridge>
<zmike.> sounds right
<fdobridge>
<zmike.> I tried doing a compositor test suite when I was at Samsung
<fdobridge>
<zmike.> but I had more autonomy than most
<fdobridge>
<ermine1716> since i've got this bug too, i'll answer: firefox crashes in both cases
<fdobridge>
<gfxstrand> Yup. GNOME on nouveau GL and firefox on Zink and everything goes blinky blinky
<fdobridge>
<gfxstrand> And it's weird because it's blinking in rectangles:
<fdobridge>
<gfxstrand> @redsheep Want to give that a go ^^
<fdobridge>
<gfxstrand> @orowith2os ^^
<fdobridge>
<gfxstrand> Using everyone's favorite solution to bugs: Just delete all the hand-rolled math.
<fdobridge>
<gfxstrand> I would love it if this fixes Discord, too
<fdobridge>
<redsheep> Yeah, let's hope. I can test sometime this evening.
<fdobridge>
<redsheep> Other than the QT caused bug when you have the Nvidia driver installed and whatever is going on with the plasma display setup I think that probably gets the zink session working just as well as nouveau, if it works.
<fdobridge>
<gfxstrand> I'm hoping so
<fdobridge>
<redsheep> I hammered on it for a bit trying to find other flaws and pretty much didn't. At least, none worth noting over stuff nouveau gl does.
<fdobridge>
<gfxstrand> I'd love it if we can say Mesa 25.0.1 has a useable Zink
<fdobridge>
<gfxstrand> I might blog about this
<fdobridge>
<redsheep> Defaulting next cycle might be good
<fdobridge>
<ermine1716> I'm compiling it rn
<fdobridge>
<gfxstrand> Yeah, we won't default in a stable build
<fdobridge>
<gfxstrand> But if y'all can do some testing and give us the 👍🏻 then maybe we can switch the default in main and let it bake for a couple months between now and the 25.1 branchpoint.
<fdobridge>
<gfxstrand> But if y'all can do some testing and give us the 👍🏻 then maybe we can switch the default in main now and let it bake for a couple months between now and the 25.1 branchpoint. (edited)
<fdobridge>
<redsheep> That would be awesome
<fdobridge>
<ermine1716> firefox flickers went away
<fdobridge>
<ermine1716> it's still crashing tho
<fdobridge>
<redsheep> Can you get a backtrace? I've never tried with a browser, and mine wasn't crashing
<fdobridge>
<ermine1716> ... but this patch wasn't supposed to fix crashes anyway
<fdobridge>
<gfxstrand> Yeah, I think the crashes are different. Backtrace would be good
<fdobridge>
<gfxstrand> Looks about right. But before my MR, the containers were the right size, just sideways and not actually in the boats. :frog_upsidedown:
<fdobridge>
<zmike.> magnets.
<fdobridge>
<redsheep> Magnetized to the bottom of the boats? That's cool
<fdobridge>
<ermine1716> if I attach a file, it shouldn't be printed on irc side, right?
<fdobridge>
<redsheep> It will leave a discord link on that end that irc users can click to download it
<fdobridge>
<gfxstrand> @zmike. Do most apps get the kopper path or do we prefer direct import/export when modifiers are in play?
<fdobridge>
<zmike.> it varies?
<fdobridge>
<zmike.> dmabuf import/export is usually a platform or app thing
<fdobridge>
<zmike.> so like on some platforms you always get pre-allocated dmabufs from the loader to import
<fdobridge>
<gfxstrand> Maybe I just need to run FF in Zink for a bit and see if I can catch the blow-up
<fdobridge>
<zmike.> reasonable chance
<fdobridge>
<gfxstrand> Here's hoping it reproduces on Zink+ANV (it probably does)
<fdobridge>
<zmike.> ahhhh I remember when I could do fun things like debug apps
* fdobridge
<zmike.> sighs wistfully
<fdobridge>
<Owo> The latter, but I should probably test the former too. I also see you have a patch for me to test, I'll give it a shot.
<fdobridge>
<gfxstrand> Yeah, it crashes instantly on ANV+Zink
<fdobridge>
<gfxstrand> The crash only happens when the compositor supports explicit sync so nouveau GL for the compositor won't trigger it.
<fdobridge>
<gfxstrand> But I can crash with ANV+Zink
<fdobridge>
<Owo> Also for Zink system-wide, the overview in gnome refuses to go brrrrr with more than a few windows open
<fdobridge>
<gfxstrand> I suspect it's something with swapchain recreation
<fdobridge>
<Owo> Not a new issue iirc
<fdobridge>
<gfxstrand> Okay, that's probably an issue with stuff getting pushed out to system RAM when it shouldn't be.
<fdobridge>
<Owo> Where did you come from, where did you go? Where did you come from, gnome Wayland overview stutters :wires:
<fdobridge>
<Owo> You want this on Firefox or system-wide?
<fdobridge>
<gfxstrand> Start with firefox
<fdobridge>
<gfxstrand> System-wide and I think you'll still see the crash
<fdobridge>
<gfxstrand> But it won't hurt system-wide
<fdobridge>
<Owo> Alrighty, give me a bit
<fdobridge>
<Owo> The radeonsi maths should be fine, right? I only need the Zink changes?
<fdobridge>
<gfxstrand> I have no idea
<zmike>
radeonsi doesn't implement
<fdobridge>
<gfxstrand> Yeah, readeonsi ignores it
<fdobridge>
<Owo> Wait, what am I talking about. What does radeonsi even have to do with Zink here. Radeonsi is fine, I think.
<fdobridge>
<gfxstrand> Zink+RADV won't ignore it but radeonsi does
<fdobridge>
<Owo> I just woke up from a 12hr eep, so I'm running slow :ferrisCozy:
<zmike>
radeonsi too strong to need partial updates
<fdobridge>
<ermine1716> Plasma doesn't start for me
<fdobridge>
<Owo> But would radeonsi go even faster if it had partial updates :akipeek:
<zmike>
impossible
<fdobridge>
<gfxstrand> Maybe?
<fdobridge>
<gfxstrand> Probably not
<fdobridge>
<gfxstrand> It's mostly for GLES hardware to let it avoid touching too much memory
<fdobridge>
<gfxstrand> IMRs don't care
<fdobridge>
<Sid> building to test on x11 rn
<fdobridge>
<gfxstrand> Okay, it's definitely crashing on the swapchain re-create path. This'll be entertaining...
<fdobridge>
<Sid> I've been maining nvk+zink for 3 days now
<fdobridge>
<Sid> mostly solid? there's random freezes (rare), random firefox crashes (rare) and sometimes the display doesn't come up from suspend (50-50)
<fdobridge>
<zmike.> in the immortal words of Sir Adam Jackson:
<fdobridge>
<zmike.> IT'S KOPPERIN' TIME
<fdobridge>
<gfxstrand> I'm looking at mutter code. What have you done to me?!?
<fdobridge>
<zmike.> did you switch places with the me from 10 years ago?!?
<fdobridge>
<gfxstrand> I hope not!
<fdobridge>
<zmike.> me too
<fdobridge>
<zmike.> samsung hq would not take kindly to your freewheeling, unfocused, un-product-driven approach to the graphics ecosystem
<fdobridge>
<gfxstrand> @fooishbar Is there any way to unset a time point? Or is `wl_buffer.commit()` supposed to unset any previously set sync points?
<fdobridge>
<fooishbar> assuming you're talking about `wp_linux_drm_syncobj`, then: `The acquire point is double-buffered state, and will be applied on the next wl_surface.commit request for the associated surface. Thus, it applies only to the buffer that is attached to the surface at commit time.`
<fdobridge>
<gfxstrand> Yes, I read that. It doesn't answer my question
<fdobridge>
<fooishbar> define 'clear'?
<fdobridge>
<fooishbar> I mean, `If at surface commit time there is a pending buffer attached but no pending acquire timeline point set, the no_acquire_point protocol error is raised.` tells you that you need to call `set_acquire_point` for every commit, and that it doesn't carry over
<fdobridge>
<gfxstrand> Okay, so here's what's happening...
<fdobridge>
<fooishbar> or that if you aren't committing (i.e. you are not `vkQueuePresentKHR`), you can just call `set_acquire_point` with a different point to replace the old one
<fdobridge>
<gfxstrand> But the reason we're hitting it is because there's a `wl_surface_commit()` which is being called even though we already called `wl_surface_commit()`.
<fdobridge>
<gfxstrand> But no one is changing buffers or time points or anything.
<fdobridge>
<gfxstrand> If I just call `wl_surface_commit()` twice, it blows up reliably
<fdobridge>
<fooishbar> I make that a Mutter bug, then
<fdobridge>
<fooishbar> clients are allowed to commit at totally arbitrary points, even through WSI/EGL, and that's OK as long as they aren't attaching a buffer
<fdobridge>
<gfxstrand> The Mesa code always calls `set_acquire_point`, `set_release_point`, and `attach` together so it's not a mesa bug. But we sometimes call `commit` twice
<fdobridge>
<fooishbar> Mesa sometimes calls commit twice ... ?!
<fdobridge>
<fooishbar> oh yeah ok, fifo
<fdobridge>
<gfxstrand> And I'm starting to think maybe someone other than Mesa is calling commit for some reason.
<fdobridge>
<gfxstrand> Like maybe FF is doing it?
<fdobridge>
<gfxstrand> That would be mean but maybe not totally invalid?
<fdobridge>
<gfxstrand> But if I make Mesa double-call commit every time, it blows up instantly
<fdobridge>
<fooishbar> yeah, that definitely sounds like a Mutter bug then
<fdobridge>
<gfxstrand> Oh joy...
<fdobridge>
<gfxstrand> Okay, I'll file a bug.
<fdobridge>
<Owo> Is this regarding the crash?
<fdobridge>
<gfxstrand> It's one of the crashes
<fdobridge>
<gfxstrand> What compositor are you using?
<fdobridge>
<Owo> Oh, so you're telling me there's MORE
<fdobridge>
<Owo> Mutter
<fdobridge>
<gfxstrand> There are two crashes as far as I can see.
<fdobridge>
<gfxstrand> One is this `set_acquire/release_point` thing.
<fdobridge>
<Owo> Ermine's issue was another, separate issue, but looked like the same one?
<fdobridge>
<gfxstrand> The other has to do with swapchain re-creation.
<fdobridge>
<Owo> They had it crashing on non-explicit sync stuff
<fdobridge>
<Owo> But still Wayland protocol shenanigans
<fdobridge>
<gfxstrand> Yeah, so I think the one I'm easily reproducing is @ermine1716's
<fdobridge>
<gfxstrand> But there's another one hidden in here and I suspect it might be a kopper bug
<fdobridge>
<gfxstrand> It's bugs all the way down
<fdobridge>
<Owo> Maybe Zink was a mistake, we should all be using ANGLE for browsers on Linux :wires:
<fdobridge>
<Owo> I really really want to say it should be an easy fix
<fdobridge>
<Owo> But I also have a lot more on my plate and that probably adds onto it
<fdobridge>
<zmike.> could it be
<fdobridge>
<zmike.> is this the fabled NOTMYBUG ???
<fdobridge>
<Owo> At least, a third of it is probably a not-your-bug ;)
<fdobridge>
<Sid> this doesn't seem to help whatever I was seeing on x11
<fdobridge>
<Sid> will file a proper report in the morning
<fdobridge>
<gfxstrand> The final crash I suspect is Kopper not really guaranteeing one VkSurface at a time.
<fdobridge>
<gfxstrand> The final crash I suspect is Kopper not really guaranteeing one VkSurface exists at a time. (edited)
<fdobridge>
<gfxstrand> Which was never a problem until we started attaching syncobj surfaces to them.
<fdobridge>
<gfxstrand> What GPU are you seeing that with?
DodoGTA has quit [Quit: DodoGTA]
DodoGTA has joined #zink
<fdobridge>
<Owo> Laptop, Ryzen 5 7530U
<fdobridge>
<Owo> Balanced/Performance mode
<fdobridge>
<gfxstrand> Are you running your compositor on Zink+RADV or Zink+NVK?
<fdobridge>
<Owo> RADV
<fdobridge>
<Owo> No Nvidia GPU, so nothing to do with nvk
<fdobridge>
<Owo> I have my system running on mesa-git
<fdobridge>
<Owo> I should really try picking up a laptop with a newer Intel CPU, and maybe nvidia
<fdobridge>
<gfxstrand> Interesting. I'd file a Zink+RADV bug.
<fdobridge>
<gfxstrand> Discrete card or APU?
<fdobridge>
<Owo> APU
<fdobridge>
<Owo> Vega... 7? I think?
<fdobridge>
<Owo> If it matters, I have the allocated vram set to the max, which only happens to be 512MB
<fdobridge>
<gfxstrand> That shouldn't matter
<fdobridge>
<gfxstrand> Both should be fast(ish)
<fdobridge>
<gfxstrand> In any case, it's worth someone looking into
<fdobridge>
<gfxstrand> It's not immediately obvious to me why Zink would be slower than radeonsi there. Maybe it's getting linear tiling for some reason? :shrug_anim:
<fdobridge>
<Owo> If there's a debug flag I can set to maybe force it, or check how it's doing things, I can run with it for a sec
<fdobridge>
<gfxstrand> I don't know RADV well enough to know what to have you set. Just file an issue and one of the RADV devs can try to debug it with you.
<fdobridge>
<gfxstrand> Okay, I've got both MRs stacked and I'm running firefox on Zink+ANV right now
<fdobridge>
<gfxstrand> We'll see if it dies
<fdobridge>
<Owo> Can you put together a patch file containing everything I should apply to mesa?
<fdobridge>
<Owo> I'm at home now, about to start up my laptop
<fdobridge>
<Owo> Or, well, I guess you don't have to, I can just grab the patch files myself.
<fdobridge>
<Owo> Unless they conflict
<fdobridge>
<Owo> I forgot you could append .patch to a gitlab PR to get a patch file from it
<fdobridge>
<gfxstrand> I'm still fixing bugs.
<fdobridge>
<gfxstrand> I've got one more that's a much trickier one to reproduce.
<fdobridge>
<gfxstrand> Something with re-creating the `VkSurfaceKHR`
<fdobridge>
<gfxstrand> But I've got ff running in a debugger with a breakpoint set now so we'll see if I hit it.
<fdobridge>
<Owo> Alright, let me know when, with a link of MRs to apply, and I'll try em all at once
<fdobridge>
<Owo> Can't do much about the explicit sync fuckery though, but so long as the surface keeps getting callbacks consistently, it should be okay
<fdobridge>
<gfxstrand> Found the last bug. Yay Kopper plumbing... 😫