ChanServ changed the topic of #zink to: official development channel for the mesa3d zink driver || https://docs.mesa3d.org/drivers/zink.html
Sid127 has quit [Quit: ZNC - https://znc.in]
Sid127 has joined #zink
<fdobridge> <z​mike.> this isn't too surprising to me
<fdobridge> <z​mike.> zink is mostly used for gaming and not on Wayland
<fdobridge> <g​fxstrand> @orowith2os In your testing, were you running just firefox with Zink or both firefox and the compositor?
<fdobridge> <g​fxstrand> Yeah, and EGL is hard and barely tested. The only way any of this shit is robust on other GL drivers is because users get annoyed and file bugs if we break it. And... that's exactly what's happening with zink right now. 😂
<fdobridge> <z​mike.> yup
<fdobridge> <z​mike.> that's the only reason it works at all right now
<fdobridge> <z​mike.> see also: the massive pile of closed wayland/egl/kopper/zink tickets
<fdobridge> <z​mike.> bugs drive technology
<fdobridge> <g​fxstrand> I tried for a while at Intel to convince people we needed to write a Wayland E2E test suite. I even told them how to do it. And then management decided that what we really should be testing was SurfaceFlinger. And then the team got moved on to something else.
<fdobridge> <g​fxstrand> Womp womp
<fdobridge> <z​mike.> sounds right
<fdobridge> <z​mike.> I tried doing a compositor test suite when I was at Samsung
<fdobridge> <z​mike.> but I had more autonomy than most
<fdobridge> <e​rmine1716> since i've got this bug too, i'll answer: firefox crashes in both cases
<fdobridge> <g​fxstrand> Yup. GNOME on nouveau GL and firefox on Zink and everything goes blinky blinky
<fdobridge> <g​fxstrand> And it's weird because it's blinking in rectangles:
<fdobridge> <g​fxstrand> (The "Latest Council Video" thing in the upper-right)
<fdobridge> <z​mike.> damage regions probably
<fdobridge> <g​fxstrand> Yeah. I suspect they're doing partial updates
<fdobridge> <g​fxstrand> Maybe Zink + damage is broken? That's plausible.
<fdobridge> <z​mike.> you could try disabling handling for that by toggling zink_set_damage_region()
<fdobridge> <r​edsheep> That's similar to the behavior I was seeing with discord
<fdobridge> <z​mike.> but also few drivers in mesa implement this so it's possible there are more general issues
<fdobridge> <g​fxstrand> Yup. Commenting out `zink_set_damage_region()` gets rid of the flicker
<fdobridge> <S​id> I've seen that on plasma x11 too
<fdobridge> <g​fxstrand> iris uses it but IDK what it would do with the information
<fdobridge> <z​mike.> I'm also not sure who in the vulkan world would be using this api
<fdobridge> <z​mike.> so it's possible there's wsi issues there
<fdobridge> <g​fxstrand> I guess iris clamps the render area to it
<fdobridge> <g​fxstrand> I'm also suspiscious of buffer_age
<zmike> fortunately there is cts for buffer age
<fdobridge> <g​fxstrand> Me: I should just copy the code from iris since iris's damage is working properly.
<fdobridge> <g​fxstrand> *Looks at iris*
<fdobridge> <g​fxstrand> Me: I should just copy the code from iris since iris's damage is working properly.
<fdobridge> <g​fxstrand> *Looks at iris*
<fdobridge> <g​fxstrand> How is shit this broken and still supposedly working?!? (edited)
<fdobridge> <m​agic_rb.> Hey, you know what they say, dont look a working GPU driver at its code
<fdobridge> <m​agic_rb.> Or was is it "dont look a gift horse in its mouth"? Dunno :P
<fdobridge> <g​fxstrand> Yeah, pretty sure both of them are broken. They'll both have the same code when I'm done.
<fdobridge> <g​fxstrand> The good news with iris is that it's broken in a way that causes it to damage too much
<fdobridge> <g​fxstrand> Ugh... Still blinking...
<fdobridge> <e​rmine1716> is it hard to fix?
<fdobridge> <g​fxstrand> Nah. Just have to figure out what's going on with Zink's y-flip
<fdobridge> <z​mike.> https://imgur.com/vsoDJlz
<fdobridge> <g​fxstrand> flipping damage doesn't fix it
<fdobridge> <g​fxstrand> Let's see if Intel is broken with iris' damage code "fixed"
<fdobridge> <g​fxstrand> @zmike. Is there a bit on the pipe_resource that says whether or not it's flipped?
<fdobridge> <z​mike.> uhhhh
<fdobridge> <z​mike.> I don't think so?
<fdobridge> <z​mike.> that all happens in the frontend
<fdobridge> <z​mike.> iirc
<fdobridge> <g​fxstrand> Can we assume damage is always upside down?
<fdobridge> <g​fxstrand> Because the front-end isn't flipping it
<fdobridge> <z​mike.> good q
<fdobridge> <z​mike.> I think I made that assumption
<fdobridge> <g​fxstrand> You did
<fdobridge> <g​fxstrand> Iris doesn't flip them
<fdobridge> <g​fxstrand> Both were wrong before I started this process.
<fdobridge> <z​mike.> vulkan api requires flipping
<fdobridge> <g​fxstrand> I don't know who to trust anymore. 😢
<fdobridge> <z​mike.> vs egl
<fdobridge> <g​fxstrand> Yes, as does all hardware
<fdobridge> <g​fxstrand> Except qualcomm which is wonky
<fdobridge> <z​mike.> @fooishbar reviewed the flipping in zink, and I trust him
<fdobridge> <g​fxstrand> lol
<fdobridge> <g​fxstrand> You have more faith than I do. I don't trust anyone!
<fdobridge> <z​mike.> are you seeing wrong y somewhere?
<fdobridge> <g​fxstrand> I'm seeing it be flipped.
<fdobridge> <g​fxstrand> But I don't know if that's guaranteed
<fdobridge> <g​fxstrand> I'm gonna go digging
<fdobridge> <z​mike.> there's some y-flip code in mesa/st ?
<fdobridge> <m​henning> just always damage both the flipped and unflipped region /s
<fdobridge> <z​mike.> insert <stop doing damage> meme here
<fdobridge> <g​fxstrand> Damage hurts
<fdobridge> <g​fxstrand> Literally the only thing that ever calls it is the DRI code
<fdobridge> <g​fxstrand> So I'm gonna assume it's always flipped
<fdobridge> <f​ooishbar> EGL damage is lower-left origin
<fdobridge> <g​fxstrand> @redsheep Want to give that a go ^^
<fdobridge> <g​fxstrand> @orowith2os ^^
<fdobridge> <g​fxstrand> Using everyone's favorite solution to bugs: Just delete all the hand-rolled math.
<fdobridge> <g​fxstrand> I would love it if this fixes Discord, too
<fdobridge> <r​edsheep> Yeah, let's hope. I can test sometime this evening.
<fdobridge> <r​edsheep> Other than the QT caused bug when you have the Nvidia driver installed and whatever is going on with the plasma display setup I think that probably gets the zink session working just as well as nouveau, if it works.
<fdobridge> <g​fxstrand> I'm hoping so
<fdobridge> <r​edsheep> I hammered on it for a bit trying to find other flaws and pretty much didn't. At least, none worth noting over stuff nouveau gl does.
<fdobridge> <g​fxstrand> I'd love it if we can say Mesa 25.0.1 has a useable Zink
<fdobridge> <g​fxstrand> I might blog about this
<fdobridge> <r​edsheep> Defaulting next cycle might be good
<fdobridge> <e​rmine1716> I'm compiling it rn
<fdobridge> <g​fxstrand> Yeah, we won't default in a stable build
<fdobridge> <g​fxstrand> But if y'all can do some testing and give us the 👍🏻 then maybe we can switch the default in main and let it bake for a couple months between now and the 25.1 branchpoint.
<fdobridge> <g​fxstrand> But if y'all can do some testing and give us the 👍🏻 then maybe we can switch the default in main now and let it bake for a couple months between now and the 25.1 branchpoint. (edited)
<fdobridge> <r​edsheep> That would be awesome
<fdobridge> <e​rmine1716> firefox flickers went away
<fdobridge> <e​rmine1716> it's still crashing tho
<fdobridge> <r​edsheep> Can you get a backtrace? I've never tried with a browser, and mine wasn't crashing
<fdobridge> <e​rmine1716> ... but this patch wasn't supposed to fix crashes anyway
<fdobridge> <g​fxstrand> Yeah, I think the crashes are different. Backtrace would be good
<fdobridge> <g​fxstrand> Looks about right. But before my MR, the containers were the right size, just sideways and not actually in the boats. :frog_upsidedown:
<fdobridge> <z​mike.> magnets.
<fdobridge> <r​edsheep> Magnetized to the bottom of the boats? That's cool
<fdobridge> <e​rmine1716> if I attach a file, it shouldn't be printed on irc side, right?
<fdobridge> <r​edsheep> It will leave a discord link on that end that irc users can click to download it
<fdobridge> <e​rmine1716> ok. here are logs
<fdobridge> <e​rmine1716> no core dumps tho
<fdobridge> <r​edsheep> Surface already exists, interesting
<fdobridge> <g​fxstrand> @zmike. Do most apps get the kopper path or do we prefer direct import/export when modifiers are in play?
<fdobridge> <z​mike.> it varies?
<fdobridge> <z​mike.> dmabuf import/export is usually a platform or app thing
<fdobridge> <z​mike.> so like on some platforms you always get pre-allocated dmabufs from the loader to import
<fdobridge> <g​fxstrand> Maybe I just need to run FF in Zink for a bit and see if I can catch the blow-up
<fdobridge> <z​mike.> reasonable chance
<fdobridge> <g​fxstrand> Here's hoping it reproduces on Zink+ANV (it probably does)
<fdobridge> <z​mike.> ahhhh I remember when I could do fun things like debug apps
* fdobridge <z​mike.> sighs wistfully
<fdobridge> <O​wo> The latter, but I should probably test the former too. I also see you have a patch for me to test, I'll give it a shot.
<fdobridge> <g​fxstrand> Yeah, it crashes instantly on ANV+Zink
<fdobridge> <g​fxstrand> The crash only happens when the compositor supports explicit sync so nouveau GL for the compositor won't trigger it.
<fdobridge> <g​fxstrand> But I can crash with ANV+Zink
<fdobridge> <O​wo> Also for Zink system-wide, the overview in gnome refuses to go brrrrr with more than a few windows open
<fdobridge> <g​fxstrand> I suspect it's something with swapchain recreation
<fdobridge> <O​wo> Not a new issue iirc
<fdobridge> <g​fxstrand> Okay, that's probably an issue with stuff getting pushed out to system RAM when it shouldn't be.
<fdobridge> <O​wo> Where did you come from, where did you go? Where did you come from, gnome Wayland overview stutters :wires:
<fdobridge> <O​wo> You want this on Firefox or system-wide?
<fdobridge> <g​fxstrand> Start with firefox
<fdobridge> <g​fxstrand> System-wide and I think you'll still see the crash
<fdobridge> <g​fxstrand> But it won't hurt system-wide
<fdobridge> <O​wo> Alrighty, give me a bit
<fdobridge> <O​wo> The radeonsi maths should be fine, right? I only need the Zink changes?
<fdobridge> <g​fxstrand> I have no idea
<zmike> radeonsi doesn't implement
<fdobridge> <g​fxstrand> Yeah, readeonsi ignores it
<fdobridge> <O​wo> Wait, what am I talking about. What does radeonsi even have to do with Zink here. Radeonsi is fine, I think.
<fdobridge> <g​fxstrand> Zink+RADV won't ignore it but radeonsi does
<fdobridge> <O​wo> I just woke up from a 12hr eep, so I'm running slow :ferrisCozy:
<zmike> radeonsi too strong to need partial updates
<fdobridge> <e​rmine1716> Plasma doesn't start for me
<fdobridge> <O​wo> But would radeonsi go even faster if it had partial updates :akipeek:
<zmike> impossible
<fdobridge> <g​fxstrand> Maybe?
<fdobridge> <g​fxstrand> Probably not
<fdobridge> <g​fxstrand> It's mostly for GLES hardware to let it avoid touching too much memory
<fdobridge> <g​fxstrand> IMRs don't care
<fdobridge> <S​id> building to test on x11 rn
<fdobridge> <g​fxstrand> Okay, it's definitely crashing on the swapchain re-create path. This'll be entertaining...
<fdobridge> <S​id> I've been maining nvk+zink for 3 days now
<fdobridge> <S​id> mostly solid? there's random freezes (rare), random firefox crashes (rare) and sometimes the display doesn't come up from suspend (50-50)
<fdobridge> <g​fxstrand> We're definitely getting kopper
<fdobridge> <z​mike.> in the immortal words of Sir Adam Jackson:
<fdobridge> <z​mike.> IT'S KOPPERIN' TIME
<fdobridge> <g​fxstrand> I'm looking at mutter code. What have you done to me?!?
<fdobridge> <z​mike.> did you switch places with the me from 10 years ago?!?
<fdobridge> <g​fxstrand> I hope not!
<fdobridge> <z​mike.> me too
<fdobridge> <z​mike.> samsung hq would not take kindly to your freewheeling, unfocused, un-product-driven approach to the graphics ecosystem
<fdobridge> <g​fxstrand> @fooishbar Is there any way to unset a time point? Or is `wl_buffer.commit()` supposed to unset any previously set sync points?
<fdobridge> <f​ooishbar> assuming you're talking about `wp_linux_drm_syncobj`, then: `The acquire point is double-buffered state, and will be applied on the next wl_surface.commit request for the associated surface. Thus, it applies only to the buffer that is attached to the surface at commit time.`
<fdobridge> <g​fxstrand> Yes, I read that. It doesn't answer my question
<fdobridge> <f​ooishbar> define 'clear'?
<fdobridge> <f​ooishbar> I mean, `If at surface commit time there is a pending buffer attached but no pending acquire timeline point set, the no_acquire_point protocol error is raised.` tells you that you need to call `set_acquire_point` for every commit, and that it doesn't carry over
<fdobridge> <g​fxstrand> Okay, so here's what's happening...
<fdobridge> <f​ooishbar> or that if you aren't committing (i.e. you are not `vkQueuePresentKHR`), you can just call `set_acquire_point` with a different point to replace the old one
<fdobridge> <g​fxstrand> But the reason we're hitting it is because there's a `wl_surface_commit()` which is being called even though we already called `wl_surface_commit()`.
<fdobridge> <g​fxstrand> But no one is changing buffers or time points or anything.
<fdobridge> <g​fxstrand> If I just call `wl_surface_commit()` twice, it blows up reliably
<fdobridge> <f​ooishbar> I make that a Mutter bug, then
<fdobridge> <f​ooishbar> clients are allowed to commit at totally arbitrary points, even through WSI/EGL, and that's OK as long as they aren't attaching a buffer
<fdobridge> <g​fxstrand> The Mesa code always calls `set_acquire_point`, `set_release_point`, and `attach` together so it's not a mesa bug. But we sometimes call `commit` twice
<fdobridge> <f​ooishbar> Mesa sometimes calls commit twice ... ?!
<fdobridge> <f​ooishbar> oh yeah ok, fifo
<fdobridge> <g​fxstrand> And I'm starting to think maybe someone other than Mesa is calling commit for some reason.
<fdobridge> <g​fxstrand> Like maybe FF is doing it?
<fdobridge> <g​fxstrand> That would be mean but maybe not totally invalid?
<fdobridge> <g​fxstrand> But if I make Mesa double-call commit every time, it blows up instantly
<fdobridge> <f​ooishbar> yeah, that definitely sounds like a Mutter bug then
<fdobridge> <g​fxstrand> Oh joy...
<fdobridge> <g​fxstrand> Okay, I'll file a bug.
<fdobridge> <O​wo> Is this regarding the crash?
<fdobridge> <g​fxstrand> It's one of the crashes
<fdobridge> <g​fxstrand> What compositor are you using?
<fdobridge> <O​wo> Oh, so you're telling me there's MORE
<fdobridge> <O​wo> Mutter
<fdobridge> <g​fxstrand> There are two crashes as far as I can see.
<fdobridge> <g​fxstrand> One is this `set_acquire/release_point` thing.
<fdobridge> <O​wo> Ermine's issue was another, separate issue, but looked like the same one?
<fdobridge> <g​fxstrand> The other has to do with swapchain re-creation.
<fdobridge> <O​wo> They had it crashing on non-explicit sync stuff
<fdobridge> <O​wo> But still Wayland protocol shenanigans
<fdobridge> <g​fxstrand> Yeah, so I think the one I'm easily reproducing is @ermine1716's
<fdobridge> <g​fxstrand> But there's another one hidden in here and I suspect it might be a kopper bug
<fdobridge> <g​fxstrand> It's bugs all the way down
<fdobridge> <O​wo> Maybe Zink was a mistake, we should all be using ANGLE for browsers on Linux :wires:
<fdobridge> <O​wo> I really really want to say it should be an easy fix
<fdobridge> <O​wo> But I also have a lot more on my plate and that probably adds onto it
<fdobridge> <z​mike.> could it be
<fdobridge> <z​mike.> is this the fabled NOTMYBUG ???
<fdobridge> <O​wo> At least, a third of it is probably a not-your-bug ;)
<fdobridge> <S​id> this doesn't seem to help whatever I was seeing on x11
<fdobridge> <S​id> will file a proper report in the morning
<fdobridge> <g​fxstrand> The final crash I suspect is Kopper not really guaranteeing one VkSurface at a time.
<fdobridge> <g​fxstrand> The final crash I suspect is Kopper not really guaranteeing one VkSurface exists at a time. (edited)
<fdobridge> <g​fxstrand> Which was never a problem until we started attaching syncobj surfaces to them.
<fdobridge> <g​fxstrand> What GPU are you seeing that with?
DodoGTA has quit [Quit: DodoGTA]
DodoGTA has joined #zink
<fdobridge> <O​wo> Laptop, Ryzen 5 7530U
<fdobridge> <O​wo> Balanced/Performance mode
<fdobridge> <g​fxstrand> Are you running your compositor on Zink+RADV or Zink+NVK?
<fdobridge> <O​wo> RADV
<fdobridge> <O​wo> No Nvidia GPU, so nothing to do with nvk
<fdobridge> <O​wo> I have my system running on mesa-git
<fdobridge> <O​wo> I should really try picking up a laptop with a newer Intel CPU, and maybe nvidia
<fdobridge> <g​fxstrand> Interesting. I'd file a Zink+RADV bug.
<fdobridge> <g​fxstrand> Discrete card or APU?
<fdobridge> <O​wo> APU
<fdobridge> <O​wo> Vega... 7? I think?
<fdobridge> <O​wo> If it matters, I have the allocated vram set to the max, which only happens to be 512MB
<fdobridge> <g​fxstrand> That shouldn't matter
<fdobridge> <g​fxstrand> Both should be fast(ish)
<fdobridge> <g​fxstrand> In any case, it's worth someone looking into
<fdobridge> <g​fxstrand> It's not immediately obvious to me why Zink would be slower than radeonsi there. Maybe it's getting linear tiling for some reason? :shrug_anim:
<fdobridge> <O​wo> If there's a debug flag I can set to maybe force it, or check how it's doing things, I can run with it for a sec
<fdobridge> <g​fxstrand> I don't know RADV well enough to know what to have you set. Just file an issue and one of the RADV devs can try to debug it with you.
<fdobridge> <g​fxstrand> It's a firefox bug
<fdobridge> <g​fxstrand> Damn...
<fdobridge> <g​fxstrand> Looks like it might be fixed on nightly builds.
<fdobridge> <g​fxstrand> But it's been 8 months. I'd think that would have propagated by now.
<fdobridge> <g​fxstrand> @zmike. How (if at all) does `u_threaded_context` interact with EGL and Kopper?
<fdobridge> <g​fxstrand> Looks like kopper launches a thread. 😬
<fdobridge> <O​wo> You can test with this
<fdobridge> <O​wo> Install mesa-git from flathub-beta if you're feeling quirky
<fdobridge> <g​fxstrand> Nah, I just pulled the tar.xz
<fdobridge> <g​fxstrand> I found the Zink bug, though. I'm going to file a new one.
<fdobridge> <g​fxstrand> @zmike. @fooishbar ^^
<fdobridge> <g​fxstrand> tl;dr: Threaded present considered harmful if used with Wayland.
<fdobridge> <z​mike.> it's supposed to be disabled on wayland though?
<fdobridge> <g​fxstrand> We enable it unconditionally if threaded submit is enabled
<fdobridge> <z​mike.> maybe I broke it again at some point
<fdobridge> <z​mike.> it should check the display type
<fdobridge> <g​fxstrand> Feel free to tell me to do it differently
<fdobridge> <z​mike.> do it differently
<fdobridge> <g​fxstrand> Feel free to tell me how to do it differently. 😛
<fdobridge> <z​mike.> the present function already does all the locking, so skip the thread entirely on wayland
<fdobridge> <g​fxstrand> We have a lot of `util_queue_is_initialized()` checks. Those are all going to have to be replaced.
<fdobridge> <z​mike.> yeah so probably add a check in displaytarget_create for queue_is_initialized && !is_wayland and switch everything to that
<fdobridge> <z​mike.> I wonder if threaded present is even useful now that x11 wsi is more threadful
<fdobridge> <z​mike.> I guess on non-mesa drivers
<fdobridge> <g​fxstrand> Pushed a take 2
<fdobridge> <z​mike.> perfection
<fdobridge> <g​fxstrand> Okay, I've got both MRs stacked and I'm running firefox on Zink+ANV right now
<fdobridge> <g​fxstrand> We'll see if it dies
<fdobridge> <O​wo> Can you put together a patch file containing everything I should apply to mesa?
<fdobridge> <O​wo> I'm at home now, about to start up my laptop
<fdobridge> <O​wo> Or, well, I guess you don't have to, I can just grab the patch files myself.
<fdobridge> <O​wo> Unless they conflict
<fdobridge> <O​wo> I forgot you could append .patch to a gitlab PR to get a patch file from it
<fdobridge> <g​fxstrand> I'm still fixing bugs.
<fdobridge> <g​fxstrand> I've got one more that's a much trickier one to reproduce.
<fdobridge> <g​fxstrand> Something with re-creating the `VkSurfaceKHR`
<fdobridge> <g​fxstrand> But I've got ff running in a debugger with a breakpoint set now so we'll see if I hit it.
<fdobridge> <O​wo> Alright, let me know when, with a link of MRs to apply, and I'll try em all at once
<fdobridge> <O​wo> Can't do much about the explicit sync fuckery though, but so long as the surface keeps getting callbacks consistently, it should be okay
<fdobridge> <g​fxstrand> Found the last bug. Yay Kopper plumbing... 😫
<fdobridge> <O​wo> At least you found it ;)
<fdobridge> <O​wo> Thanks a bunch, Faith
<fdobridge> <g​fxstrand> @orowith2os Pull my zink/all-the-fixes branch
<fdobridge> <g​fxstrand> @redsheep You, too, when you get around to testing
<fdobridge> <O​wo> matrix bridge no worky, bleh
<fdobridge> <O​wo> I'm putting it together on my system now