ChanServ changed the topic of #zink to: official development channel for the mesa3d zink driver || https://docs.mesa3d.org/drivers/zink.html
<fdobridge> <a​irlied> @marysaka any ideas on emitting nops after hmma opcodes? nvidia seems to do it, but I'm not seeing anything on why
<fdobridge> <m​arysaka> I think @karolherbst mentioned that this is needed for scheduling as it can take more cycles than what we can define with one instruction
<fdobridge> <m​arysaka> so might be worth checking what the scheduling of those are
<fdobridge> <a​irlied> the docs don't seem to mention it, and I've go the correct scheduling in my branch for everything on Turing, but I did notice in the dumps you did the NOPs were there
<fdobridge> <a​irlied> oh wait, is there a limit on delays in one instruction?
<fdobridge> <a​irlied> do we handle adding NOPs if we get a larger delay
<fdobridge> <m​arysaka> I don't think so because it never really happened so far
<fdobridge> <m​arysaka> (also uuum this is the zink channel just noticed)
<fdobridge> <a​irlied> oops doh!
<fdobridge> <z​mike.> @gfxstrand I have my post all ready to go, just need a link to your post
<fdobridge> <g​fxstrand> I just assigned Marge and told Kara to go post the blog
<fdobridge> <z​mike.> 🔗 🔗 🔗 🔗 🔗
<fdobridge> <g​fxstrand> The Collabora blogging process requires way more manual effort than you'd like to think. It'll be a few minutes.
<fdobridge> <z​mike.> :sweatytowelguy:
<fdobridge> <S​id> :myy_RunAway:
<fdobridge> <g​fxstrand> Oh no!
<fdobridge> <g​fxstrand> What happens if you put a `return;` at the top of `zink_set_damage_region()`?
<fdobridge> <g​fxstrand> Ugh... Firefox won't even start for me on X11
<fdobridge> <S​id> let's see..
<fdobridge> <g​fxstrand> I think the device select layer got screwed up for me.
<fdobridge> <g​fxstrand> I had too many Vulkan drivers. I have FF now
<fdobridge> <S​id> that's a flawless experience
<fdobridge> <g​fxstrand> damn...
<fdobridge> <g​fxstrand> What if you comment out the line right after the "we need to flip it" line?
<fdobridge> <S​id> compiling
<fdobridge> <g​fxstrand> Okay, that seems worse
<fdobridge> <S​id> hm?
<fdobridge> <S​id> I meant that I was compiling w/ the change :P
<fdobridge> <g​fxstrand> Oh, I mean I tested and it seems worse
<fdobridge> <S​id> ah
<fdobridge> <g​fxstrand> Got rid of damage and it's perfect for me
<fdobridge> <g​fxstrand> Ugh...
<fdobridge> <S​id> can confirm
<fdobridge> <S​id> this squished cursor is really funny tbh
<fdobridge> <m​henning> yeah, it's not a bug, it's a feature 😛
<fdobridge> <S​id> I can help but giggle every time it turns into the text-input-box cursor
<fdobridge> <S​id> I-beam pointer, I believe it's called
<fdobridge> <O​wo> @gfxstrand have you considered only enabling the NVK+Zink combo for Wayland, since X11 seems to be a pain?
<fdobridge> <O​wo> Or a case to disable damage on X11 if you want to keep it there.
<fdobridge> <g​fxstrand> Let's see if we can just fix the bug. Choosing which to use based on window system is pain
<fdobridge> <m​henning> Yeah, we don't really want to support both nouveau gl and zink (on the same hardware) in the long term. Ideally we fix the bugs
<fdobridge> <O​wo> Yeah. Still, if it's the easiest in the short-term, I can try hacking something up just so it works? Try n get a Mesa contribution under my belt, even if it's a hack :akipeek:
<fdobridge> <O​wo> Toss in an env var to make it do damage again so that it can still be tested without patching
<fdobridge> <z​mike.> NO 👏 MORE 👏 ENV 👏 VARS
<fdobridge> <O​wo> ~~`ZINK_ENABLE_DAMAGE`~~
<fdobridge> <g​fxstrand> I'm going to try and find/fix the bug today
<fdobridge> <z​mike.> smh didn't even mention that IMG has been shipping zink as their GL driver for years
<fdobridge> <O​wo> @gfxstrand `other driver teams follow suite` suite or suit?
<fdobridge> <S​id> should be the latter
<fdobridge> <g​fxstrand> On what devices?
<fdobridge> <g​fxstrand> IMG hasn't been shipping in hears.
<fdobridge> <g​fxstrand> IMG hasn't been shipping in Years. (edited)
<fdobridge> <g​fxstrand> IMG hasn't been shipping in years. 😛 (edited)
<fdobridge> <z​mike.> they're still submitting conformance for products even now https://www.khronos.org/opengl/adopters/login/submissions/#submission_369
<fdobridge> <g​fxstrand> And I did say "Nouveau is the first *Mesa* driver stack..."
<fdobridge> <g​fxstrand> Conformant and shipping aren't the same thing. 😛
<fdobridge> <z​mike.> https://www.imaginationtech.com/product/img-bxm-8-256/ sure seems like it has shipped
<fdobridge> <z​mike.> anyway here's your reblog https://www.supergoodcode.com/znvk/
<fdobridge> <S​id> the moniker is official then
<fdobridge> <S​id> znvk
Sid127- has joined #zink
Sid127 has quit [Read error: Connection reset by peer]
<fdobridge> <O​wo> Wait, why does the collabora post say december 2...?
<fdobridge> <g​fxstrand> Wait, what?
<fdobridge> <g​fxstrand> Says March 11 here
<fdobridge> <O​wo> Oh, weird. It's working now.
<fdobridge> <O​wo> It showed December 2, 2024 for a bit
<fdobridge> <g​fxstrand> 🤷🏻‍♀️
<fdobridge> <g​fxstrand> Perfect! 😂
jhli has joined #zink
<fdobridge> <z​mike.> HAHAHA
<fdobridge> <z​mike.> didn't link your blog post AND mentioned IMG
<fdobridge> <S​id> phoronix is a zmike alt confirmed
<fdobridge> <k​arolherbst> more than an hour? I'm not impressed
<fdobridge> <h​untercz122> waiting for phoronix comments about nvidia and nak being woke cuz of rust
<fdobridge> <k​arolherbst> 🍿
<fdobridge> <g​fxstrand> I did all that work and he STILL didn't link to the blog post. 😂
<fdobridge> <g​fxstrand> I should have put it in the commit message.
<zmike> you can just mail him and ask him to add a link
<fdobridge> <g​fxstrand> I wonder...
<fdobridge> <r​edsheep> Somebody here has braved the phoronix comments to try to get the blog added to the article
<fdobridge> <g​fxstrand> hehe
<fdobridge> <g​fxstrand> This time it might actually be a firefox bug.
<fdobridge> <S​id> ban
<fdobridge> <S​id> how dare they
<fdobridge> <g​fxstrand> Returning 0 from buffer_age doesn't make things work
<fdobridge> <g​fxstrand> Glad I looked at the MR. We almost got blocked by a Zink ADL flake
<fdobridge> <g​fxstrand> Also that
<fdobridge> <g​fxstrand> And... I think the OOM killer just killed deqp
<fdobridge> <g​fxstrand> Just cancelled and restarted the job. Here's hoping I did it fast enough that Marge won't notice
<fdobridge> <g​fxstrand> This is funky...
<fdobridge> <g​fxstrand> ```
<fdobridge> <g​fxstrand> 18:22:28.786: Running dEQP on 9 threads in 239-test groups
<fdobridge> <g​fxstrand> 18:22:28.786: Running dEQP on 9 threads in 500-test groups
<fdobridge> <g​fxstrand> 18:22:28.786: Running dEQP on 9 threads in 500-test groups
<fdobridge> <g​fxstrand> ```
<fdobridge> <g​fxstrand> Is it running deqp-runner 3 times in parallel for a total of 27 concurrent copies of the CTS?
<fdobridge> <g​fxstrand> ```
<fdobridge> <g​fxstrand> 18:22:33.868: Running dEQP on 9 threads in 500-test groups
<fdobridge> <g​fxstrand> 18:22:33.868: Running dEQP on 9 threads in 4-test groups
<fdobridge> <g​fxstrand> 18:22:33.868: Running dEQP on 9 threads in 3-test groups
<fdobridge> <g​fxstrand> 18:22:33.868: Running 1972 Piglit tests on 9 threads
<fdobridge> <g​fxstrand> ```
<fdobridge> <g​fxstrand> That can't be right...
<fdobridge> <g​fxstrand> Oh, no. It's the "suite" feature
<fdobridge> <g​fxstrand> Okay
<fdobridge> <g​fxstrand> So presumably not running it all simultaneously
<fdobridge> <g​fxstrand> Yeah, it looks like it's one deqp-runner so it'll only run on the 9 threads. Had me worried for a second there. :frog_sweat:
<fdobridge> <g​fxstrand> Oh, yeah. I thought that was bad with dEQP-VK but with piglit it's literally every test. 🤡
<fdobridge> <g​fxstrand> Merged!
<fdobridge> <z​mike.> monumental
<fdobridge> <g​fxstrand> Now we watch as the bug reports roll in...
<fdobridge> <S​id> they started rolling in even before it got merged
<fdobridge> <S​id> :3
<fdobridge> <g​fxstrand> 😛
<fdobridge> <g​fxstrand> And we appreciate it! 💜
<fdobridge> <g​fxstrand> Am I really downloading the Firefox source code?
<fdobridge> <g​fxstrand> Yes, yes I am...
<fdobridge> <O​wo> Faith, whyyyy
<fdobridge> <g​fxstrand> Because there's a damage bug and it very much looks like it's not ours
<fdobridge> <S​id> :saigeheart:
<fdobridge> <r​edsheep> I guess the bugs gotta get fixed somehow...
<fdobridge> <S​id> doing all I can to be helpful with whatever limited energy I have left after existing and adulting all day
<fdobridge> <g​fxstrand> mood
<fdobridge> <S​id> yeah, especially with things in personal life sapping so much of my energy and me contributing in my own free time, it's.. difficult .-.
<fdobridge> <g​fxstrand> ```
<fdobridge> <g​fxstrand> At all times, any client API rendering which falls outside of the damage
<fdobridge> <g​fxstrand> region results in undefined framebuffer contents for the entire framebuffer.
<fdobridge> <g​fxstrand> It is the client's responsibility to ensure that rendering is confined to
<fdobridge> <g​fxstrand> the current damage area.
<fdobridge> <g​fxstrand> ```
<fdobridge> <g​fxstrand> So if returning early from `set_damage()` fixes something then either we screwed up the damage regions somehow or the client is rendering outside of them which is illegal.
<fdobridge> <g​fxstrand> That or there's a crazy zink bug hiding somewhere
<fdobridge> <g​fxstrand> It did look in some of the flickering that it's literally rendering to the wrong spot
<fdobridge> <S​id> I even had frames showing from login sessions that I had logged out of
<fdobridge> <S​id> i.e. I could see parts of the wallpaper I've set in plasma in the flickery regions when I logged into i3 (no wallpaper)
<fdobridge> <m​henning> that sounds like uninitialized memory. you could try seeing if NVK_DEBUG=zero_vram makes any difference
<fdobridge> <S​id> :neko_salute:
<fdobridge> <O​wo> Shouldn't the kernel be zeroing it anyways?
<fdobridge> <O​wo> :akipeek:
<fdobridge> <r​edsheep> Without nouveau being ready to do crazy async page cleaning allocation magic the performance cost is too high
<fdobridge> <O​wo> Rayon to the rescue? :p
<fdobridge> <O​wo> (if someone wants to either write C bindings for it, or rewrite those parts of nouveau in rust)
<fdobridge> <O​wo> Probably overkill. But a thought.
<fdobridge> <r​edsheep> That looks like something that would only have to do with cpu side stuff? From what I understand zeroing pages should be happening pretty nearly all on the gpu side, it's just complicated to make the juggle work right
<fdobridge> <g​fxstrand> Yeah, that's just uninitialized memory
<fdobridge> <g​fxstrand> Which FF should also be rendering over since it's getting a buffer age of 0
<fdobridge> <S​id> sounds stinky
<fdobridge> <g​fxstrand> pulling debug symbols....
<fdobridge> <S​id> I'm sorry for unleashing this onto you :p
<fdobridge> <g​fxstrand> Yeah... Zink is giving me 1x1 damage regions. Something's not right.
<fdobridge> <g​fxstrand> Yeah... Firefox is giving me 1x1 damage regions. Something's not right. (edited)
<fdobridge> <g​fxstrand> Yeah, this is a firefox bug.
<fdobridge> <g​fxstrand> There's nothing I can do if FF gives me a 1x1 damage region
<fdobridge> <g​fxstrand> I'm surprised anything is working at all, TBH
<fdobridge> <S​id> ban
<fdobridge> <r​edsheep> Is there anything in mesa for application workarounds for damage?
<fdobridge> <S​id> ban firefox frok using mesa :wolfFIRE:
<fdobridge> <S​id> ban firefox from using mesa :wolfFIRE: (edited)
<fdobridge> <r​edsheep> Is a per application exception for damage even possible?
<fdobridge> <S​id> Thunderbird might be affected too
<fdobridge> <S​id> and any firefox based browser
<fdobridge> <S​id> so.. firefox, librewolf (is affected, I use it), tor browser, icecat, whateverelse exists
<fdobridge> <g​fxstrand> We could add a driconf
<fdobridge> <g​fxstrand> Still doing it on Nightly
<fdobridge> <r​edsheep> Might be good, for the sake of quickly having it working for anyone testing main.
<fdobridge> <g​fxstrand> I wonder how hard it is to build firefox..
<fdobridge> <g​fxstrand> If I get a Firefox patch out of this...
<fdobridge> <r​edsheep> You're just taking a tour of the entire stack to get your driver working lol
<fdobridge> <r​edsheep> I guess that's just what it takes
<fdobridge> <g​fxstrand> :shrug_anim:
<fdobridge> <g​fxstrand> The things I'm willing to do for the Linux graphics stack...
<fdobridge> <r​edsheep> What I don't get is how firefox isn't broken for a whole lot of other people if it's doing damage wrong
<fdobridge> <r​edsheep> Surely there are people on TBR gpus relying on damage who are using mesa and running firefox
<fdobridge> <g​fxstrand> Are they using X11?
<fdobridge> <g​fxstrand> I'm pretty sure this has to do with X *somehow*
<fdobridge> <g​fxstrand> How? I'll let you know when I figure it out
<fdobridge> <z​mike.> Nobody else does damage
<fdobridge> <m​henning> ~~do we need to do damage then~~
<fdobridge> <r​edsheep> We need a "Stop doing damage" meme
<fdobridge> <r​edsheep> It writes itself
<fdobridge> <r​edsheep> Should the theoretical IMG+zink users have had damage this whole time?
<fdobridge> <z​mike.> I don't think I had implemented it when they started using it
<fdobridge> <z​mike.> Also they're pinned on a mesa version from 2023 afaik
<fdobridge> <a​irlied> feels like IMG shipped zink first should come with a few asterisks 😛
<fdobridge> <g​fxstrand> Kopper bug!
<fdobridge> <g​fxstrand> That's what I said!
<fdobridge> <r​edsheep> Kopper bug?
<fdobridge> <g​fxstrand> Yeah
<fdobridge> <g​fxstrand> It's everyone's favorite X11 race
<fdobridge> <g​fxstrand> I frickin' hate X11
<fdobridge> <g​fxstrand> *sigh*
<fdobridge> <r​edsheep> Is this about the cursor, or firefox, or something else?
<fdobridge> <g​fxstrand> firefox
<fdobridge> <r​edsheep> Like... it's not actually a firefox bug after all?
<fdobridge> <r​edsheep> :headache:
<fdobridge> <g​fxstrand> The problem is that there's a race inherent in X11 when a window is initially created where they always start off as 1x1 and then someone changes the size to whatever. We, as Mesa, have no idea what the size of the X window is so we have to query the X server for it. If we loose the race, we get 1x1 instead of the actual size. Normally this is kinda okay because we re-query as part of present and, if it's wrong, we adjust so we're only ever wr
<fdobridge> <k​arolherbst> maybe just query over and over again if it's 1x1?
<fdobridge> <k​arolherbst> like....
<fdobridge> <k​arolherbst> not sure how many real world applications would use 1x1 windows in X...... who am I kidding, I'm sure applications are doing it for real for weirdo hacks
<fdobridge> <r​edsheep> That sounds like something I would expect that to go away when I resize as well, but I have had it continue after resizing...
<fdobridge> <g​fxstrand> That's because kopper never queries it again
<fdobridge> <k​arolherbst> trusting the x server? bold
<fdobridge> <g​fxstrand> Kopper has lots of "We don't actually need this code. We're Kopper." paths.
<fdobridge> <z​mike.> It's supposed to get that info from SUBOPTIMAL acquire/present returns
<fdobridge> <z​mike.> Which trigger surface reinit
<fdobridge> <z​mike.> It's funny because the ticket mentions this error is printed on startup, though typically that only happens from manual resize
<fdobridge> <g​fxstrand> Yes, and we re-init just fine
<fdobridge> <g​fxstrand> We just don't update surf->Width/Height