#zink on 2025-03-11 — irc logs at oftc.irclog.whitequark.org

2024-07-16 04:51 ChanServ changed the topic of #zink to: official development channel for the mesa3d zink driver || https://docs.mesa3d.org/drivers/zink.html

05:42 <fdobridge> <airlied> @marysaka any ideas on emitting nops after hmma opcodes? nvidia seems to do it, but I'm not seeing anything on why

08:04 <fdobridge> <marysaka> I think @karolherbst mentioned that this is needed for scheduling as it can take more cycles than what we can define with one instruction

08:04 <fdobridge> <marysaka> so might be worth checking what the scheduling of those are

08:22 <fdobridge> <airlied> the docs don't seem to mention it, and I've go the correct scheduling in my branch for everything on Turing, but I did notice in the dumps you did the NOPs were there

08:22 <fdobridge> <airlied> oh wait, is there a limit on delays in one instruction?

08:22 <fdobridge> <airlied> do we handle adding NOPs if we get a larger delay

08:22 <fdobridge> <marysaka> I don't think so because it never really happened so far

08:22 <fdobridge> <marysaka> (also uuum this is the zink channel just noticed)

08:22 <fdobridge> <airlied> oops doh!

13:51 <fdobridge> <zmike.> @gfxstrand I have my post all ready to go, just need a link to your post

14:36 <fdobridge> <gfxstrand> I just assigned Marge and told Kara to go post the blog

14:37 <fdobridge> <zmike.> 🔗 🔗 🔗 🔗 🔗

14:40 <fdobridge> <gfxstrand> The Collabora blogging process requires way more manual effort than you'd like to think. It'll be a few minutes.

14:40 <fdobridge> <zmike.> :sweatytowelguy:

14:49 <fdobridge> <Sid> https://tenor.com/view/polish-observe-cat-black-gif-5927409

14:49 <fdobridge> <Sid> https://gitlab.freedesktop.org/mesa/mesa/-/issues/12797

14:50 <fdobridge> <Sid> :myy_RunAway:

14:52 <fdobridge> <gfxstrand> Oh no!

14:53 <fdobridge> <gfxstrand> What happens if you put a `return;` at the top of `zink_set_damage_region()`?

15:01 <fdobridge> <gfxstrand> Ugh... Firefox won't even start for me on X11

15:06 <fdobridge> <Sid> let's see..

15:15 <fdobridge> <gfxstrand> I think the device select layer got screwed up for me.

15:20 <fdobridge> <gfxstrand> I had too many Vulkan drivers. I have FF now

15:20 <fdobridge> <Sid> that's a flawless experience

15:20 <fdobridge> <gfxstrand> damn...

15:21 <fdobridge> <gfxstrand> What if you comment out the line right after the "we need to flip it" line?

15:22 <fdobridge> <Sid> compiling

15:22 <fdobridge> <gfxstrand> Okay, that seems worse

15:23 <fdobridge> <Sid> hm?

15:23 <fdobridge> <Sid> I meant that I was compiling w/ the change :P

15:23 <fdobridge> <gfxstrand> Oh, I mean I tested and it seems worse

15:23 <fdobridge> <Sid> ah

15:24 <fdobridge> <gfxstrand> Got rid of damage and it's perfect for me

15:24 <fdobridge> <gfxstrand> Ugh...

15:24 <fdobridge> <Sid> can confirm

15:26 <fdobridge> <Sid> this squished cursor is really funny tbh

15:35 <fdobridge> <mhenning> yeah, it's not a bug, it's a feature 😛

15:36 <fdobridge> <Sid> I can help but giggle every time it turns into the text-input-box cursor

15:36 <fdobridge> <Sid> I-beam pointer, I believe it's called

15:53 <fdobridge> <Owo> @gfxstrand have you considered only enabling the NVK+Zink combo for Wayland, since X11 seems to be a pain?

15:53 <fdobridge> <Owo> Or a case to disable damage on X11 if you want to keep it there.

15:55 <fdobridge> <gfxstrand> Let's see if we can just fix the bug. Choosing which to use based on window system is pain

15:57 <fdobridge> <mhenning> Yeah, we don't really want to support both nouveau gl and zink (on the same hardware) in the long term. Ideally we fix the bugs

16:03 <fdobridge> <Owo> Yeah. Still, if it's the easiest in the short-term, I can try hacking something up just so it works? Try n get a Mesa contribution under my belt, even if it's a hack :akipeek:

16:05 <fdobridge> <Owo> Toss in an env var to make it do damage again so that it can still be tested without patching

16:05 <fdobridge> <zmike.> NO 👏 MORE 👏 ENV 👏 VARS

16:05 <fdobridge> <Owo> ~~`ZINK_ENABLE_DAMAGE`~~

16:08 <fdobridge> <gfxstrand> Blog is blogged: https://www.collabora.com/news-and-blog/news-and-events/goodbye-nouveau-gl-hello-zink.html

16:08 <fdobridge> <gfxstrand> I'm going to try and find/fix the bug today

16:10 <fdobridge> <zmike.> smh didn't even mention that IMG has been shipping zink as their GL driver for years

16:10 <fdobridge> <Owo> @gfxstrand `other driver teams follow suite` suite or suit?

16:11 <fdobridge> <Sid> should be the latter

16:12 <fdobridge> <gfxstrand> On what devices?

16:13 <fdobridge> <gfxstrand> IMG hasn't been shipping in hears.

16:13 <fdobridge> <gfxstrand> IMG hasn't been shipping in Years. (edited)

16:13 <fdobridge> <gfxstrand> IMG hasn't been shipping in years. 😛 (edited)

16:13 <fdobridge> <zmike.> they're still submitting conformance for products even now https://www.khronos.org/opengl/adopters/login/submissions/#submission_369

16:13 <fdobridge> <gfxstrand> And I did say "Nouveau is the first *Mesa* driver stack..."

16:14 <fdobridge> <gfxstrand> Conformant and shipping aren't the same thing. 😛

16:15 <fdobridge> <zmike.> https://www.imaginationtech.com/product/img-bxm-8-256/ sure seems like it has shipped

16:19 <fdobridge> <zmike.> anyway here's your reblog https://www.supergoodcode.com/znvk/

16:21 <fdobridge> <Sid> the moniker is official then

16:22 <fdobridge> <Sid> znvk

16:24 Sid127- has joined #zink

16:24 Sid127 has quit [Read error: Connection reset by peer]

16:30 <fdobridge> <Owo> Wait, why does the collabora post say december 2...?

16:44 <fdobridge> <gfxstrand> Wait, what?

16:44 <fdobridge> <gfxstrand> Says March 11 here

16:45 <fdobridge> <Owo> Oh, weird. It's working now.

16:45 <fdobridge> <Owo> It showed December 2, 2024 for a bit

16:46 <fdobridge> <gfxstrand> 🤷🏻‍♀️

16:46 <fdobridge> <gfxstrand> Perfect! 😂

16:57 jhli has joined #zink

17:07 <fdobridge> <zmike.> HAHAHA

17:07 <fdobridge> <zmike.> https://www.phoronix.com/news/Nouveau-Turing-Zink-NVK-OpenGL

17:07 <fdobridge> <zmike.> didn't link your blog post AND mentioned IMG

17:07 <fdobridge> <Sid> phoronix is a zmike alt confirmed

17:12 <fdobridge> <karolherbst> more than an hour? I'm not impressed

17:15 <fdobridge> <huntercz122> waiting for phoronix comments about nvidia and nak being woke cuz of rust

17:16 <fdobridge> <karolherbst> 🍿

17:17 <fdobridge> <gfxstrand> I did all that work and he STILL didn't link to the blog post. 😂

17:17 <fdobridge> <gfxstrand> I should have put it in the commit message.

17:17 <zmike> you can just mail him and ask him to add a link

17:48 <fdobridge> <gfxstrand> I wonder...

17:52 <fdobridge> <redsheep> Somebody here has braved the phoronix comments to try to get the blog added to the article

17:54 <fdobridge> <gfxstrand> hehe

18:05 <fdobridge> <gfxstrand> This time it might actually be a firefox bug.

18:06 <fdobridge> <Sid> ban

18:06 <fdobridge> <Sid> how dare they

18:07 <fdobridge> <gfxstrand> Returning 0 from buffer_age doesn't make things work

18:15 <fdobridge> <gfxstrand> Glad I looked at the MR. We almost got blocked by a Zink ADL flake

18:16 <zmike> see also https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33554

18:16 <fdobridge> <gfxstrand> Also that

18:19 <fdobridge> <gfxstrand> And... I think the OOM killer just killed deqp

18:20 <fdobridge> <gfxstrand> Just cancelled and restarted the job. Here's hoping I did it fast enough that Marge won't notice

18:23 <fdobridge> <gfxstrand> This is funky...

18:23 <fdobridge> <gfxstrand> ```

18:23 <fdobridge> <gfxstrand> 18:22:28.786: Running dEQP on 9 threads in 239-test groups

18:23 <fdobridge> <gfxstrand> 18:22:28.786: Running dEQP on 9 threads in 500-test groups

18:23 <fdobridge> <gfxstrand> ```

18:24 <fdobridge> <gfxstrand> Is it running deqp-runner 3 times in parallel for a total of 27 concurrent copies of the CTS?

18:24 <fdobridge> <gfxstrand> ```

18:24 <fdobridge> <gfxstrand> 18:22:33.868: Running dEQP on 9 threads in 500-test groups

18:24 <fdobridge> <gfxstrand> 18:22:33.868: Running dEQP on 9 threads in 4-test groups

18:24 <fdobridge> <gfxstrand> 18:22:33.868: Running dEQP on 9 threads in 3-test groups

18:24 <fdobridge> <gfxstrand> 18:22:33.868: Running 1972 Piglit tests on 9 threads

18:24 <fdobridge> <gfxstrand> ```

18:24 <fdobridge> <gfxstrand> That can't be right...

18:25 <fdobridge> <gfxstrand> Oh, no. It's the "suite" feature

18:25 <fdobridge> <gfxstrand> Okay

18:25 <fdobridge> <gfxstrand> So presumably not running it all simultaneously

18:27 <fdobridge> <gfxstrand> Yeah, it looks like it's one deqp-runner so it'll only run on the 9 threads. Had me worried for a second there. :frog_sweat:

18:28 <fdobridge> <gfxstrand> Oh, yeah. I thought that was bad with dEQP-VK but with piglit it's literally every test. 🤡

18:29 <fdobridge> <gfxstrand> Merged!

18:29 <fdobridge> <gfxstrand> https://tenor.com/view/snoopy-party-gif-960466014556350386

18:29 <fdobridge> <zmike.> monumental

18:29 <fdobridge> <gfxstrand> Now we watch as the bug reports roll in...

18:30 <fdobridge> <Sid> they started rolling in even before it got merged

18:30 <fdobridge> <Sid> :3

18:30 <fdobridge> <gfxstrand> 😛

18:31 <fdobridge> <gfxstrand> And we appreciate it! 💜

18:33 <fdobridge> <gfxstrand> Am I really downloading the Firefox source code?

18:33 <fdobridge> <gfxstrand> Yes, yes I am...

18:35 <fdobridge> <Owo> Faith, whyyyy

18:36 <fdobridge> <gfxstrand> Because there's a damage bug and it very much looks like it's not ours

18:49 <fdobridge> <Sid> :saigeheart:

18:49 <fdobridge> <redsheep> I guess the bugs gotta get fixed somehow...

18:49 <fdobridge> <Sid> doing all I can to be helpful with whatever limited energy I have left after existing and adulting all day

19:26 <fdobridge> <gfxstrand> mood

19:28 <fdobridge> <Sid> yeah, especially with things in personal life sapping so much of my energy and me contributing in my own free time, it's.. difficult .-.

21:19 <fdobridge> <gfxstrand> ```

21:19 <fdobridge> <gfxstrand> At all times, any client API rendering which falls outside of the damage

21:19 <fdobridge> <gfxstrand> region results in undefined framebuffer contents for the entire framebuffer.

21:19 <fdobridge> <gfxstrand> It is the client's responsibility to ensure that rendering is confined to

21:19 <fdobridge> <gfxstrand> the current damage area.

21:19 <fdobridge> <gfxstrand> ```

21:19 <fdobridge> <gfxstrand> So if returning early from `set_damage()` fixes something then either we screwed up the damage regions somehow or the client is rendering outside of them which is illegal.

21:27 <fdobridge> <gfxstrand> That or there's a crazy zink bug hiding somewhere

21:27 <fdobridge> <gfxstrand> It did look in some of the flickering that it's literally rendering to the wrong spot

21:31 <fdobridge> <Sid> I even had frames showing from login sessions that I had logged out of

21:31 <fdobridge> <Sid> i.e. I could see parts of the wallpaper I've set in plasma in the flickery regions when I logged into i3 (no wallpaper)

21:36 <fdobridge> <mhenning> that sounds like uninitialized memory. you could try seeing if NVK_DEBUG=zero_vram makes any difference

21:36 <fdobridge> <Sid> :neko_salute:

21:45 <fdobridge> <Owo> Shouldn't the kernel be zeroing it anyways?

21:45 <fdobridge> <Owo> :akipeek:

21:47 <fdobridge> <redsheep> Without nouveau being ready to do crazy async page cleaning allocation magic the performance cost is too high

21:48 <fdobridge> <Owo> Rayon to the rescue? :p

21:49 <fdobridge> <Owo> (if someone wants to either write C bindings for it, or rewrite those parts of nouveau in rust)

21:49 <fdobridge> <Owo> Probably overkill. But a thought.

21:53 <fdobridge> <redsheep> That looks like something that would only have to do with cpu side stuff? From what I understand zeroing pages should be happening pretty nearly all on the gpu side, it's just complicated to make the juggle work right

21:57 <fdobridge> <gfxstrand> Yeah, that's just uninitialized memory

22:02 <fdobridge> <gfxstrand> Which FF should also be rendering over since it's getting a buffer age of 0

22:03 <fdobridge> <Sid> sounds stinky

22:04 <fdobridge> <gfxstrand> pulling debug symbols....

22:05 <fdobridge> <Sid> I'm sorry for unleashing this onto you :p

22:08 <fdobridge> <gfxstrand> Yeah... Zink is giving me 1x1 damage regions. Something's not right.

22:08 <fdobridge> <gfxstrand> Yeah... Firefox is giving me 1x1 damage regions. Something's not right. (edited)

22:14 <fdobridge> <gfxstrand> Yeah, this is a firefox bug.

22:14 <fdobridge> <gfxstrand> There's nothing I can do if FF gives me a 1x1 damage region

22:14 <fdobridge> <gfxstrand> I'm surprised anything is working at all, TBH

22:23 <fdobridge> <Sid> ban

22:23 <fdobridge> <redsheep> Is there anything in mesa for application workarounds for damage?

22:23 <fdobridge> <Sid> ban firefox frok using mesa :wolfFIRE:

22:23 <fdobridge> <Sid> ban firefox from using mesa :wolfFIRE: (edited)

22:24 <fdobridge> <redsheep> Is a per application exception for damage even possible?

22:24 <fdobridge> <Sid> Thunderbird might be affected too

22:24 <fdobridge> <Sid> and any firefox based browser

22:25 <fdobridge> <Sid> so.. firefox, librewolf (is affected, I use it), tor browser, icecat, whateverelse exists

22:29 <fdobridge> <gfxstrand> We could add a driconf

22:31 <fdobridge> <gfxstrand> Still doing it on Nightly

22:34 <fdobridge> <redsheep> Might be good, for the sake of quickly having it working for anyone testing main.

22:35 <fdobridge> <gfxstrand> I wonder how hard it is to build firefox..

22:37 <fdobridge> <gfxstrand> If I get a Firefox patch out of this...

22:38 <fdobridge> <redsheep> You're just taking a tour of the entire stack to get your driver working lol

22:40 <fdobridge> <redsheep> I guess that's just what it takes

22:40 <fdobridge> <gfxstrand> :shrug_anim:

22:40 <fdobridge> <gfxstrand> The things I'm willing to do for the Linux graphics stack...

22:40 <fdobridge> <redsheep> What I don't get is how firefox isn't broken for a whole lot of other people if it's doing damage wrong

22:41 <fdobridge> <redsheep> Surely there are people on TBR gpus relying on damage who are using mesa and running firefox

22:43 <fdobridge> <gfxstrand> Are they using X11?

22:43 <fdobridge> <gfxstrand> I'm pretty sure this has to do with X *somehow*

22:43 <fdobridge> <gfxstrand> How? I'll let you know when I figure it out

22:45 <fdobridge> <zmike.> Nobody else does damage

22:46 <fdobridge> <mhenning> ~~do we need to do damage then~~

22:46 <fdobridge> <redsheep> We need a "Stop doing damage" meme

22:46 <fdobridge> <redsheep> It writes itself

22:49 <fdobridge> <redsheep> Should the theoretical IMG+zink users have had damage this whole time?

22:52 <fdobridge> <zmike.> I don't think I had implemented it when they started using it

22:56 <fdobridge> <zmike.> Also they're pinned on a mesa version from 2023 afaik

23:15 <fdobridge> <airlied> feels like IMG shipped zink first should come with a few asterisks 😛

23:25 <fdobridge> <gfxstrand> Kopper bug!

23:25 <fdobridge> <gfxstrand> That's what I said!

23:25 <fdobridge> <redsheep> Kopper bug?

23:35 <fdobridge> <gfxstrand> Yeah

23:35 <fdobridge> <gfxstrand> It's everyone's favorite X11 race

23:35 <fdobridge> <gfxstrand> I frickin' hate X11

23:35 <fdobridge> <gfxstrand> *sigh*

23:36 <fdobridge> <redsheep> Is this about the cursor, or firefox, or something else?

23:37 <fdobridge> <gfxstrand> firefox

23:37 <fdobridge> <redsheep> Like... it's not actually a firefox bug after all?

23:38 <fdobridge> <redsheep> :headache:

23:40 <fdobridge> <gfxstrand> The problem is that there's a race inherent in X11 when a window is initially created where they always start off as 1x1 and then someone changes the size to whatever. We, as Mesa, have no idea what the size of the X window is so we have to query the X server for it. If we loose the race, we get 1x1 instead of the actual size. Normally this is kinda okay because we re-query as part of present and, if it's wrong, we adjust so we're only ever wr

23:40 <fdobridge> <karolherbst> maybe just query over and over again if it's 1x1?

23:41 <fdobridge> <karolherbst> like....

23:41 <fdobridge> <karolherbst> not sure how many real world applications would use 1x1 windows in X...... who am I kidding, I'm sure applications are doing it for real for weirdo hacks

23:41 <fdobridge> <redsheep> That sounds like something I would expect that to go away when I resize as well, but I have had it continue after resizing...

23:42 <fdobridge> <gfxstrand> That's because kopper never queries it again

23:42 <fdobridge> <karolherbst> trusting the x server? bold

23:42 <fdobridge> <gfxstrand> Kopper has lots of "We don't actually need this code. We're Kopper." paths.

23:44 <fdobridge> <zmike.> It's supposed to get that info from SUBOPTIMAL acquire/present returns

23:44 <fdobridge> <zmike.> Which trigger surface reinit

23:46 <fdobridge> <zmike.> It's funny because the ticket mentions this error is printed on startup, though typically that only happens from manual resize

23:48 <fdobridge> <gfxstrand> Yes, and we re-init just fine

23:48 <fdobridge> <gfxstrand> We just don't update surf->Width/Height