#zink on 2022-04-25 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:52 ChanServ changed the topic of #zink to: official development channel for the mesa3d zink driver || https://docs.mesa3d.org/drivers/zink.html

02:16 eukara_ has quit [Remote host closed the connection]

02:17 eukara_ has joined #zink

08:11 cheako has quit [Quit: Connection closed for inactivity]

13:18 <zmike> ajax: I'm looking at the glxgears thing again and I'm not sure exactly how we want to approach this; basically the issue is that glxgears never checks any error states, therefore it would be up to mesa to abort/exit/whatever in this scenario

13:18 <zmike> and we don't

13:19 <zmike> I'm not sure there's any mechanism in mesa to trigger a shutdown like that at all?

14:03 <ajax> zmike: the default xlib error handler is printf + exit, so if a glx call hits a condition described as "GLXBadWhatever is generated" then that's where glxgears would normally crash out

14:03 <ajax> something like we bubble enough error back up through SwapBuffers and teach libGL to turn that into a synthetic protocol error

14:03 <zmike> ajax: yeah, I'm assuming nv wsi has a handler that changes that behavior

14:04 <zmike> ugh

14:04 <ajax> i can take that on if you want, i started down that path late last week

14:04 <zmike> https://pastebin.com/j2k6c59w is the trace

14:04 <zmike> I'm working on figuring out texture from pixmap now

14:05 <zmike> can check this once I'm done

14:05 <zmike> really need your focus on the xcb ordering issue and swapinterval handling

14:05 <ajax> k

14:18 <zmike> ugh this is terrible

14:21 cheako has joined #zink

15:17 <zmike> hmm so I'm looking at two options for tfp, and they both seem terrible:

15:17 <zmike> 1) implement tfp in kopper

15:17 <zmike> 2) reuse dmabuf importing from dri3

15:18 <zmike> 1 is terrible because obviously it is

15:18 <zmike> 2 is terrible because none of the dri3 stuff is at all reusable if you aren't actually dri3, which means probably lots of awful hacky subclassing or related tricks to try and make it work

15:19 <zmike> if I'm not missing something, then I guess when I get back from running a couple errands I'm gonna flip a coin and implement whichever result I get

15:23 <ajax> ngh yeah

15:57 <zmike> as much as I really don't want to reinvent the wheel I'm thinking that might be the best choice here

16:02 <ajax> i guess i was hoping to not think about tfp until i could rely on the server also being zink

16:03 <ajax> because then it's just EXT_external_memory_fd hooked up to dri3, i thnk

16:05 <zmike> yeah that's more or less what it'll end up being except I gotta punt it through gallium interfaces

16:05 <zmike> so I think either way this work has to happen

16:09 <zmike> (dri3 still won't work with zink no matter what without huge refactoring since it takes all dri3 struct types)

17:19 LexSfX has quit []

17:24 LexSfX has joined #zink

17:33 LexSfX has quit [Ping timeout: 480 seconds]

17:34 LexSfX has joined #zink

18:06 <zmike> $ MESA_LOADER_DRIVER_OVERRIDE=zink bin/glx-tfp -auto

18:06 <zmike> PIGLIT: {"result": "pass" }

18:58 <zmike> ajax: re: glxgears closing, what if when it's detected that the swapchain is dead in kopper (e.g., your patch) we just trigger a more severe xerror from kopper

18:59 <ajax> sounds like the right plan to me

18:59 <zmike> seems like that would be easier than trying to pipe errors back up to a handler that doesn't exist yet

19:00 <zmike> any ideas for errors that would be catastrophic enough that even the most braindead wsi would exit?

19:00 <ajax> BadWindow, you'd think

19:00 <zmike> hm

19:01 <ajax> so the problem i'm having is

19:01 <ajax> i'm trying to get swapinterval working, and a side effect of that is the default changes to actually vsync'd

19:01 <zmike> sounds normal enough

19:01 <ajax> which means wsi takes a different path, the one where you're using a thread to manage the present events

19:02 <zmike> oh no

19:02 <ajax> and _only_ with that path, can i reproduce this class of hang-at-exit that i'm currently looking at

19:02 <zmike> threads an wsi name a more awful combo

19:06 <zmike> maybe the solution is to disable present thread with zink?

19:06 <zmike> (for now)

19:07 <ajax> when you say present thread, do you mean the one in wsi for the swapchain, or do you mean one in zink for async gl command dispatch?

19:07 <zmike> wsi

19:08 <ajax> i don't think the non-present-thread code path would behave properly for FIFO queues

19:09 <zmike> awkward

19:09 <ajax> if i could make it work without a thread believe me i'd love to

19:09 <zmike> I was imagining it might be a good enough bandaid to slap on for now since we're holding up 22.1 release

19:10 <ajax> scheisse

19:10 <ajax> okay so

19:10 <ajax> this is still basically https://gitlab.freedesktop.org/mesa/mesa/-/issues/6139

19:11 <zmike> I guess

19:11 <ajax> it's stopped at exactly the same point. xcb_wait_for_special_event is a terrible api

19:11 <zmike> yeah

19:12 <zmike> I just found more fuckups with resizing if you do it fast enough

19:12 <zmike> so there goes my week

19:12 <zmike> but I think tfp is done at least

19:12 <ajax> nice!

19:13 <ajax> i need to figure out how to repro whatever was going wrong with my previous attempt, all i remember is someone bisected it guilty and it got reverted

19:14 <zmike> yeah I think it was fps drops in some game or something like that?

19:14 <ajax> i _think_ the way i wrote it this time is immune to the race michel pointed out

19:14 <zmike> "this time"

19:14 <ajax> look

19:14 <zmike> ptsd intensifies

19:16 <zmike> did you put up a MR yet or just branch?

19:17 <ajax> current state isn't pushed yet, give me a few

19:17 <zmike> no rush, I'm trying to figure out how to do dynamically resizable depth buffers

19:40 <zmike> oooh this is really fucking gross but I nailed it

19:49 <zmike> ajax: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16151

19:49 <zmike> try this out

19:50 <ajax> ew

19:50 <zmike> yes

19:51 <zmike> it's basically the non-wsi version of the existing swapchain buffer code

19:51 <zmike> I'm wondering if I should just remove the special casing in kopper now and let zink fully manage the depth buffer with this handler

19:52 <zmike> since now it'll be kopper recreating depth buffers on window resize but then also zink internally recreating depth buffers at the Truly Correct Size and sneaking them in behind kopper's back anyway

19:53 <ajax> yeah i feel like you can just go ahead and resize zs when you resize front/back?

19:53 <zmike> that's basically what this is doing

19:53 <zmike> so I think I'm gonna test out dropping the special casing since potentially now kopper might alloc a depth buffer that never gets used

19:54 <ajax> ack

19:59 <zmike> ajax: tfp here https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16152

20:01 <zmike> looking at the remaining ci regressions from kopper and I don't see these failing locally?

20:01 <zmike> 🤔

20:04 <ajax> hm hm

20:05 <ajax> zmike: !15800 has current bits for swapinterval. zero regressions for me with piglit quick_gl and radv.

20:08 <zmike> I'm on it

20:12 <zmike> ajax: looks good overall, did you ever figure out what happens on resize?

20:12 <zmike> wonder if the interval should just be included in the loader info so zink_kopper can just apply it then

20:13 <ajax> resizing being the sequence number abort? not yet but apparently it's still easy to hit

20:13 <zmike> also does it fix the piglit regressions listed in the ci baseline for swap control?

20:13 <zmike> no, not that

20:13 <zmike> resize+swapinterval

20:14 <zmike> hm

20:14 <zmike> actually, I guess it wouldn't matter would it since zink handles the resizing and thus the interval should be preserved there

20:14 <zmike> DISREGARD

20:14 <ajax> oh that yeah. yes, look at the extra vtable nonsense i had to do in the egl code to keep it away from the dri2 protocol

20:14 <zmike> yeah gross

20:14 <zmike> but expected

20:14 <zmike> it's not dri without a vtable

20:15 <zmike> or 5

20:15 <ajax> that tfp code looks... plausible.

20:15 <ajax> that work on nvidia?

20:15 <ajax> i kind of imagine no

20:19 <zmike> it does not

20:19 <zmike> though that test fails the same way on nv native?

20:19 <zmike> 🤔

20:20 <ajax> heh

20:20 <ajax> yeah it's not really "good"

20:20 <zmike> X Error of failed request: BadMatch (invalid parameter attributes)

20:20 <zmike> Major opcode of failed request: 152 (GLX)

20:20 <zmike> Minor opcode of failed request: 22 (X_GLXCreatePixmap)

20:20 <zmike> yeah but it's the only "simple" test I have for it

20:21 <zmike> does it work for you on radv?

20:21 <ajax> firefox's o-o-p rendering uses it i think?

20:21 <zmike> I only have nv and intel up right now

20:21 <zmike> uhhhhhhh

20:21 <zmike> I don't think I've ever tested running ff on zink

20:21 <zmike> not sure this is the time to start haha

20:21 <ajax> buk buk buk buk braawwwwwk

20:21 <zmike> oh no you didn't

20:22 <zmike> fine

20:22 <zmike> FINE

20:22 <zmike> well ff works

20:22 <zmike> I'm slamming webgl aquarium on it

20:23 <ajax> yep, works with radv

20:23 <zmike> cool

20:23 <ajax> and with anv

20:28 <zmike> okay, so the only remaining release blockers (https://gitlab.freedesktop.org/mesa/mesa/-/issues/6267) are the disconnects and the auto-loading fallback

20:28 <zmike> I can tackle the fallback tomorrow

20:28 <zmike> and maybe try to throw a BadWindow into glxgears somehow

20:29 <ajax> i still don't understand how you're having that problem

20:29 <zmike> yeah it's baffling

20:29 <zmike> I guess nv wsi is just insanely permissive

20:33 <zmike> that special_event patch in your MR is pretty gnarly

20:33 <ajax> didn't say i liked it

20:34 <ajax> i could add like a usleep(100) or something to make it less busy-waity but i have trouble caring

20:34 <zmike> haha

20:34 <zmike> yeah

20:34 <ajax> you shouldn't get to that spot, and if you do this is not why you're slow

20:34 <ajax> you have fewer images than threads wanting to work on them. resize your shiz.

20:35 <zmike> hopefully this doesn't cause more mysterious regressions

20:36 <zmike> whew there's a light at the end of the tunnel finally

20:36 <ajax> i couldn't run quick_gl to completion without it so i'm pretty hopeful

20:37 <zmike> awesome

20:37 <zmike> I'm queuing up some runs

20:53 eukara has joined #zink

20:55 eukara_ has quit [Ping timeout: 480 seconds]

21:02 eukara has quit [Ping timeout: 480 seconds]

21:02 eukara has joined #zink

21:07 eukara_ has joined #zink

21:11 eukara has quit [Ping timeout: 480 seconds]

21:12 eukara__ has joined #zink

21:18 eukara_ has quit [Ping timeout: 480 seconds]

21:22 eukara__ has quit [Read error: Connection reset by peer]

21:27 <ajax> ngh

21:27 <ajax> so i can pretty reliably make kopper glxgears crash just by rapd resizing

21:27 <ajax> and if i forcibly call XInitThreads() from libGL's constructor, it doesn't crash

21:28 <ajax> so fundamentally the issue here is it's not possible to use xcb safely without knowing whether it's the backend for some xlib display

21:28 <ajax> which there's no API for

21:29 <ajax> i could scrape /proc/pid/maps for my own heap and try to find my own xcb_connection_t* somewhere aligned. it's 64 bits, it's not going to false-positive.

21:30 <ajax> but then the other problem is i don't think XInitThreads can help you if a display was already created before you called it

21:30 <ajax> which means if libGL was dlopen'd you might just be out of luck.

21:32 <zmike> starting to feel like we're getting into the territory of the truly insane

21:34 <daniels> ajax: just ram through the patch to make every display threadsafe?

21:34 <daniels> I posted it a while ago but decided I didn’t care when it became about nop chicken on various platforms

21:34 <ajax> daniels: i mean. yeah. i kind of hate saying everyone gets to upgrade their libX11...

21:35 <ajax> my other insane plan was to just import the _Display struct ABI into libxcb and use that for xcb_connection_t's actual storage

21:36 <ajax> no question of request numbers getting out of sync if they're only stored in one place!

21:37 <ajax> but there too you're forcing everyone to update libxcb just to make this one driver work

21:39 <ajax> afk for the evening. branch updated with some minor cleanups, but ci choked on the last version and i didn't fix anything about those fails yet

21:39 <daniels> tbf we are talking about an unreleased version of Mesa here … ?

21:39 <daniels> so upgrades are fairly implied

21:57 eukara has joined #zink

22:08 eukara_ has joined #zink

22:09 eukara has quit [Read error: No route to host]

22:23 eukara_ has quit [Ping timeout: 480 seconds]