#zink on 2022-09-12 — irc logs at oftc.irclog.whitequark.org

2022-08-14 19:54 ChanServ changed the topic of #zink to: official development channel for the mesa3d zink driver || https://docs.mesa3d.org/drivers/zink.html

03:36 LexSfX has quit []

03:40 LexSfX has joined #zink

04:52 Simonx22 has joined #zink

04:52 zmike has joined #zink

04:52 Sachiel has joined #zink

06:20 fahien has joined #zink

12:15 fahien has quit [Ping timeout: 480 seconds]

12:23 fahien has joined #zink

14:59 xroumegue has quit [Ping timeout: 480 seconds]

14:59 xroumegue has joined #zink

19:10 fahien has quit [Ping timeout: 480 seconds]

22:21 <anholt> before I go typing a thing: do zink's descriptor template updates update every descriptor in the set?

22:21 <zmike> yep

22:22 <anholt> looks making use of that will be good for 74 -> 102 fps on gfxbench driveroverhead (submits disabled. we also suck at cmd streams and so you only get 64ish fps when processing submits)

22:22 <zmike> suppose it'd be possible to create incremental templates for doing only partial updates, but the overhead of tracking more precise changes would probably negate the gains

22:23 <zmike> what exactly is this case doing that's hitting perf issues in descriptor updates?

22:23 <anholt> the issue we have is a partial update means you have to memcpy the un-updated descriptors, and that's a read of WC memory.

22:23 <zmike> is it just doing like 1 sampler update for a set of 16?

22:24 <zmike> i.e., only updating the first sampler in a set

22:24 <anholt> there are only 5 descriptors in the set I think, let's see what they are.

22:24 <zmike> be interested to know types and which ones are being updated

22:27 <anholt> 5 UBOs, apparently.

22:28 <zmike> all updated?

22:29 <zmike> should be possible to check this trivially with GALLIUM_TRACE btw

22:30 <anholt> actually not sure where 5 ubos are even coming from

22:33 <anholt> looks like it's just a loop of set_constant_buffer(VERTEX, 0), draw_vbo.

22:33 <zmike> hm so that should be updating the push set

22:33 <zmike> which is its own set

22:33 <zmike> ahhh

22:34 <zmike> okay because it'll update all the stages yeah I see

22:34 <zmike> interesting

22:34 <zmike> I'd never considered that before

22:34 <anholt> yeah, it's tu_CmdPushDescriptorSetWithTemplateKHR

22:34 <zmike> right

22:35 <zmike> so I think the solution to this would be something like

22:35 <zmike> in zink_descriptors_init create more push templates in different combinations

22:35 <zmike> store them to an array

22:36 <anholt> but you're always going to have the layout have descriptors for every possible stage?

22:36 <zmike> yeah that's kinda unavoidable

22:36 <anholt> prediction: partial update's going to be slower for us than that.

22:37 <anholt> since partial update means WC read. better to just send us the whole mess then.

22:37 <zmike> ah

22:37 <zmike> well I think that's what it should be doing currently?

22:37 <zmike> the issue is it's currently optimized to just slam in the same layout for every pipeline

22:38 <anholt> so I just need to check that the template covers the full layout, at which point I can skip the WC read.

22:38 <zmike> and changing that adds a lot of complexity

22:38 <zmike> yeah probably

22:38 <zmike> you can see in zink_descriptors_init the push template that gets created for it

22:38 <zmike> and it'll always update all the descriptors every time

22:38 <zmike> so it's never partial

22:39 <zmike> zink never does partial set updates except for bindless

22:39 <zmike> so you should never have wc read

22:43 <anholt> ok, so if I fix this, I think we're at no-submit driveroverhead 102 fps zink+tu vs 108 fps freedreno, and yes-hw driveroverhead 64 vs 69 fps.

22:44 <zmike> ooo

22:44 <zmike> that's pretty competitive

22:44 <anholt> (well, I also had to force sysmem. so two things to fix, at which point I will declare driveroverhead Fine)

22:44 <zmike> nice 💪

22:46 <zmike> anholt: btw there's cases for this in vkoverhead too now

22:46 <zmike> template and no-template

23:23 <anholt> looks like it doesn't, that uses the non-push path.

23:26 <zmike> ohh it's unique to push descriptors?

23:26 <zmike> ok, I should add cases for that too

23:49 <anholt> for push, we have to copy the old set over to the new space. if you're doing non-push update, then it's in place.

23:49 <anholt> (and the lifetimes are in your hands)

23:54 <zmike> gotcha

23:56 <zmike> didn't expect any other drivers might prefer the non-push path

23:56 <zmike> there's a driver workaround already to disable it on amd

23:57 <anholt> I think full-update push should work out the same as a non-push with you manually managing the space for your full updates with my upcoming MR.

23:58 <anholt> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18561

23:58 <zmike> would be trivial to test, just disable push descriptor ext in tu and zink will use regular sets

23:59 <zmike> that's a huge win though

23:59 <zmike> nice find