ChanServ changed the topic of #zink to: official development channel for the mesa3d zink driver || https://docs.mesa3d.org/drivers/zink.html
LexSfX has quit []
LexSfX has joined #zink
Simonx22 has joined #zink
zmike has joined #zink
Sachiel has joined #zink
fahien has joined #zink
fahien has quit [Ping timeout: 480 seconds]
fahien has joined #zink
xroumegue has quit [Ping timeout: 480 seconds]
xroumegue has joined #zink
fahien has quit [Ping timeout: 480 seconds]
<anholt> before I go typing a thing: do zink's descriptor template updates update every descriptor in the set?
<zmike> yep
<anholt> looks making use of that will be good for 74 -> 102 fps on gfxbench driveroverhead (submits disabled. we also suck at cmd streams and so you only get 64ish fps when processing submits)
<zmike> suppose it'd be possible to create incremental templates for doing only partial updates, but the overhead of tracking more precise changes would probably negate the gains
<zmike> what exactly is this case doing that's hitting perf issues in descriptor updates?
<anholt> the issue we have is a partial update means you have to memcpy the un-updated descriptors, and that's a read of WC memory.
<zmike> is it just doing like 1 sampler update for a set of 16?
<zmike> i.e., only updating the first sampler in a set
<anholt> there are only 5 descriptors in the set I think, let's see what they are.
<zmike> be interested to know types and which ones are being updated
<anholt> 5 UBOs, apparently.
<zmike> all updated?
<zmike> should be possible to check this trivially with GALLIUM_TRACE btw
<anholt> actually not sure where 5 ubos are even coming from
<anholt> looks like it's just a loop of set_constant_buffer(VERTEX, 0), draw_vbo.
<zmike> hm so that should be updating the push set
<zmike> which is its own set
<zmike> ahhh
<zmike> okay because it'll update all the stages yeah I see
<zmike> interesting
<zmike> I'd never considered that before
<anholt> yeah, it's tu_CmdPushDescriptorSetWithTemplateKHR
<zmike> right
<zmike> so I think the solution to this would be something like
<zmike> in zink_descriptors_init create more push templates in different combinations
<zmike> store them to an array
<anholt> but you're always going to have the layout have descriptors for every possible stage?
<zmike> yeah that's kinda unavoidable
<anholt> prediction: partial update's going to be slower for us than that.
<anholt> since partial update means WC read. better to just send us the whole mess then.
<zmike> ah
<zmike> well I think that's what it should be doing currently?
<zmike> the issue is it's currently optimized to just slam in the same layout for every pipeline
<anholt> so I just need to check that the template covers the full layout, at which point I can skip the WC read.
<zmike> and changing that adds a lot of complexity
<zmike> yeah probably
<zmike> you can see in zink_descriptors_init the push template that gets created for it
<zmike> and it'll always update all the descriptors every time
<zmike> so it's never partial
<zmike> zink never does partial set updates except for bindless
<zmike> so you should never have wc read
<anholt> ok, so if I fix this, I think we're at no-submit driveroverhead 102 fps zink+tu vs 108 fps freedreno, and yes-hw driveroverhead 64 vs 69 fps.
<zmike> ooo
<zmike> that's pretty competitive
<anholt> (well, I also had to force sysmem. so two things to fix, at which point I will declare driveroverhead Fine)
<zmike> nice 💪
<zmike> anholt: btw there's cases for this in vkoverhead too now
<zmike> template and no-template
<anholt> looks like it doesn't, that uses the non-push path.
<zmike> ohh it's unique to push descriptors?
<zmike> ok, I should add cases for that too
<anholt> for push, we have to copy the old set over to the new space. if you're doing non-push update, then it's in place.
<anholt> (and the lifetimes are in your hands)
<zmike> gotcha
<zmike> didn't expect any other drivers might prefer the non-push path
<zmike> there's a driver workaround already to disable it on amd
<anholt> I think full-update push should work out the same as a non-push with you manually managing the space for your full updates with my upcoming MR.
<zmike> would be trivial to test, just disable push descriptor ext in tu and zink will use regular sets
<zmike> that's a huge win though
<zmike> nice find