ChanServ changed the topic of #zink to: official development channel for the mesa3d zink driver || https://docs.mesa3d.org/drivers/zink.html
Dark-Show has joined #zink
Dark-Show has quit [Ping timeout: 480 seconds]
pac85 has joined #zink
<pac85> zmike: for precompilation I had to deal with user clip planes. Now given that there can be 8 and each is a vec4 it adds up to 128 bytes which is too much for push consts.
<pac85> To handle them I'm creating an uniform buffer, however this means that we need to support one more uniform buffer than the current maximum (which my code should already handle). Do you think this approach will work or do you seeproblems with it?
<zmike> uhhh
<zmike> pac85: remind me again why it needs a vec4?
<pac85> zmike: each clip plane is represented by a plane equation which has 4 parameters
<zmike> hm
<zmike> technically aren't the 4 values doubles?
<zmike> so that would need a dvec4 per plane
<zmike> in any case yeah this isn't ideal
<zmike> how much space do you need aside from that?
<pac85> pipe_clip_state uses floats
<zmike> yeah so I see
<pac85> zmike: for ucp? it's those vecs and then a one byte bitmask
<zmike> I meant for all your precompile stuff
<zmike> I'm somewhat tempted to say that this feature just requires 256 bytes of push constant space and anything that doesn't support that doesn't get precompile
<zmike> vkd3d already requires 256 minimum so all the drivers I care about already supports this
<zmike> since the drivers that don't support this probably also don't/won't support DS3/GPL
<zmike> which are required for precompile anyway
<pac85> zmkile: eveyrhting else fits in 4 x uint32_t
<zmike> okay, then let's just do push constants
<zmike> it'll be much simpler
<pac85> I think amdvlk exposes the minimum
<zmike> if that's true then nobody's doing much gaming there anyway
<pac85> BTW I've already implemented the uniform buffer stuff so implementing it is not the problem, but do you see any potential issues with what I described above? If so we can have that requirement for the push consts size
<zmike> it's not gonna be great for descriptor update overhead
<zmike> vs just memcpy in some push constant data
<zmike> not to mention pipeline layout compatibility
<zmike> so...yeah, not really in favor of the uniform buffer method
<pac85> mmm I see. Thanks. Although I had this thought that not having a way of sending arbitrary amounts of data to shaders feels wrong and eventually something will need to be done about it
<zmike> you might be right, but so far it's been an avoidable scenario
<pac85> Also besides the precompilation stuff push consts are used for other things so I think we need a bit more than 256bytes
<zmike> hm I don't think so? the current usage is well below 128 bytes
<zmike> so there should be plenty of space available
<pac85> Oh for wahetver reason I counted 3 uint32_t as 128bytes... yeah right 256 is enough
<zmike> cool
<pac85> I haven't talked to much about what I'm doing. Basically I have this "st key" struct, the first 2 bytes (of which I'm currently using 1 ) are either push constd or inlined (depending on whether they are used for the uber shader or the optimized ones), the remaining ones are always push constd. The first part also acts as a key for the optimized shaders. It's already like 60 commits on top of main so I'm doing clenup and bugfixing
<pac85> and hoepfully I can open an MR not too far from now
<zmike> oh awesome, I didn't realize you were that far along
<pac85> much of the work went into making lowering passed dynamic (so using sysvals, or sometimes rewriting them from scratch for things such as flatshading)
<anholt> pac85: really excited to see your work on this! draw-time compiles for alpha test are the worst thing in cs:go replay on anv.
<pac85> anholt: thanks! you gave me one more thing that I can use for testing
<zmike> I'd guess a bunch of the traces in the db should use legacy features
i509vcb has joined #zink
<pac85> Yeah I've collected quite a few that do
pac85 has quit [Quit: Konversation terminated!]
<zmike> eyyyyy
<neobrain> FEX can now run 32-bit x86-zink on ARM-Vulkan drivers :)
<zmike> 💪
sinanmohd has quit [Read error: No route to host]
sinanmohd has joined #zink
<zmike> anholt: I don't suppose you've taken any kind of deep look at glmark2 beyond basic benching? I've been digging into this for the past couple days and it seems like there's some kind of massive bottleneck that continues to elude me
<anholt> I've been on a "make intel's compiler suck less" kick. might pull back out of this shortly, but also about to head out camping for a week.
<zmike> both are certainly good ways to spend time
<zmike> aha!
<zmike> anholt: you were right, ANI is indeed the bottleneck for at least some things https://gitlab.freedesktop.org/mesa/mesa/-/issues/9201
<airlied> zmike: add another image to the swapchain :)
<zmike> always a solution
<zmike> but if they have the same swapchain size I don't get why the perf is so different
<airlied> just wasn't sure you had same swapchain sizes
<airlied> but yeah could be the thread in wsi maybr
<zmike> I'm pretty sure they use the same size?
<zmike> this uses IMMEDIATE, so there's no wsi thread
<zmike> at least, if they don't use the same size then it'd be because mesa/gl has a swapchain size override specifically for glmark, and I'm not inclined to believe that's a thing
<zmike> in driconf
<zmike> and indeed it is not
<airlied> oh I just wasn't sure how radeonsi did swapchain sizing
<airlied> vs vulkan explicitly doing it
<zmike> radeonsi just goes along with whatever the dri frontend decides
<zmike> which, to my knowledge, is the same as it is in vk wsi?
<zmike> cc daniels for fact check
<daniels> si has no opinion on the matter; it allocates when the frontend tells it to
<daniels> same for all gallium except zink because kopper is special with a capital special