<i509vcb>
nir_load_global_constant can be used to read from a buffer, but how do I tell nir to store that value into some uniform registers?
<i509vcb>
I have sorted sysvals so that uniform registers can be assigned, but I need to load a value from the sysval buffer and store it in a uniform reg for use
<i509vcb>
agx_emit_store_preamble?
<alyssa>
i509vcb: short answer is "you dont'"
<alyssa>
if you emit load_global_constant, that means the value is read /in the shader/ directly
<alyssa>
and there are no uniform registers involved from the drivers perspective
<alyssa>
no layouts for the driver to consider, no usc_uniform calls
<i509vcb>
hmm okay, so really the only usc_uniform call I'd probably do then is tell the driver where I uploaded the agxv_sysvals_whatever
<alyssa>
in practice, the compiler will internally generate store_preamble instructions to optimize the load into a uniform, but that's completely invisible to the driver
<alyssa>
right
<alyssa>
otoh, if you want to calculate layouts yourself, then you use usc_uniform in the driver and you have to emit load_preamble directly -- not load_global_constant
<alyssa>
and set the compiler input accordingly so your uniforms dont get stomped on
<i509vcb>
I guess part of this question was reading
<i509vcb>
> agx_usc_uniform is then used to copy from that struct to uniform registers according to the layout that you picked earlier
<i509vcb>
But from a theoretical performance standpoint loading that value into a uniform register would just generate an redundant mov?
<alyssa>
uhh there are two options
<alyssa>
1. you do not assign uniforms, you just generate load_global_const and let the compiler go to town
<alyssa>
2. you do assign uniforms, and instead of generating load_global_const, you generate load_preamble and the compiler won't touch that
chadmed has quit [Ping timeout: 480 seconds]
<alyssa>
in the second case only you use agx_usc_uniform to copy things into place
<alyssa>
in the first case the compiler deals with it
<i509vcb>
I guess the former sounds better from a maintaince standpoint since we already upload all the stuff into a buffer
<alyssa>
sure, it's less work
<i509vcb>
And I doubt I can do much better than the compiler right now
<alyssa>
sure
<alyssa>
in #1, the only load_preamble you generate is for the address of the the sysval buffer
<alyssa>
and the only usc_uniform is for that single 64-bit address at u0_u1
<alyssa>
probably not optimal for a produciton vk driver but the optimizations can come later easier
<i509vcb>
I've left the bound buffers in the uniforms for now
<i509vcb>
Mainly because of the compatibility stuff currently there
<i509vcb>
although it is definitely chunky
<i509vcb>
32 uint64_t means 128 uniform registers just for buffers
as400 has quit [Remote host closed the connection]
as400 has joined #asahi-gpu
<i509vcb>
I'm going to take moving over fully to the sysval buffer a little more slowly, I've found that trying to do push constants, descriptors and vbo stuff at the same time as num_workgroups to be a bit hard to test properly
ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
ourdumbfuture has joined #asahi-gpu
chadmed has joined #asahi-gpu
jeisom has quit [Ping timeout: 480 seconds]
ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
<alyssa>
this would load the /address/ of the sysval, you need to wrap this whole thing in a load_const_agx
<alyssa>
at least as it's used in the GL driver, load_sysval_agx is intended to return the value
<alyssa>
for load_num_wrkgroups that means there would be 2 dependent loads in this case
<alyssa>
load_global_const(load_global_const(load_preamble(0) + offset to the num_workgroups))
<alyssa>
since you put the address int he buffer and then you need to load from the address int he buffer
<alyssa>
agx_usc_uniform works like a memcpy: move the contents of this GPU memory into these uniforms
<alyssa>
if you want to put an address in a uniform, you need to pool_upload the address to the cmdbuf pool and pass /that/ address to usc_uniform
<alyssa>
that 0000 is a hexdump of the content of the uniform
<alyssa>
(alternatively, you could pass the entirety of the sysval buffer to agx_usc_uniform and then use the load_sysval_agx you implemented above, without the load. but that is very chunky since you dont use most of those uniforms in a given pipeline, which is why the gl driver has the layout code.)
cylm has joined #asahi-gpu
ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
ourdumbfuture has joined #asahi-gpu
maria has quit [Read error: Connection reset by peer]