sarnex has quit [Read error: Connection reset by peer]
samuelig has quit [Quit: Bye!]
samuelig has joined #dri-devel
sarnex has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
fab has quit [Ping timeout: 480 seconds]
pjakobsson has joined #dri-devel
tzimmermann has joined #dri-devel
rasterman has joined #dri-devel
warpme has joined #dri-devel
bbrezillon has joined #dri-devel
YuGiOhJCJ has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
warpme has quit []
kts has joined #dri-devel
kts has quit []
sgruszka has joined #dri-devel
fab has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
jfalempe has joined #dri-devel
warpme has joined #dri-devel
kts has joined #dri-devel
LeviYun has joined #dri-devel
frieder has joined #dri-devel
lynxeye has joined #dri-devel
eukara has quit []
jsa has joined #dri-devel
anujp has quit [Ping timeout: 480 seconds]
vliaskov has joined #dri-devel
lemonzest1 has quit []
davispuh has joined #dri-devel
mvlad has joined #dri-devel
coldfeet has joined #dri-devel
lemonzest has joined #dri-devel
Company has joined #dri-devel
coldfeet has quit [Remote host closed the connection]
<haasn>
karolherbst: the problem with av1 on gpu is that we want to do almost exclusively 8/16 bit integer ops
<haasn>
In my current code I just store everything as an int
<haasn>
But that’s wasteful, 75% of the ALU is unused
<haasn>
I was thinking that using u8vec4 would let the gpu pack four 8 bit ints into a single register
<haasn>
Basically upping throughout from 32 pixels / warp / cycle to 128
<haasn>
But doing something like a convolution with this design is very hard
<haasn>
Can’t just do your usual subgroup sum etc
<haasn>
And you need to widen it to 16 bit intermediates
<HdkR>
Also most GPUs these days don't allow you to do much SWAR anymore
anujp has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
kts has quit [Ping timeout: 480 seconds]
warpme has quit []
rsalvaterra has quit []
rsalvaterra has joined #dri-devel
rasterman has joined #dri-devel
alliancemade has joined #dri-devel
bbrezillon has quit [Quit: WeeChat 4.3.0]
bbrezillon has joined #dri-devel
bbrezillon has quit []
bbrezillon has joined #dri-devel
alliancemade has quit [Remote host closed the connection]
alliancemade has joined #dri-devel
<karolherbst>
haasn: most GPUs don't have any benefit doing 8/16 bit alu over 32 bit, unless you can use those fancy AI matrix multiplication ops
alliancemade has quit [Remote host closed the connection]
jfalempe has quit [Quit: jfalempe]
alliancemade has joined #dri-devel
alliancemade has quit [Ping timeout: 480 seconds]
<Company>
is there a technical reason why amd and nvidia don't implement GL_EXT_shader_framebuffer_fetch ?
<Company>
I'm trying to figure out the best way to do HDR colorspace conversions without needing an extra buffer
<Company>
ie I have an srgb buffer and want to convert it to rec2020, or sth along those lines
<dj-death>
compute shader + storage image?
yyds has quit [Remote host closed the connection]
<Company>
then I need to add compute support - which is gonna happen long term I guess
<karolherbst>
can also use storage image in fp programs I think
<karolherbst>
or not?
<Company>
i have no idea - it's probably a driver question too
<Company>
and/or a version question, because we use GLES these days
<Company>
I'm just trying to find the smartest way to do this, so HDR can work on the worst possible hardware without lags
<Company>
with the least amount of work
<pq>
GL_EXT_shader_framebuffer_fetch_non_coherent any better?
<karolherbst>
nvidia hardware doesn't support fbfetch natively anyway, and in nouveau we simply bind the framebuffer as a texture to read from
<Company>
pq: same drivers
<Company>
more or less
<pq>
those that don't support fbfetch, do they also have problems with using a temporary buffer?
<karolherbst>
it's kepler+ in nouveau anyway
<karolherbst>
I doubt that reading from the framebuffer is efficient on any hardware
<karolherbst>
I might be wrong
<karolherbst>
maybe tilers do better here
<pq>
they have to blend somehow...
zamundaaa[m] has joined #dri-devel
<Company>
I have no idea - the naive solution is to use an extra temporary buffer and that's always gonna work
<pq>
it's just blending, except one would want to add custom code to mangle the read and written values of the destination - especially the read values
<Company>
but it means there's an extra buffer involved
<pq>
I guess it's a problem if blending happens by fixed-function.
<Company>
I also haven't looked yet at how this works in Vulkan and if/how I can have an image be input and output attachment at the same time
<karolherbst>
maybe just use an image as in read-write image?
<dj-death>
karolherbst: we can read the render target cache on Intel, but it doesn't support MSAA very well
<dj-death>
karolherbst: you have to do non-uniform lowering to fetch each sample
<karolherbst>
uhh
<dj-death>
so we've been doing the same as nouveau
<dj-death>
use the sampler
<karolherbst>
yeah, I think that's the sanest solution if you only want to have it supported :D
<dj-death>
but that means you have to be careful with compression
fab has quit [Quit: fab]
fab has joined #dri-devel
fab has quit []
fab has joined #dri-devel
<Company>
oh, I didn't think about that yet
<Company>
we've been thinking about using msaa or supersampling for higher quality output, and that obviously interacts
<Company>
but that's a long way off - first inkscape needs a renderer using gtk instead of cairo
coldfeet has joined #dri-devel
fireburn has joined #dri-devel
Dark-Show has joined #dri-devel
bbrezillon has quit [Quit: WeeChat 4.3.0]
simon-perretta-img has quit [Ping timeout: 480 seconds]
coldfeet has quit [Remote host closed the connection]
jsa has quit [Ping timeout: 480 seconds]
<fireburn>
Is the AMD 7900M in any laptops? There was exclusivity with Alienware but it looks like they've stopped selling them now
jsa has joined #dri-devel
kzd has joined #dri-devel
guludo has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
ced117 has quit [Ping timeout: 480 seconds]
smpl has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
fireburn has quit [Remote host closed the connection]
cascardo has quit [Ping timeout: 480 seconds]
kchibisov has quit [Remote host closed the connection]
mainiomano has quit [Remote host closed the connection]
ella-0 has quit [Remote host closed the connection]
cmarcelo has quit [Remote host closed the connection]
rosefromthedead has quit [Remote host closed the connection]
kennylevinsen has quit [Remote host closed the connection]
atiltedtree has quit [Remote host closed the connection]
kuruczgy has quit [Remote host closed the connection]
nucfreq has quit [Remote host closed the connection]
rpigott has quit [Remote host closed the connection]
pitust has quit [Remote host closed the connection]
sumoon has quit [Remote host closed the connection]
ifreund has quit [Remote host closed the connection]
alethkit has quit [Remote host closed the connection]
itoral has quit [Remote host closed the connection]
cmarcelo has joined #dri-devel
ella-0 has joined #dri-devel
kuruczgy has joined #dri-devel
rpigott has joined #dri-devel
kchibisov has joined #dri-devel
mainiomano has joined #dri-devel
pitust has joined #dri-devel
cyrinux30 has quit []
cyrinux30 has joined #dri-devel
ifreund has joined #dri-devel
rosefromthedead has joined #dri-devel
sumoon has joined #dri-devel
mripard has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
nucfreq has joined #dri-devel
atiltedtree has joined #dri-devel
alethkit has joined #dri-devel
kennylevinsen has joined #dri-devel
feaneron has joined #dri-devel
kts has joined #dri-devel
<dliviu>
mlankhorst: I see that my patch got removed from drm-misc-next-fixes, am I correct that it hasn't been added anywhere else? Can I push it into drm-misc-next?
<mripard>
dliviu: it's in drm-misc-fixes
<dliviu>
mripard: thanks
epoch101 has joined #dri-devel
rgallaispou has joined #dri-devel
epoch101 has quit []
epoch101 has joined #dri-devel
<karolherbst>
haasn: maybe instead of looping inside looprestoration.glsl, you could see if having one thread doing one iteration be a feasible rework here and see how much of a difference that would make
<karolherbst>
or at least if you have loops, make sure that threads don't diverge (e.g. having the same iteration across all threads)
<haasn>
karolherbst: that loop primarily exists because the input size exceeds the output size
<haasn>
I map threads onto output pixels in a 1:1 manner
<haasn>
could also try the alternative of mapping threads onto input pixels and then dealing with the fact that some invocations will be idle during the last convolution
<karolherbst>
yeah, I think that might be fine
<mripard>
dliviu: it was meant for drm-misc-next?
<karolherbst>
what matters is, that threads don't diverge in control flow. Like if they all execute loops in lock-step that's fine
<dliviu>
I was not in a rush
<karolherbst>
or rather
<karolherbst>
all threads within a subgroup
<karolherbst>
if entire subgroups diverge, that's still fine
<dliviu>
mripard: it doesn't "fix" anything other than adding more compilation exposure to more function in komeda
<karolherbst>
e.g. have a block size of the subgroup size, and make sure that each block executes the same path through the shader
<karolherbst>
and then if entire blocks have nothing to do at the end, they can exit early without having to wait for other threads
kugel is now known as Guest9918
<haasn>
the internal block size of this algorithm is 8x8 so I set my WG size to that
kugel has joined #dri-devel
<karolherbst>
yeah, should be fine, although there is hardware with bigger subgroups sizes and I don't know how much that matters there
<haasn>
there will be some partial executions during the initial "assemble input" phase, that's almost unavoidable because the input pixel count is not a clean multiple of 32
<karolherbst>
though they might be able to execute two blocks at once?
<haasn>
AMD used to use 64 sized subgroups, I don't think there are more than that?
<karolherbst>
broadcom has 128
<karolherbst>
apparently
cascardo has joined #dri-devel
<karolherbst>
and I can see people using rpis as their media center thing, so it might even matter, but honestly don't know if the hw has some smartness here to counteract this
<haasn>
something I just randomly thought of, for film grain it might be possible to split it into two passes, one to cover all non-edge pixels and a second, separate pass to deal only with edge pixels (that need to be blurred with the previously processed non-edge pixels)
<karolherbst>
mhh, yeah, might be worth a shot
<haasn>
the film grain shader diverges quite badly
Guest9918 has quit [Ping timeout: 480 seconds]
<karolherbst>
on some GPUs (nvidia e.g.) having more specialized programs allocating fewer GPRs also leads to being able to run more threads in parallel
<haasn>
right
<haasn>
actually, that might be worth doing for loop restoration as well
<haasn>
instead of if (cond) { path A } else { path B }; invoke two shaders: if (!cond) return; /* path A */ and: if (cond) return; /* path B */
<haasn>
on the premise that inactive workgroups will very quickly return out, freeing up those resources for more groups
<karolherbst>
yeah, but sometimes you have to check if some trade-offs are worth doing. It might also make sense to group/sort input and reshuffle data so that blocks stay uniform
<haasn>
the condition here is on a scalar value shared by tee entire work group
<haasn>
the condition is granular on an 8x8 level
<haasn>
(that's why the workgroup size is set to 8x8)
<karolherbst>
right, in which case it should be fine
<karolherbst>
just overallocating gprs might be a problem then if there is an imbalance between those two paths
<mripard>
dliviu: why did it end up in drm-misc-next-fixes then
jsa has quit [Ping timeout: 480 seconds]
chaos_princess has quit [Quit: chaos_princess]
chaos_princess has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
KarenTheDorf has quit [Remote host closed the connection]
fab has quit [Quit: fab]
rgallaispou has quit [Read error: Connection reset by peer]
rgallaispou has joined #dri-devel
rgallaispou has quit [Read error: Connection reset by peer]
rgallaispou has joined #dri-devel
cmichael has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
fab has joined #dri-devel
<dliviu>
mripard: see my exchange with mlankhorst on Thursday. I've mistakenly assumed that putting things into -fixes is the thing to do for patches that do cleanups, I should have put it into misc-next
kts has joined #dri-devel
jsa has joined #dri-devel
fireburn has joined #dri-devel
fireburn has quit []
fireburn has joined #dri-devel
jfalempe has joined #dri-devel
bbrezillon has quit [Quit: WeeChat 4.3.2]
warpme has joined #dri-devel
bbrezillon has joined #dri-devel
fireburn has quit [Quit: Konversation terminated!]
fireburn has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
<zackr>
sima: i thought i already rb that patch twice. did boris make him take down rb's for one of the versions? iirc, that series is blocking a work i actually wanted alexey to get done so i'm happy to see it go in
<sima>
zackr, hm no idea, maybe complain why the r-b is getting dropped each version?
l276 has joined #dri-devel
bolson has quit [Remote host closed the connection]
bolson has joined #dri-devel
l276 has left #dri-devel [#dri-devel]
Duke`` has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
warpme has joined #dri-devel
simon-perretta-img has joined #dri-devel
oneforall2 has quit [Remote host closed the connection]
tzimmermann has quit [Quit: Leaving]
oneforall2 has joined #dri-devel
childrenatwar has joined #dri-devel
coldfeet has quit [Remote host closed the connection]
warpme has quit []
simon-perretta-img has quit [Ping timeout: 480 seconds]
sgruszka has quit [Quit: Leaving]
bolson has quit [Ping timeout: 480 seconds]
Calandracas has quit [Remote host closed the connection]
Calandracas has joined #dri-devel
<childrenatwar>
so all together the zero bit is carried amongst the fields, for an example 65+64 is 129 for first bit, 66+65 is second bit , so 64 and 65 are the states that are not toggled, so minuend or addend or adder and subtrahand needs -1 shifted encoder to annotate bit states that are not toggled in, and that way everything is going to work out, if subtrahand is 66-1 for third bit, minuend is 0
<childrenatwar>
or vice versa.
simon-perretta-img has joined #dri-devel
cmichael has quit [Quit: Leaving]
<childrenatwar>
for untoggled bits, so toggled bit state is 66 for both
childrenatwar has quit [Remote host closed the connection]
childrenatwar has joined #dri-devel
<childrenatwar>
toggled bit state for second bit is in both cases 66 so untoggled is 0 and 65 or 65 and 0
riteo_ has quit [Ping timeout: 480 seconds]
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
<sima>
airlied, btw just checked, the kms_lease test already checks for double-leasing, so running that should cover all relevant edge cases
<sima>
aside from the silly one of having an enormous pile of duplicated ids :-)
riteo has joined #dri-devel
nerdopolis has quit [Ping timeout: 480 seconds]
jsa has quit [Ping timeout: 480 seconds]
Dark-Show has quit [Ping timeout: 480 seconds]
Dark-Show has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
Dark-Show has quit [Read error: No route to host]
simon-perretta-img has quit [Ping timeout: 480 seconds]
Dark-Show has joined #dri-devel
simon-perretta-img has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
warpme has joined #dri-devel
simon-perretta-img has joined #dri-devel
OftenTimeConsuming has quit [Remote host closed the connection]
coldfeet has joined #dri-devel
warpme has quit []
OftenTimeConsuming has joined #dri-devel
childrenatwar has quit [Ping timeout: 480 seconds]
sghuge_ has joined #dri-devel
sghuge_ has quit []
sghuge_ has joined #dri-devel
sghuge_ has left #dri-devel [#dri-devel]
sghuge_ has joined #dri-devel
lynxeye has quit [Quit: Leaving.]
sghuge has quit [Ping timeout: 480 seconds]
sghuge_ has left #dri-devel [#dri-devel]
sghuge has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
agd5f_ has quit []
agd5f has joined #dri-devel
nerdopolis has joined #dri-devel
KarenTheDorf has joined #dri-devel
guludo has quit [Quit: WeeChat 4.2.2]
guludo has joined #dri-devel
childrenatwar has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
rasterman has joined #dri-devel
rasterman has quit []
rasterman has joined #dri-devel
childrenatwar has quit [Remote host closed the connection]
feaneron has quit [Ping timeout: 480 seconds]
ced117 has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
Haaninjo has joined #dri-devel
coldfeet has quit [Remote host closed the connection]
<airlied>
haasn: I think intel media-driver has some filmgrain shaders, but not sure if they have source, or it they use special intel media shader features
simon-perretta-img has quit [Ping timeout: 480 seconds]