sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
Piraty_ has joined #dri-devel
Piraty has quit [Ping timeout: 480 seconds]
junaid has quit [Remote host closed the connection]
Haaninjo has joined #dri-devel
Leopold has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
Leopold__ has quit [Ping timeout: 480 seconds]
djbw has quit [Remote host closed the connection]
djbw has joined #dri-devel
djbw has quit [Remote host closed the connection]
<MrCooper>
jenatali: Marge doesn't actively watch the CI, it just gets the green/red result at the end; besides, sometimes a human watching can retry a job which failed due to a flake and prevent the pipeline from failing
Swivel_ has quit [Remote host closed the connection]
swivel has joined #dri-devel
karolherbst has joined #dri-devel
rcf has quit [Ping timeout: 480 seconds]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
gouchi has joined #dri-devel
rcf has joined #dri-devel
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
dcz_ has joined #dri-devel
rasterman has joined #dri-devel
dcz_ has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
dv_ has quit [Ping timeout: 480 seconds]
dv_ has joined #dri-devel
luc has quit [Remote host closed the connection]
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
rasterman has quit [Quit: Gettin' stinky!]
dumarrrrrrrrrrrrrrrrrrrr^ has quit [Remote host closed the connection]
sima has quit [Ping timeout: 480 seconds]
Jeremy_Rand_Talos_ has quit [Remote host closed the connection]
Jeremy_Rand_Talos_ has joined #dri-devel
gouchi has quit [Remote host closed the connection]
pcercuei has quit [Quit: dodo]
<alyssa>
currently doing screen capture in firefox in sway
<alyssa>
ahaha the future is now
<alyssa>
:D
Haaninjo has quit [Quit: Ex-Chat]
<alyssa>
anholt_: ok, so the hsl algorithm fail I have on AGX is similar to what you had on turnip
<alyssa>
The shader does `mediump float foo = r - g;` where r and g are loaded from a mediump vertex input backed by an R32_FLOAT vertex buffer
<alyssa>
So logically, the test expects to do fsub(f2f16(r), f2f16(g))
<alyssa>
However, our backend copyprop is implementing this effectively as f2f16(fsub(r, g))
<alyssa>
which, now I'm wondering if that's legal. certainly not if it's exact. probably fine for gles and technically a test bug even though there's also a driver bug that vk would hit
<alyssa>
The shader code *looks* innocuous enough, something like
<alyssa>
fadd32 r0l, r1, r2
<alyssa>
but.. maybe promoting 16-bit ALU to 32-bit ALU to fold away f2f16 sources isn't kosher after all
<alyssa>
similar problem with the destination... If we have f2f32(fadd(x, y)) the backend will fold that to
<alyssa>
fadd32 r0, r1l, r2l
<alyssa>
but again doing the add at higher precision than the nIR
<alyssa>
unclear to me if/when doing stuff at higher precision would ever not be ok
<alyssa>
this also affects midgard which architecturally lacks fp16 scalar arithmetic and instead does fp32 with f2f16/f2f32 on the inputs/outputs
<alyssa>
(though I don't think the test is failing there, possibly by pure luck of getting vectorized and using the true fp16 vector units)
<alyssa>
so.. all in all, this is possibly both a driver bug and a CTS bug. unsure what to do about either one
<alyssa>
(By the way, why are no other gles drivers affected? because normal GPUs would fold the f2f16 into the vertex load, since it'd be dedicated vertex fetch hardware that does the memory float32 -> register fp16 internally. AGX does vertex fetch in software which makes the conversions all explicit ALU.)
<alyssa>
I'm now also wondering if this would run astray of gl invariance rules, if we swap in a fast linked (vertex fetch separate from main VS, f2f16 not folded in) for a monolithic (f2f16 folded in by promoting some other ALU to fp32) program
<alyssa>
we don't do this yet but will soon to deal with recompile jank, and down the line zink+agxv+gpl would do this too
<alyssa>
so maybe the backend copyprop is bogus... but it's still not obvious to me when promoting internal operation precision is exact and when it's not
<alyssa>
I guess the spicy case is something like x=y=10^6 and calculate fsub(f2f16(r), f2f16(g))
<alyssa>
should be nan or inf
<alyssa>
but f2f16(fsub(r, g)) = 0.0
<alyssa>
so am I not allowed to fold conversions at all? :|
<alyssa>
hmm, well, not quite
<alyssa>
I can fold alu32(f2f32(x))
<alyssa>
and I can fold f2f16(alu32(x))
<alyssa>
since we were already doing a 32-bit operation, there's no difference
<alyssa>
the problem case is only when we promote a 16-bit operation to 32-bit
<alyssa>
so can't fold f2f32(alu16(x)) or alu16(f2f16(x))
<alyssa>
for fadd/fmul/ffma I have separate fp16/fp32, so that's a hard and fast rule
<alyssa>
but for all other alu, it's all internally 32-bit (even if you convert both source and destination)
<alyssa>
that.. should still be fine? like, that should just be an implementation detail at that point. although, ugh, hm
<alyssa>
No, the invariance issue from this backend optimization in particular is specifically from the fp16 alu and fp32 alu being different hardware, and changing the opcode isn't ok
<alyssa>
The other cases have nothing to do with the optimizer and amounts to me asking "is this hw a valid implementation of the fp16 op in NIR at all"
<alyssa>
so I think as long as I disallow the opcode-switching cases I should be in the clear. I think. mediump melts my head.
<alyssa>
(and the opcode-switching cases are probably valid in gles if not for invariance issues with fast linking, but definitely not valid in vulkan with strict float rules)