omegatron has quit [Quit: What happened? You quit!]
graphitemaster has joined #zink
<graphitemaster>
I'm trying to build a stand-alone zink that will leverage nvidia provided vulkan and I'm running into some issues, mesa keeps trying to load swrast_dri
<graphitemaster>
My build command is meson --prefix=/tmp/zink -Dgallium-drivers=zink -Dvulkan-drivers= -Ddri-drivers= build-zink; ninja -C build-zink/ install
<graphitemaster>
Testing it with LD_LIBRARY_PATH=/tmp/zink/lib MESA_LOADER_DRIVER_OVERRIDE=zink glxinfo
<graphitemaster>
libGL error: failed to load driver: swrast
kusma has joined #zink
<kusma>
I think you need to use the GALLIUM_DRIVER=zink approach on NVIDIA until Penny/Copper has landed... ajax_?
hch12907 has joined #zink
<graphitemaster>
That works. Looks like mesa + zink doesn't do mesa_glthread=true ?
<graphitemaster>
"dri_create_context: requested glthread but driver is missing backgroundCallable V2 extension"
<kusma>
graphitemaster: That might be because you're down a swrast codepath now :-/
<kusma>
And we don't really have a better option for NVIDIA yet. Penny/Copper will fix that, but we're not there yet.
<graphitemaster>
I can't imagine the WSI blit is that SLOW
<graphitemaster>
Anyways, I'm just surprised it works XD
<kusma>
Penny/Copper is ajax patches to hook up to vulkan WSI code instead
<kusma>
graphitemaster: The good news is that most of your perf is bound to blitting the frontbuffer, so you can probably do much more heavy rendering at not much less perf! ;)
<graphitemaster>
Was going to say, gsync is totally broken too
<kusma>
BTW, just checked, mesa_glthread=true seems to work fine on Intel with DRI
<kusma>
Because, our winsys stuff should kinda be on par with llvmpipe (modulo some details that probably doesn't matter)
<graphitemaster>
Could try llvmpipe there, I dunno how gsync would work with software rendering though
<graphitemaster>
Unless you're using hw to present to swapchain
<kusma>
Yeah, so maybe that's the reason. And again, I guess Penny/Copper is the fix ;)
<kusma>
It's becoming a bit of a meme; Penny/Copper will fix EVERYTHING ;)
<kusma>
I'm sure Ajax is working our the interactions it has with flying cars and jetpacks right now.
<graphitemaster>
Apparently in llvmpipe all my glTextureSubImage3D calls are straight up INVALID_VALUE / INVALID_OPERATIION, but also glthread does not appear to work with GALLIUM_DRIVER=llvmpipe either
<kusma>
graphitemaster: Yeah, that's kinda what I expected... Try MESA_DEBUG to figure out what more precisely is wrong with the glTextureSubImage3D calls...
<kusma>
(env var, set it to something like 1)
<graphitemaster>
Mesa: User error: GL_INVALID_VALUE in glTexStorage3D(invalid width, height or depth)
<graphitemaster>
Time to print what values I'm passing there.
<graphitemaster>
It's weird it's printing that specific function because I don't use it, I use glTextureStorage3D here.
<kusma>
Yeah, that could be a reporting-bug
<graphitemaster>
Mesa debug output: GL_INVALID_VALUE in glTexStorage3D(invalid width, height or depth)
<graphitemaster>
w=32,h=32,d=4096
<graphitemaster>
The values I'm passing to it.
<kusma>
that d=4096 sounds like a LOT
<graphitemaster>
That is correct, 128 packed 32x32x32 3D textures :P
<kusma>
LLVMpipe has a max 3D texture size of 2k
<kusma>
(per axis)
<kusma>
So... that's your problem :-)
<graphitemaster>
Well that is not min-spec conformant XD
<graphitemaster>
4096 is min-spec XD
<kusma>
for which spec version?
<graphitemaster>
> The value must be at least 1024
<graphitemaster>
Oh my god
<graphitemaster>
OpenGL is ridiculous sometimes.
<graphitemaster>
The min-spec is even worse than 2k
<kusma>
Sounds like your application either needs to reject drivers with too low limits, or change the texture-packing ;)
<graphitemaster>
Yeah fixed, ez.
<kusma>
...or just not care about LLVMpipe, which is a totally acceptable solution for some applications ;)
<kusma>
Cool :)
<graphitemaster>
I'm impressed with how well llvmpipe runs until I look at top and I see that my ThreadRipper is dying.
<graphitemaster>
zink fails my lookdev tests btw
<graphitemaster>
that's where I compare different rendered frames of the same scene to see how close the frames are for different renderers
<graphitemaster>
I guess rasterization / fill rules might be different
<graphitemaster>
or the multisample pattern is different
<graphitemaster>
hummm
<kusma>
What kind of primitives? If it's lines or points, then... uh yeah ;)
<kusma>
Triangles should be identical.
<graphitemaster>
Looks more like texture filtering
<kusma>
Hmm, that should be the same...
<graphitemaster>
Oh
<kusma>
Maybe we're not exposing the same levels of anisotropic filtering or something?
<graphitemaster>
What do dFdx, dFdy map to in zink, Coarse or Fine
<kusma>
Coarse by default, I suspect.
<kusma>
Ah, I don't think mesa supports the glHint for this...
<graphitemaster>
Yep, looks like you ignore GL_FRAGMENT_SHADER_DERIVATIVE_HINT
<graphitemaster>
That's the issue, changing shaders fixed it :P
<kusma>
Yeah, I think that would be a very welcome fix :)
<kusma>
Shouldn't be too hard to fix, I think.
<graphitemaster>
Beautiful.
<kusma>
OK, seems i965 supports the hint, but not any Gallium drivers.
<graphitemaster>
It seems simple to support in theory but you can set it before a draw call and that has to patch the shader :|
<kusma>
graphitemaster: Yeah, but we have stuff to handle these kinds of things
<graphitemaster>
*nod*
<graphitemaster>
Might end up learning mesa...
<kusma>
I think st_update_fp needs to check the state and lower the instructions to either the fine or coarse versions.
<kusma>
So that would probably end up as a bit in st_fp_variant_key or something like that...
<kusma>
Or... maybe that's a bit heavy handed... That won't let us deduplicate the variants when uses_fddx_fddy is false...
<kusma>
Ah, maybe it does let us...
<graphitemaster>
I'm ignorant so just going to nod along. Really curious about the penny/copper thing too.
<graphitemaster>
Found a legit bug, passing 16 to GL_UNPACK_ALIGNMENT
<graphitemaster>
Dunno who to thank, Mesa for being strict or NV for allowing that >_>
<graphitemaster>
kusma, building now, should I also test switching hint before a draw call just to make extra sure
<graphitemaster>
Well they must be working because lookdev tests passed for GL Collabora Ltd 4.6 (Core Profile) Mesa 21.3.0-devel (git-e0b45bf2ff) zink (NVIDIA GeForce RTX 2070)
<graphitemaster>
But that is just set once globally, I dunno if it works for switching the hint before a draw call yet.
<kusma>
Yeah, good point. I guess I should verify that we're invalidating the right state here.
<kusma>
I suspect we are, because i965 doesn't do anything magical here as far as I can tell, but let's find out!
<kusma>
Hmm, no. Doesn't look right to me.
<graphitemaster>
Something else is wrong lookdev wise too in another scene, something wonky with texture lod
<graphitemaster>
Should map textureLod(u_prefilter, r, roughness * 5.0).rgb, but it doesn't appear to have the same value
<kusma>
Fixed the state-update bug in the MR, BTW
<graphitemaster>
u_prefilter is a cubemap sampler.
<kusma>
does that cubemap care about seamless vs non-seamless?
<graphitemaster>
seamless cubemap filtering is required, enabled globally at context init
<graphitemaster>
this just looks like it's using a lower lod level though
<graphitemaster>
lower than the one it hsould be using
<kusma>
We only support seamless cubemaps for Zink, because Vulkan. In theory we could implement non-seamless per texture by lowering to some ALU code and a layered 2D texture lookup instead, but meh.
<kusma>
OK, so that shouldn't be a problem.
<kusma>
@gr
<kusma>
graphitemaster: Do you specify any lod-bias anywhere?
<kusma>
Could be that we're missing one... I remember having a problem like that in the very similar D3D12 driver...
<kusma>
IIRC, there were one LOD bias in the... texture objects(?) that we didn't have an obvious place to account for, so we needed some shader-lowering...
<graphitemaster>
No lod bias here.
<kusma>
OK, then that's probably not it either...
<graphitemaster>
min lod is -1000 and max lod is 1000 (gl defaults)
<graphitemaster>
lod bias = 0.0
<kusma>
So one obvious thing is that cubemaps have an input coordinate per face of -1 to 1 instead of 0 to 1...
<graphitemaster>
Was going to say, cubemap uvw's are implicitly supposed to be normalized by the textureLod call
<kusma>
So if the lod is calculated without taking the cubemapness into account, you'd get an off-by-one in the LOD...
<graphitemaster>
The normalize is not being thrown out is it?
<kusma>
The normalize happens below us, we're just emitting SPIR-V opcodes to sample.
<kusma>
So... this sounds like it could be a bug in the NVIDIA Vulkan driver, perhaps?
<kusma>
There are some cases where Mesa drivers lower texturing stuff like that, but I don't think we do for Zink...
<kusma>
You can use $ZINK_DEBUG=spirv to have Zink dump the SPIR-V modules
<graphitemaster>
Walking back up, can see the OpFMul %float %394 %395
<graphitemaster>
Which is the "roughness * 5.0" presumably
<graphitemaster>
I assume uintBitsToFloat(1084227584) is 5.0
<kusma>
Yeah, and 1084227584 = 0x40a00000 = 5.0
<graphitemaster>
roughness is pulled out of the b channel of the normal texture, which is what I assume %393 = OpCompositeExtract %uint %59 2 is doing, the 2 = b
<graphitemaster>
So the spirv code is gine
<graphitemaster>
s/gine/fine
<kusma>
Hmm, but I think I see a smoking gun: "%58 = OpImageSampleImplicitLod %v4float %57 %56 None"
<kusma>
Do you really want to sample your gbuffer (I assume that's what this is?) with LODs?
<kusma>
I guess if you have none, then it might be fine... But otherwise...?
<kusma>
That's where your "roughness" seems to be coming from...
<graphitemaster>
No mips or lods, all nearest filter, I don't specify Lod in that case
<kusma>
OK, then that should be fine.
<kusma>
So, yeah. Then I wonder if this is a problem in the Vulkan driver...
<graphitemaster>
I highly doubt that
<kusma>
Having worked with NVIDIAs Vulkan driver, I'm not so sure I doubt it ;)
<kusma>
I had a lot of issues along these lines in the past with it.
<kusma>
In this case, it would be a OpImageSampleImplicitLod + cubemap special case. I guess it's worth checking if that's covered in the CTS.
<kusma>
Eh OpImageSampleExplicitLod
<graphitemaster>
I know there is some whacky crap in OpenGL with min/max lod, it doesn't just map directly
<graphitemaster>
directly being TIC here on the Maxwell NV GPU
<graphitemaster>
There is a min/max field for lod but the GL values from the NV driver do not actually produce the equivalent values in the TIC.
<kusma>
Seems the VK CTS does test the combination of textureLod and cubemaps.
<kusma>
So.. are you using nearest filtering of the mipmaps?
<kusma>
I mean, I think this is doing the same in Vulkan and GL, so I don't think that's it either. Just trying to figure out if there's more to dig into here...
<graphitemaster>
looks like bilinear + trilinear + mipmaps
<kusma>
right, but that ceil stuff is inside an "TEXTURE_MIN_FILTER is NEAREST_MIPMAP_NEAREST or LINEAR_MIPMAP_NEAREST" conditional, neigher is trilinear...
<kusma>
So, Vulkan specifies the same here, `ceil(d' + 0.5) - 1`, but also *allows* for dropping the -1...
<graphitemaster>
I also force aniso 16x on it...
<graphitemaster>
I dunno why I do that
<graphitemaster>
Huh
<graphitemaster>
I'm looking at UE4 source code because I'm confused
<kusma>
Hmm...
<graphitemaster>
They set minLod to 0.0 and maxLod to 1000 in Vulkan, but -1000.0 and 1000.0 in OpenGL
<kusma>
Aniso is another candidate for lod bias issues, indeed...
<kusma>
Mesa ends up clamping the min/max lod range to 0...num_miplevels anyway
<graphitemaster>
UE4 universally sets mipLodBias to clamp(user_lod_bias, -maxSamplerLodBias, maxSamplerLodBias)
<graphitemaster>
So just clamps it to the device limits I guess
<graphitemaster>
I want to test something, I'm going to set minLod to 0 in my GL code, just to weed out that this clamping behavior is not the culprit
<kusma>
Good idea
<graphitemaster>
Not that :|
<graphitemaster>
All this time we've been thinking it's Lod related, what if it's the generation of the texture itself too.
<graphitemaster>
Haven't ruled that out yet
<kusma>
Yeah, that could be!
<kusma>
I'm far from sure Zink always access the right 2D image when combinding things like miplevels, texture-arrays and cubemaps... As in blits to the right subresource etc.
<kusma>
IIRC there were recently some fixes there, but with a comment from zmike that there probably was more similar stuff...
<kusma>
I think that was about cubemap-arrays, but I'm not sure.
<graphitemaster>
Okay just to rule that out, it does not happen with 2D textures
<graphitemaster>
This is specifically a problem with cubemap samplers.
<graphitemaster>
So now I'm believing it could be a NV Vulkan driver bug more
<graphitemaster>
At first I was like no way this is a NV driver bug because that would've been found immediately ... for 2D textures :P
<kusma>
Yeah... Both GL and Vulkan defines the LOD as being calculated in pixel-space, and this seems to be the kind of problem you get if you calculate it from coordinate-space...
<kusma>
Yeah. But the VK CTS does test textureLod + cubemap, so I'm not 100% sure...
<kusma>
Seems like something NV should be aware of then...
<kusma>
We don't do dithering at all, as it's completely undefined and not really supported in Vulkan. In theory we could create some bayer-matrix and apply a bias at the end of the fragment shader and kinda-sorta get the right behavior... but meh.
<kusma>
Ah, right. Yeah, so that's you, not GL :)
<graphitemaster>
Nice catch though
<kusma>
Then I guess the noise here is mostly due to the lod issue?
<graphitemaster>
Having visual tests (RMSE, MAE, etc) for comparing different rendering backends was one of those "sounds really good on paper", because I can ensure consistent visual results across all our platforms and artists love it, but it's also been an absolute nightmare because it's found several driver bugs, api quirks, and just bullshit that I really don't feel like fixing.
<kusma>
Oh I absolutely hate image based testing. It's something that has haunted me my entire career, and in every case we've ended up stopping to do it because of all of the issues it comes with.
<kusma>
Theory: you notice when rendering changes. Practice: You notice months later than something regressed, because a trivial change pushed the diff over the error-threshold, and nobody inspected the results in between, or blindly accepted new results because validating the new ones is hard.
<graphitemaster>
Oh I broke zink hard
<kusma>
And no amount of making fancy UIs to inspect the changes or clever comparison algorithm etc seems to help with that.
<zmike>
yeah so it looks like your BAR is getting blown out
<zmike>
128mb I'd guess
<graphitemaster>
Why are the resources not being released when I literally shutdown the context..
<zmike>
resources are allocated by the screen, not the context
<graphitemaster>
The screen in this case is what?
<graphitemaster>
I can launch the engine 3 times fine without running out of memory
<graphitemaster>
But if I relaunch the same insane 3 times I run out of memory
<zmike>
overall gl creation
<graphitemaster>
s/insane/instance
<kusma>
I think there's some confusion here. @zmike is talking about pipe_context... I have a feeling graphitemaster is referring to the GL context, which isn't quite the same.
<kusma>
Deleting the GL context should indeed lead to all resources allocated in that GL context to be deleted. Question is do we actually do that?
<graphitemaster>
If I just launch the engine multiple times from the command line I can run several instance simultaneously without running out of memory, but simply relaunching the same instance twice (only one instance running) is enough to OOM.
<graphitemaster>
I have 39 instances of it running right now, no OOM.
<graphitemaster>
Relaunch just one of them, OOM.
<kusma>
My guess would be yeah, but who knows? :)
<zmike>
like I said, I'd guess it's not destroying the screen object between
<zmike>
zink_destroy_screen
<zmike>
should be easy to verify
<kusma>
zmike: Yeah, but that's not something the application controls... I think the state-tracker should delete the resources...
<zmike>
seems improbable or else cts would've been exploding
<kusma>
Fair point.
<zmike>
🤔
<kusma>
Maybe it's a... different kind of leak?
<kusma>
Like, something that doesn't just happen to all applications, but some sort of corner-case that the CTS doesn't trigger?
<kusma>
This is a buffer object... A fairly large one... 16 MB...
<kusma>
Nah, that doesn't make much sense.
<graphitemaster>
Simple gdb script shows there are more calls to vkAllocateMemory than vkFreeMemory
<kusma>
That's bad
<zmike>
na that's expected
<kusma>
Maybe we can tag these calls with some valgrind magic to track leaks?
<zmike>
the memory is cached
<kusma>
@zmike: not across screen deletes, is it?
<zmike>
well no, is the script counting that?
<graphitemaster>
I mean you can test this yourself: b vkAllocateMemory; commands; silent; continue; end; (do this for vkFreeMemory too), then just `info break n` and n+1 to see the number of times the bp was hit
<graphitemaster>
They do not match when the screen is destroyed
<kusma>
That sounds like a smoking gun to me...
<graphitemaster>
And yes, it's 16 MiB, the engine only has one unified vertex buffer
<graphitemaster>
It will resize it if it gets too small
<graphitemaster>
So there should only be one whole buffer in this whole thing
<graphitemaster>
I guess it doesn't like that XD
<kusma>
I mean, there's no logical reason why that shouldn't work... This sounds like a bug to me ;)
<graphitemaster>
I really feel like an ass for running into bugs and just wasting your time with them XD
<graphitemaster>
yack shaving now, I was suposed to figure out what was wrong with lod