ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
<jekstrand> Yeah, chromebooks aren't exactly optimized for storing data on the device. You just need to figure out how to do builds directly in your Google Drive. :D
<airlied> just use a jupyter notebook
<alyssa> oops i broke the entire cts
<alyssa> this is somehow marge's fault
<jekstrand> alyssa: From your vec4 patch:
<jekstrand> total instructions in shared programs: 15970411 -> 15970928 (<.01%)
<jekstrand> helped: 635
<jekstrand> instructions in affected programs: 124197 -> 124714 (0.42%)
<jekstrand> HURT: 673
<jekstrand> helped stats (abs) min: 1 max: 7 x̄: 1.89 x̃: 1
<jekstrand> helped stats (rel) min: 0.15% max: 16.67% x̄: 4.44% x̃: 3.26%
<jekstrand> HURT stats (abs) min: 1 max: 24 x̄: 2.55 x̃: 1
<jekstrand> HURT stats (rel) min: 0.26% max: 27.38% x̄: 2.80% x̃: 2.13%
<jekstrand> 95% mean confidence interval for instructions value: 0.23 0.57
<jekstrand> 95% mean confidence interval for instructions %-change: -0.98% -0.46%
<jekstrand> Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).
<alyssa> jekstrand: Curious. Okay, was worth a shot :)
rsalvaterra_ has quit []
rsalvaterra has joined #dri-devel
illwieckz has quit [Ping timeout: 480 seconds]
<airlied> robclark: I'll hold that next pull until you can track down the devfreq regression
<robclark> airlied: devfreq regression?
<robclark> oh
* airlied looks like I was being proactive by doing nothing :-P
<robclark> hmm, I wonder if a630 has some extra restrictions about changing freq.. I thought it should otherwise be pretty similar to a618
<imirkin_> alyssa: why did you tag me on ?
<imirkin_> alyssa: also did you mean to assign to marge rather than request her review?
Lightkey has quit [Ping timeout: 480 seconds]
illwieckz has joined #dri-devel
<alyssa> ummmm
<alyssa> i er did mean to assign to marge
<alyssa> though I'm happy to have her review too
<imirkin_> might have to wait a while for that to happen :)
<imirkin_> i'm sure anholt_ is working on the auto-review feature...
<alyssa> neural netowrks
<zf> <reject>
Lightkey has joined #dri-devel
pnowack has quit [Quit: pnowack]
<idr> alyssa: Better her than Patty or Selma.
<alyssa> ;D
<alyssa> Oh, Homie...
nchery has quit [Quit: Leaving]
sdutt has quit [Remote host closed the connection]
sdutt has joined #dri-devel
boistordu has joined #dri-devel
ngcortes has quit [Remote host closed the connection]
boistordu_ex has quit [Ping timeout: 480 seconds]
<imirkin> graphitemaster: out of curiosity, what nvidia hw are you seeing fine/coarse derivatives differences on?
<imirkin> pretty sure i checked on kepler or so, and there's no diff in generated code
<imirkin> it might affect sampler settings actually
<graphitemaster> imirkin, 2070 RTX
<graphitemaster> It does affect sampler settings though, it does not change the generated code.
<imirkin> graphitemaster: ah that makes sense
<imirkin> yeah, samplers on nvidia have tons of knobs
<graphitemaster> Yeah, looking at the TSC, it looks like the nouveau guys didn't figure it out either
<graphitemaster> Between mag and min filter here is 2 bits that define the partial derivatives used
<graphitemaster> Don't ask how I know please.
<imirkin> not everything we know is documented there :p
<graphitemaster> :)
<graphitemaster> Okay well, guess what I'm writing tonight, a derivative test for textures at init to fallback to software :P
<graphitemaster> BTW on AMD it appears like Fine derivatives are always used for mipmap lod.
<graphitemaster> Just looking at the visual results here, that's radeonsi.
<imirkin> radeonsi ignores the hint
<imirkin> (all st/mesa drivers ignore the hint)
<imirkin> (since the hint is in GL, and st doesn't pass it through to the gallium driver atm)
* airlied had an MR to pass it through
<airlied> but never really cared enough to push it
<graphitemaster> The sample instruction in RNDA (_D suffix) supports 2, 4, or 6 slopes for LOD calculation.
<airlied> granted that was TGSI so very old
<graphitemaster> My guess is nothing uses the 6 slopes? since Vulkan specs fine as 4 slope and coarse as 2 slope in the quad
<graphitemaster> Maybe 6 slopes is uses for MSAA textures or something.
<graphitemaster> s/uses/used
<graphitemaster> Oh for 3D
<graphitemaster> Derp
<graphitemaster> Okay yeah, so there is fine and coarse selection for LOD derivatives in the Sample instruction itself on RDNA.
<alyssa> graphitemaster: what mesa driver are you interested in? radeonsi? nouveau?
<graphitemaster> None of them specifically. I'm just an engine dev trying to make sure my stuff looks correct on open source drivers and Zink specifically since I need a solution for when Apple kills OpenGL.
<graphitemaster> And that has led me to looking at all of them XD
<alyssa> gotcha
<graphitemaster> The danger is I actually learn all this stuff and become a driver developer.
<graphitemaster> It looks like coarse / fine derivative selection for textures is just a complete oversight by all the APIs except OpenGL which makes the hint optional (granted).
<graphitemaster> And the only reason it works at all in GL is because the hint affects texture sampling indirectly. It doesn't appear to be an intended use of the hint.
<graphitemaster> Does that mean I can propose an extension? XD
lemonzest has joined #dri-devel
cef is now known as Guest2649
cef has joined #dri-devel
<mareko> RDNA can't select coarse/fine for sample instructions
Guest2649 has quit [Ping timeout: 480 seconds]
<mareko> yes, only RDNA1 has them
<mareko> and older chips
<graphitemaster> RDNA2 has it too.
<graphitemaster> GCN1, GCN2, and GCN3 as well (going backwards)
<mareko> there is no _CD
<graphitemaster> Oh
<graphitemaster> I see now.
<graphitemaster> So it's always Fine there.
<graphitemaster> Which is consistent with what I was seeing
<graphitemaster> This is frustrating.
<alyssa> graphitemaster: *sips*
<alyssa> This is Fine.
<graphitemaster> Okay I'm just going to do the really dumb thing I was thinking of drawing a fullscreen triangle with two mip texture and reading back the result to determine if coarse or fine derivatives are being used and then emulating fine derivatives for lod calculation when obviously coarse.
<graphitemaster> And that's totally going to be a lot of overhead
<imirkin> graphitemaster: can probably cut out the middleman and use textureQueryLod?
<imirkin> erm, i guess nevermind on that.
<imirkin> er no
<imirkin> that should work
<imirkin> at least on nvidia
<graphitemaster> The thing that also uses partial derivatives :P
<imirkin> it still takes all the regular texturing details
<imirkin> and then compare what that function produces relative to what you'd produce by hand?
<imirkin> using dfdx/dy
<graphitemaster> Can LOD values be barycentrically interpolated? I could just compute the LOD on the CPU using backward finite difference and pack it in with the UVs in my vertex soup. I actually don't know how the GPU calculates the gradients, I guess it's just the luminance of the RGB pixels much like edge-detection methods work in some of the ugly AA algorithms. Writing a pretty basic 2x2 rasterizer isn't out of the question here, already have
<graphitemaster> a depth-only rasterizer for occlusion culling I could use, this is getting too big brain, never mind.
<graphitemaster> Compute shader can calculate them actually and just store to the vertex soup just before a draw.
<graphitemaster> NV absolutely hates interleaved compute and draws though
<graphitemaster> I don't think they can be interpolated anyways because partial derivatives are non-linear.
<graphitemaster> Why else would you analytically compute them per-fragment if they can be done per-vertex.
<graphitemaster> So that was a stupid idea in hindsight, never mind.
<graphitemaster> Oh my god
<graphitemaster> I forgot about textureGrad
<graphitemaster> By the way, textureLod() is a horrible idea even with emulation because it disables aniostropic filtering.
<graphitemaster> So I'd have to use textureGrad actually.
<imirkin> or use textureQueryLod
<imirkin> and then you don't have to worry about texturing at all
<imirkin> since you get at the computed lod info
<imirkin> without worrying about doing actual texture lookups
<graphitemaster> The result of that still has to be used with textureLod
<imirkin> why
<imirkin> why are you looking anything up in the texture
<graphitemaster> Because I'm sampling the texture?
<imirkin> compare the result of that to your manually-computed lod
<imirkin> why are you sampling the texture?
<graphitemaster> So I can texture something
<imirkin> i thought you were trying to build a detector for the fine thing
<imirkin> just look for textureQueryLod != manual lod
<graphitemaster> Right if I just wanted to determine if the coarse or fine derivatives are being used I could do that. I'm just saying emulating the fine derivatives for mip lod selection is broken anyways
<graphitemaster> It's broken because when I do end up using textureLod with fine computed lod I lose anisotropic filtering
<imirkin> but again, that's not what you're testing
<imirkin> you're testing lod computation
<imirkin> whether it's per-quad or per-frag
<imirkin> i guess?
<graphitemaster> The purpose of the test is so I can fall back to doing something alternative to emulate the behavior I want, if I can't even emulate the behavior correctly then the test is not very helpful.
<imirkin> ah
<imirkin> then yes, textureGrad is for you :)
<graphitemaster> So vec4 sample_emulate(sampler2D tex, vec2 uv) { return textureGrad(tex, dFdxFine(uv), dFdyFine(uv)); }
<graphitemaster> oops, forgot to pass uv there
<graphitemaster> I wonder if this pattern is detected by the compiler better
<graphitemaster> And optimized correctly if fine derivatives are default
<graphitemaster> Shit that works correctly and the NVfp does compile it away!
<graphitemaster> Hallelujah!
<graphitemaster> Looks like on modern AMD it does optimize away too
<graphitemaster> Since fine derivatives are the default, so it just uses regular image_sample_d
<mareko> graphitemaster: quad_perm is dfdx/dfdy
<graphitemaster> Oh :(
<graphitemaster> Suppose I wanted to play around with mesa to learn it a little, where would I read and attempt to add an optimization for this specific pattern in the IR
<graphitemaster> When the derivatives of a textureGrad are computed immediately from the same uv given to textureGrad but with dFd* intrinsics, just strength reduce it to a regular implicit texture lod.
<imirkin> graphitemaster: you'd add something to nir, or to the driver's backend compilers
soreau has quit [Read error: No route to host]
soreau has joined #dri-devel
<mareko> if you find the texture opcode in nir, you just change nir_tex_instr::op from txd to tex and set num_srcs to 1
<graphitemaster> So turn the grad instruction to a regular tex one, and then presumably the partial derivative functions called will be no longer referenced so the optimizer will remove them?
<mareko> yes
<graphitemaster> Makes sense.
<graphitemaster> I see a replace_gradient_with_lod
<mareko> yeah something like that
<graphitemaster> As an aside, that lower_gradient_cube_map looks wrong to me
<graphitemaster> It computes lod = -1.0 + 0.5 * log2(L * L * M)
Duke`` has joined #dri-devel
Company has quit [Read error: Connection reset by peer]
<graphitemaster> Why is there a -1.0 there
<graphitemaster> The dot's compute squared lengths, the 0.5 * log2(x) is the same as log2(sqrt(x))
<graphitemaster> I wonder if this is what is causing the broken lod I see on Zink XD
<imirkin> are you sure that pass even runs with zink?
<graphitemaster> Must not be, since it's not hitting the bp
<graphitemaster> nir_lower_tex is never called in Zink so can't be anything here
<graphitemaster> So attempting optimizations here with the pattern I talked about is probably not a good idea until I can figure out how to get this to go.
mlankhorst has joined #dri-devel
danvet has joined #dri-devel
mattrope has quit [Read error: Connection reset by peer]
mattrope has joined #dri-devel
<graphitemaster> Does this appear correct
<graphitemaster> Trying to emulate fine derivatives for a comparison bench I'm making
Putti has joined #dri-devel
<graphitemaster> er, coarse, bad name
jessica_24 has quit [Quit: Connection closed for inactivity]
Duke`` has quit [Ping timeout: 480 seconds]
bcarvalho has quit [Remote host closed the connection]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
frieder has joined #dri-devel
jkrzyszt has joined #dri-devel
aravind has joined #dri-devel
mattrope has quit [Remote host closed the connection]
rpigott has quit [Remote host closed the connection]
rpigott has joined #dri-devel
rasterman has joined #dri-devel
sdutt has quit [Ping timeout: 480 seconds]
enunes has joined #dri-devel
sdutt has joined #dri-devel
thellstrom has joined #dri-devel
enunes has quit [Read error: Connection reset by peer]
enunes has joined #dri-devel
sumits has quit [Quit: ZNC -]
Terman has joined #dri-devel
pnowack has joined #dri-devel
xexaxo has joined #dri-devel
sumits has joined #dri-devel
enunes has quit [Quit: ZNC -]
Lucretia has joined #dri-devel
sumits has quit [Quit: ZNC -]
enunes has joined #dri-devel
bcarvalho has joined #dri-devel
<pq> bl4ckb0ne, all the DRM formats are listed in drm_fourcc.h - I don't recall seeing one for depth.
xexaxo has quit [Ping timeout: 480 seconds]
tursulin has joined #dri-devel
pcercuei has joined #dri-devel
sumits has joined #dri-devel
agx has quit [Read error: Connection reset by peer]
agx has joined #dri-devel
pcercuei has quit [Ping timeout: 480 seconds]
pcercuei has joined #dri-devel
mlankhorst has quit [Ping timeout: 480 seconds]
aissen has quit [Ping timeout: 480 seconds]
<emersion> alyssa: does your comment count as a A-b here?
<emersion> daniels: and yours as a R-b?
<emersion> (for the panfrost bits only ofc)
K`den has joined #dri-devel
Kayden has quit [Read error: Connection reset by peer]
xexaxo has joined #dri-devel
vivijim has joined #dri-devel
sdutt has quit [Ping timeout: 480 seconds]
nirmoy has joined #dri-devel
camus has joined #dri-devel
xexaxo has quit [Remote host closed the connection]
xexaxo has joined #dri-devel
camus has quit [Ping timeout: 480 seconds]
<daniels> emersion: yep, R-b from me, thankyou
<emersion> thx
<pendingchaos> is there a MSVC equivalent to GCC's -fwrapv? I'm thinking that maybe we should compile NIR's constant folding with this option
camus has joined #dri-devel
frieder has quit [Ping timeout: 480 seconds]
rg3igalia has joined #dri-devel
frieder has joined #dri-devel
camus has quit [Ping timeout: 480 seconds]
ppascher has quit [Quit: Gateway shutdown]
ella-0 has quit [Remote host closed the connection]
ella-0 has joined #dri-devel
camus has joined #dri-devel
Net147 has quit [Quit: Quit]
Net147 has joined #dri-devel
ppascher has joined #dri-devel
camus has quit [Ping timeout: 480 seconds]
<kusma> Uh, it seems like the debian-gallium job on Mesa CI is currently failing with an ir3_ra() failed message...
<kusma> Oh, retrying printed the same message, but the job passed??
ella-0 has quit [Remote host closed the connection]
ella-0 has joined #dri-devel
<daniels> kusma: job link please
<kusma> It seems lke the ir3_ra() thing might be unrelated to the failure. I already posted this on the CI tracker issue.
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
<daniels> yeah, that's an intel drm-shim crash
<dj-death> I can look into it
<dj-death> I bet we started requiring a new feature and the stub doesn't have it
Company has joined #dri-devel
dviola has quit [Ping timeout: 480 seconds]
dviola has joined #dri-devel
<zmike> anyone else having issues pulling from mesa git now?
dviola has quit [Ping timeout: 480 seconds]
<MrCooper> zmike: might be related to this from #freedestkop: <bentiss> sigh large-5 is now down, ceph is timeing out
<zmike> huh sounds probable
dviola has joined #dri-devel
ella-0_ has joined #dri-devel
ella-0 has quit [Read error: Connection reset by peer]
ezequielg has quit []
ezequielg has joined #dri-devel
<jenatali> pendingchaos: Looks like no
<alyssa> I guess I should whip up a panfrost drm-shim at some point
<alyssa> I've dragged my feat since I can do shader-db from any mali gpu on my older mali laptop, just by faking the gpu id instead of the whole drm-shim
<alyssa> but... m1 can do faster shader-db runs...
<bl4ckb0ne> pq: yeah, thinking about adding some
<bl4ckb0ne> i have a weird use case
<pq> bl4ckb0ne, you got a literal 3D monitor? :-)
<bl4ckb0ne> almost
<pq> hmm, yeah, actually 3D is not new, there are 3D video modes, but they tend to be stereo
<bl4ckb0ne> i have a weird combo of wayland linux dmabuf used with eglCreateImageKHR as a depth buffer
<bl4ckb0ne> buffer is allocated with DRM_FORMAT_R16, and when attached to the framebuffer it fails because the format is GL_RGB
<bl4ckb0ne> so eitheri found a way inside mesa to transmute the PIPE_R16_UNORM into PIPE_Z16_UNORM
<bl4ckb0ne> or DRM_FORMAT_Z16
<bl4ckb0ne> PIPE_FORMAT*
camus has joined #dri-devel
ella-0_ is now known as ella-0
<dj-death> can't repro the drm-shim crash :/
mlankhorst has joined #dri-devel
<dj-death> ah no, looks like a fstat issue
tzimmermann has joined #dri-devel
iive has joined #dri-devel
nchery has joined #dri-devel
<dj-death> intel stuff might not be the only affected stuff
MrCooper has quit [Quit: Leaving]
boistordu has quit [Remote host closed the connection]
boistordu has joined #dri-devel
MrCooper has joined #dri-devel
<emersion> pq, the idea is to pass depth DMA-BUFs from client to compositor
Peste_Bubonica has joined #dri-devel
<pq> cool, so that one is flying again
alyssa has left #dri-devel [#dri-devel]
<pq> but do they need to be imported as depth buffers?
<emersion> they can be passed around as color buffers, but then that requires blitting
<emersion> danvet: are depth buffer formats something that drm_fourcc.h would be suitable for?
sdutt has joined #dri-devel
<danvet> emersion, no one tried yet, I think that's all
<danvet> might need some egl ext rewording and a pile of patches
<emersion> i guess my main question is, maybe the single-channel formats should be used instead?
<bl4ckb0ne> im surprised nobody tried
<emersion> like, R16 is red, but could very well be depth
<emersion> is it better to add EGL APIs to say "this is R16 but i want depth", or is it better to add the depth formats to drm_fourcc.h
<pq> emersion, why would it need blitting?
alyssa has joined #dri-devel
<alyssa> jekstrand: bbrezillon: and I are talking about structing a Vulkan driver for per-gen GenXML compiles, mind if we pick your brain?
<bl4ckb0ne> depth is a handled as a regular texture
<emersion> pq, essentially an EGL/GL/Mesa limitation. DMA-BUFs can't be imported/attached as depth buffers
<emersion> instead, you need to blit the depth buffer to a color buffer, then export it
<pq> emersion, why would you need it *imported* as a depth buffer?
<emersion> the client exports its depth buffer to the compositor
<pq> oooh, you mean for the rendering?
<emersion> yea
<pq> ok, I was only thinking about using it in a compositor or such :-)
<emersion> to use the depth buffer when compositing multiple clients, each of which have a color+depth buffer
<bl4ckb0ne> told you it was almost a 3d monitor ;)
<emersion> pq, maybe we should've started with the obvious, bl4ckb0ne is working on a VR compositor
<pq> sure
<emersion> well, obvious for us
<pq> I guess you want the fixed-function depth stuff to use a dmabuf in the client that renders the 3D image.
<emersion> yeah
<pq> for a compositor to read a color buffer as depth is no problem, just plug it an the compositing shader
<pq> *in
<emersion> hm, i guess if you hand-roll your GL_DEPTH_TEST, should work yea
<bl4ckb0ne> yup that's what I have atm
<pq> emersion, do you need even that? Just read the color texture with the depth values and assign to glFragDepth?
<pq> or is the performance hit of that significant?
<emersion> hmm. tbh bl4ckb0ne needs to try a lot of things and see what happens
<emersion> throw a lot of stuff at the wall and see what sticks
<imirkin_> when i was reviewing the nouveau modifier patches
<imirkin_> depth came up as a problem
<imirkin_> but it was stated that this was "out of bounds" for things that can be exported
<imirkin_> so i didn't worry too much about it
<emersion> ah, oops.
<pq> fragment depth value is totally arbitrarily writable in a frag shader in GL :-)
<imirkin_> (and i checked - no depth formats in drm_fourcc, etc)
<emersion> pq, so we'd still need a blit in the compositor?
mattrope has joined #dri-devel
<pq> after which I think it goes through all the fixed-function depth stuff
<pq> no
<pq> no blit
<emersion> there may be multiple clients
<jekstrand> alyssa: Sure
<pq> sure
<imirkin_> so at least with nouveau, depth + dmabuf won't work trivially
<emersion> to composite them together correctly, each needs a proper depth buffer
<pq> you still composite then one by one, right?
<imirkin_> not to say it can't be made to work, just that the current code won't support it
<pq> each needs proper depth - not necessary as a depth buffer per se
<emersion> pq, likely, yeah
sdutt has quit []
sdutt has joined #dri-devel
<emersion> hm
<emersion> so hand-roll that depth test?
<pq> just bind the R32 buffer as another regular texture, sample it, and write the result into glFragDepth suitably scaled if needed
<pq> no, that should just work
frieder has quit [Ping timeout: 480 seconds]
<imirkin_> note that depth buffers are traditionally Z24, so there's no natural color format to put it into
<emersion> how does GL know that only the pixels whose depth is > than the destination's depth should be blitted?
<emersion> s/>/</
<imirkin_> emersion: GL_DEPTH_TEST + glDepthFunc?
<pq> that ^
<emersion> GL wouldn't know about the source texture's depth, since it would just be a regular color texture
<imirkin_> if you're emulating the depth test, you also need the depth test parameters
<pq> emersion, it doesn't need to. You specify the depth value in the fragment shader by writing to glFragDepth.
<emersion> oh.
<emersion> thanks for your patience
<imirkin_> gl_FragDepth, to be pedantic
<emersion> just understood what you meant
<imirkin_> and iirc that's not a thing on unextended GLES2? i forget.
<emersion> i though gl_FragDepth wrote to the destination buffer
<emersion> but it just sets the depth test's input
<pq> imirkin_, thanks - it's been over 10 years since I used it... if I used it. :-)
<imirkin_> emersion: gl_FragDepth is indeed the output value to write to the depth buffer
<emersion> ah.
<imirkin_> gl_FragCoord.z is the "input" value of depth
<imirkin_> the two need not have any connection with each other
<pq> imirkin_, huh??
<emersion> so just set gl_FragCoord.z and let GL do its depth test?
<emersion> is gl_FragCoord.z writable?
<imirkin_> no
<imirkin_> gl_FragDepth is writable.
<pq> imirkin_, I mean, sure, that's the value to be written *if* the fragment passes all tests, right?
<imirkin_> there are both early and "late" depth tests
<imirkin_> if the shader writes depth, the depth tests are indeed done after
<emersion> ok, but i want to set the depth test input, or else need to re-implement it myself
<imirkin_> yeah, so you can write gl_FragDepth and that will be the input into the depth test.
<emersion> ah, ok
<imirkin_> (and ultimately be written to the depth surface, if it passes the test)
<emersion> cool
<emersion> thanks!
<pq> that! ^ \o/
<imirkin_> and gl_FragCoord.z is the "natural" depth after viewport transform/etc.
<imirkin_> (and is an input into the shader)
Duke`` has joined #dri-devel
<pq> So the only problem left is how the client is doing to write the depth buffer that can be shared. I'm guessing you need MRT for it and write it out explicitly?
<pq> *goint to
<pq> *going to
<emersion> MRT?
<imirkin_> multiple render targets
<imirkin_> i.e. multiple color outputs from a single fragment shader
<emersion> hm, maybe
<imirkin_> (also not a core GLES2 feature iirc)
<bl4ckb0ne> i moved to gles3 recently
<imirkin_> welcome :)
<emersion> but does the frag shader run late enough for this?
<emersion> i think the safe thing to do is just another render pass
<imirkin_> emersion: no, but the depth test will reject writes
<emersion> hmm.\
<imirkin_> i'm not 100% clear on what you're trying to do tbh ;)
<alyssa> jekstrand: How to handle driver structures (e.g. anv_private.h) that contain prepacked hardware state? particularly when the size of the state differs across gens?
<emersion> imirkin_: have multiple 3D scenes. each rendered in one process. composite them together in a compositor
<alyssa> I guess anv just open codes u32[] but that seems... unsafe
<emersion> the 3D scenes are rendered with the same mvp matrices and stuff
<jekstrand> alyssa: We have a way to do it in anv but it's terrible
<alyssa> bbrezillon: ^^
<alyssa> I found anv_graphics_pipeline which is... not amazing...
<imirkin_> emersion: ... why
<jekstrand> alyssa: Fixed sizes aren't really a problem if you STATIC_ASSERT(ARRAY_SIZE(foo) >= GENX(MY_PACKET_length));
<bl4ckb0ne> why not :D
<imirkin_> bl4ckb0ne: because it's a pain
<bl4ckb0ne> i know, i know
<alyssa> jekstrand: Ah, of course!
<alyssa> and the static assert can be per gen even if the header is not, got it got it got it
<jekstrand> Yup
<bl4ckb0ne> but it works :p
<pq> emersion, you may want to pay attention if we are talking about the compositor or the client rendering here. ;-)
<pq> \o.
<imirkin_> bl4ckb0ne: how do you ensure depth "compatibility" between the two scenes?
<bl4ckb0ne> for now i dont, but that's what I want to do
dviola has quit [Quit: WeeChat 3.2]
thellstrom has quit [Quit: thellstrom]
<imirkin_> bl4ckb0ne: i mean ... what if one thing is centered around 0.5 and the other around 0.25. what if one has inverted depth. etc.
<bl4ckb0ne> centered?
<imirkin_> like, the raw depth values
<imirkin_> how do you ensure they are "compatible" between scenes?
ezequielg has quit []
<bl4ckb0ne> compositor shares the values
ezequielg has joined #dri-devel
<emersion> imirkin_: i guess it just boils down to writing down the expectations so that all clients agree?
<zf> did this ever happen in any form?
<imirkin_> emersion: i guess
<dcbaker> vrigl people: I'm assuming that Revert "Revert "virgl: Cache depth and stencil buffers"" shouldn't be backported to 21.2?
<jenatali> venemo: Sounds like something for GitLab to solve, not something Freedesktop would solve in its instance
K`den is now known as Kayden
<MrCooper> Venemo: I think you need to authenticate with NickServ, didn't see what you wrote that jenatali responded to
Peste_Bubonica has quit [Quit: Leaving]
jessica_24 has joined #dri-devel
<shadeslayer> dcbaker: nope, it depends on the previous commit in the history
tobiasjakobi has joined #dri-devel
<dcbaker> shadeslayer: yeah, that's not backported. We have this... feature were all reverts are automatically nominated for stable. eric_engestrom was working on fixing that I think
<dcbaker> I'll go ahead and denominate
<shadeslayer> ahh, I wasn't aware of that
aissen has joined #dri-devel
tobiasjakobi has quit [Remote host closed the connection]
macromorgan has quit [Remote host closed the connection]
macromorgan has joined #dri-devel
<alyssa> jekstrand: Oh, hm. It occurs to me if we're playing really fast 'n loose, we could save 2 instructions for dfdx_coarse
<alyssa> currently we emit
<alyssa> broadcast(lane_id & ~1, x) - broadcast(lane_id | 1, x)
<alyssa> (ish)
<alyssa> whereas i guess the specs are loose enough for coarse we could do
<alyssa> broadcast(0, x) - broadcast(1, x)
<shadeslayer> dcbaker: maybe revert'ing reverts can have a "Needs" field that can depend on another commit :)
<jekstrand> alyssa: Do you always have a subgroup size of 4?
<alyssa> jekstrand: No, but our broadcast has a subgroup_size parameter so we can always do logical quads.
<jekstrand> right
<jekstrand> Yeah, that's basically what we do on Intel.
<alyssa> (I guess it just does `lane_id & (subgroup_size - 1)` in hardware)
<imirkin_> on nvidia, there's a special mode to the readInvocation equivalent which lets you read the other "x" lane or "y" lane
<alyssa> oh, derp... I guess we have that too and we don't take advantage of it
<imirkin_> heh
<imirkin_> "oops"
<alyssa> oh, now i remember why - because then you have sign trouble.
<jekstrand> Yeah
<imirkin_> ah yeah. we can flip the sign. it's a neat op.
<alyssa> should use it for fwidth or somethin
<jekstrand> You have to subtract consistently
<imirkin_> FSWZADD on newer GPUs is what it's called
<dcbaker> shadeslayer: I think the plan is to just treat them like other commits, either a "cc" or a "fixes" is required for them to be backported
<dcbaker> I mean, reverts basically come in two flavors
<dcbaker> 1. said patch was wrong, revert
<jekstrand> alyssa: prior to Ice Lake, we just used the vec4 hardware with a swizzle. 'cause we can do that in scalar mode. :)
<dcbaker> 2. I rewrote everything, and now the old behavior is correct
<alyssa> jekstrand: Haha
<jekstrand> alyssa: Yeah, its a pretty neat trick and basically the only interesting use of vec4 in fragment shaders on Intel. :)
<alyssa> jekstrand: Ok, right now we need 5* instructions for a ddx .. mov lane_id, iadd, broadcast, broadcast, fsub
<alyssa> * amortized 3 instructions since the mov/iadd gets cse'd for subsequent ddx
<imirkin_> so ... 4 amortized ops then?
<alyssa> sure
<alyssa> if we use the "get other" one, we could do... er... broadcast, fsub, fsub, iand, csel ?
<alyssa> it's not clear that's better :-p
<jekstrand> alyssa: But it's GL_FASTEST!
<alyssa> wonder what the ddk does
<alyssa> this "lane_id ^ src0" mode for broadcast is clearly intended for something hah
<imirkin_> yeah, being able to add while fetching other lanes is ... convenien
<jekstrand> And super-important for all those derivative-bound shaders. :)
<imirkin_> it's basically the most common op
<imirkin_> some people just can't stay in their lane...
<alyssa> oh, er.
<alyssa> imirkin_: lol
<jekstrand> There's an olympic swimming joke in here somewhere.
<alyssa> DDK's coarse is `broadcast(1) - broadcast(0)`
<alyssa> which is 3 ops and no CSE opportunities
<bnieuwenhuizen> the real trick with derivatives is to do them analytically and do away with helper lanes. Helps with tons of small triangles :)
<alyssa> bnieuwenhuizen: i can't tell if this is sarcasm or not
<bnieuwenhuizen> alyssa: not sarcasm, but a bit out there :) UE5 switched to partial rasterization in a compute shader because of too many small triangles (roughly 1 per pixel) causing large helper lane overheads
<alyssa> jekstrand: I guess the only neat trick is that abs(dFdx(x)) can be done as "abs(broadcast(lane ^ 1) - x)" and since the lane ^ 1 is for free, the whole fwidth() dance becomes
<alyssa> broadcast, broadcast, fsub, fsub, fadd
<jekstrand> But that would require adding nir_op_fwidth
<alyssa> so 5 ops for fwidth. as opposed to the current fwidth impl which would be 10 ops.
<alyssa> jekstrand: or just detected fabs(dfdx) and fabs(dfdy) in isel
<jekstrand> alyssa: Yeah, that sounds better. :)
<alyssa> there are almost no uses of fwidth in my shader-db so going to say... not bothering :p
<ajax> imirkin_: re depth buffer convention above: that's not buffer sharing's job to care about? all you're doing is transporting the data in a given format. it's not like you're required to treat "rgba" buffers as literally red green blue and opacity
<imirkin_> ajax: yes. i was asking about how it was going to be used in practice.
<imirkin_> with rgba, there's _some_ expectation that there's a connection to red/green/blue.
<imirkin_> esp for display surfaces
<imirkin_> sent to a compositor
<imirkin_> whereas there's no single overriding convention for depth
<Venemo> MrCooper: thanks for letting me know. I'm pretty sure I was identified before, but it forgot... Do you see this message now?
<ajax> imirkin_: fair. i'm having trouble thinking of a good way to signal that in-band though.
<pepp> Venemo: yes
<imirkin_> ajax: and even with rgba, there's srgb, color spaces, bla bla bla. so it's a problem there too.
<Venemo> Thx
ngcortes has joined #dri-devel
<bl4ckb0ne> isnt depth/stencil more straightforward than colorspace?
<imirkin_> bl4ckb0ne: not really...
<imirkin_> you still have the question of how to interpret the values
<imirkin_> does higher depth = near or far?
<bnieuwenhuizen> also what are the near plane and the far plane
<bnieuwenhuizen> (or was that only for the .w?)
<ajax> struct pipe_depth_stencil_alpha_state vs struct st_visual
<ajax> one of these has several more members than the other
tzimmermann has quit [Ping timeout: 480 seconds]
aravind has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
<bl4ckb0ne> so that cant really be well represented in a 32 bits format
adjtm has quit [Quit: Leaving]
xexaxo has quit [Ping timeout: 480 seconds]
tobiasjakobi has joined #dri-devel
nirmoy has quit []
phomes has joined #dri-devel
<phomes> Can I ask for someone to assign to marge for me?
<alyssa> 👀
<jekstrand> done
Peste_Bubonica has joined #dri-devel
<alyssa> dcbaker: eric_engestrom: fair warning, i'm about to cc mesa-stable on a fairly large (line count) patch
<alyssa> It should cherrypick cleanly, though
Kayden has quit [Quit: to lunch and the office]
<alyssa> Adds a workaround for a hardware bug (well, "feature", but.....) that we only understood, uh, yesterday
<alyssa> (Certain in-spec shaders cause GPU faults. It's rare, but occurs on the CTS and could occur in real workloads. It's not pretty.)
<alyssa> Just giving a heads up why I'm nominating a +71 insertions patch for stable
<dcbaker> alyssa: thanks! generally if it's in a specific driver and the main devs are nominating it I just assume you know what you're doing. If you nominated a 71 line NIR change I might get nervous :)
<alyssa> dcbaker: haha, fair enough
rsalvaterra_ has joined #dri-devel
tzimmermann has joined #dri-devel
rsalvaterra has quit [Ping timeout: 480 seconds]
frieder has joined #dri-devel
agx has quit [Read error: Connection reset by peer]
agx has joined #dri-devel
agx has quit [Read error: Connection reset by peer]
agx has joined #dri-devel
gpoo has quit [Ping timeout: 480 seconds]
gpoo has joined #dri-devel
tzimmermann has quit [Quit: Leaving]
Kayden has joined #dri-devel
frieder has quit [Remote host closed the connection]
tobiasjakobi has quit [Remote host closed the connection]
dviola has joined #dri-devel
vivijim has quit [Remote host closed the connection]
rasterman has quit [Quit: Gettin' stinky!]
phomes has quit [Remote host closed the connection]
adjtm has joined #dri-devel
<robclark> airlied: looks like Caleb found the root issue, it wasn't drm/msm pull req, see "drm/msm: Disable frequency clamping on a630" thread.. but if danvet doesn't need scheduler conversion in drm-next it is ok to hold off
vivijim has joined #dri-devel
rasterman has joined #dri-devel
<danvet> robclark, the bikeshed settled on some naming, so I can rebase the set
<danvet> so would still be good to have msm scheduler stuff in there, but doesn't need to be right now
<robclark> ok, it's your call
<robclark> I'll try and actually have a look at your series next week..
thellstrom has joined #dri-devel
agx has quit [Read error: Connection reset by peer]
agx has joined #dri-devel
rsalvaterra has joined #dri-devel
vivijim has quit [Quit: Lost terminal]
vivijim has joined #dri-devel
rsalvaterra_ has quit [Ping timeout: 480 seconds]
<mlankhorst> airlied: bit late, but hopefully still in time with my pull req!
rasterman has quit [Quit: Gettin' stinky!]
alyssa has left #dri-devel [#dri-devel]
lemonzest has quit [Quit: Quitting]
danvet has quit [Ping timeout: 480 seconds]
aravind has quit [Ping timeout: 480 seconds]
<JoshuaAshton> agd5f: May I nudge you about ? ( cc: hakzsam )
pnowack has quit [Quit: pnowack]
mlankhorst has quit [Ping timeout: 480 seconds]
vivijim has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
agx has quit [Remote host closed the connection]
jkrzyszt has quit [Ping timeout: 480 seconds]
agx has joined #dri-devel
pcercuei has quit [Quit: dodo]
Peste_Bubonica has quit [Quit: Leaving]
iive has quit []
idr has quit [Quit: Leaving]
Kayden has quit [Quit: go home]
mbrost has joined #dri-devel
Lucretia has quit []
camus has quit []
<graphitemaster> jekstrand, Do Gallium drivers really not know what derivatives are used by texture sampling? My comment on 12097 was based on the assumption that this capability bit is fed to Mesa such that Mesa could make better optimizations.
<graphitemaster> It feels like you can't do the right thing here unless the driver knows what the hardware is actually going to use for implicit lod. Otherwise you're potentially emitting different code when it's not necessary.
<jenatali> It sounds like (for some hardware at least) there's no way to modify the derivate operation done by the sampler unit when you request implicit derivatives
<jenatali> Which is likely to be more efficient than explicitly computing derivatives and feeding them into the sampler op
<imirkin_> graphitemaster: the glHint is not used anywhere by gallium drivers
<imirkin_> (nor is it made available to them)
<imirkin_> the sampler state definition does not include a quality bit
<graphitemaster> Exactly. My hope was that drivers could report what derivative operation is done by implicit and then Mesa could just generate explicit texture Grad for texture, to support derivatives that do not match what the driver has, and then for there to be a NIR lowering pass for code that explicitly use derivative control which happens to match the hardware. This would be ideal and consistent then.
<imirkin_> textureGrad is *super* expensive to support on some hardware (e.g. nvidia)