#dri-devel on 2021-07-29 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:00 <jekstrand> Yeah, chromebooks aren't exactly optimized for storing data on the device. You just need to figure out how to do builds directly in your Google Drive. :D

00:00 <airlied> just use a jupyter notebook

00:01 <alyssa> oops i broke the entire cts

00:01 <alyssa> this is somehow marge's fault

00:03 <jekstrand> alyssa: From your vec4 patch:

00:03 <jekstrand> total instructions in shared programs: 15970411 -> 15970928 (<.01%)

00:03 <jekstrand> helped: 635

00:03 <jekstrand> instructions in affected programs: 124197 -> 124714 (0.42%)

00:03 <jekstrand> HURT: 673

00:03 <jekstrand> helped stats (abs) min: 1 max: 7 x̄: 1.89 x̃: 1

00:03 <jekstrand> helped stats (rel) min: 0.15% max: 16.67% x̄: 4.44% x̃: 3.26%

00:03 <jekstrand> HURT stats (abs) min: 1 max: 24 x̄: 2.55 x̃: 1

00:03 <jekstrand> HURT stats (rel) min: 0.26% max: 27.38% x̄: 2.80% x̃: 2.13%

00:03 <jekstrand> 95% mean confidence interval for instructions value: 0.23 0.57

00:03 <jekstrand> 95% mean confidence interval for instructions %-change: -0.98% -0.46%

00:03 <jekstrand> Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

00:05 <alyssa> jekstrand: Curious. Okay, was worth a shot :)

00:34 rsalvaterra_ has quit []

00:34 rsalvaterra has joined #dri-devel

00:38 illwieckz has quit [Ping timeout: 480 seconds]

00:40 <airlied> robclark: I'll hold that next pull until you can track down the devfreq regression

00:41 <robclark> airlied: devfreq regression?

00:41 <robclark> oh

00:42 * airlied looks like I was being proactive by doing nothing :-P

00:43 <robclark> hmm, I wonder if a630 has some extra restrictions about changing freq.. I thought it should otherwise be pretty similar to a618

00:44 <imirkin_> alyssa: why did you tag me on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12082 ?

00:45 <imirkin_> alyssa: also did you mean to assign to marge rather than request her review?

00:45 Lightkey has quit [Ping timeout: 480 seconds]

00:48 illwieckz has joined #dri-devel

00:49 <alyssa> ummmm

00:50 <alyssa> i er did mean to assign to marge

00:50 <alyssa> though I'm happy to have her review too

00:50 <imirkin_> might have to wait a while for that to happen :)

00:50 <imirkin_> i'm sure anholt_ is working on the auto-review feature...

00:51 <alyssa> neural netowrks

00:51 <zf> <reject>

00:54 Lightkey has joined #dri-devel

00:58 pnowack has quit [Quit: pnowack]

01:27 <idr> alyssa: Better her than Patty or Selma.

01:49 <alyssa> ;D

01:49 <alyssa> Oh, Homie...

01:50 nchery has quit [Quit: Leaving]

02:01 sdutt has quit [Remote host closed the connection]

02:01 sdutt has joined #dri-devel

02:11 boistordu has joined #dri-devel

02:16 ngcortes has quit [Remote host closed the connection]

02:17 boistordu_ex has quit [Ping timeout: 480 seconds]

02:18 <imirkin> graphitemaster: out of curiosity, what nvidia hw are you seeing fine/coarse derivatives differences on?

02:19 <imirkin> pretty sure i checked on kepler or so, and there's no diff in generated code

02:19 <imirkin> it might affect sampler settings actually

02:21 <graphitemaster> imirkin, 2070 RTX

02:21 <graphitemaster> It does affect sampler settings though, it does not change the generated code.

02:21 <imirkin> graphitemaster: ah that makes sense

02:22 <imirkin> yeah, samplers on nvidia have tons of knobs

02:26 <graphitemaster> Yeah, looking at the TSC, it looks like the nouveau guys didn't figure it out either

02:26 <graphitemaster> https://github.com/envytools/envytools/blob/master/rnndb/graph/g80_texture.xml#L392

02:26 <graphitemaster> Between mag and min filter here is 2 bits that define the partial derivatives used

02:26 <graphitemaster> Don't ask how I know please.

02:26 <imirkin> not everything we know is documented there :p

02:27 <graphitemaster> :)

02:27 <graphitemaster> Okay well, guess what I'm writing tonight, a derivative test for textures at init to fallback to software :P

02:30 <graphitemaster> BTW on AMD it appears like Fine derivatives are always used for mipmap lod.

02:30 <graphitemaster> Just looking at the visual results here, that's radeonsi.

02:31 <imirkin> radeonsi ignores the hint

02:31 <imirkin> (all st/mesa drivers ignore the hint)

02:32 <imirkin> (since the hint is in GL, and st doesn't pass it through to the gallium driver atm)

02:33 * airlied had an MR to pass it through

02:33 <airlied> but never really cared enough to push it

02:34 <graphitemaster> The sample instruction in RNDA (_D suffix) supports 2, 4, or 6 slopes for LOD calculation.

02:34 <airlied> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3052

02:34 <airlied> granted that was TGSI so very old

02:34 <graphitemaster> My guess is nothing uses the 6 slopes? since Vulkan specs fine as 4 slope and coarse as 2 slope in the quad

02:34 <graphitemaster> Maybe 6 slopes is uses for MSAA textures or something.

02:35 <graphitemaster> s/uses/used

02:35 <graphitemaster> Oh for 3D

02:35 <graphitemaster> Derp

02:36 <graphitemaster> Okay yeah, so there is fine and coarse selection for LOD derivatives in the Sample instruction itself on RDNA.

02:36 <alyssa> graphitemaster: what mesa driver are you interested in? radeonsi? nouveau?

02:37 <graphitemaster> None of them specifically. I'm just an engine dev trying to make sure my stuff looks correct on open source drivers and Zink specifically since I need a solution for when Apple kills OpenGL.

02:37 <graphitemaster> And that has led me to looking at all of them XD

02:39 <alyssa> gotcha

02:40 <graphitemaster> The danger is I actually learn all this stuff and become a driver developer.

02:45 <graphitemaster> It looks like coarse / fine derivative selection for textures is just a complete oversight by all the APIs except OpenGL which makes the hint optional (granted).

02:49 <graphitemaster> And the only reason it works at all in GL is because the hint affects texture sampling indirectly. It doesn't appear to be an intended use of the hint.

02:49 <graphitemaster> Does that mean I can propose an extension? XD

02:54 lemonzest has joined #dri-devel

02:56 cef is now known as Guest2649

02:57 cef has joined #dri-devel

02:57 <mareko> RDNA can't select coarse/fine for sample instructions

02:59 <graphitemaster> page 82, table 44: https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf

03:03 Guest2649 has quit [Ping timeout: 480 seconds]

03:04 <mareko> yes, only RDNA1 has them

03:04 <mareko> and older chips

03:04 <graphitemaster> RDNA2 has it too.

03:05 <graphitemaster> page 73, table 44: https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf

03:06 <graphitemaster> GCN1, GCN2, and GCN3 as well (going backwards)

03:06 <mareko> there is no _CD

03:07 <graphitemaster> Oh

03:07 <graphitemaster> I see now.

03:08 <graphitemaster> So it's always Fine there.

03:08 <graphitemaster> Which is consistent with what I was seeing

03:09 <graphitemaster> This is frustrating.

03:10 <alyssa> graphitemaster: *sips*

03:10 <alyssa> This is Fine.

03:12 <graphitemaster> Okay I'm just going to do the really dumb thing I was thinking of drawing a fullscreen triangle with two mip texture and reading back the result to determine if coarse or fine derivatives are being used and then emulating fine derivatives for lod calculation when obviously coarse.

03:13 <graphitemaster> And that's totally going to be a lot of overhead

03:18 <imirkin> graphitemaster: can probably cut out the middleman and use textureQueryLod?

03:18 <imirkin> erm, i guess nevermind on that.

03:18 <imirkin> er no

03:18 <imirkin> that should work

03:18 <imirkin> at least on nvidia

03:18 <graphitemaster> The thing that also uses partial derivatives :P

03:19 <imirkin> it still takes all the regular texturing details

03:20 <imirkin> and then compare what that function produces relative to what you'd produce by hand?

03:20 <imirkin> using dfdx/dy

03:22 <graphitemaster> Can LOD values be barycentrically interpolated? I could just compute the LOD on the CPU using backward finite difference and pack it in with the UVs in my vertex soup. I actually don't know how the GPU calculates the gradients, I guess it's just the luminance of the RGB pixels much like edge-detection methods work in some of the ugly AA algorithms. Writing a pretty basic 2x2 rasterizer isn't out of the question here, already have

03:22 <graphitemaster> a depth-only rasterizer for occlusion culling I could use, this is getting too big brain, never mind.

03:23 <graphitemaster> Compute shader can calculate them actually and just store to the vertex soup just before a draw.

03:23 <graphitemaster> NV absolutely hates interleaved compute and draws though

03:25 <graphitemaster> I don't think they can be interpolated anyways because partial derivatives are non-linear.

03:26 <graphitemaster> Why else would you analytically compute them per-fragment if they can be done per-vertex.

03:26 <graphitemaster> So that was a stupid idea in hindsight, never mind.

03:31 <graphitemaster> Oh my god

03:32 <graphitemaster> I forgot about textureGrad

03:34 <graphitemaster> By the way, textureLod() is a horrible idea even with emulation because it disables aniostropic filtering.

03:34 <graphitemaster> So I'd have to use textureGrad actually.

03:35 <imirkin> or use textureQueryLod

03:35 <imirkin> and then you don't have to worry about texturing at all

03:35 <imirkin> since you get at the computed lod info

03:35 <imirkin> without worrying about doing actual texture lookups

03:36 <graphitemaster> The result of that still has to be used with textureLod

03:36 <imirkin> why

03:36 <imirkin> why are you looking anything up in the texture

03:36 <graphitemaster> Because I'm sampling the texture?

03:36 <imirkin> compare the result of that to your manually-computed lod

03:36 <imirkin> why are you sampling the texture?

03:36 <graphitemaster> So I can texture something

03:37 <imirkin> i thought you were trying to build a detector for the fine thing

03:37 <imirkin> just look for textureQueryLod != manual lod

03:37 <graphitemaster> Right if I just wanted to determine if the coarse or fine derivatives are being used I could do that. I'm just saying emulating the fine derivatives for mip lod selection is broken anyways

03:38 <graphitemaster> It's broken because when I do end up using textureLod with fine computed lod I lose anisotropic filtering

03:38 <imirkin> but again, that's not what you're testing

03:38 <imirkin> you're testing lod computation

03:38 <imirkin> whether it's per-quad or per-frag

03:39 <imirkin> i guess?

03:39 <graphitemaster> The purpose of the test is so I can fall back to doing something alternative to emulate the behavior I want, if I can't even emulate the behavior correctly then the test is not very helpful.

03:39 <imirkin> ah

03:39 <imirkin> then yes, textureGrad is for you :)

03:40 <graphitemaster> So vec4 sample_emulate(sampler2D tex, vec2 uv) { return textureGrad(tex, dFdxFine(uv), dFdyFine(uv)); }

03:41 <graphitemaster> oops, forgot to pass uv there

03:41 <graphitemaster> I wonder if this pattern is detected by the compiler better

03:41 <graphitemaster> And optimized correctly if fine derivatives are default

03:48 <graphitemaster> Shit that works correctly and the NVfp does compile it away!

03:48 <graphitemaster> Hallelujah!

03:55 <graphitemaster> http://shader-playground.timjones.io/742c0548f171944fb54a0ab2d10d1395

03:55 <graphitemaster> Looks like on modern AMD it does optimize away too

03:55 <graphitemaster> Since fine derivatives are the default, so it just uses regular image_sample_d

03:59 <mareko> graphitemaster: quad_perm is dfdx/dfdy

03:59 <graphitemaster> Oh :(

04:01 <graphitemaster> Suppose I wanted to play around with mesa to learn it a little, where would I read and attempt to add an optimization for this specific pattern in the IR

04:02 <graphitemaster> When the derivatives of a textureGrad are computed immediately from the same uv given to textureGrad but with dFd* intrinsics, just strength reduce it to a regular implicit texture lod.

04:03 <imirkin> graphitemaster: you'd add something to nir, or to the driver's backend compilers

04:06 soreau has quit [Read error: No route to host]

04:07 soreau has joined #dri-devel

04:11 <mareko> if you find the texture opcode in nir, you just change nir_tex_instr::op from txd to tex and set num_srcs to 1

04:13 <graphitemaster> So turn the grad instruction to a regular tex one, and then presumably the partial derivative functions called will be no longer referenced so the optimizer will remove them?

04:13 <mareko> yes

04:13 <graphitemaster> Makes sense.

04:15 <graphitemaster> I see a replace_gradient_with_lod

04:16 <mareko> yeah something like that

04:17 <graphitemaster> As an aside, that lower_gradient_cube_map looks wrong to me

04:17 <graphitemaster> It computes lod = -1.0 + 0.5 * log2(L * L * M)

04:21 Duke`` has joined #dri-devel

04:27 Company has quit [Read error: Connection reset by peer]

04:33 <graphitemaster> Why is there a -1.0 there

04:33 <graphitemaster> The dot's compute squared lengths, the 0.5 * log2(x) is the same as log2(sqrt(x))

04:34 <graphitemaster> I wonder if this is what is causing the broken lod I see on Zink XD

04:37 <imirkin> are you sure that pass even runs with zink?

04:40 <graphitemaster> Must not be, since it's not hitting the bp

04:47 <graphitemaster> nir_lower_tex is never called in Zink so can't be anything here

04:49 <graphitemaster> So attempting optimizations here with the pattern I talked about is probably not a good idea until I can figure out how to get this to go.

05:05 mlankhorst has joined #dri-devel

05:09 danvet has joined #dri-devel

05:13 mattrope has quit [Read error: Connection reset by peer]

05:26 mattrope has joined #dri-devel

05:27 <graphitemaster> Does this appear correct https://pastebin.com/raw/wJHEdGSP

05:28 <graphitemaster> Trying to emulate fine derivatives for a comparison bench I'm making

05:28 Putti has joined #dri-devel

05:28 <graphitemaster> er, coarse, bad name

05:35 jessica_24 has quit [Quit: Connection closed for inactivity]

06:14 Duke`` has quit [Ping timeout: 480 seconds]

06:21 bcarvalho has quit [Remote host closed the connection]

06:31 alanc has quit [Remote host closed the connection]

06:31 alanc has joined #dri-devel

06:32 frieder has joined #dri-devel

06:33 jkrzyszt has joined #dri-devel

06:37 aravind has joined #dri-devel

06:43 mattrope has quit [Remote host closed the connection]

07:07 rpigott has quit [Remote host closed the connection]

07:08 rpigott has joined #dri-devel

07:13 rasterman has joined #dri-devel

07:13 sdutt has quit [Ping timeout: 480 seconds]

07:19 enunes has joined #dri-devel

07:19 sdutt has joined #dri-devel

07:19 thellstrom has joined #dri-devel

07:21 enunes has quit [Read error: Connection reset by peer]

07:22 enunes has joined #dri-devel

07:25 sumits has quit [Quit: ZNC - http://znc.in]

07:27 Terman has joined #dri-devel

07:39 pnowack has joined #dri-devel

07:39 xexaxo has joined #dri-devel

07:40 sumits has joined #dri-devel

07:46 enunes has quit [Quit: ZNC - https://znc.in]

07:47 Lucretia has joined #dri-devel

07:48 sumits has quit [Quit: ZNC - http://znc.in]

07:48 enunes has joined #dri-devel

07:50 bcarvalho has joined #dri-devel

07:50 <pq> bl4ckb0ne, all the DRM formats are listed in drm_fourcc.h - I don't recall seeing one for depth.

07:53 xexaxo has quit [Ping timeout: 480 seconds]

08:03 tursulin has joined #dri-devel

08:04 pcercuei has joined #dri-devel

08:14 sumits has joined #dri-devel

08:24 agx has quit [Read error: Connection reset by peer]

08:26 agx has joined #dri-devel

08:36 pcercuei has quit [Ping timeout: 480 seconds]

08:39 pcercuei has joined #dri-devel

08:39 mlankhorst has quit [Ping timeout: 480 seconds]

08:54 aissen has quit [Ping timeout: 480 seconds]

08:56 <emersion> alyssa: does your comment count as a A-b here? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12074

08:56 <emersion> daniels: and yours as a R-b?

08:57 <emersion> (for the panfrost bits only ofc)

09:00 K`den has joined #dri-devel

09:00 Kayden has quit [Read error: Connection reset by peer]

09:10 xexaxo has joined #dri-devel

09:26 vivijim has joined #dri-devel

09:29 sdutt has quit [Ping timeout: 480 seconds]

09:32 nirmoy has joined #dri-devel

09:42 camus has joined #dri-devel

09:53 xexaxo has quit [Remote host closed the connection]

09:53 xexaxo has joined #dri-devel

09:54 camus has quit [Ping timeout: 480 seconds]

09:55 <daniels> emersion: yep, R-b from me, thankyou

09:58 <emersion> thx

10:26 <pendingchaos> is there a MSVC equivalent to GCC's -fwrapv? I'm thinking that maybe we should compile NIR's constant folding with this option

10:41 camus has joined #dri-devel

10:42 frieder has quit [Ping timeout: 480 seconds]

10:48 rg3igalia has joined #dri-devel

10:51 frieder has joined #dri-devel

10:54 camus has quit [Ping timeout: 480 seconds]

10:55 ppascher has quit [Quit: Gateway shutdown]

10:58 ella-0 has quit [Remote host closed the connection]

10:59 ella-0 has joined #dri-devel

11:04 camus has joined #dri-devel

11:06 Net147 has quit [Quit: Quit]

11:07 Net147 has joined #dri-devel

11:12 ppascher has joined #dri-devel

11:26 camus has quit [Ping timeout: 480 seconds]

11:34 <kusma> Uh, it seems like the debian-gallium job on Mesa CI is currently failing with an ir3_ra() failed message...

11:34 <kusma> Oh, retrying printed the same message, but the job passed??

11:38 ella-0 has quit [Remote host closed the connection]

11:39 ella-0 has joined #dri-devel

11:47 <daniels> kusma: job link please

11:51 <kusma> daniels: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/12303378#L4178

11:51 <kusma> It seems lke the ir3_ra() thing might be unrelated to the failure. I already posted this on the CI tracker issue.

11:54 flacks has quit [Quit: Quitter]

11:56 flacks has joined #dri-devel

11:58 <daniels> yeah, that's an intel drm-shim crash

12:00 <dj-death> I can look into it

12:00 <dj-death> I bet we started requiring a new feature and the stub doesn't have it

12:36 Company has joined #dri-devel

12:37 dviola has quit [Ping timeout: 480 seconds]

12:38 dviola has joined #dri-devel

12:59 <zmike> anyone else having issues pulling from mesa git now?

12:59 dviola has quit [Ping timeout: 480 seconds]

13:00 <MrCooper> zmike: might be related to this from #freedestkop: <bentiss> sigh large-5 is now down, ceph is timeing out

13:00 <zmike> huh sounds probable

13:10 dviola has joined #dri-devel

13:13 ella-0_ has joined #dri-devel

13:14 ella-0 has quit [Read error: Connection reset by peer]

13:22 ezequielg has quit []

13:22 ezequielg has joined #dri-devel

13:24 <jenatali> pendingchaos: Looks like no

13:28 <alyssa> I guess I should whip up a panfrost drm-shim at some point

13:29 <alyssa> I've dragged my feat since I can do shader-db from any mali gpu on my older mali laptop, just by faking the gpu id instead of the whole drm-shim

13:29 <alyssa> but... m1 can do faster shader-db runs...

13:29 <bl4ckb0ne> pq: yeah, thinking about adding some

13:30 <bl4ckb0ne> i have a weird use case

13:38 <pq> bl4ckb0ne, you got a literal 3D monitor? :-)

13:38 <bl4ckb0ne> almost

13:39 <pq> hmm, yeah, actually 3D is not new, there are 3D video modes, but they tend to be stereo

13:41 <bl4ckb0ne> i have a weird combo of wayland linux dmabuf used with eglCreateImageKHR as a depth buffer

13:41 <bl4ckb0ne> buffer is allocated with DRM_FORMAT_R16, and when attached to the framebuffer it fails because the format is GL_RGB

13:42 <bl4ckb0ne> so eitheri found a way inside mesa to transmute the PIPE_R16_UNORM into PIPE_Z16_UNORM

13:42 <bl4ckb0ne> or DRM_FORMAT_Z16

13:42 <bl4ckb0ne> PIPE_FORMAT*

13:49 camus has joined #dri-devel

14:10 ella-0_ is now known as ella-0

14:17 <dj-death> can't repro the drm-shim crash :/

14:23 mlankhorst has joined #dri-devel

14:23 <dj-death> ah no, looks like a fstat issue

14:25 tzimmermann has joined #dri-devel

14:35 iive has joined #dri-devel

14:35 nchery has joined #dri-devel

14:36 <dj-death> found it : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12129

14:37 <dj-death> intel stuff might not be the only affected stuff

14:38 MrCooper has quit [Quit: Leaving]

14:38 boistordu has quit [Remote host closed the connection]

14:38 boistordu has joined #dri-devel

14:40 MrCooper has joined #dri-devel

14:41 <emersion> pq, the idea is to pass depth DMA-BUFs from client to compositor

14:44 Peste_Bubonica has joined #dri-devel

14:47 <pq> cool, so that one is flying again

14:47 alyssa has left #dri-devel [#dri-devel]

14:47 <pq> but do they need to be imported as depth buffers?

14:48 <emersion> they can be passed around as color buffers, but then that requires blitting

14:51 <emersion> danvet: are depth buffer formats something that drm_fourcc.h would be suitable for?

14:53 sdutt has joined #dri-devel

14:53 <danvet> emersion, no one tried yet, I think that's all

14:54 <danvet> might need some egl ext rewording and a pile of patches

14:54 <emersion> i guess my main question is, maybe the single-channel formats should be used instead?

14:54 <bl4ckb0ne> im surprised nobody tried

14:55 <emersion> like, R16 is red, but could very well be depth

14:55 <emersion> is it better to add EGL APIs to say "this is R16 but i want depth", or is it better to add the depth formats to drm_fourcc.h

14:59 <pq> emersion, why would it need blitting?

14:59 alyssa has joined #dri-devel

15:00 <alyssa> jekstrand: bbrezillon: and I are talking about structing a Vulkan driver for per-gen GenXML compiles, mind if we pick your brain?

15:00 <bl4ckb0ne> depth is a handled as a regular texture

15:00 <emersion> pq, essentially an EGL/GL/Mesa limitation. DMA-BUFs can't be imported/attached as depth buffers

15:00 <emersion> instead, you need to blit the depth buffer to a color buffer, then export it

15:00 <pq> emersion, why would you need it *imported* as a depth buffer?

15:01 <emersion> the client exports its depth buffer to the compositor

15:01 <pq> oooh, you mean for the rendering?

15:01 <emersion> yea

15:01 <pq> ok, I was only thinking about using it in a compositor or such :-)

15:01 <emersion> to use the depth buffer when compositing multiple clients, each of which have a color+depth buffer

15:01 <bl4ckb0ne> told you it was almost a 3d monitor ;)

15:02 <emersion> pq, maybe we should've started with the obvious, bl4ckb0ne is working on a VR compositor

15:02 <pq> sure

15:02 <emersion> well, obvious for us

15:03 <pq> I guess you want the fixed-function depth stuff to use a dmabuf in the client that renders the 3D image.

15:03 <emersion> yeah

15:03 <pq> for a compositor to read a color buffer as depth is no problem, just plug it an the compositing shader

15:03 <pq> *in

15:04 <emersion> hm, i guess if you hand-roll your GL_DEPTH_TEST, should work yea

15:04 <bl4ckb0ne> yup that's what I have atm

15:10 <pq> emersion, do you need even that? Just read the color texture with the depth values and assign to glFragDepth?

15:11 <pq> or is the performance hit of that significant?

15:11 <emersion> hmm. tbh bl4ckb0ne needs to try a lot of things and see what happens

15:11 <emersion> throw a lot of stuff at the wall and see what sticks

15:11 <imirkin_> when i was reviewing the nouveau modifier patches

15:11 <imirkin_> depth came up as a problem

15:12 <imirkin_> but it was stated that this was "out of bounds" for things that can be exported

15:12 <imirkin_> so i didn't worry too much about it

15:12 <emersion> ah, oops.

15:12 <pq> fragment depth value is totally arbitrarily writable in a frag shader in GL :-)

15:12 <imirkin_> (and i checked - no depth formats in drm_fourcc, etc)

15:12 <emersion> pq, so we'd still need a blit in the compositor?

15:12 mattrope has joined #dri-devel

15:13 <pq> after which I think it goes through all the fixed-function depth stuff

15:13 <pq> no

15:13 <pq> no blit

15:13 <emersion> there may be multiple clients

15:13 <jekstrand> alyssa: Sure

15:13 <pq> sure

15:13 <imirkin_> so at least with nouveau, depth + dmabuf won't work trivially

15:13 <emersion> to composite them together correctly, each needs a proper depth buffer

15:13 <pq> you still composite then one by one, right?

15:13 <imirkin_> not to say it can't be made to work, just that the current code won't support it

15:14 <pq> each needs proper depth - not necessary as a depth buffer per se

15:14 <emersion> pq, likely, yeah

15:14 sdutt has quit []

15:14 sdutt has joined #dri-devel

15:14 <emersion> hm

15:15 <emersion> so hand-roll that depth test?

15:15 <pq> just bind the R32 buffer as another regular texture, sample it, and write the result into glFragDepth suitably scaled if needed

15:15 <pq> no, that should just work

15:15 frieder has quit [Ping timeout: 480 seconds]

15:16 <imirkin_> note that depth buffers are traditionally Z24, so there's no natural color format to put it into

15:16 <emersion> how does GL know that only the pixels whose depth is > than the destination's depth should be blitted?

15:16 <emersion> s/>/</

15:16 <imirkin_> emersion: GL_DEPTH_TEST + glDepthFunc?

15:16 <pq> that ^

15:16 <emersion> GL wouldn't know about the source texture's depth, since it would just be a regular color texture

15:17 <imirkin_> if you're emulating the depth test, you also need the depth test parameters

15:17 <pq> emersion, it doesn't need to. You specify the depth value in the fragment shader by writing to glFragDepth.

15:17 <emersion> oh.

15:17 <emersion> thanks for your patience

15:18 <imirkin_> gl_FragDepth, to be pedantic

15:18 <emersion> just understood what you meant

15:18 <imirkin_> and iirc that's not a thing on unextended GLES2? i forget.

15:18 <emersion> i though gl_FragDepth wrote to the destination buffer

15:18 <emersion> but it just sets the depth test's input

15:18 <pq> imirkin_, thanks - it's been over 10 years since I used it... if I used it. :-)

15:18 <imirkin_> emersion: gl_FragDepth is indeed the output value to write to the depth buffer

15:19 <emersion> ah.

15:19 <imirkin_> gl_FragCoord.z is the "input" value of depth

15:19 <imirkin_> the two need not have any connection with each other

15:19 <pq> imirkin_, huh??

15:19 <emersion> so just set gl_FragCoord.z and let GL do its depth test?

15:19 <emersion> is gl_FragCoord.z writable?

15:19 <imirkin_> no

15:19 <imirkin_> gl_FragDepth is writable.

15:19 <pq> imirkin_, I mean, sure, that's the value to be written *if* the fragment passes all tests, right?

15:20 <imirkin_> there are both early and "late" depth tests

15:20 <imirkin_> if the shader writes depth, the depth tests are indeed done after

15:20 <emersion> ok, but i want to set the depth test input, or else need to re-implement it myself

15:20 <imirkin_> yeah, so you can write gl_FragDepth and that will be the input into the depth test.

15:20 <emersion> ah, ok

15:20 <imirkin_> (and ultimately be written to the depth surface, if it passes the test)

15:20 <emersion> cool

15:20 <emersion> thanks!

15:20 <pq> that! ^ \o/

15:21 <imirkin_> and gl_FragCoord.z is the "natural" depth after viewport transform/etc.

15:21 <imirkin_> (and is an input into the shader)

15:22 Duke`` has joined #dri-devel

15:22 <pq> So the only problem left is how the client is doing to write the depth buffer that can be shared. I'm guessing you need MRT for it and write it out explicitly?

15:22 <pq> *goint to

15:22 <pq> *going to

15:23 <emersion> MRT?

15:24 <imirkin_> multiple render targets

15:24 <imirkin_> i.e. multiple color outputs from a single fragment shader

15:24 <emersion> hm, maybe

15:24 <imirkin_> (also not a core GLES2 feature iirc)

15:24 <bl4ckb0ne> i moved to gles3 recently

15:24 <imirkin_> welcome :)

15:24 <emersion> but does the frag shader run late enough for this?

15:25 <emersion> i think the safe thing to do is just another render pass

15:25 <imirkin_> emersion: no, but the depth test will reject writes

15:25 <emersion> hmm.\

15:25 <imirkin_> i'm not 100% clear on what you're trying to do tbh ;)

15:26 <alyssa> jekstrand: How to handle driver structures (e.g. anv_private.h) that contain prepacked hardware state? particularly when the size of the state differs across gens?

15:26 <emersion> imirkin_: have multiple 3D scenes. each rendered in one process. composite them together in a compositor

15:26 <alyssa> I guess anv just open codes u32[] but that seems... unsafe

15:26 <emersion> the 3D scenes are rendered with the same mvp matrices and stuff

15:26 <jekstrand> alyssa: We have a way to do it in anv but it's terrible

15:27 <alyssa> bbrezillon: ^^

15:27 <alyssa> I found anv_graphics_pipeline which is... not amazing...

15:27 <imirkin_> emersion: ... why

15:28 <jekstrand> alyssa: Fixed sizes aren't really a problem if you STATIC_ASSERT(ARRAY_SIZE(foo) >= GENX(MY_PACKET_length));

15:28 <bl4ckb0ne> why not :D

15:28 <imirkin_> bl4ckb0ne: because it's a pain

15:29 <bl4ckb0ne> i know, i know

15:29 <alyssa> jekstrand: Ah, of course!

15:29 <alyssa> and the static assert can be per gen even if the header is not, got it got it got it

15:29 <jekstrand> Yup

15:30 <bl4ckb0ne> but it works https://l.sr.ht/tmFK.png :p

15:30 <pq> emersion, you may want to pay attention if we are talking about the compositor or the client rendering here. ;-)

15:31 <pq> \o.

15:48 <imirkin_> bl4ckb0ne: how do you ensure depth "compatibility" between the two scenes?

15:51 <bl4ckb0ne> for now i dont, but that's what I want to do

15:52 dviola has quit [Quit: WeeChat 3.2]

15:52 thellstrom has quit [Quit: thellstrom]

15:54 <imirkin_> bl4ckb0ne: i mean ... what if one thing is centered around 0.5 and the other around 0.25. what if one has inverted depth. etc.

15:57 <bl4ckb0ne> centered?

15:59 <imirkin_> like, the raw depth values

16:00 <imirkin_> how do you ensure they are "compatible" between scenes?

16:00 ezequielg has quit []

16:00 <bl4ckb0ne> compositor shares the values

16:01 ezequielg has joined #dri-devel

16:02 <emersion> imirkin_: i guess it just boils down to writing down the expectations so that all clients agree?

16:02 <zf> <https://community.khronos.org/t/gldrawelementsindirect-with-element-array-buffer-offset/69421>

16:02 <zf> did this ever happen in any form?

16:03 <imirkin_> emersion: i guess

16:11 <dcbaker> vrigl people: I'm assuming that Revert "Revert "virgl: Cache depth and stencil buffers"" shouldn't be backported to 21.2?

16:17 <jenatali> venemo: Sounds like something for GitLab to solve, not something Freedesktop would solve in its instance

16:33 K`den is now known as Kayden

16:36 <MrCooper> Venemo: I think you need to authenticate with NickServ, didn't see what you wrote that jenatali responded to

16:45 Peste_Bubonica has quit [Quit: Leaving]

16:56 jessica_24 has joined #dri-devel

16:58 <shadeslayer> dcbaker: nope, it depends on the previous commit in the history

16:59 <shadeslayer> https://gitlab.freedesktop.org/mesa/mesa/-/commit/a425c5df789f2b28fdf9e61f108418b6b01e10a9

17:00 tobiasjakobi has joined #dri-devel

17:02 <dcbaker> shadeslayer: yeah, that's not backported. We have this... feature were all reverts are automatically nominated for stable. eric_engestrom was working on fixing that I think

17:02 <dcbaker> I'll go ahead and denominate

17:03 <shadeslayer> ahh, I wasn't aware of that

17:07 aissen has joined #dri-devel

17:14 tobiasjakobi has quit [Remote host closed the connection]

17:20 macromorgan has quit [Remote host closed the connection]

17:21 macromorgan has joined #dri-devel

17:22 <alyssa> jekstrand: Oh, hm. It occurs to me if we're playing really fast 'n loose, we could save 2 instructions for dfdx_coarse

17:22 <alyssa> currently we emit

17:22 <alyssa> broadcast(lane_id & ~1, x) - broadcast(lane_id | 1, x)

17:22 <alyssa> (ish)

17:23 <alyssa> whereas i guess the specs are loose enough for coarse we could do

17:23 <alyssa> broadcast(0, x) - broadcast(1, x)

17:23 <shadeslayer> dcbaker: maybe revert'ing reverts can have a "Needs" field that can depend on another commit :)

17:23 <jekstrand> alyssa: Do you always have a subgroup size of 4?

17:23 <alyssa> jekstrand: No, but our broadcast has a subgroup_size parameter so we can always do logical quads.

17:24 <jekstrand> right

17:24 <jekstrand> Yeah, that's basically what we do on Intel.

17:24 <alyssa> (I guess it just does `lane_id & (subgroup_size - 1)` in hardware)

17:24 <imirkin_> on nvidia, there's a special mode to the readInvocation equivalent which lets you read the other "x" lane or "y" lane

17:25 <alyssa> oh, derp... I guess we have that too and we don't take advantage of it

17:26 <imirkin_> heh

17:26 <imirkin_> "oops"

17:26 <alyssa> oh, now i remember why - because then you have sign trouble.

17:26 <jekstrand> Yeah

17:26 <imirkin_> ah yeah. we can flip the sign. it's a neat op.

17:26 <alyssa> should use it for fwidth or somethin

17:26 <jekstrand> You have to subtract consistently

17:26 <imirkin_> FSWZADD on newer GPUs is what it's called

17:26 <dcbaker> shadeslayer: I think the plan is to just treat them like other commits, either a "cc" or a "fixes" is required for them to be backported

17:27 <dcbaker> I mean, reverts basically come in two flavors

17:27 <dcbaker> 1. said patch was wrong, revert

17:27 <jekstrand> alyssa: prior to Ice Lake, we just used the vec4 hardware with a swizzle. 'cause we can do that in scalar mode. :)

17:27 <dcbaker> 2. I rewrote everything, and now the old behavior is correct

17:27 <alyssa> jekstrand: Haha

17:28 <jekstrand> alyssa: Yeah, its a pretty neat trick and basically the only interesting use of vec4 in fragment shaders on Intel. :)

17:28 <alyssa> jekstrand: Ok, right now we need 5* instructions for a ddx .. mov lane_id, iadd, broadcast, broadcast, fsub

17:28 <alyssa> * amortized 3 instructions since the mov/iadd gets cse'd for subsequent ddx

17:29 <imirkin_> so ... 4 amortized ops then?

17:29 <alyssa> sure

17:29 <alyssa> if we use the "get other" one, we could do... er... broadcast, fsub, fsub, iand, csel ?

17:29 <alyssa> it's not clear that's better :-p

17:30 <jekstrand> alyssa: But it's GL_FASTEST!

17:30 <alyssa> wonder what the ddk does

17:30 <alyssa> this "lane_id ^ src0" mode for broadcast is clearly intended for something hah

17:30 <imirkin_> yeah, being able to add while fetching other lanes is ... convenien

17:31 <jekstrand> And super-important for all those derivative-bound shaders. :)

17:31 <imirkin_> it's basically the most common op

17:31 <imirkin_> some people just can't stay in their lane...

17:31 <alyssa> oh, er.

17:32 <alyssa> imirkin_: lol

17:32 <jekstrand> There's an olympic swimming joke in here somewhere.

17:32 <alyssa> DDK's coarse is `broadcast(1) - broadcast(0)`

17:32 <alyssa> which is 3 ops and no CSE opportunities

17:32 <bnieuwenhuizen> the real trick with derivatives is to do them analytically and do away with helper lanes. Helps with tons of small triangles :)

17:32 <alyssa> bnieuwenhuizen: i can't tell if this is sarcasm or not

17:34 <bnieuwenhuizen> alyssa: not sarcasm, but a bit out there :) UE5 switched to partial rasterization in a compute shader because of too many small triangles (roughly 1 per pixel) causing large helper lane overheads

17:36 <alyssa> jekstrand: I guess the only neat trick is that abs(dFdx(x)) can be done as "abs(broadcast(lane ^ 1) - x)" and since the lane ^ 1 is for free, the whole fwidth() dance becomes

17:36 <alyssa> broadcast, broadcast, fsub, fsub, fadd

17:37 <jekstrand> But that would require adding nir_op_fwidth

17:37 <alyssa> so 5 ops for fwidth. as opposed to the current fwidth impl which would be 10 ops.

17:37 <alyssa> jekstrand: or just detected fabs(dfdx) and fabs(dfdy) in isel

17:37 <jekstrand> alyssa: Yeah, that sounds better. :)

17:38 <alyssa> there are almost no uses of fwidth in my shader-db so going to say... not bothering :p

17:43 <ajax> imirkin_: re depth buffer convention above: that's not buffer sharing's job to care about? all you're doing is transporting the data in a given format. it's not like you're required to treat "rgba" buffers as literally red green blue and opacity

17:45 <imirkin_> ajax: yes. i was asking about how it was going to be used in practice.

17:45 <imirkin_> with rgba, there's _some_ expectation that there's a connection to red/green/blue.

17:45 <imirkin_> esp for display surfaces

17:45 <imirkin_> sent to a compositor

17:46 <imirkin_> whereas there's no single overriding convention for depth

17:47 <Venemo> MrCooper: thanks for letting me know. I'm pretty sure I was identified before, but it forgot... Do you see this message now?

17:51 <ajax> imirkin_: fair. i'm having trouble thinking of a good way to signal that in-band though.

17:51 <pepp> Venemo: yes

17:52 <imirkin_> ajax: and even with rgba, there's srgb, color spaces, bla bla bla. so it's a problem there too.

17:54 <Venemo> Thx

17:56 ngcortes has joined #dri-devel

17:57 <bl4ckb0ne> isnt depth/stencil more straightforward than colorspace?

17:58 <imirkin_> bl4ckb0ne: not really...

17:58 <imirkin_> you still have the question of how to interpret the values

17:58 <imirkin_> does higher depth = near or far?

17:59 <bnieuwenhuizen> also what are the near plane and the far plane

18:00 <bnieuwenhuizen> (or was that only for the .w?)

18:01 <ajax> struct pipe_depth_stencil_alpha_state vs struct st_visual

18:02 <ajax> one of these has several more members than the other

18:02 tzimmermann has quit [Ping timeout: 480 seconds]

18:04 aravind has quit [Ping timeout: 480 seconds]

18:04 aravind has joined #dri-devel

18:04 <bl4ckb0ne> so that cant really be well represented in a 32 bits format

18:11 adjtm has quit [Quit: Leaving]

18:19 xexaxo has quit [Ping timeout: 480 seconds]

18:29 tobiasjakobi has joined #dri-devel

18:32 nirmoy has quit []

18:33 phomes has joined #dri-devel

18:36 <phomes> Can I ask for someone to assign https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11880 to marge for me?

18:37 <alyssa> 👀

18:37 <jekstrand> done

18:40 Peste_Bubonica has joined #dri-devel

18:42 <alyssa> dcbaker: eric_engestrom: fair warning, i'm about to cc mesa-stable on a fairly large (line count) patch

18:43 <alyssa> It should cherrypick cleanly, though

18:43 Kayden has quit [Quit: to lunch and the office]

18:43 <alyssa> Adds a workaround for a hardware bug (well, "feature", but.....) that we only understood, uh, yesterday

18:44 <alyssa> (Certain in-spec shaders cause GPU faults. It's rare, but occurs on the CTS and could occur in real workloads. It's not pretty.)

18:45 <alyssa> Just giving a heads up why I'm nominating a +71 insertions patch for stable

18:46 <dcbaker> alyssa: thanks! generally if it's in a specific driver and the main devs are nominating it I just assume you know what you're doing. If you nominated a 71 line NIR change I might get nervous :)

18:46 <alyssa> dcbaker: haha, fair enough

18:49 rsalvaterra_ has joined #dri-devel

18:51 tzimmermann has joined #dri-devel

18:55 rsalvaterra has quit [Ping timeout: 480 seconds]

19:00 frieder has joined #dri-devel

19:07 agx has quit [Read error: Connection reset by peer]

19:08 agx has joined #dri-devel

19:12 agx has quit [Read error: Connection reset by peer]

19:13 agx has joined #dri-devel

19:15 gpoo has quit [Ping timeout: 480 seconds]

19:16 gpoo has joined #dri-devel

19:33 tzimmermann has quit [Quit: Leaving]

19:41 Kayden has joined #dri-devel

19:45 frieder has quit [Remote host closed the connection]

19:46 tobiasjakobi has quit [Remote host closed the connection]

19:48 dviola has joined #dri-devel

19:49 vivijim has quit [Remote host closed the connection]

19:59 rasterman has quit [Quit: Gettin' stinky!]

20:09 phomes has quit [Remote host closed the connection]

20:16 adjtm has joined #dri-devel

20:21 <robclark> airlied: looks like Caleb found the root issue, it wasn't drm/msm pull req, see "drm/msm: Disable frequency clamping on a630" thread.. but if danvet doesn't need scheduler conversion in drm-next it is ok to hold off

20:22 vivijim has joined #dri-devel

20:30 rasterman has joined #dri-devel

20:35 <danvet> robclark, the bikeshed settled on some naming, so I can rebase the set

20:35 <danvet> so would still be good to have msm scheduler stuff in there, but doesn't need to be right now

20:36 <robclark> ok, it's your call

20:37 <robclark> I'll try and actually have a look at your series next week..

20:37 thellstrom has joined #dri-devel

20:43 agx has quit [Read error: Connection reset by peer]

20:43 agx has joined #dri-devel

20:47 rsalvaterra has joined #dri-devel

20:48 vivijim has quit [Quit: Lost terminal]

20:49 vivijim has joined #dri-devel

20:53 rsalvaterra_ has quit [Ping timeout: 480 seconds]

20:58 <mlankhorst> airlied: bit late, but hopefully still in time with my pull req!

21:23 rasterman has quit [Quit: Gettin' stinky!]

21:31 alyssa has left #dri-devel [#dri-devel]

21:33 lemonzest has quit [Quit: Quitting]

21:35 danvet has quit [Ping timeout: 480 seconds]

21:36 aravind has quit [Ping timeout: 480 seconds]

21:54 <JoshuaAshton> agd5f: May I nudge you about https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/168 ? ( cc: hakzsam )

21:57 pnowack has quit [Quit: pnowack]

22:07 mlankhorst has quit [Ping timeout: 480 seconds]

22:10 vivijim has quit [Ping timeout: 480 seconds]

22:34 Duke`` has quit [Ping timeout: 480 seconds]

22:36 agx has quit [Remote host closed the connection]

22:38 jkrzyszt has quit [Ping timeout: 480 seconds]

22:41 agx has joined #dri-devel

22:44 pcercuei has quit [Quit: dodo]

22:57 Peste_Bubonica has quit [Quit: Leaving]

23:01 iive has quit []

23:13 idr has quit [Quit: Leaving]

23:16 Kayden has quit [Quit: go home]

23:24 mbrost has joined #dri-devel

23:25 Lucretia has quit []

23:37 camus has quit []

23:51 <graphitemaster> jekstrand, Do Gallium drivers really not know what derivatives are used by texture sampling? My comment on 12097 was based on the assumption that this capability bit is fed to Mesa such that Mesa could make better optimizations.

23:52 <graphitemaster> It feels like you can't do the right thing here unless the driver knows what the hardware is actually going to use for implicit lod. Otherwise you're potentially emitting different code when it's not necessary.

23:57 <jenatali> It sounds like (for some hardware at least) there's no way to modify the derivate operation done by the sampler unit when you request implicit derivatives

23:57 <jenatali> Which is likely to be more efficient than explicitly computing derivatives and feeding them into the sampler op

23:58 <imirkin_> graphitemaster: the glHint is not used anywhere by gallium drivers

23:59 <imirkin_> (nor is it made available to them)

23:59 <imirkin_> the sampler state definition does not include a quality bit

23:59 <graphitemaster> Exactly. My hope was that drivers could report what derivative operation is done by implicit and then Mesa could just generate explicit texture Grad for texture, to support derivatives that do not match what the driver has, and then for there to be a NIR lowering pass for code that explicitly use derivative control which happens to match the hardware. This would be ideal and consistent then.

23:59 <imirkin_> textureGrad is *super* expensive to support on some hardware (e.g. nvidia)