#dri-devel on 2021-08-11 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:02 hanetzer has joined #dri-devel

00:12 mbrost has quit [Ping timeout: 480 seconds]

00:16 hanetzer has quit []

00:22 mbrost has joined #dri-devel

00:48 imirkin_ has joined #dri-devel

00:51 Lucretia has quit []

00:51 hanetzer has joined #dri-devel

00:54 <imirkin_> nir question...

00:54 <imirkin_> vec1 32 ssa_27 = deref_var &img (uniform imageBuffer)

00:54 <imirkin_> intrinsic image_deref_store (ssa_27, ssa_29, ssa_0, ssa_26, ssa_28) (0, 0, 0, 0, 0) /* image_dim=1D */ /* image_array=false */ /* format=none */ /* access=0 */ /* src_type=invalid */

00:54 <imirkin_> why does it say "image_dim=1D"?

00:54 <imirkin_> is that separate from the deref var?

00:55 <imirkin_> this continues after the gl_nir_lower_images pass which converts it into a image_store intrinsic

00:56 <imirkin_> hm, i see. glsl_to_nir explicitly calls nir_intrinsic_set_image_dim.

00:58 flto has quit [Remote host closed the connection]

00:58 flto has joined #dri-devel

01:00 <imirkin_> yeah, that fixes it.

01:00 <imirkin_> thanks, rubber duck.

01:28 khfeng has joined #dri-devel

01:29 mceier has quit [Remote host closed the connection]

01:29 mceier has joined #dri-devel

01:36 pnowack_ has joined #dri-devel

01:43 pnowack has quit [Ping timeout: 480 seconds]

01:49 <imirkin_> this makes no sense... bumping up the no-attachment fb's layers seems to help the pbo download on nvc0. but ... why. shouldn't matter. grr.

01:50 ngcortes has quit [Remote host closed the connection]

01:51 pnowack_ has quit []

01:51 <imirkin_> ah no. it does not.

01:51 <jekstrand> imirkin_: More parallelism?

01:51 <jekstrand> Oh, or that. :)

01:51 <imirkin_> so wtf did i do to make it always pass

01:51 <imirkin_> and then re-break

01:51 <imirkin_> grrrr

01:52 <imirkin_> well at least the thing that didn't make sense isn't the case anymore. that's nice.

01:52 <mareko> grrr is the correct swizzle

01:52 <imirkin_> ah yes. there ya go!

01:53 <imirkin_> th-arrr ya go, that is.

01:53 <imirkin_> arrg is another good swizzle.

01:54 <imirkin_> and the ever-popular "stab" swizzle

01:55 <imirkin_> oh wait, no. you can't mix/match. o well.

02:09 <alyssa> mareko: hehe

02:10 <imirkin_> ok, did another random thing. it randomly fixes it.

02:10 <imirkin_> sigh

02:10 <imirkin_> now i need to wait 5 minutes, since that's what "broke" my fix last time...

02:14 mbrost has quit [Ping timeout: 480 seconds]

02:14 <imirkin_> hm, passing/failing a bit more randomly now. and mostly passing. progress!

02:16 <imirkin_> hey, should it be possible to do a render which writes to an imagebuf, and then map it and expect to see shader results? what barriers, if any, would be required?

02:17 boistordu_ex has quit [Ping timeout: 480 seconds]

02:29 <mareko> CPU reads don't need any barriers

02:32 <imirkin_> that's what i was afraid of... gr

02:32 <imirkin_> i think we're forgetting to mark the buffers. oops.

02:33 <imirkin_> not sure how we've survived this long

02:36 <imirkin_> ah no. nevermind. it's there.

02:36 Company has quit [Read error: Connection reset by peer]

02:40 mbrost has joined #dri-devel

02:42 <alyssa> rrrr

03:25 JohnnyonFlame has quit [Ping timeout: 480 seconds]

03:26 gregoy has joined #dri-devel

03:36 <gregoy> would this be an appropriate place to ask a noob question?

03:37 <jekstrand> sure

03:39 <gregoy> I am trying to build (meson) d3dadapter9 and it can't find a subproject directory or llvm.wrap

03:41 <gregoy> what did I mess up

03:45 muhomor has joined #dri-devel

03:50 muhomor has quit [Remote host closed the connection]

03:50 muhomor_ has quit [Ping timeout: 480 seconds]

03:52 jessica_24 has quit [Quit: Connection closed for inactivity]

03:55 <gregoy> I understand I have to update llvm now. how do none of you cry

03:55 mbrost has quit [Remote host closed the connection]

03:59 <HdkR> gregoy: The biggest PCs possible usually

03:59 <airlied> or use a distro prepackaged llvm

04:00 <gregoy> I just thought of that, I tried finding it using apt but couldn't, found it with Synaptic

04:00 tzimmermann has joined #dri-devel

04:03 <gregoy> thx frens

04:03 <gregoy> haha still found only version 10. I'll stop bothering you guys though

04:56 Duke`` has joined #dri-devel

05:16 gregoy has quit [Remote host closed the connection]

05:20 thellstrom1 has joined #dri-devel

05:20 thellstrom has quit [Remote host closed the connection]

05:23 itoral has joined #dri-devel

05:40 macromorgan has quit [Read error: Connection reset by peer]

06:15 Duke`` has quit [Ping timeout: 480 seconds]

06:33 alanc has quit [Remote host closed the connection]

06:34 alanc has joined #dri-devel

06:39 jkrzyszt has joined #dri-devel

06:40 frieder has joined #dri-devel

06:40 frieder_ has joined #dri-devel

06:41 <krh> jekstrand: yes, it's quite useful

06:44 mlankhorst has joined #dri-devel

06:52 gouchi has joined #dri-devel

06:52 frieder_ has quit []

06:53 pnowack has joined #dri-devel

06:57 camus1 has joined #dri-devel

06:58 camus has quit [Remote host closed the connection]

07:05 gouchi has quit [Quit: Quitte]

07:20 pcercuei has joined #dri-devel

07:20 rasterman has joined #dri-devel

07:25 thellstrom1 has quit [Ping timeout: 480 seconds]

07:32 thellstrom has joined #dri-devel

07:36 mattrope has quit [Read error: Connection reset by peer]

07:47 jagan_ has joined #dri-devel

07:47 lynxeye has joined #dri-devel

07:50 lemonzest has joined #dri-devel

07:51 camus1 has quit [Remote host closed the connection]

07:51 camus has joined #dri-devel

07:52 Ahuj has joined #dri-devel

08:12 danvet has joined #dri-devel

08:14 tzimmermann has quit [Quit: Leaving]

08:24 jagan_ has quit [Remote host closed the connection]

08:35 Lucretia has joined #dri-devel

08:51 pochu has joined #dri-devel

09:06 thellstrom has quit [Remote host closed the connection]

09:19 JohnnyonFlame has joined #dri-devel

09:20 K`den has joined #dri-devel

09:20 Kayden has quit [Read error: Connection reset by peer]

09:21 thellstrom has joined #dri-devel

09:22 K`den is now known as Kayden

09:36 <dv_> is it generally possible to dup a dmabuf fd with an applied offset?

09:37 <daniels> no

09:37 <dv_> for example, when one component produces single-planar frames (= all planes in one dmabuf), and another expects multi-planar frames (one dmabuf per plane)

09:37 <dv_> hm

09:37 <dv_> so, such cases can only be handled by copying the pixels?

09:37 <daniels> dup() gives you a new file descriptor (i.e. number in your process's fd table) referring to the same underlying file description (kernel data structure)

09:37 <daniels> seek position is a property of the file description

09:38 <emersion> what components are you talking about?

09:38 <daniels> you need to fix your API to take per-plane offsets

09:38 <daniels> as all the others do

09:38 <emersion> DMA-BUF APIs need to take one FD and one offset per plane

09:38 <dv_> ahh good point

09:38 <emersion> then the FDs can refer to the same buffer object, or not

09:39 <emersion> if you're importing to an API that doesn't support multiple FDs, like Vulkan without disjoint VkImages, you can check whether all FDs refer to the same buffer object with inode numbers

09:40 <dv_> alright

09:41 <dv_> a second, similar case I have here (most of this is proprietary stuff unfortunately) is about using vl42 mem2mem based video decoding when the video decoder only supports the multi-planar API

09:41 <dv_> and other bits expect decoded frames in a single-planar fashion

09:41 <emersion> ah, proprietary stuff

09:41 * emersion walks out

09:42 <dv_> if I want to import dmabufs that were allocated by something else,

09:42 <dv_> I would have to queue a v4l2_buffer with the v4l2_planes containing the same dmabuf FD N times (N = number of planes), but with different data_offset values?

09:42 <dv_> ah well the v4l2 bit is not proprietary

09:43 <dv_> the docs say: "Offset in bytes to video data in the plane. Drivers must set this field when type refers to a capture stream, applications when it refers to an output stream."

09:43 <daniels> yep, correct

09:43 <dv_> but, from what I gather, in v4l2 lingo, "capture" is the side that provides the -decoded- frames

09:43 <dv_> so .. according to the docs, I cannot set data_offset, because the driver will?

09:44 <dv_> or is this an oversight in the docs and this does not apply to mem2mem based decoding?

09:44 <daniels> it certainly applies to anything you provide as the source for m2m decoding

09:44 <daniels> I'm not sure about the dest

09:49 shfil has joined #dri-devel

09:51 mlankhorst has quit [Ping timeout: 480 seconds]

10:18 thellstrom has quit [Remote host closed the connection]

10:21 <tomeu> airlied: btw, I'm having to move some work from the record stage to the execute one, so we don't have lvp-specific stuff in the common code

10:21 <tomeu> nothing so far seems specially computation-intensive, but I guess you have a better idea of what a perf impact this will have

10:30 mlankhorst has joined #dri-devel

10:41 flacks has quit [Quit: Quitter]

10:42 flacks has joined #dri-devel

10:49 camus1 has joined #dri-devel

10:49 camus has quit [Read error: Connection reset by peer]

10:51 imre has joined #dri-devel

11:26 itoral has quit []

11:58 thellstrom has joined #dri-devel

12:33 JohnnyonFlame has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

12:34 JohnnyonFlame has joined #dri-devel

13:08 vivijim has joined #dri-devel

13:09 mlankhorst has quit [Ping timeout: 480 seconds]

13:15 mlankhorst has joined #dri-devel

13:30 sdutt_ has joined #dri-devel

13:30 sdutt has quit [Read error: Connection reset by peer]

13:33 macromorgan has joined #dri-devel

13:57 Kayden has quit [Quit: Leaving]

13:57 Kayden has joined #dri-devel

14:20 Ahuj has quit [Ping timeout: 480 seconds]

14:23 <tomeu> airlied: I think this won't end up being a problem with a void *driver_data member in the cmd structs, for drivers to cache data if that makes a difference

14:31 ezequielg has quit []

14:31 ezequielg has joined #dri-devel

14:32 alyssa has left #dri-devel [#dri-devel]

14:38 Daanct12 has joined #dri-devel

14:44 Danct12 has quit [Ping timeout: 480 seconds]

14:46 mattrope has joined #dri-devel

14:55 Duke`` has joined #dri-devel

15:14 bluebugs has quit [Remote host closed the connection]

15:30 vivek has joined #dri-devel

15:32 sdutt_ has quit []

15:32 sdutt has joined #dri-devel

15:33 nirmoy has joined #dri-devel

15:36 <ajax> jekstrand: what exactly do we want to do with i965 support in the amber branch?

15:36 jessica_24 has joined #dri-devel

15:36 <ajax> jekstrand: now that there's a crocus i'm kind of inclined to say anv is main-only and amber just cuts off i965 after gen7/chv

15:37 <jekstrand> ajax: That works for me.

15:37 <ajax> so then there's no interop question of anv from one tree and gl from another

15:38 <jekstrand> ajax: We should make sure crocus has EXT_external_objects so we don't lose functionality.

15:38 <ajax> it sets the bit

15:38 <ajax> case PIPE_CAP_MEMOBJ:

15:38 <ajax> return devinfo->ver >= 7;

15:38 <jekstrand> I feel pretty sorry for shadeslayer that he's spent so much time on it only to get cut off. It was just really poor timing. :(

15:38 <jekstrand> ajax: Sure, but has it been tested and do we know it works?

15:38 <ajax> right, no idea about testing

15:39 <ajax> yeah that really sucks

15:39 <jekstrand> hikiko, shadeslayer, tpalli: Has anyone tested crocus with EXT_external_objects?

15:39 <jekstrand> I think crocus has a much better chance of working more-or-less out-of-the-box than i965 since it's modeled on iris.

15:39 pekkari has joined #dri-devel

15:41 <ajax> alright, i'll redo the amber patch to make that happen

15:45 <jekstrand> sgtm

15:46 <jekstrand> Actually, we don't have to cut off i965 as long as we always prefer iris/crocus

15:46 <ajax> true

15:46 <jekstrand> As long as we make amber not build ANV without lots of option smashing.

15:46 <ajax> well right now amber means no gallium drivers

15:47 <imirkin> jekstrand: you probably know this, but crocus isn't nearly as production-ready as i965 is atm. it's being improved, but i don't think it's ready to be used in a distro as a default...

15:47 <jekstrand> imirkin: Yeah, that's why I kind-of think amber and/or dropping classic is a bit premature yet.

15:47 <jekstrand> I was hoping to give crocus another 6 months to bake.

15:48 <imirkin> although it's probably in a better spot than i965 was 6-7 years ago? dunno :)

15:49 <jekstrand> I also kind-of think the only way to get it fully baked is to get people using it and filing bugs.

15:49 <ajax> huh, i guess !10153 was only april

15:49 <jekstrand> The best way to do that is for a distro to try shipping it. :D

15:49 <imirkin> jekstrand: well, there are people filing bugs now

15:49 <imirkin> i don't think we need a fatter bug-pipe unless it will also come with more people looking at those bugs

15:50 <jekstrand> heh

15:50 <jekstrand> sure

15:50 <ajax> i feel pretty comfortable saying that, if 21.2.x is the amber branch, that the crocus in 21.3.0/22.0 (whichever happens first) will be plenty baked

15:50 <shfil> not sure if it's possible for mesa, but artifacts per each commit would help with testing

15:50 <ajax> so that by the time you _need_ to decide on mesa or amber for your crocus-era support, crocus will be a no-brainer

15:51 <ajax> maybe that's optimistic

15:52 <imirkin> i'd point you at the current list of crocus bugs, but gitlab is not responding

15:52 <imirkin> i think basically only airlied is looking at them

15:53 <ajax> yeah that basketball team is terrible, they only have magic johnson

15:53 <ajax> i take your point

15:53 <imirkin> well ... he is doing the work of 10 men. should his availability drop slightly to only be able to do the work of 9 men, crocus may get the axe :)

15:54 <jekstrand> Isn't the work of 10 men only 20% of airlied?

15:54 <imirkin> that's why they call it a "20% project"

15:54 <ajax> i'm just really itchy to chop classic

15:55 <ajax> the number of times i go to fix something and discover it's broken entirely differently in classic in a way that invalidates how i wanted to fix it the first time...

15:56 <jekstrand> ajax: You're not alone there

15:56 <ajax> and if the cost of chopping that off is a somewhat rough support experience for people running hot-off-the-presses mesa on nine year old silicon... sure?

16:05 <jekstrand> Ugh... Just came across param compaction again. Let's nuke i965!

16:05 <jekstrand> *grumble*

16:05 ezequielg has quit []

16:05 <imirkin> jekstrand: if you guys are all itching to make i965 go away, maybe you can put some elbow grease towards fixing crocus?

16:05 <hikiko> jekstrand, no, I was planning too but I need to make a few fixes to run Vulkan first (I use crocus on a FreeBSD la[top, not on Linux)

16:06 ezequielg has joined #dri-devel

16:06 <imirkin> https://gitlab.freedesktop.org/mesa/mesa/-/issues?scope=all&state=opened&label_name[]=crocus

16:06 <imirkin> mostly gen4/4.5 issues, although could just be a side-effect of this being the hw the people testing have

16:08 alyssa has joined #dri-devel

16:08 <alyssa> ERROR: No suitable configuration found for GL config rgba8888d24s8ms0 at glcTestRunner.cpp:341

16:08 <alyssa> does cts-runner not do surfaceless?

16:08 <imirkin> cts-runner does what you tell it to do

16:08 <imirkin> how did you build it?

16:09 <alyssa> with DEQP_TARGET=surfaceless

16:09 <alyssa> normal deqp runs ok but requires a boatload of options:

16:09 <imirkin> did you pass those to cts-runner?

16:09 <alyssa> EGL_PLATFORM=surfaceless ./deqp-gles* --deqp-surface-type=pbuffer --deqp-visibility=hidden --deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=256 --deqp-surface-height=256

16:09 <imirkin> i actually haven't really used cts-runner much.

16:09 <alyssa> cts-runner doesn't have options for that stuff

16:09 <imirkin> why pbuffer surface type?

16:10 <alyssa> dunno

16:10 <imirkin> heh

16:10 <ajax> because surfaceless doesn't have any other kind?

16:10 <daniels> the thing about surfaceless is that it has no surfaces

16:10 <alyssa> daniels: pikachu

16:10 <jekstrand> imirkin: Yeah, so there's a gallium HUD issue, Gimp, OpenMW, and the rest are Nine. I don't think I care about Nine at the moment.

16:10 <ajax> it has no windows, really

16:10 <daniels> alyssa: https://youtu.be/IfQumd_o0Gk

16:10 <imirkin> jekstrand: yeah, don't worry about nine :)

16:10 <daniels> ajax: nor pixmaps

16:10 <imirkin> jekstrand: a bunch of those are in various stages of fixing

16:10 <ajax> but i thought there was --deqp-surface-type=fbo ?

16:11 <imirkin> jekstrand: but as they get fixed, more get opened :)

16:12 <imirkin> basically two guys have been filing the majority of the issues (Wladislav and Angelo). they seem to have been running into problems fairly "early" into their testing/usage

16:12 <imirkin> i'm guessing more will come as the "obvious" stuff gets fixed

16:12 <imirkin> jekstrand: also some sort of CI to track regressions would be great. i think this has been discussed in the past, but i'm not aware of any clear action on that point.

16:13 <jekstrand> imirkin: I think we're running it in CI

16:13 <alyssa> so err... should I be building CTS for X11 or Wayland instead?

16:13 <imirkin> ah cool

16:14 khfeng has quit [Ping timeout: 480 seconds]

16:21 pekkari has quit [Quit: Konversation terminated!]

16:22 <imirkin> jekstrand: do you know where?

16:22 <imirkin> i'm looking at e.g. https://mesa-ci.01.org/mesa_master/builds/26972/group/63a9f0ea7bb98050796b649e85481845#platforms

16:23 <jekstrand> imirkin: Maybe we're not yet

16:23 <hikiko> jekstrand, I was actually wondering about the state of Vulkan on GPUs that can run crocus. I suppose they support 1.0, but do they also support the extensions for external memory and external memory capabilities?

16:23 <jekstrand> Yes, they should

16:25 <hikiko> hmm, ok I'll try to run Vulkan on FreeBSD and check this then, I had some issues with the loader so far but I think/hope they aren't hard to fix :)

16:40 xexaxo has quit [Ping timeout: 480 seconds]

16:42 mbrost has joined #dri-devel

16:45 <robclark> ajax: there is --deqp-surface-type=fbo but last I looked deqp had a lot of fbo vs window/pbuffer bugs (which admittedly gl doesn't make easy because of small diffs btwn fbo vs !fbo)

16:46 frieder has quit [Remote host closed the connection]

16:49 <zmike> iris/anv don't seem to work with pbuffer+surfaceless

16:49 <imirkin> alyssa: i always build x11_egl_glx

16:49 <zmike> posted a bug about that a while ago

16:49 <imirkin> it avoids me having to debug CTS

16:49 <imirkin> i assume that's what everyone else is doing

16:50 <robclark> In CI we are using surfaceless except for egl tests, iirc? So I guess surfaceless gets the most regular usage

16:51 <imirkin> (and when i say "everyone else", i mean "the people who use/develop CTS")

16:51 <jekstrand> imirkin: I can't repro the GALLIUM_HUD bug on my ILK. :(

16:51 <imirkin> jekstrand: i think airlied fixed that specific issue, and it was on gen4/45

16:51 <imirkin> oh wait no

16:51 <imirkin> wrong bug

16:51 <imirkin> just kidding

16:52 <imirkin> this one's on SNB. i should be able to check it out i guess.

16:52 <imirkin> jekstrand: wait. which issue? 5120? that one's gen4

16:53 <imirkin> fixed by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12058

16:53 <imirkin> but then the person had more issues

16:53 <imirkin> and since filing a new bug is an extremely expensive operation, opted to add follow-on things to the same bug

16:59 <alyssa> jekstrand: do you understand khronos conformance rules?

16:59 <alyssa> because I've been building the latest opengl cts tag for the past hour

16:59 <alyssa> and just now realize there's a separate *much older* opengl es cts tag

16:59 <alyssa> am I building something that's too new?

16:59 <alyssa> (for GLES conformance)

17:00 <imirkin> the rule is simple... pay money to khronos, get conformance.

17:00 <alyssa> imirkin: heh

17:00 <alyssa> building opengl-cts-4.6.1.0

17:00 <alyssa> (from Jul 1)

17:01 <jekstrand> imirkin: Yes, more-or-less.

17:01 <alyssa> but on Feb 4 there was opengl-es-cts-3.2.7.0

17:01 <alyssa> and I'm trying for OpenGL ES 3.1 conformance

17:01 <alyssa> complicating matters, I think a bug fix was merged in May that applies to us.

17:01 <imirkin> the more money you pay, the more exceptions you get

17:01 <imirkin> for enough money, you get to just write the tests

17:02 * zmike submits gallium noop for conformance

17:02 <imirkin> alyssa: afaik you can apply patches on top of cts

17:02 <imirkin> and just mention that you did so in your submission

17:03 <alyssa> imirkin: i just don't want to resume this build since this is so painful on my little board

17:03 <alyssa> ad it's almost done

17:03 <imirkin> alyssa: look into cross-compiles?

17:03 <alyssa> I mean the right answer at this point is to build from the M1 but yeah :p

17:04 <alyssa> Anyway I errrr

17:04 <alyssa> I can't tell if using an OpenGL CTS release for an OpenGL ES submission is valid. These rules are complicated.

17:07 <jekstrand> alyssa: What API are you wanting to get conformance on? GL or ES?

17:08 nroberts has quit [Ping timeout: 480 seconds]

17:08 <jekstrand> alyssa: GL, ES, and Vulkan are effectively different projects with different release cycles that live in the same repo.

17:08 ceyusa has quit [Ping timeout: 480 seconds]

17:08 * airlied has the fix for qt fonts on gen4 but wanted to try.and alternate

17:08 <airlied> that stills allows blt usage

17:10 <alyssa> jekstrand: ES

17:10 <alyssa> ES3.1

17:10 <jekstrand> Then use an opengl-es-cts tag

17:10 <daniels> (the latest one)

17:11 gouchi has joined #dri-devel

17:11 <alyssa> well starting a fresh build from scratch then.

17:19 <imirkin> airlied: one approach is to just push the change as you have and file a "investigate later" type of bug

17:24 vivek has quit [Ping timeout: 480 seconds]

17:27 ngcortes has joined #dri-devel

17:27 <airlied> imirkin: indeed, I found it late in the day, then took a day off :-P

17:28 <imirkin> airlied: yeah, didn't intend it as a criticism :)

17:28 <airlied> meanwhile my CL CTS run on llvmpipe is running conversions tests for all the day off

17:29 <imirkin> no rest for the computer?

17:33 <ajax> we taught the sand to think, we didn't teach it to slack off

17:33 ngcortes has quit [Remote host closed the connection]

17:33 dogukan has joined #dri-devel

17:33 <imirkin> ah, so the deep learning hasn't learned yet...

17:33 <dcbaker> ajax: but the we do slack off, and we taught the sand, so...

17:34 <imirkin> (i guess that's why they call it learn*ing*...)

17:36 xexaxo has joined #dri-devel

17:39 dogukan has quit [Quit: Konversation terminated!]

17:40 dogukan has joined #dri-devel

17:41 <FLHerne> robclark: https://arstechnica.com/gadgets/2021/04/pixel-5-sees-dramatically-improved-gpu-performance-after-april-patch/ <- I was wondering, is this related to it using Freedreno before/after/both?

17:41 <FLHerne> or just blob-driver screwiness?

17:42 <robclark> the android phones are (so far) still using blob driver.. but I suppose there is also lots of room for devfreq to screw up performance

17:45 dogukan has quit []

17:45 dogukan has joined #dri-devel

17:49 dogukan has quit []

17:49 dogukan has joined #dri-devel

17:49 dogukan has quit []

17:50 dogukan has joined #dri-devel

17:50 <airlied> for crocus I'm reasonably happy on ivb/hsw, snb is still a bit of a pita, and gen4/5 is mostly just small diffs from 965 that I'm ironing out, and then you have wierdness like the idr compiler fixes :-P

17:50 <imirkin> airlied: the intel ci bits never came through right?

17:51 <airlied> imirkin: I think they were running but having some issues

17:51 <imirkin> ah ok

17:51 <imirkin> do you remember offhand what the issues are with snb?

17:51 <imirkin> is xfb still broken?

17:52 <airlied> yeah snb still has the deqp hang

17:52 <imirkin> bleh

17:52 <airlied> I think I know why it happens with crocus not 965, but how to fix it is still no clue

17:53 <airlied> it's nearly at the beg someone to figure out SNB simulator

17:53 dogukan has quit []

17:53 <FLHerne> robclark: thanks

17:54 * robclark not actually aware of *what* the perf improvement was for the p5's.. I'd actually be a bit curious if it was userspace or kernel (or something outside of the gfx driver stack)

17:55 <airlied> okay posted the gen4/5 blt corruption fix

17:55 <imirkin> airlied: any issues on snb that i could look at? i.e. not the hang? :)

17:57 <airlied> imirkin: I think the biggest missing piece on gen6 is push constant handling

17:57 pochu has quit [Ping timeout: 480 seconds]

17:57 dogukan has joined #dri-devel

17:57 <airlied> since we want to use the ubo push constant lowering not the old uniform compaction code

17:58 <imirkin> airlied: does i965 have it?

17:58 <airlied> imirkin: no

17:58 <airlied> i965 uses the old uniform compaction code

17:59 <airlied> lols the one gitlab issue mentions crocus on snb, but crocus works where 965 hangs :-P

18:00 <ajax> sounds like it's production-quality to me

18:00 <imirkin> yea i'm not debugging that one.

18:00 * airlied is nine avoidant also

18:01 <airlied> at some point I suppose I should work out to install it

18:01 <imirkin> i'll try to check out the nine issues

18:01 <imirkin> i couldn't repro anythign with Xnine

18:02 <imirkin> the problem is that i probably need a 32-bit build

18:02 <imirkin> and we dropped autotools

18:02 <ajax> why is autotools related

18:02 <imirkin> because autotools makes cross-compile / 32-bit builds easy

18:03 <imirkin> whereas it's a substantial fight (which i generally lose) each time with meson

18:03 <zmike> it's easy with meson too

18:03 <zmike> https://gist.github.com/Venemo/a9483106565df3a83fc67a411191edbd

18:03 <zmike> fire up your copy/paste and get in there

18:04 <imirkin> don't think i have a llvm-config-32

18:04 camus has joined #dri-devel

18:04 <imirkin> nor a pkg-config-32

18:04 <zmike> yea me neither, just change to use lib paths instead

18:04 alyssa has left #dri-devel [#dri-devel]

18:05 <zmike> e.g., PKG_CONFIG_PATH=/usr/lib or /usr/lib32 or whatever

18:05 <imirkin> yeah, but then it's not baked into the build iirc

18:05 <zmike> shrug

18:05 <imirkin> there is no "configure PKG_CONFIG_PATH=/bla" anymore

18:05 <zmike> just a guy over here debugging 32bit steam games every day on a meson build

18:05 <imirkin> anyways, i know it's possible

18:05 <imirkin> i might have even done it once or twice

18:06 <imirkin> it's just a fight each time

18:06 <airlied> meson --pkg-config-path ?

18:06 camus1 has quit [Read error: Connection reset by peer]

18:06 <airlied> meson configure --pkg-config-path I suppose

18:06 <zmike> I still do it the same way as autotools with the env vars

18:07 <airlied> and of course not having to deal with 32-bit llvm is an advantage :-P

18:08 <imirkin> yeah, that's what i usually opt for i think?

18:08 <emersion> > Running Starcraft 2

18:08 <imirkin> otherwise it just doesn't work

18:09 <emersion> lol

18:14 <idr> airlied: Is there a "need" for pre-SNB systems in gitlab CI?

18:14 <idr> I might have a person motherboard + CPU that I could donate to the cause... I'll have to look for it, though.

18:15 <idr> *a personal

18:15 <idr> #motherboardsarepeopletoo

18:17 iive has joined #dri-devel

18:21 <airlied> idr: I think for gitlab CI it's more of a where to host it and how infrastrucutre it needs

18:22 <idr> Right.

18:22 <idr> Do we still have the ability to host machines at PSU?

18:22 <airlied> not for gitlab I don't think

18:22 <idr> mattst88 and I can drive down there and work on stuff. Not to volunteer him or anything. :)

18:23 <airlied> gitlab CI requires more hands on maint

18:23 <daniels> not really

18:23 <daniels> we don't have constant physical access to most of those machines

18:23 <airlied> or at least good remote power :-P

18:23 <daniels> yes

18:23 <daniels> very good remote power + console + ability to drive your bootloader

18:24 <daniels> 'wait for someone to drive down and press reset' is a total non-starter

18:24 <idr> Right... I was more thinking of cases where the system has major problem and, for example, needs RAM replaced.

18:25 ngcortes has joined #dri-devel

18:25 <daniels> imirkin: $PKG_CONFIG_PATH _is_ baked into the build

18:25 <idr> There are IP-enabled power switches. Is that a problem X.org could solve with money?

18:26 jhli has joined #dri-devel

18:26 <daniels> just don't buy the wrong IP-enabled power switch which periodically requires you to drive in and press its physical reset button because it's forgotten how to do fucking DHCP

18:26 lemonzest has quit [Quit: Quitting]

18:26 <idr> D'oh

18:26 <daniels> I mean, like, hypothetically that might be a problem you could face, I guess

18:26 <imirkin> daniels: ok. that must be new (as of the last time i looked at this stuff, which isn't frequent)

18:27 <daniels> imirkin: it's always been that way

18:27 <idr> This also seems like a problem that could be solved with a RPi and a couple relays.

18:27 <daniels> yeah, until your RPi falls over, etc

18:27 <daniels> anyway, the bigger issue is scale - SNB is not super quick ...

18:27 <imirkin> daniels: ok. then i was doing something wrong :)

18:28 <idr> And G45 and ILK are even worse...

18:28 <idr> But, like i915, far far fewer tests run on them.

18:28 <daniels> by comparison, the AMD Stoney Islands Chromebooks have absolutely terrible CPUs to be fair, but several years post-SNB, and even 10 of them isn't enough to keep us from having to massively decimate the test suite to keep it running

18:28 <idr> You don't have to wait for 4 billion OpenGL 4.6 or 5 trillion Vulkan tests. :)

18:28 <daniels> IIRC there are something like 30 RPi4s, and honestly they're probably not uncompetitive with SNB

18:29 <idr> What GL version do the Stoney Islands GPUs support?

18:29 xexaxo has quit [Remote host closed the connection]

18:29 <airlied> yeah getting an gm45 or ilk piglit run down to 10mins would a bit of work

18:30 <imirkin> that's the saving grace of gen4/5 -- no GL3+ ;)

18:30 * airlied listening to his gm45/ilk machines grind their fans

18:30 <idr> I bet it's more than GL 3.3 (SNB) or GL 2.1 + many extensions (G45).

18:30 <airlied> imirkin: hey it has EXT_gpu_shader4 now :-P

18:30 <agd5f> idr, stoneyridge? 4.6

18:31 <daniels> oh yeah, for sure - maybe a better metric is the RK3399 Panfrost SoCs at ES3.1, and IIRC despite fairly heavy decimation we have something like 7 of them?

18:31 <imirkin> airlied: thankfully that has few tests

18:31 <idr> Yeah... so that will run probably 3x or 4x the tests.

18:31 <daniels> idr: except that we only run something like 1/10th of the tests to keep the runtime acceptable :P

18:31 <airlied> imirkin: once I enabled all the texture tests it takes a bit more

18:32 <idr> Fair.

18:32 <idr> Either way... it sounds like there are other problems to solve first.

18:32 <airlied> daniels: it does seem at least for the chromebooks there'd be a case to stick a real AMD gpu in a real machine alongside them to cover a bunch of other cases

18:32 <daniels> anyway, point is 1 machine definitely works for local testing (e.g. anholt has some personal ILK/PNV hooked up IIRC?), but for putting it in the mainstream tree 1 would become a capacity/reliability problem super quick

18:33 <daniels> and reading all the pings I get about CI is already super motivational as is

18:33 <idr> Still better than watching the news.

18:33 <idr> Lol

18:33 <daniels> airlied: yeah, we've thought about that on our side, but the server room is pretty physically full and surprisingly it's taking a while to get our office reconstructed to make it way larger

18:34 <daniels> so we can just about squeeze in more SBC/Chromebook, but not much more

18:34 <mdnavare> agd5f: jekstrand: If Wayland does per display buffer even in clone mode for Multi GPU case then we wont run into the buffer sharing issues right?

18:34 <daniels> (also getting a backup fibre provider after we lost all connectivity for a few days because someone severed the undergorund cable; looking forward to the backup provider severing the main link whilst installing the new one)

18:35 <agd5f> mdnavare, right

18:35 <daniels> mdnavare: yes it does do per-CRTC buffers for composition; in direct-scanout cases a client buffer may be used on both GPUs, but you can just refuse that and it'll fall back gracefully

18:37 <glennk> how are the slow runners scheduled? do they attempt to run on all commits, or just the most recent one each time it is free?

18:37 <mdnavare> daniels: And the existing wayland stack handles that already where it scans out from per CRTC buffer on respective displays connected to IGPU and DGPU?

18:37 <daniels> mdnavare: when composition is being used, yes

18:37 <daniels> we don't share composition target buffers between CRTCs because tearing is bad

18:38 <daniels> glennk: they run on all commits, and we decimate test coverage to bring runtime to an acceptable level

18:38 <daniels> glennk: we could do daily runs, but realistically it'd go red tomorrow and no-one would ever pay any attention to it

18:38 <glennk> from experience, i can recommend trying the opposite and run full coverage at a lower rate

18:38 <mdnavare> daniels: agd5f: So to verify this with Wayland on Gnome desktop, in a system with IGPU and DGPU, we can set join displays for extended case and mirror displays to clone?

18:39 <daniels> glennk: realistically it'd go red tomorrow and no-one would ever pay any attention to it

18:39 <daniels> it works well when you have small/dedicated/focused teams with people incentivised to care

18:39 <daniels> that is ... not Mesa

18:40 <airlied> yeah we'd totally be the gitlab CI equiv of the llvm buildbot

18:40 <agd5f> mdnavare, yes

18:41 pochu has joined #dri-devel

18:41 <glennk> daniels, i'll add to my recommendation that you would still run the quick sanity tests frequently

18:41 <airlied> I think there's likely scope for nightlies on some of the more corner case hw

18:41 <daniels> mdnavare: yep

18:41 <mdnavare> agd5f: And no specific changes needed here in Mesa or Gem right? The compositors already handle this gracefully?

18:41 <agd5f> mdnavare, yes

18:42 <daniels> glennk: yeah so we are in fact looking at doing longer post-merge runs periodically, but ... only backed up by people incentivised to care

18:42 mlankhorst has quit [Ping timeout: 480 seconds]

18:42 <glennk> nifty

18:43 xexaxo has joined #dri-devel

18:44 <mdnavare> agd5f: daniels: Also when we force render on dGPU with DRI PRIME =1 for say a full screen gl application, in that case it will render on this buffer in lmem but now in clone or extended mode, how will the buffer be scanned out on IGPU ddisplay from smem as well as dgpu display from lmem?

18:45 <mdnavare> does the compositor just create a copy of that buffer onto which rendering happens and make that a per CRTC buffer?

18:46 <daniels> mdnavare: either KMS accepts the request to directly scan it out in which case the driver makes it happen somehow, or it denies the request to directly scan it out in which we fall back to composition into per-CRTC buffers

18:49 <mdnavare> daniels: So when it falls back in caseof prime rendering, the composition somehow access the rendered buffer which is on lmem?

18:50 * glennk raises hand asking what is lmem?

18:50 <mdnavare> Thats where the concern is that how does it do the buffer migration between lmem and smem when prime is set to 1 and rendering is forced to happen only on lmem

18:50 <jenatali> Local memory, i.e. VRAM (guessing from context)

18:50 <mdnavare> glennk: Local memory or the Vdieo RAM

18:50 <daniels> mdnavare: the compositor doesn't know anything about buffer placement

18:50 <glennk> can someone send intel the memo that this has been called vram since at least the 70s?

18:51 <daniels> mdnavare: the client sends the compositor a dmabuf; the compositor imports that dmabuf as an EGLImage for every EGL context it would need to be used in, and also attempts to import it as a KMS FB for every KMS context it would need to be used in

18:51 <daniels> if the buffer needs migration, then the two options are a) the driver fails the import (benign for KMS since the compositor can fall back to GPU composition, fatal for EGL since there is no fallback available), or b) magically does migration under the hood

18:54 <mdnavare> daniels: So from what I understand if our client is sending the compositor a dmabuf for the rendered buffer, then compositor should be able to access that rendered buffer in vram/lmem and scanout on IGPU display as well as direct scanout from vram to dgpu display?

18:55 <daniels> mdnavare: if you have one client sending a buffer (dmabuf) which needs to be displayed on two GPUs, it is _mandatory_ for EGLImage imports to succeed on both GPUs, else the client will be killed

18:56 <daniels> it is _optimal_ for KMS FB imports to succeed on both GPUs, but not required

18:58 <airlied> idr: should land those gfx5 compiler fixes as well, they don't seem to be making things worse here :-P

18:59 vivek has joined #dri-devel

18:59 <idr> airlied: I was just waiting for jekstrand... but I know he's been super busy lately.

18:59 pzanoni has quit [Ping timeout: 480 seconds]

19:02 <mdnavare> daniels: Okay currently we see a failure to light up the display connected to DGPU when we try the join displays, it just gives black screen need to look at the client logs and kernel logs to see where the modeset is failing may be the kms imporrt or the dmabuf import is failing

19:03 xexaxo has quit [Read error: Connection reset by peer]

19:03 <daniels> mdnavare: sure, that makes sense

19:03 slattann1 has joined #dri-devel

19:04 <daniels> mdnavare: Wayland compositors (with the possible exception of gamescope?) make no expectation that KMS scanout will succeed however, so as long as the atomic commit or pageflip returns with an error, we will fall back to GPU composition

19:08 <mdnavare> daniels: Hmm may be our driver is not returning an error there or its not propagating correctly to compositor request for it to fall back on GPU composition

19:08 <mdnavare> daniels: What logs can we look at to understand whats happening from the compositor end?

19:08 <mdnavare> slattann1: FYI

19:09 <mdnavare> slattann1 and others are trying to get this up and running so any debug ideas from compositor/ userspace side will be helpful to understand why the join display/mirror mode not working

19:10 <daniels> mdnavare: GNOME uses journalctl to log

19:10 <mdnavare> daniels: Actually on gnome, in mirror displays, it doesnt even show the second display connected to card 1, thats only seen in join displays

19:14 <daniels> you can also use $MUTTER_DEBUG per https://gitlab.gnome.org/GNOME/mutter/-/blob/main/src/core/util.c#L48

19:14 pzanoni has joined #dri-devel

19:15 <mdnavare> daniels: okay we will collect these logs to see where in the modeset/composition its failing

19:16 <mdnavare> daniels: either some failures in KMS handling or during composition fallback

19:16 <mdnavare> slattann1: Can we try collecting the above logs in the join displays/ mirror mode cases?

19:17 <mdnavare> daniels: We should be able to play a bit with xrandr here too right to output on a particular display or to force extended or clone modes?

19:17 <daniels> mdnavare: no, xrandr is only for X11

19:18 <mdnavare> daniels: Is there something equivalent for Mutter/wayland stack with Ubuntu 21

19:19 <daniels> mdnavare: just the GNOME control panel

19:21 <mdnavare> daniels: wonder why the mirror mode there doest even detect second display on dGPU, since in the i915 display info we see that its connected but in sys class device enabled we only see the iGPU display

19:24 <jekstrand> idr: What are you waiting on me for?

19:26 <airlied> jekstrand: ack/rb on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191

19:28 nirmoy has quit []

19:31 <ajax> speaking of continuous tracking, is anyone looking at doing regular shaderdb stats across every driver?

19:33 <ajax> seems like drm-shim means we should be able to keep up with that

19:36 <robclark> we do some shader-db in CI using drm-shim.. not sure we keep stats, I think it is more just making sure things don't start crashing

19:37 <robclark> there is some initial trace based benchmarking on real hw that tomeu put together

19:40 <jekstrand> airlied: ack

19:45 <daniels> the trace profiling is on Grafana

19:45 <daniels> generated from https://gitlab.freedesktop.org/gfx-ci/mesa-performance-tracking

19:46 <daniels> mdnavare: might be a Mutter bug

19:47 <robclark> daniels: too bad traces don't get magically faster when you override GL_RENDERER string :-P

19:48 <daniels> shhh, we’re saving that for next quarter

19:48 <robclark> heheh

19:54 rasterman has quit [Quit: Gettin' stinky!]

20:00 slattann1 has quit []

20:01 <emersion> daniels: fwiw, gamescope has proper composition fallback, w/ vulkan compute

20:01 <emersion> it does not yet support mixed composition/planes though

20:02 <daniels> gotcha! thanks for confirming

20:05 <idr> jekstrand: Were you okay with my response to your question about the vec4 patch too?

20:05 <jekstrand> yeah

20:06 <idr> With the if-statement changed, Rb?

20:06 <jekstrand> sure

20:12 <idr> sweet

20:19 jkrzyszt has quit [Ping timeout: 480 seconds]

20:34 Daaanct12 has joined #dri-devel

20:34 <airlied> dcbaker: I created a crocus 21.2 MR with all outstanding fixes from master

20:34 <airlied> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12330

20:39 Danct12 has joined #dri-devel

20:39 <jenatali> Is there already handling for emulating RGBX with RGBA and just having Mesa disable the alpha channel for me?

20:40 Daanct12 has quit [Ping timeout: 480 seconds]

20:42 lynxeye has quit [Quit: Leaving.]

20:43 Daaanct12 has quit [Ping timeout: 480 seconds]

20:51 alyssa has joined #dri-devel

20:51 <alyssa> Is there a GLES extension to rotate the framebuffer ?

20:51 <alyssa> I see the arm blob implementing dfdx as `transform_matrix * [dfdx; dfdy]` where the matrix is a sysval

20:52 <alyssa> which seems .. well, inefficient

20:53 Company has joined #dri-devel

20:53 <imirkin> alyssa: sometimes done for fbo flipping?

20:54 <imirkin> mesa does something similar for the y component

20:54 <alyssa> imirkin: flip, ok, but rotate?

20:54 <imirkin> just flip.

20:55 gouchi has quit [Remote host closed the connection]

20:55 <alyssa> thought so

20:59 Duke`` has quit [Ping timeout: 480 seconds]

21:00 <imirkin> alyssa: i'm also unsure how things like ssaa work, perhaps it applies there with scaling?

21:00 <imirkin> alyssa: i.e. hw supports up to 8x msaa, but nvidia blob reports 32 (or 64?)

21:00 <imirkin> they do this by having "fat" pixels

21:03 <ajax> alyssa: GL_MESA_framebuffer_{flip_{x,y},_swap_xy} ?

21:04 <imirkin> there's a _swap_xy now?

21:04 <ajax> https://www.khronos.org/registry/OpenGL/extensions/MESA/MESA_framebuffer_swap_xy.txt

21:04 <imirkin> someone had fun.

21:05 <zmike> jenatali: in what context

21:05 <zmike> I don't think there's generic handling for it if you're talking about stuff like blend states

21:06 <jenatali> zmike: I have an app looking specifically for a config that's 8 bit RGB 0 bit alpha, and we don't report that because D3D12 doesn't have RGBX formats

21:06 <zmike> sounds like you should be rejecting 3-component rgb formats like all the cool kids

21:06 <jenatali> So I was wondering if Mesa would just automatically turn off blending for me if I just reported an RGBA format instead :)

21:06 <jekstrand> I think maybe it does

21:06 <jekstrand> Well, not turn off blending, but smash alpha to 1

21:07 <jenatali> Even better

21:07 <zmike> it'll give you RGBX if you don't support RGB

21:07 <jenatali> Right, but I don't have RGBX either, I want RGBA instead

21:07 <jenatali> But the app wants RGB

21:07 <imirkin> jenatali: if you support per-RT blend settings, it does

21:07 <imirkin> jenatali: if you don't, you're SOL

21:07 <jenatali> imirkin: Cool, we do

21:08 <imirkin> per-RT blend is d3d10.1 i think?

21:08 <jenatali> Well, D3D does, not sure if the Gallium backend has it hooked up

21:08 <jenatali> I'd assume so

21:08 <imirkin> it's required for GL 4.x

21:08 <jenatali> Hm, then maybe it's not hooked up, I'll take a look

21:08 <imirkin> one of the advanced blend exts

21:08 <imirkin> there's like 10 of them, i can never remember which is which

21:09 <imirkin> jenatali: PIPE_CAP_INDEP_BLEND_FUNC is enabled, so you should be good.

21:09 <jenatali> imirkin: Yeah we report PIPE_CAP_INDEP_BLEND_ENABLE

21:09 <jenatali> Awesome

21:09 <imirkin> INDEP_BLEND_ENABLE is not enough. that's part of d3d10.0 i think

21:09 <jenatali> Oh I see, independent blend functions, sure

21:09 <jenatali> To force the RGBX to use alpha of 1

21:09 <imirkin> the independent blend funcs is what you need to support weird blend settings and missing RGBX support

21:10 <imirkin> st/mesa will smash DST_ALPHA to ONE for such formats iirc

21:10 <robclark> alyssa, imirkin, swap_xy and friends where intended for android "pre-rotation" (ie. to avoid a rotate blit for things like tablets that can have 0/90/180/270 rotation).. there is a (somewhat out of date by now) mesa MR to implement it.. but there are a lot of sharp corners

21:10 <jenatali> Perfect, thanks!

21:11 <imirkin> jenatali: https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/state_tracker/st_atom_blend.c#n272

21:11 <jenatali> imirkin: Awesome, that's what I was hoping for

21:11 <imirkin> jenatali: make sure you also set PIPE_CAP_RGB_OVERRIDE_DST_ALPHA_BLEND

21:12 <imirkin> (looks like you do)

21:18 ayaka has quit [Ping timeout: 480 seconds]

21:19 ayaka has joined #dri-devel

21:19 Viciouss has quit [Quit: The Lounge - https://thelounge.chat]

21:20 Viciouss has joined #dri-devel

21:20 Viciouss has quit []

21:25 Viciouss has joined #dri-devel

21:27 iive has quit []

21:28 <alyssa> OA

21:29 <alyssa> robclark: Ah...

21:29 <alyssa> Display controllers can't do that themselves?

21:29 <robclark> *sometimes*.. and sometimes if they can it is less efficient

21:30 <emersion> intel and amd can only rotate tiled buffers iirc

21:31 <robclark> right, w/ tiled + enough buffering in the display controller you can implement rotated scanout without horrible memory access patterns

21:45 <alyssa> makes sense, yeah

21:47 shfil has quit [Quit: Konversation terminated!]

21:48 <alyssa> jekstrand: I ended writing that fwidth optimization I mentioned

21:49 <alyssa> in the most extreme case, if my performance model is correct*, on one chip cycle count is halved with fwidth

21:52 pochu has quit [Ping timeout: 480 seconds]

21:57 ngcortes has quit [Ping timeout: 480 seconds]

22:03 <alyssa> does anybody else benefit from arbitrary sign fddx?

22:03 <alyssa> (so I know if I should suffix _mali or not)

22:09 <alyssa> Looking at actual shader-db --- almost nothing is affected

22:09 <alyssa> but there is 1 shader in chromeos that's affected, and that one shader has a 21% reduction in cycle count

22:15 vivijim has quit [Ping timeout: 480 seconds]

22:29 <imirkin> airlied: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11505 -- has anyone commented outside of gitlab?

22:47 ngcortes has joined #dri-devel

22:47 pcercuei has quit [Quit: dodo]

22:49 <imirkin> idr: perhaps you can take a glance? --^

23:01 <airlied> imirkin: I was mostly happy to agree with mareko in that

23:02 <imirkin> airlied: ah ok. so basically just claim support and take the piglit failures?

23:04 <airlied> imirkin: yup

23:09 <idr> I can try to dig through the wayback machine to formulate an opinion.

23:09 <idr> I don't know why we made the choice we made for i965, but there's better than 50% chance we had a good reason.

23:09 danvet has quit [Ping timeout: 480 seconds]

23:10 <dcbaker> airlied: I assigned that to marge

23:18 * alyssa stares at lower_fsqrt

23:19 <alyssa> does.. does the CTS not check sqrt(0)?

23:19 <idr> Lolololol

23:20 <alyssa> idr: guess that's a no :p

23:21 <idr> At least some early hardware only had frsq and frcp... and there was no NaN... so maybe on that hardware frcp(frsq(0)) does the right thing?

23:21 <alyssa> for that note

23:21 <alyssa> if we're throwing out NaN correctness, you might as well do fmul(x, frsq(x)) ...

23:21 <alyssa> unless that has precision issues

23:22 <idr> Hm... can't be worse than frcp.

23:22 <alyssa> maybe it can

23:22 <idr> Fair enough. :)

23:22 <alyssa> The precision of sqrt in the GLSL spec is defined as "inherited from 1 / inversesqrt(x)"

23:22 <alyssa> so frcp(frsq(x)) is blessed..

23:23 <alyssa> should I check 4 billion floats? eh sure

23:23 <idr> Meaning that it has to be at least as good as frcp(frsq(x)).

23:23 <alyssa> Yeah

23:23 <idr> The don't have to produce identical results.

23:25 <idr> Grepping for PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED... it looks like a bunch of GPUs don't have sqrt that probably have NaN, so that's a bummer.

23:27 <idr> Spec says inversesqrt(x) is undefined if x <= 0. I guess if those GPUs return Inf or HUGE_VAL and rcp returns zero...

23:28 <idr> Seems like a lot of ifs. :(

23:29 <idr> Some piglit tests of edge values should be interesting... 0, smallest x such that sqrt(x) is normal, and HUGE_VAL could all cause problems.

23:33 <alyssa> idr: it looks like the mul is sometimes more precise and sometimes less precise

23:35 <alyssa> wait derp

23:41 <alyssa> uhhhm

23:43 <idr> It seems like an "always safe" lowering would be (x * fmax(0, frcp(x)). That should be correct for x=0 and x=NaN.\

23:43 <alyssa> It's the precision issue I'm worried about

23:44 <idr> Spec says 1/x is 2.5 ULP... multiply had better be at least that good. :)

23:44 <alyssa> https://rosenzweig.io/foo.c

23:44 <alyssa> this fails

23:47 <idr> The actual answer doesn't have to be closer to correct... it just has to be within 4.5 ULP of correct. (I think the precision is additive...)

23:48 <alyssa> Hmm.

23:48 <idr> 2 ULP for inversesqrt, and 2.5 for 1/x.

23:48 dogukan has quit [Quit: Konversation terminated!]

23:48 <imirkin_> nvidia up to moderately recent GPUs didn't have a built-in sqrt

23:49 <idr> And I suspect the compiler is transforming 1/(1/x) to just x.

23:49 <idr> But I didn't actually run the program. :)

23:50 <alyssa> ok, reading the CPU asm found a bug yep

23:51 <idr> Hm...

23:52 <idr> x*rsq(x) might not work for Inf... since rsq(Inf) is probably 0, and Inf*0 is NaN.

23:52 <imirkin_> we used to lower it like that

23:52 <imirkin_> but it doesn't handle some edge cases

23:52 <imirkin_> blob does it as rcp(rsq(x))

23:53 <imirkin_> (we even did it using the inf*0=0 mul, still some oddness iirc)

23:53 <idr> imirkin_: Any idea what rsq(0) does?

23:53 <alyssa> imirkin_: mumble

23:53 <idr> Is it Inf or NaN?

23:53 <imirkin_> (or maybe we don't have that on nv50 so it didn't work? i forget)

23:53 <alyssa> idr: Either way I have a funny 0*inf=0, 0*nan=0 mul I can use

23:53 <imirkin_> idr: not 100% sure, sorry

23:53 <imirkin_> idr: wait, we have a functional model of it

23:53 <imirkin_> let's see

23:55 <imirkin_> idr: https://github.com/envytools/envytools/blob/master/nvhw/sfu.c#L102

23:55 <imirkin_> this applies to the tesla family

23:55 <imirkin_> rsq(0) -> Inf, rsq(-0) -> -Inf

23:55 <imirkin_> i strongly doubt it'd be different on fermi and later though

23:56 <imirkin_> unless there was something specifically in DX which changed from DX10 to DX11

23:56 <idr> That's the only behavior that would make rcp(rsq(x)) Just Work.

23:57 <idr> alyssa: I guess to answer your previous question... the CTS might test sqrt(0), and the lowering we just might Just Work. :)

23:59 <imirkin_> whoa weird. looking at the ancient code, we didn't actually make use of the 0*x = 0 mul variant

23:59 <imirkin_> that's probably an oversight. i flipped it to rcp(rsq(x)) since that's what blob used though