ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
hanetzer has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
hanetzer has quit []
mbrost has joined #dri-devel
imirkin_ has joined #dri-devel
Lucretia has quit []
hanetzer has joined #dri-devel
<imirkin_> nir question...
<imirkin_> vec1 32 ssa_27 = deref_var &img (uniform imageBuffer)
<imirkin_> intrinsic image_deref_store (ssa_27, ssa_29, ssa_0, ssa_26, ssa_28) (0, 0, 0, 0, 0) /* image_dim=1D */ /* image_array=false */ /* format=none */ /* access=0 */ /* src_type=invalid */
<imirkin_> why does it say "image_dim=1D"?
<imirkin_> is that separate from the deref var?
<imirkin_> this continues after the gl_nir_lower_images pass which converts it into a image_store intrinsic
<imirkin_> hm, i see. glsl_to_nir explicitly calls nir_intrinsic_set_image_dim.
flto has quit [Remote host closed the connection]
flto has joined #dri-devel
<imirkin_> yeah, that fixes it.
<imirkin_> thanks, rubber duck.
khfeng has joined #dri-devel
mceier has quit [Remote host closed the connection]
mceier has joined #dri-devel
pnowack_ has joined #dri-devel
pnowack has quit [Ping timeout: 480 seconds]
<imirkin_> this makes no sense... bumping up the no-attachment fb's layers seems to help the pbo download on nvc0. but ... why. shouldn't matter. grr.
ngcortes has quit [Remote host closed the connection]
pnowack_ has quit []
<imirkin_> ah no. it does not.
<jekstrand> imirkin_: More parallelism?
<jekstrand> Oh, or that. :)
<imirkin_> so wtf did i do to make it always pass
<imirkin_> and then re-break
<imirkin_> grrrr
<imirkin_> well at least the thing that didn't make sense isn't the case anymore. that's nice.
<mareko> grrr is the correct swizzle
<imirkin_> ah yes. there ya go!
<imirkin_> th-arrr ya go, that is.
<imirkin_> arrg is another good swizzle.
<imirkin_> and the ever-popular "stab" swizzle
<imirkin_> oh wait, no. you can't mix/match. o well.
<alyssa> mareko: hehe
<imirkin_> ok, did another random thing. it randomly fixes it.
<imirkin_> sigh
<imirkin_> now i need to wait 5 minutes, since that's what "broke" my fix last time...
mbrost has quit [Ping timeout: 480 seconds]
<imirkin_> hm, passing/failing a bit more randomly now. and mostly passing. progress!
<imirkin_> hey, should it be possible to do a render which writes to an imagebuf, and then map it and expect to see shader results? what barriers, if any, would be required?
boistordu_ex has quit [Ping timeout: 480 seconds]
<mareko> CPU reads don't need any barriers
<imirkin_> that's what i was afraid of... gr
<imirkin_> i think we're forgetting to mark the buffers. oops.
<imirkin_> not sure how we've survived this long
<imirkin_> ah no. nevermind. it's there.
Company has quit [Read error: Connection reset by peer]
mbrost has joined #dri-devel
<alyssa> rrrr
JohnnyonFlame has quit [Ping timeout: 480 seconds]
gregoy has joined #dri-devel
<gregoy> would this be an appropriate place to ask a noob question?
<jekstrand> sure
<gregoy> I am trying to build (meson) d3dadapter9 and it can't find a subproject directory or llvm.wrap
<gregoy> what did I mess up
muhomor has joined #dri-devel
muhomor has quit [Remote host closed the connection]
muhomor_ has quit [Ping timeout: 480 seconds]
jessica_24 has quit [Quit: Connection closed for inactivity]
<gregoy> I understand I have to update llvm now. how do none of you cry
mbrost has quit [Remote host closed the connection]
<HdkR> gregoy: The biggest PCs possible usually
<airlied> or use a distro prepackaged llvm
<gregoy> I just thought of that, I tried finding it using apt but couldn't, found it with Synaptic
tzimmermann has joined #dri-devel
<gregoy> thx frens
<gregoy> haha still found only version 10. I'll stop bothering you guys though
Duke`` has joined #dri-devel
gregoy has quit [Remote host closed the connection]
thellstrom1 has joined #dri-devel
thellstrom has quit [Remote host closed the connection]
itoral has joined #dri-devel
macromorgan has quit [Read error: Connection reset by peer]
Duke`` has quit [Ping timeout: 480 seconds]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
jkrzyszt has joined #dri-devel
frieder has joined #dri-devel
frieder_ has joined #dri-devel
<krh> jekstrand: yes, it's quite useful
mlankhorst has joined #dri-devel
gouchi has joined #dri-devel
frieder_ has quit []
pnowack has joined #dri-devel
camus1 has joined #dri-devel
camus has quit [Remote host closed the connection]
gouchi has quit [Quit: Quitte]
pcercuei has joined #dri-devel
rasterman has joined #dri-devel
thellstrom1 has quit [Ping timeout: 480 seconds]
thellstrom has joined #dri-devel
mattrope has quit [Read error: Connection reset by peer]
jagan_ has joined #dri-devel
lynxeye has joined #dri-devel
lemonzest has joined #dri-devel
camus1 has quit [Remote host closed the connection]
camus has joined #dri-devel
Ahuj has joined #dri-devel
danvet has joined #dri-devel
tzimmermann has quit [Quit: Leaving]
jagan_ has quit [Remote host closed the connection]
Lucretia has joined #dri-devel
pochu has joined #dri-devel
thellstrom has quit [Remote host closed the connection]
JohnnyonFlame has joined #dri-devel
K`den has joined #dri-devel
Kayden has quit [Read error: Connection reset by peer]
thellstrom has joined #dri-devel
K`den is now known as Kayden
<dv_> is it generally possible to dup a dmabuf fd with an applied offset?
<daniels> no
<dv_> for example, when one component produces single-planar frames (= all planes in one dmabuf), and another expects multi-planar frames (one dmabuf per plane)
<dv_> hm
<dv_> so, such cases can only be handled by copying the pixels?
<daniels> dup() gives you a new file descriptor (i.e. number in your process's fd table) referring to the same underlying file description (kernel data structure)
<daniels> seek position is a property of the file description
<emersion> what components are you talking about?
<daniels> you need to fix your API to take per-plane offsets
<daniels> as all the others do
<emersion> DMA-BUF APIs need to take one FD and one offset per plane
<dv_> ahh good point
<emersion> then the FDs can refer to the same buffer object, or not
<emersion> if you're importing to an API that doesn't support multiple FDs, like Vulkan without disjoint VkImages, you can check whether all FDs refer to the same buffer object with inode numbers
<dv_> alright
<dv_> a second, similar case I have here (most of this is proprietary stuff unfortunately) is about using vl42 mem2mem based video decoding when the video decoder only supports the multi-planar API
<dv_> and other bits expect decoded frames in a single-planar fashion
<emersion> ah, proprietary stuff
* emersion walks out
<dv_> if I want to import dmabufs that were allocated by something else,
<dv_> I would have to queue a v4l2_buffer with the v4l2_planes containing the same dmabuf FD N times (N = number of planes), but with different data_offset values?
<dv_> ah well the v4l2 bit is not proprietary
<dv_> the docs say: "Offset in bytes to video data in the plane. Drivers must set this field when type refers to a capture stream, applications when it refers to an output stream."
<daniels> yep, correct
<dv_> but, from what I gather, in v4l2 lingo, "capture" is the side that provides the -decoded- frames
<dv_> so .. according to the docs, I cannot set data_offset, because the driver will?
<dv_> or is this an oversight in the docs and this does not apply to mem2mem based decoding?
<daniels> it certainly applies to anything you provide as the source for m2m decoding
<daniels> I'm not sure about the dest
shfil has joined #dri-devel
mlankhorst has quit [Ping timeout: 480 seconds]
thellstrom has quit [Remote host closed the connection]
<tomeu> airlied: btw, I'm having to move some work from the record stage to the execute one, so we don't have lvp-specific stuff in the common code
<tomeu> nothing so far seems specially computation-intensive, but I guess you have a better idea of what a perf impact this will have
mlankhorst has joined #dri-devel
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
camus1 has joined #dri-devel
camus has quit [Read error: Connection reset by peer]
imre has joined #dri-devel
itoral has quit []
thellstrom has joined #dri-devel
JohnnyonFlame has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
JohnnyonFlame has joined #dri-devel
vivijim has joined #dri-devel
mlankhorst has quit [Ping timeout: 480 seconds]
mlankhorst has joined #dri-devel
sdutt_ has joined #dri-devel
sdutt has quit [Read error: Connection reset by peer]
macromorgan has joined #dri-devel
Kayden has quit [Quit: Leaving]
Kayden has joined #dri-devel
Ahuj has quit [Ping timeout: 480 seconds]
<tomeu> airlied: I think this won't end up being a problem with a void *driver_data member in the cmd structs, for drivers to cache data if that makes a difference
ezequielg has quit []
ezequielg has joined #dri-devel
alyssa has left #dri-devel [#dri-devel]
Daanct12 has joined #dri-devel
Danct12 has quit [Ping timeout: 480 seconds]
mattrope has joined #dri-devel
Duke`` has joined #dri-devel
bluebugs has quit [Remote host closed the connection]
vivek has joined #dri-devel
sdutt_ has quit []
sdutt has joined #dri-devel
nirmoy has joined #dri-devel
<ajax> jekstrand: what exactly do we want to do with i965 support in the amber branch?
jessica_24 has joined #dri-devel
<ajax> jekstrand: now that there's a crocus i'm kind of inclined to say anv is main-only and amber just cuts off i965 after gen7/chv
<jekstrand> ajax: That works for me.
<ajax> so then there's no interop question of anv from one tree and gl from another
<jekstrand> ajax: We should make sure crocus has EXT_external_objects so we don't lose functionality.
<ajax> it sets the bit
<ajax> case PIPE_CAP_MEMOBJ:
<ajax> return devinfo->ver >= 7;
<jekstrand> I feel pretty sorry for shadeslayer that he's spent so much time on it only to get cut off. It was just really poor timing. :(
<jekstrand> ajax: Sure, but has it been tested and do we know it works?
<ajax> right, no idea about testing
<ajax> yeah that really sucks
<jekstrand> hikiko, shadeslayer, tpalli: Has anyone tested crocus with EXT_external_objects?
<jekstrand> I think crocus has a much better chance of working more-or-less out-of-the-box than i965 since it's modeled on iris.
pekkari has joined #dri-devel
<ajax> alright, i'll redo the amber patch to make that happen
<jekstrand> sgtm
<jekstrand> Actually, we don't have to cut off i965 as long as we always prefer iris/crocus
<ajax> true
<jekstrand> As long as we make amber not build ANV without lots of option smashing.
<ajax> well right now amber means no gallium drivers
<imirkin> jekstrand: you probably know this, but crocus isn't nearly as production-ready as i965 is atm. it's being improved, but i don't think it's ready to be used in a distro as a default...
<jekstrand> imirkin: Yeah, that's why I kind-of think amber and/or dropping classic is a bit premature yet.
<jekstrand> I was hoping to give crocus another 6 months to bake.
<imirkin> although it's probably in a better spot than i965 was 6-7 years ago? dunno :)
<jekstrand> I also kind-of think the only way to get it fully baked is to get people using it and filing bugs.
<ajax> huh, i guess !10153 was only april
<jekstrand> The best way to do that is for a distro to try shipping it. :D
<imirkin> jekstrand: well, there are people filing bugs now
<imirkin> i don't think we need a fatter bug-pipe unless it will also come with more people looking at those bugs
<jekstrand> heh
<jekstrand> sure
<ajax> i feel pretty comfortable saying that, if 21.2.x is the amber branch, that the crocus in 21.3.0/22.0 (whichever happens first) will be plenty baked
<shfil> not sure if it's possible for mesa, but artifacts per each commit would help with testing
<ajax> so that by the time you _need_ to decide on mesa or amber for your crocus-era support, crocus will be a no-brainer
<ajax> maybe that's optimistic
<imirkin> i'd point you at the current list of crocus bugs, but gitlab is not responding
<imirkin> i think basically only airlied is looking at them
<ajax> yeah that basketball team is terrible, they only have magic johnson
<ajax> i take your point
<imirkin> well ... he is doing the work of 10 men. should his availability drop slightly to only be able to do the work of 9 men, crocus may get the axe :)
<jekstrand> Isn't the work of 10 men only 20% of airlied?
<imirkin> that's why they call it a "20% project"
<ajax> i'm just really itchy to chop classic
<ajax> the number of times i go to fix something and discover it's broken entirely differently in classic in a way that invalidates how i wanted to fix it the first time...
<jekstrand> ajax: You're not alone there
<ajax> and if the cost of chopping that off is a somewhat rough support experience for people running hot-off-the-presses mesa on nine year old silicon... sure?
<jekstrand> Ugh... Just came across param compaction again. Let's nuke i965!
<jekstrand> *grumble*
ezequielg has quit []
<imirkin> jekstrand: if you guys are all itching to make i965 go away, maybe you can put some elbow grease towards fixing crocus?
<hikiko> jekstrand, no, I was planning too but I need to make a few fixes to run Vulkan first (I use crocus on a FreeBSD la[top, not on Linux)
ezequielg has joined #dri-devel
<imirkin> mostly gen4/4.5 issues, although could just be a side-effect of this being the hw the people testing have
alyssa has joined #dri-devel
<alyssa> ERROR: No suitable configuration found for GL config rgba8888d24s8ms0 at glcTestRunner.cpp:341
<alyssa> does cts-runner not do surfaceless?
<imirkin> cts-runner does what you tell it to do
<imirkin> how did you build it?
<alyssa> with DEQP_TARGET=surfaceless
<alyssa> normal deqp runs ok but requires a boatload of options:
<imirkin> did you pass those to cts-runner?
<alyssa> EGL_PLATFORM=surfaceless ./deqp-gles* --deqp-surface-type=pbuffer --deqp-visibility=hidden --deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=256 --deqp-surface-height=256
<imirkin> i actually haven't really used cts-runner much.
<alyssa> cts-runner doesn't have options for that stuff
<imirkin> why pbuffer surface type?
<alyssa> dunno
<imirkin> heh
<ajax> because surfaceless doesn't have any other kind?
<daniels> the thing about surfaceless is that it has no surfaces
<alyssa> daniels: pikachu
<jekstrand> imirkin: Yeah, so there's a gallium HUD issue, Gimp, OpenMW, and the rest are Nine. I don't think I care about Nine at the moment.
<ajax> it has no windows, really
<imirkin> jekstrand: yeah, don't worry about nine :)
<daniels> ajax: nor pixmaps
<imirkin> jekstrand: a bunch of those are in various stages of fixing
<ajax> but i thought there was --deqp-surface-type=fbo ?
<imirkin> jekstrand: but as they get fixed, more get opened :)
<imirkin> basically two guys have been filing the majority of the issues (Wladislav and Angelo). they seem to have been running into problems fairly "early" into their testing/usage
<imirkin> i'm guessing more will come as the "obvious" stuff gets fixed
<imirkin> jekstrand: also some sort of CI to track regressions would be great. i think this has been discussed in the past, but i'm not aware of any clear action on that point.
<jekstrand> imirkin: I think we're running it in CI
<alyssa> so err... should I be building CTS for X11 or Wayland instead?
<imirkin> ah cool
khfeng has quit [Ping timeout: 480 seconds]
pekkari has quit [Quit: Konversation terminated!]
<imirkin> jekstrand: do you know where?
<jekstrand> imirkin: Maybe we're not yet
<hikiko> jekstrand, I was actually wondering about the state of Vulkan on GPUs that can run crocus. I suppose they support 1.0, but do they also support the extensions for external memory and external memory capabilities?
<jekstrand> Yes, they should
<hikiko> hmm, ok I'll try to run Vulkan on FreeBSD and check this then, I had some issues with the loader so far but I think/hope they aren't hard to fix :)
xexaxo has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
<robclark> ajax: there is --deqp-surface-type=fbo but last I looked deqp had a lot of fbo vs window/pbuffer bugs (which admittedly gl doesn't make easy because of small diffs btwn fbo vs !fbo)
frieder has quit [Remote host closed the connection]
<zmike> iris/anv don't seem to work with pbuffer+surfaceless
<imirkin> alyssa: i always build x11_egl_glx
<zmike> posted a bug about that a while ago
<imirkin> it avoids me having to debug CTS
<imirkin> i assume that's what everyone else is doing
<robclark> In CI we are using surfaceless except for egl tests, iirc? So I guess surfaceless gets the most regular usage
<imirkin> (and when i say "everyone else", i mean "the people who use/develop CTS")
<jekstrand> imirkin: I can't repro the GALLIUM_HUD bug on my ILK. :(
<imirkin> jekstrand: i think airlied fixed that specific issue, and it was on gen4/45
<imirkin> oh wait no
<imirkin> wrong bug
<imirkin> just kidding
<imirkin> this one's on SNB. i should be able to check it out i guess.
<imirkin> jekstrand: wait. which issue? 5120? that one's gen4
<imirkin> but then the person had more issues
<imirkin> and since filing a new bug is an extremely expensive operation, opted to add follow-on things to the same bug
<alyssa> jekstrand: do you understand khronos conformance rules?
<alyssa> because I've been building the latest opengl cts tag for the past hour
<alyssa> and just now realize there's a separate *much older* opengl es cts tag
<alyssa> am I building something that's too new?
<alyssa> (for GLES conformance)
<imirkin> the rule is simple... pay money to khronos, get conformance.
<alyssa> imirkin: heh
<alyssa> building opengl-cts-4.6.1.0
<alyssa> (from Jul 1)
<jekstrand> imirkin: Yes, more-or-less.
<alyssa> but on Feb 4 there was opengl-es-cts-3.2.7.0
<alyssa> and I'm trying for OpenGL ES 3.1 conformance
<alyssa> complicating matters, I think a bug fix was merged in May that applies to us.
<imirkin> the more money you pay, the more exceptions you get
<imirkin> for enough money, you get to just write the tests
* zmike submits gallium noop for conformance
<imirkin> alyssa: afaik you can apply patches on top of cts
<imirkin> and just mention that you did so in your submission
<alyssa> imirkin: i just don't want to resume this build since this is so painful on my little board
<alyssa> ad it's almost done
<imirkin> alyssa: look into cross-compiles?
<alyssa> I mean the right answer at this point is to build from the M1 but yeah :p
<alyssa> Anyway I errrr
<alyssa> I can't tell if using an OpenGL CTS release for an OpenGL ES submission is valid. These rules are complicated.
<jekstrand> alyssa: What API are you wanting to get conformance on? GL or ES?
nroberts has quit [Ping timeout: 480 seconds]
<jekstrand> alyssa: GL, ES, and Vulkan are effectively different projects with different release cycles that live in the same repo.
ceyusa has quit [Ping timeout: 480 seconds]
* airlied has the fix for qt fonts on gen4 but wanted to try.and alternate
<airlied> that stills allows blt usage
<alyssa> jekstrand: ES
<alyssa> ES3.1
<jekstrand> Then use an opengl-es-cts tag
<daniels> (the latest one)
gouchi has joined #dri-devel
<alyssa> well starting a fresh build from scratch then.
<imirkin> airlied: one approach is to just push the change as you have and file a "investigate later" type of bug
vivek has quit [Ping timeout: 480 seconds]
ngcortes has joined #dri-devel
<airlied> imirkin: indeed, I found it late in the day, then took a day off :-P
<imirkin> airlied: yeah, didn't intend it as a criticism :)
<airlied> meanwhile my CL CTS run on llvmpipe is running conversions tests for all the day off
<imirkin> no rest for the computer?
<ajax> we taught the sand to think, we didn't teach it to slack off
ngcortes has quit [Remote host closed the connection]
dogukan has joined #dri-devel
<imirkin> ah, so the deep learning hasn't learned yet...
<dcbaker> ajax: but the we do slack off, and we taught the sand, so...
<imirkin> (i guess that's why they call it learn*ing*...)
xexaxo has joined #dri-devel
dogukan has quit [Quit: Konversation terminated!]
dogukan has joined #dri-devel
<FLHerne> robclark: https://arstechnica.com/gadgets/2021/04/pixel-5-sees-dramatically-improved-gpu-performance-after-april-patch/ <- I was wondering, is this related to it using Freedreno before/after/both?
<FLHerne> or just blob-driver screwiness?
<robclark> the android phones are (so far) still using blob driver.. but I suppose there is also lots of room for devfreq to screw up performance
dogukan has quit []
dogukan has joined #dri-devel
dogukan has quit []
dogukan has joined #dri-devel
dogukan has quit []
dogukan has joined #dri-devel
<airlied> for crocus I'm reasonably happy on ivb/hsw, snb is still a bit of a pita, and gen4/5 is mostly just small diffs from 965 that I'm ironing out, and then you have wierdness like the idr compiler fixes :-P
<imirkin> airlied: the intel ci bits never came through right?
<airlied> imirkin: I think they were running but having some issues
<imirkin> ah ok
<imirkin> do you remember offhand what the issues are with snb?
<imirkin> is xfb still broken?
<airlied> yeah snb still has the deqp hang
<imirkin> bleh
<airlied> I think I know why it happens with crocus not 965, but how to fix it is still no clue
<airlied> it's nearly at the beg someone to figure out SNB simulator
dogukan has quit []
<FLHerne> robclark: thanks
* robclark not actually aware of *what* the perf improvement was for the p5's.. I'd actually be a bit curious if it was userspace or kernel (or something outside of the gfx driver stack)
<airlied> okay posted the gen4/5 blt corruption fix
<imirkin> airlied: any issues on snb that i could look at? i.e. not the hang? :)
<airlied> imirkin: I think the biggest missing piece on gen6 is push constant handling
pochu has quit [Ping timeout: 480 seconds]
dogukan has joined #dri-devel
<airlied> since we want to use the ubo push constant lowering not the old uniform compaction code
<imirkin> airlied: does i965 have it?
<airlied> imirkin: no
<airlied> i965 uses the old uniform compaction code
<airlied> lols the one gitlab issue mentions crocus on snb, but crocus works where 965 hangs :-P
<ajax> sounds like it's production-quality to me
<imirkin> yea i'm not debugging that one.
* airlied is nine avoidant also
<airlied> at some point I suppose I should work out to install it
<imirkin> i'll try to check out the nine issues
<imirkin> i couldn't repro anythign with Xnine
<imirkin> the problem is that i probably need a 32-bit build
<imirkin> and we dropped autotools
<ajax> why is autotools related
<imirkin> because autotools makes cross-compile / 32-bit builds easy
<imirkin> whereas it's a substantial fight (which i generally lose) each time with meson
<zmike> it's easy with meson too
<zmike> fire up your copy/paste and get in there
<imirkin> don't think i have a llvm-config-32
camus has joined #dri-devel
<imirkin> nor a pkg-config-32
<zmike> yea me neither, just change to use lib paths instead
alyssa has left #dri-devel [#dri-devel]
<zmike> e.g., PKG_CONFIG_PATH=/usr/lib or /usr/lib32 or whatever
<imirkin> yeah, but then it's not baked into the build iirc
<zmike> shrug
<imirkin> there is no "configure PKG_CONFIG_PATH=/bla" anymore
<zmike> just a guy over here debugging 32bit steam games every day on a meson build
<imirkin> anyways, i know it's possible
<imirkin> i might have even done it once or twice
<imirkin> it's just a fight each time
<airlied> meson --pkg-config-path ?
camus1 has quit [Read error: Connection reset by peer]
<airlied> meson configure --pkg-config-path I suppose
<zmike> I still do it the same way as autotools with the env vars
<airlied> and of course not having to deal with 32-bit llvm is an advantage :-P
<imirkin> yeah, that's what i usually opt for i think?
<emersion> > Running Starcraft 2
<imirkin> otherwise it just doesn't work
<emersion> lol
<idr> airlied: Is there a "need" for pre-SNB systems in gitlab CI?
<idr> I might have a person motherboard + CPU that I could donate to the cause... I'll have to look for it, though.
<idr> *a personal
<idr> #motherboardsarepeopletoo
iive has joined #dri-devel
<airlied> idr: I think for gitlab CI it's more of a where to host it and how infrastrucutre it needs
<idr> Right.
<idr> Do we still have the ability to host machines at PSU?
<airlied> not for gitlab I don't think
<idr> mattst88 and I can drive down there and work on stuff. Not to volunteer him or anything. :)
<airlied> gitlab CI requires more hands on maint
<daniels> not really
<daniels> we don't have constant physical access to most of those machines
<airlied> or at least good remote power :-P
<daniels> yes
<daniels> very good remote power + console + ability to drive your bootloader
<daniels> 'wait for someone to drive down and press reset' is a total non-starter
<idr> Right... I was more thinking of cases where the system has major problem and, for example, needs RAM replaced.
ngcortes has joined #dri-devel
<daniels> imirkin: $PKG_CONFIG_PATH _is_ baked into the build
<idr> There are IP-enabled power switches. Is that a problem X.org could solve with money?
jhli has joined #dri-devel
<daniels> just don't buy the wrong IP-enabled power switch which periodically requires you to drive in and press its physical reset button because it's forgotten how to do fucking DHCP
lemonzest has quit [Quit: Quitting]
<idr> D'oh
<daniels> I mean, like, hypothetically that might be a problem you could face, I guess
<imirkin> daniels: ok. that must be new (as of the last time i looked at this stuff, which isn't frequent)
<daniels> imirkin: it's always been that way
<idr> This also seems like a problem that could be solved with a RPi and a couple relays.
<daniels> yeah, until your RPi falls over, etc
<daniels> anyway, the bigger issue is scale - SNB is not super quick ...
<imirkin> daniels: ok. then i was doing something wrong :)
<idr> And G45 and ILK are even worse...
<idr> But, like i915, far far fewer tests run on them.
<daniels> by comparison, the AMD Stoney Islands Chromebooks have absolutely terrible CPUs to be fair, but several years post-SNB, and even 10 of them isn't enough to keep us from having to massively decimate the test suite to keep it running
<idr> You don't have to wait for 4 billion OpenGL 4.6 or 5 trillion Vulkan tests. :)
<daniels> IIRC there are something like 30 RPi4s, and honestly they're probably not uncompetitive with SNB
<idr> What GL version do the Stoney Islands GPUs support?
xexaxo has quit [Remote host closed the connection]
<airlied> yeah getting an gm45 or ilk piglit run down to 10mins would a bit of work
<imirkin> that's the saving grace of gen4/5 -- no GL3+ ;)
* airlied listening to his gm45/ilk machines grind their fans
<idr> I bet it's more than GL 3.3 (SNB) or GL 2.1 + many extensions (G45).
<airlied> imirkin: hey it has EXT_gpu_shader4 now :-P
<agd5f> idr, stoneyridge? 4.6
<daniels> oh yeah, for sure - maybe a better metric is the RK3399 Panfrost SoCs at ES3.1, and IIRC despite fairly heavy decimation we have something like 7 of them?
<imirkin> airlied: thankfully that has few tests
<idr> Yeah... so that will run probably 3x or 4x the tests.
<daniels> idr: except that we only run something like 1/10th of the tests to keep the runtime acceptable :P
<airlied> imirkin: once I enabled all the texture tests it takes a bit more
<idr> Fair.
<idr> Either way... it sounds like there are other problems to solve first.
<airlied> daniels: it does seem at least for the chromebooks there'd be a case to stick a real AMD gpu in a real machine alongside them to cover a bunch of other cases
<daniels> anyway, point is 1 machine definitely works for local testing (e.g. anholt has some personal ILK/PNV hooked up IIRC?), but for putting it in the mainstream tree 1 would become a capacity/reliability problem super quick
<daniels> and reading all the pings I get about CI is already super motivational as is
<idr> Still better than watching the news.
<idr> Lol
<daniels> airlied: yeah, we've thought about that on our side, but the server room is pretty physically full and surprisingly it's taking a while to get our office reconstructed to make it way larger
<daniels> so we can just about squeeze in more SBC/Chromebook, but not much more
<mdnavare> agd5f: jekstrand: If Wayland does per display buffer even in clone mode for Multi GPU case then we wont run into the buffer sharing issues right?
<daniels> (also getting a backup fibre provider after we lost all connectivity for a few days because someone severed the undergorund cable; looking forward to the backup provider severing the main link whilst installing the new one)
<agd5f> mdnavare, right
<daniels> mdnavare: yes it does do per-CRTC buffers for composition; in direct-scanout cases a client buffer may be used on both GPUs, but you can just refuse that and it'll fall back gracefully
<glennk> how are the slow runners scheduled? do they attempt to run on all commits, or just the most recent one each time it is free?
<mdnavare> daniels: And the existing wayland stack handles that already where it scans out from per CRTC buffer on respective displays connected to IGPU and DGPU?
<daniels> mdnavare: when composition is being used, yes
<daniels> we don't share composition target buffers between CRTCs because tearing is bad
<daniels> glennk: they run on all commits, and we decimate test coverage to bring runtime to an acceptable level
<daniels> glennk: we could do daily runs, but realistically it'd go red tomorrow and no-one would ever pay any attention to it
<glennk> from experience, i can recommend trying the opposite and run full coverage at a lower rate
<mdnavare> daniels: agd5f: So to verify this with Wayland on Gnome desktop, in a system with IGPU and DGPU, we can set join displays for extended case and mirror displays to clone?
<daniels> glennk: realistically it'd go red tomorrow and no-one would ever pay any attention to it
<daniels> it works well when you have small/dedicated/focused teams with people incentivised to care
<daniels> that is ... not Mesa
<airlied> yeah we'd totally be the gitlab CI equiv of the llvm buildbot
<agd5f> mdnavare, yes
pochu has joined #dri-devel
<glennk> daniels, i'll add to my recommendation that you would still run the quick sanity tests frequently
<airlied> I think there's likely scope for nightlies on some of the more corner case hw
<daniels> mdnavare: yep
<mdnavare> agd5f: And no specific changes needed here in Mesa or Gem right? The compositors already handle this gracefully?
<agd5f> mdnavare, yes
<daniels> glennk: yeah so we are in fact looking at doing longer post-merge runs periodically, but ... only backed up by people incentivised to care
mlankhorst has quit [Ping timeout: 480 seconds]
<glennk> nifty
xexaxo has joined #dri-devel
<mdnavare> agd5f: daniels: Also when we force render on dGPU with DRI PRIME =1 for say a full screen gl application, in that case it will render on this buffer in lmem but now in clone or extended mode, how will the buffer be scanned out on IGPU ddisplay from smem as well as dgpu display from lmem?
<mdnavare> does the compositor just create a copy of that buffer onto which rendering happens and make that a per CRTC buffer?
<daniels> mdnavare: either KMS accepts the request to directly scan it out in which case the driver makes it happen somehow, or it denies the request to directly scan it out in which we fall back to composition into per-CRTC buffers
<mdnavare> daniels: So when it falls back in caseof prime rendering, the composition somehow access the rendered buffer which is on lmem?
* glennk raises hand asking what is lmem?
<mdnavare> Thats where the concern is that how does it do the buffer migration between lmem and smem when prime is set to 1 and rendering is forced to happen only on lmem
<jenatali> Local memory, i.e. VRAM (guessing from context)
<mdnavare> glennk: Local memory or the Vdieo RAM
<daniels> mdnavare: the compositor doesn't know anything about buffer placement
<glennk> can someone send intel the memo that this has been called vram since at least the 70s?
<daniels> mdnavare: the client sends the compositor a dmabuf; the compositor imports that dmabuf as an EGLImage for every EGL context it would need to be used in, and also attempts to import it as a KMS FB for every KMS context it would need to be used in
<daniels> if the buffer needs migration, then the two options are a) the driver fails the import (benign for KMS since the compositor can fall back to GPU composition, fatal for EGL since there is no fallback available), or b) magically does migration under the hood
<mdnavare> daniels: So from what I understand if our client is sending the compositor a dmabuf for the rendered buffer, then compositor should be able to access that rendered buffer in vram/lmem and scanout on IGPU display as well as direct scanout from vram to dgpu display?
<daniels> mdnavare: if you have one client sending a buffer (dmabuf) which needs to be displayed on two GPUs, it is _mandatory_ for EGLImage imports to succeed on both GPUs, else the client will be killed
<daniels> it is _optimal_ for KMS FB imports to succeed on both GPUs, but not required
<airlied> idr: should land those gfx5 compiler fixes as well, they don't seem to be making things worse here :-P
vivek has joined #dri-devel
<idr> airlied: I was just waiting for jekstrand... but I know he's been super busy lately.
pzanoni has quit [Ping timeout: 480 seconds]
<mdnavare> daniels: Okay currently we see a failure to light up the display connected to DGPU when we try the join displays, it just gives black screen need to look at the client logs and kernel logs to see where the modeset is failing may be the kms imporrt or the dmabuf import is failing
xexaxo has quit [Read error: Connection reset by peer]
<daniels> mdnavare: sure, that makes sense
slattann1 has joined #dri-devel
<daniels> mdnavare: Wayland compositors (with the possible exception of gamescope?) make no expectation that KMS scanout will succeed however, so as long as the atomic commit or pageflip returns with an error, we will fall back to GPU composition
<mdnavare> daniels: Hmm may be our driver is not returning an error there or its not propagating correctly to compositor request for it to fall back on GPU composition
<mdnavare> daniels: What logs can we look at to understand whats happening from the compositor end?
<mdnavare> slattann1: FYI
<mdnavare> slattann1 and others are trying to get this up and running so any debug ideas from compositor/ userspace side will be helpful to understand why the join display/mirror mode not working
<daniels> mdnavare: GNOME uses journalctl to log
<mdnavare> daniels: Actually on gnome, in mirror displays, it doesnt even show the second display connected to card 1, thats only seen in join displays
pzanoni has joined #dri-devel
<mdnavare> daniels: okay we will collect these logs to see where in the modeset/composition its failing
<mdnavare> daniels: either some failures in KMS handling or during composition fallback
<mdnavare> slattann1: Can we try collecting the above logs in the join displays/ mirror mode cases?
<mdnavare> daniels: We should be able to play a bit with xrandr here too right to output on a particular display or to force extended or clone modes?
<daniels> mdnavare: no, xrandr is only for X11
<mdnavare> daniels: Is there something equivalent for Mutter/wayland stack with Ubuntu 21
<daniels> mdnavare: just the GNOME control panel
<mdnavare> daniels: wonder why the mirror mode there doest even detect second display on dGPU, since in the i915 display info we see that its connected but in sys class device enabled we only see the iGPU display
<jekstrand> idr: What are you waiting on me for?
nirmoy has quit []
<ajax> speaking of continuous tracking, is anyone looking at doing regular shaderdb stats across every driver?
<ajax> seems like drm-shim means we should be able to keep up with that
<robclark> we do some shader-db in CI using drm-shim.. not sure we keep stats, I think it is more just making sure things don't start crashing
<robclark> there is some initial trace based benchmarking on real hw that tomeu put together
<jekstrand> airlied: ack
<daniels> the trace profiling is on Grafana
<daniels> mdnavare: might be a Mutter bug
<robclark> daniels: too bad traces don't get magically faster when you override GL_RENDERER string :-P
<daniels> shhh, we’re saving that for next quarter
<robclark> heheh
rasterman has quit [Quit: Gettin' stinky!]
slattann1 has quit []
<emersion> daniels: fwiw, gamescope has proper composition fallback, w/ vulkan compute
<emersion> it does not yet support mixed composition/planes though
<daniels> gotcha! thanks for confirming
<idr> jekstrand: Were you okay with my response to your question about the vec4 patch too?
<jekstrand> yeah
<idr> With the if-statement changed, Rb?
<jekstrand> sure
<idr> sweet
jkrzyszt has quit [Ping timeout: 480 seconds]
Daaanct12 has joined #dri-devel
<airlied> dcbaker: I created a crocus 21.2 MR with all outstanding fixes from master
Danct12 has joined #dri-devel
<jenatali> Is there already handling for emulating RGBX with RGBA and just having Mesa disable the alpha channel for me?
Daanct12 has quit [Ping timeout: 480 seconds]
lynxeye has quit [Quit: Leaving.]
Daaanct12 has quit [Ping timeout: 480 seconds]
alyssa has joined #dri-devel
<alyssa> Is there a GLES extension to rotate the framebuffer ?
<alyssa> I see the arm blob implementing dfdx as `transform_matrix * [dfdx; dfdy]` where the matrix is a sysval
<alyssa> which seems .. well, inefficient
Company has joined #dri-devel
<imirkin> alyssa: sometimes done for fbo flipping?
<imirkin> mesa does something similar for the y component
<alyssa> imirkin: flip, ok, but rotate?
<imirkin> just flip.
gouchi has quit [Remote host closed the connection]
<alyssa> thought so
Duke`` has quit [Ping timeout: 480 seconds]
<imirkin> alyssa: i'm also unsure how things like ssaa work, perhaps it applies there with scaling?
<imirkin> alyssa: i.e. hw supports up to 8x msaa, but nvidia blob reports 32 (or 64?)
<imirkin> they do this by having "fat" pixels
<ajax> alyssa: GL_MESA_framebuffer_{flip_{x,y},_swap_xy} ?
<imirkin> there's a _swap_xy now?
<imirkin> someone had fun.
<zmike> jenatali: in what context
<zmike> I don't think there's generic handling for it if you're talking about stuff like blend states
<jenatali> zmike: I have an app looking specifically for a config that's 8 bit RGB 0 bit alpha, and we don't report that because D3D12 doesn't have RGBX formats
<zmike> sounds like you should be rejecting 3-component rgb formats like all the cool kids
<jenatali> So I was wondering if Mesa would just automatically turn off blending for me if I just reported an RGBA format instead :)
<jekstrand> I think maybe it does
<jekstrand> Well, not turn off blending, but smash alpha to 1
<jenatali> Even better
<zmike> it'll give you RGBX if you don't support RGB
<jenatali> Right, but I don't have RGBX either, I want RGBA instead
<jenatali> But the app wants RGB
<imirkin> jenatali: if you support per-RT blend settings, it does
<imirkin> jenatali: if you don't, you're SOL
<jenatali> imirkin: Cool, we do
<imirkin> per-RT blend is d3d10.1 i think?
<jenatali> Well, D3D does, not sure if the Gallium backend has it hooked up
<jenatali> I'd assume so
<imirkin> it's required for GL 4.x
<jenatali> Hm, then maybe it's not hooked up, I'll take a look
<imirkin> one of the advanced blend exts
<imirkin> there's like 10 of them, i can never remember which is which
<imirkin> jenatali: PIPE_CAP_INDEP_BLEND_FUNC is enabled, so you should be good.
<jenatali> imirkin: Yeah we report PIPE_CAP_INDEP_BLEND_ENABLE
<jenatali> Awesome
<imirkin> INDEP_BLEND_ENABLE is not enough. that's part of d3d10.0 i think
<jenatali> Oh I see, independent blend functions, sure
<jenatali> To force the RGBX to use alpha of 1
<imirkin> the independent blend funcs is what you need to support weird blend settings and missing RGBX support
<imirkin> st/mesa will smash DST_ALPHA to ONE for such formats iirc
<robclark> alyssa, imirkin, swap_xy and friends where intended for android "pre-rotation" (ie. to avoid a rotate blit for things like tablets that can have 0/90/180/270 rotation).. there is a (somewhat out of date by now) mesa MR to implement it.. but there are a lot of sharp corners
<jenatali> Perfect, thanks!
<jenatali> imirkin: Awesome, that's what I was hoping for
<imirkin> jenatali: make sure you also set PIPE_CAP_RGB_OVERRIDE_DST_ALPHA_BLEND
<imirkin> (looks like you do)
ayaka has quit [Ping timeout: 480 seconds]
ayaka has joined #dri-devel
Viciouss has quit [Quit: The Lounge - https://thelounge.chat]
Viciouss has joined #dri-devel
Viciouss has quit []
Viciouss has joined #dri-devel
iive has quit []
<alyssa> OA
<alyssa> robclark: Ah...
<alyssa> Display controllers can't do that themselves?
<robclark> *sometimes*.. and sometimes if they can it is less efficient
<emersion> intel and amd can only rotate tiled buffers iirc
<robclark> right, w/ tiled + enough buffering in the display controller you can implement rotated scanout without horrible memory access patterns
<alyssa> makes sense, yeah
shfil has quit [Quit: Konversation terminated!]
<alyssa> jekstrand: I ended writing that fwidth optimization I mentioned
<alyssa> in the most extreme case, if my performance model is correct*, on one chip cycle count is halved with fwidth
pochu has quit [Ping timeout: 480 seconds]
ngcortes has quit [Ping timeout: 480 seconds]
<alyssa> does anybody else benefit from arbitrary sign fddx?
<alyssa> (so I know if I should suffix _mali or not)
<alyssa> Looking at actual shader-db --- almost nothing is affected
<alyssa> but there is 1 shader in chromeos that's affected, and that one shader has a 21% reduction in cycle count
vivijim has quit [Ping timeout: 480 seconds]
<imirkin> airlied: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11505 -- has anyone commented outside of gitlab?
ngcortes has joined #dri-devel
pcercuei has quit [Quit: dodo]
<imirkin> idr: perhaps you can take a glance? --^
<airlied> imirkin: I was mostly happy to agree with mareko in that
<imirkin> airlied: ah ok. so basically just claim support and take the piglit failures?
<airlied> imirkin: yup
<idr> I can try to dig through the wayback machine to formulate an opinion.
<idr> I don't know why we made the choice we made for i965, but there's better than 50% chance we had a good reason.
danvet has quit [Ping timeout: 480 seconds]
<dcbaker> airlied: I assigned that to marge
* alyssa stares at lower_fsqrt
<alyssa> does.. does the CTS not check sqrt(0)?
<idr> Lolololol
<alyssa> idr: guess that's a no :p
<idr> At least some early hardware only had frsq and frcp... and there was no NaN... so maybe on that hardware frcp(frsq(0)) does the right thing?
<alyssa> for that note
<alyssa> if we're throwing out NaN correctness, you might as well do fmul(x, frsq(x)) ...
<alyssa> unless that has precision issues
<idr> Hm... can't be worse than frcp.
<alyssa> maybe it can
<idr> Fair enough. :)
<alyssa> The precision of sqrt in the GLSL spec is defined as "inherited from 1 / inversesqrt(x)"
<alyssa> so frcp(frsq(x)) is blessed..
<alyssa> should I check 4 billion floats? eh sure
<idr> Meaning that it has to be at least as good as frcp(frsq(x)).
<alyssa> Yeah
<idr> The don't have to produce identical results.
<idr> Grepping for PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED... it looks like a bunch of GPUs don't have sqrt that probably have NaN, so that's a bummer.
<idr> Spec says inversesqrt(x) is undefined if x <= 0. I guess if those GPUs return Inf or HUGE_VAL and rcp returns zero...
<idr> Seems like a lot of ifs. :(
<idr> Some piglit tests of edge values should be interesting... 0, smallest x such that sqrt(x) is normal, and HUGE_VAL could all cause problems.
<alyssa> idr: it looks like the mul is sometimes more precise and sometimes less precise
<alyssa> wait derp
<alyssa> uhhhm
<idr> It seems like an "always safe" lowering would be (x * fmax(0, frcp(x)). That should be correct for x=0 and x=NaN.\
<alyssa> It's the precision issue I'm worried about
<idr> Spec says 1/x is 2.5 ULP... multiply had better be at least that good. :)
<alyssa> this fails
<idr> The actual answer doesn't have to be closer to correct... it just has to be within 4.5 ULP of correct. (I think the precision is additive...)
<alyssa> Hmm.
<idr> 2 ULP for inversesqrt, and 2.5 for 1/x.
dogukan has quit [Quit: Konversation terminated!]
<imirkin_> nvidia up to moderately recent GPUs didn't have a built-in sqrt
<idr> And I suspect the compiler is transforming 1/(1/x) to just x.
<idr> But I didn't actually run the program. :)
<alyssa> ok, reading the CPU asm found a bug yep
<idr> Hm...
<idr> x*rsq(x) might not work for Inf... since rsq(Inf) is probably 0, and Inf*0 is NaN.
<imirkin_> we used to lower it like that
<imirkin_> but it doesn't handle some edge cases
<imirkin_> blob does it as rcp(rsq(x))
<imirkin_> (we even did it using the inf*0=0 mul, still some oddness iirc)
<idr> imirkin_: Any idea what rsq(0) does?
<alyssa> imirkin_: mumble
<idr> Is it Inf or NaN?
<imirkin_> (or maybe we don't have that on nv50 so it didn't work? i forget)
<alyssa> idr: Either way I have a funny 0*inf=0, 0*nan=0 mul I can use
<imirkin_> idr: not 100% sure, sorry
<imirkin_> idr: wait, we have a functional model of it
<imirkin_> let's see
<imirkin_> this applies to the tesla family
<imirkin_> rsq(0) -> Inf, rsq(-0) -> -Inf
<imirkin_> i strongly doubt it'd be different on fermi and later though
<imirkin_> unless there was something specifically in DX which changed from DX10 to DX11
<idr> That's the only behavior that would make rcp(rsq(x)) Just Work.
<idr> alyssa: I guess to answer your previous question... the CTS might test sqrt(0), and the lowering we just might Just Work. :)
<imirkin_> whoa weird. looking at the ancient code, we didn't actually make use of the 0*x = 0 mul variant
<imirkin_> that's probably an oversight. i flipped it to rcp(rsq(x)) since that's what blob used though