#dri-devel on 2022-04-19 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:57 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:00 icecream95 has joined #dri-devel

00:01 nchery is now known as Guest2248

00:01 nchery has joined #dri-devel

00:07 Guest2248 has quit [Ping timeout: 480 seconds]

00:09 slattann has joined #dri-devel

00:21 co1umbarius has joined #dri-devel

00:22 columbarius has quit [Ping timeout: 480 seconds]

00:29 alatiera5 has joined #dri-devel

00:30 alatiera has quit [Ping timeout: 480 seconds]

00:35 ybogdano has quit [Ping timeout: 480 seconds]

00:44 mdnavare has quit [Remote host closed the connection]

00:45 mdnavare has joined #dri-devel

00:55 TD-Linux has quit []

00:57 TD-Linux has joined #dri-devel

01:01 <jekstrand> karolherbst: Ugh... This rebase isn't working. :-/

01:04 lemonzest has quit [Quit: WeeChat 3.4]

01:04 <jekstrand> Ok, got it rebased by deleting some things

01:07 * jekstrand installs rustup

01:10 <jekstrand> karolherbst: Doesn't build with latest rust. :(

01:13 * jekstrand is confused

01:13 slattann has quit []

01:13 Daanct12 has joined #dri-devel

01:16 <karolherbst> jekstrand: ehh.. you need to change rust_std

01:17 <karolherbst> meson configure -Drust_std=2021

01:18 <karolherbst> ported to 2018, because that won't require "extern crate" decls anymore

01:18 <karolherbst> ported to 2021, because you won't have to inport TryFrom/TryInto anymore

01:21 * jekstrand needs to update meson, aparently

01:23 TD-Linux_ has joined #dri-devel

01:27 TD-Linux has quit []

01:27 <jekstrand> Ok, meson updated. Rust updated.

01:31 TD-Linux_ has left #dri-devel [#dri-devel]

01:56 <karolherbst> jekstrand: not sure about rust 2021 though as it's quite new and the benefits are weaker than for 2018 (which also introduced futures)

01:57 <karolherbst> though rust 2021 has disjoint captures, which might come in handy

02:01 <jekstrand> Given that it's new, I think we can require as recent a version of Rust as we want

02:03 <karolherbst> yeah.. it also doesn't have this toolchain mess like C/C++ do

02:04 <airlied> yay aco builds a shader without dying, now to figure out how to launch it

02:05 elongbug has joined #dri-devel

02:08 <jekstrand> airlied: \o/

02:13 * jekstrand kicks off a full run and hopes it doesn't torch his kernel

02:13 <jekstrand> karolherbst: I've got a NUC showing up on Thursday so I'll have a separate test machine and can maybe do some kernel hacking if needed.

02:32 jimjams has joined #dri-devel

02:33 anarsoul has quit [Ping timeout: 480 seconds]

02:33 anarsoul has joined #dri-devel

02:47 Daaanct12 has joined #dri-devel

02:51 heat has quit [Ping timeout: 480 seconds]

02:52 Danct12 has quit [Ping timeout: 480 seconds]

03:00 maxzor has quit [Ping timeout: 480 seconds]

03:02 ngcortes has quit [Ping timeout: 480 seconds]

03:37 aravind has joined #dri-devel

03:49 <jekstrand> karolherbst: Pass 2144 Fails 26 Crashes 1 Timeouts 0

03:53 mclasen has quit [Ping timeout: 480 seconds]

03:57 <jekstrand> karolherbst: Weirdly, the UNORM image tests seem to pass even with NORMALIZED

03:57 <jekstrand> idk what's different

03:58 <jekstrand> Might be a hint, though.

04:03 * jekstrand is so glad CL doesn't have border color...

04:08 elongbug_ has joined #dri-devel

04:08 elongbug has quit [Read error: Connection reset by peer]

04:13 rgallaispou has quit [Ping timeout: 480 seconds]

04:14 rgallaispou has joined #dri-devel

04:16 <Kayden> neat!

04:29 slattann has joined #dri-devel

04:29 <airlied> okay writes to one dwords of a buffer, now to write to the second one

04:36 Duke`` has joined #dri-devel

04:36 elongbug__ has joined #dri-devel

04:39 shankaru has joined #dri-devel

04:43 elongbug_ has quit [Ping timeout: 480 seconds]

04:49 danvet has joined #dri-devel

04:55 rgallaispou has quit [Read error: Connection reset by peer]

04:57 rgallaispou has joined #dri-devel

05:04 jewins1 has joined #dri-devel

05:04 jewins has quit [Remote host closed the connection]

05:06 tlwoerner_ has joined #dri-devel

05:10 tlwoerner has quit [Ping timeout: 480 seconds]

05:15 khfeng has joined #dri-devel

05:17 shankaru has quit [Quit: Leaving.]

05:22 itoral has joined #dri-devel

05:26 <dschuermann> airlied: cool :D

05:28 <dschuermann> airlied: variable workgroup size should be easy, but aco will have to assume some max wormgroup size which restricts occupancy

05:30 <dschuermann> you might just pass max workgroup size in case of variable to make it work

05:31 <dschuermann> err, it doesn't restrict occupancy, quite the opposite: it can create spilling as it needs to be able launch potentially lots of waves

05:32 shankaru has joined #dri-devel

05:35 <airlied> dschuermann, jekstrand, karolherbst : aco just executed it's first opencl kernel :-P

05:37 <dschuermann> \o/

05:38 <airlied> my tree is a wasteland, but it does run a basic test

05:40 jewins1 has quit [Ping timeout: 480 seconds]

05:42 sadlerap has quit [Ping timeout: 480 seconds]

05:42 sadlerap has joined #dri-devel

05:42 <airlied> https://gitlab.freedesktop.org/airlied/mesa/-/commits/radeonsi-aco-clover

05:45 <dschuermann> airlied: there is two options: either clover can pass max_workgroup_size in case of variable, or it already calculates a suitable max depending on the required shared memory (otherwise, aco will have to do that)

06:01 sarnex has quit [Quit: Quit]

06:01 sarnex has joined #dri-devel

06:05 airlied has quit [Remote host closed the connection]

06:05 sdutt has quit [Read error: Connection reset by peer]

06:05 airlied has joined #dri-devel

06:08 dj-death has joined #dri-devel

06:24 mvlad has joined #dri-devel

06:27 Duke`` has quit [Ping timeout: 480 seconds]

06:27 alanc has quit [Remote host closed the connection]

06:28 alanc has joined #dri-devel

06:33 fxkamd has quit []

06:42 lemonzest has joined #dri-devel

06:42 shankaru has quit [Quit: Leaving.]

06:56 <neonking> hello everyone o/ i just made a mesa 22.0.1 build with libglvnd enabled but i'm not really getting the point of libglvnd

06:57 <neonking> as i am running on BSD, i'm not expecting to run binary proprietary drivers, so is it really worth it ?

06:57 tzimmermann has joined #dri-devel

07:00 <neonking> also, the point of my mesa build was to make a wayland-enabled build so i disabled x11 related options, making me wondering even more if libglvnd necessary||useful ?

07:10 jkrzyszt has joined #dri-devel

07:10 nchery has quit [Read error: Connection reset by peer]

07:13 JohnnyonF has joined #dri-devel

07:14 i-garrison has quit []

07:15 i-garrison has joined #dri-devel

07:15 jfalempe has joined #dri-devel

07:18 shankaru has joined #dri-devel

07:20 JohnnyonFlame has quit [Ping timeout: 480 seconds]

07:21 nchery has joined #dri-devel

07:21 itoral has quit [Remote host closed the connection]

07:22 itoral has joined #dri-devel

07:27 jkrzyszt has quit [Remote host closed the connection]

07:27 ahajda has joined #dri-devel

07:29 itoral has quit [Remote host closed the connection]

07:30 itoral has joined #dri-devel

08:09 dj-death has quit [Ping timeout: 480 seconds]

08:14 rpigott has quit [Ping timeout: 480 seconds]

08:18 rasterman has joined #dri-devel

08:18 Gorg has quit [Read error: Connection reset by peer]

08:31 itoral has quit [Remote host closed the connection]

08:31 itoral has joined #dri-devel

08:38 itoral has quit [Remote host closed the connection]

08:38 itoral has joined #dri-devel

08:39 rpigott has joined #dri-devel

08:41 jkrzyszt has joined #dri-devel

08:42 maxzor has joined #dri-devel

08:46 itoral has quit [Remote host closed the connection]

08:47 itoral has joined #dri-devel

08:48 itoral has quit [Remote host closed the connection]

08:48 itoral has joined #dri-devel

08:49 shankaru has quit [Quit: Leaving.]

08:50 pcercuei has joined #dri-devel

08:50 itoral has quit [Remote host closed the connection]

08:50 itoral has joined #dri-devel

08:54 thellstrom has joined #dri-devel

08:57 apinheiro has joined #dri-devel

09:05 Daanct12 has quit [Quit: Leaving]

09:10 dj-death has joined #dri-devel

09:13 itoral has quit [Remote host closed the connection]

09:14 itoral has joined #dri-devel

09:19 dviola has joined #dri-devel

09:30 i-garrison has quit [Read error: Connection reset by peer]

09:31 i-garrison has joined #dri-devel

09:31 shankaru has joined #dri-devel

09:38 <bbrezillon> kusma, jenatali: any objection to merging !15911? I have a bunch of other MRs depending on this one...

09:45 itoral has quit [Remote host closed the connection]

09:46 itoral has joined #dri-devel

09:48 <jenatali> bbrezillon: ugh notifications got disabled for me on that one. I can re-review in a few hours when I'm awake for real

09:52 itoral has quit [Remote host closed the connection]

09:53 <kusma> bbrezillon: looks reasonable enough to me...

09:53 itoral has joined #dri-devel

09:54 itoral has quit [Remote host closed the connection]

09:55 itoral has joined #dri-devel

09:56 <bbrezillon> jenatali: np, it can wait a few more hours ;-)

09:56 <bbrezillon> kusma: thanks

09:59 rkanwal has joined #dri-devel

10:04 maxzor has quit [Ping timeout: 480 seconds]

10:08 jimjams has quit [Quit: Connection closed for inactivity]

10:20 flacks has quit [Quit: Quitter]

10:21 flacks has joined #dri-devel

10:28 icecream95 has quit [Ping timeout: 480 seconds]

10:29 pallavim has joined #dri-devel

10:32 itoral has quit [Remote host closed the connection]

10:33 itoral has joined #dri-devel

10:34 mclasen has joined #dri-devel

10:35 jkrzyszt has quit [Remote host closed the connection]

10:36 alatiera5 has quit []

10:36 alatiera5 has joined #dri-devel

10:37 jkrzyszt has joined #dri-devel

10:37 alatiera5 is now known as alatiera

10:48 <HdkR> Anyone know if there were any DRM ioctl additions between kernel 5.16 and current 5.18-rc3? I've been busy this last month and haven't had the time to look.

10:49 <HdkR> I'm sort of guessing virtio had some changes, but other than that?

10:55 shankaru has quit [Quit: Leaving.]

11:02 Company has joined #dri-devel

11:03 The_Company has joined #dri-devel

11:05 The_Company has quit []

11:05 Company has quit []

11:05 Company has joined #dri-devel

11:10 shankaru has joined #dri-devel

11:30 slattann has quit [Ping timeout: 480 seconds]

11:37 slattann has joined #dri-devel

11:37 rgallaispou1 has joined #dri-devel

11:39 rgallaispou has quit [Remote host closed the connection]

11:41 shankaru has quit []

11:42 shankaru has joined #dri-devel

11:42 MajorBiscuit has joined #dri-devel

11:46 itoral has quit [Remote host closed the connection]

11:47 itoral has joined #dri-devel

11:48 itoral has quit [Remote host closed the connection]

11:49 itoral has joined #dri-devel

11:52 itoral has quit [Remote host closed the connection]

11:53 itoral has joined #dri-devel

11:55 itoral has quit [Remote host closed the connection]

11:55 itoral has joined #dri-devel

12:00 itoral has quit [Remote host closed the connection]

12:01 itoral has joined #dri-devel

12:01 lemonzest has quit [Quit: WeeChat 3.4]

12:02 itoral has quit [Remote host closed the connection]

12:03 itoral has joined #dri-devel

12:03 FireBurn has quit [Quit: Konversation terminated!]

12:04 <karolherbst> jekstrand: yeah... dunno if 26 fails are better or not as with 128 read only images you get a full profile and some things might get tested in more depth

12:04 <karolherbst> but if you have something which works, I can take a look and check

12:05 itoral has quit [Remote host closed the connection]

12:06 <karolherbst> airlied: nice

12:06 itoral has joined #dri-devel

12:08 itoral has quit [Remote host closed the connection]

12:09 itoral has joined #dri-devel

12:12 <alyssa> Dumb floating point question

12:13 <alyssa> Is f2f16(f_32(f2f32(x), f2f32(y))) necessarily equal to f_16(x, y)?

12:13 <alyssa> (where f is fsub in this case but I don't know if that matters)

12:13 <alyssa> tarceri_: ^ investigating the bump shaders

12:13 <karolherbst> alyssa: it's not, question is, if it matters

12:15 <alyssa> karolherbst: context is maybe https://mesa.pages.freedesktop.org/-/mesa/-/jobs/21395833/artifacts/results/summary/results/trace@gl-panfrost-t860@glmark2@bump_bump-render=height.trace.html

12:15 <karolherbst> like imagine x and y being in this "normal in f32 but subnormal in f16 domain"

12:15 <karolherbst> alyssa: f_f32 stands for any float op, correct?

12:15 <daniels> jenatali: hmm, I'm not sure when this started happening, but windows-vs2019 builds are now pulling & building zlib patches with every build, rather than it being available system-wide ... is that new or am I just hallucinating?

12:16 <karolherbst> anyway.. precision matters for floats and stuff can be weird. but GL generally doesn't care

12:16 <jenatali> daniels: not new AFAIK

12:16 <karolherbst> things can get even weirder if you have 1 / 0.00000000x (which is 0.0 in f16, but not 0 in f32 and the fp16 ops generates NaN)

12:17 <daniels> jenatali: fair enough

12:17 <jenatali> daniels: we could (and maybe should) add it to the container rather than pulling it at build time

12:18 itoral has quit []

12:18 <daniels> jenatali: I only noticed because pulling it appears to be heroically slow

12:18 <daniels> like, about as long as the entire Mesa compilation

12:18 <daniels> (could just be a one-off)

12:18 <alyssa> karolherbst: aaah, right..

12:18 <jenatali> Ooh fun

12:18 <karolherbst> alyssa: but we do a lot of optimizations with floats where the result does change, but things are fine, because people still see the "correct" thing

12:19 <alyssa> yeah but some of these could cause unnoticable differences that would cause a trace job to fail, yeah?

12:19 <karolherbst> that's why all of this is quite hard to CI, because things does change, and people have to check if anything broke

12:19 <karolherbst> alyssa: sure, that's why people check the results

12:19 <jenatali> daniels: of course adding it to the container requires rebuilding llvm currently, which is also heroically slow

12:20 <alyssa> tarceri_: ^^ I think you should update the checksum, then

12:20 <alyssa> Slight scheduling differences lead to differences in FP16 vs FP32 use in that shader, and apparently that's not ok

12:20 <karolherbst> uhh

12:21 <karolherbst> scheduling as in "it affects future opts"?

12:21 <karolherbst> because normally scheduling itself shouldn't affect that stuff

12:21 <karolherbst> or at least I would assume it doesn't

12:22 <alyssa> karolherbst: I thought the hw had a "fp16 or fp32" slot

12:22 <alyssa> actually it's an fp32-only slot, that lets you fold in f2f32 on the inputs and f2f16 on the output for free

12:22 <karolherbst> ahh

12:22 <karolherbst> "fun"

12:23 <alyssa> which means the scheduler is turning "fsub16(x, y)" into "f2f16(fsub32(f2f32(x), f2f32(y)))" and while that's legal in GLES it's not necessarily the same

12:23 <karolherbst> yeah..

12:23 <karolherbst> for GL that's probably totally fine

12:23 <karolherbst> not sure about CL :) but you can also just say you support round to nearest but non of that denormal business

12:24 <karolherbst> not sure how well that would work for the fp16 ext though

12:26 <alyssa> if we wanted conformant CL on this chip (we don't), we'd disable this optimization

12:27 <alyssa> or rather, gate it on !exact and pipe through exact into the IR

12:27 <karolherbst> yeah.. probably

12:27 <alyssa> ^backend IR

12:27 <alyssa> tomeu: It does raise some questions about the validity of tracie on t860

12:28 <tomeu> well, right now what it does is to warn you of a possibly unintended change in rendering

12:28 <tomeu> it cannot figure by itself if the change is "good" or "bad"

12:28 <alyssa> sure, I know what the behaviour is, but it raises questions about whether it's acceptable to have midgard+tracie+fp16 in pre-merge CI

12:29 <tomeu> we will need a morality ISA extension in CPUs before then

12:29 <alyssa> (if correct changes to common NIR passes can change things in such a way that the scheduler does something slightly different and then the checksum changes imperceptibly)

12:29 <tomeu> well, how does the value of the capacity to know of unintended changes in rendering compare to the hassle of reacting to them?

12:30 <daniels> well, we could force it to be no-fp16, but then it's only testing things people don't run

12:31 <alyssa> tomeu: TBD.

12:31 <alyssa> This particular issue wasn't on my radar when we started doing trace testing, and in all that time, this is the first time it's caused a possible issue

12:32 <alyssa> So maybe it's irrelevant

12:32 <alyssa> Maybe we want fuzzy image comparisons (like dEQP) instead of straight up checksums, though...?

12:32 <tomeu> hard to say, up to the maintainer of the driver I would say

12:33 <tomeu> we can do fuzzy without much problems, but then we are just changing the problem slightly, but not the fundamental problem

12:33 <alyssa> (In the case under discussion, tracie already says 0 failed pixels and it's unclear what changed at all to my human eyes.)

12:33 <tomeu> but if we can reduce the hassle while maintaining the value, then it could be a good thing to do

12:33 <tomeu> yeah, sorry, that's a bug I should fix

12:33 <alyssa> what bug..?

12:34 <tomeu> the JS that compares the image has a tolerance value, but the job fails without any tolerance

12:34 <alyssa> Ahh, right

12:55 tony[m]12 has joined #dri-devel

12:56 Daaanct12 is now known as Danct12

12:58 sdutt has joined #dri-devel

12:59 tony[m]12 has left #dri-devel [#dri-devel]

12:59 sdutt has quit []

12:59 sdutt has joined #dri-devel

13:00 rgallaispou has joined #dri-devel

13:02 <neonking> gentle ping about some pointer to libglvnd role ?

13:02 <neonking> maybe i'm asking the wrong place?

13:03 rgallaispou1 has quit [Ping timeout: 480 seconds]

13:03 agd5f has quit [Read error: Connection reset by peer]

13:04 agd5f has joined #dri-devel

13:09 shankaru has quit []

13:10 tlwoerner_ has quit []

13:11 tlwoerner has joined #dri-devel

13:14 Emmy_ has quit [Remote host closed the connection]

13:16 Emmy_ has joined #dri-devel

13:24 tonyk has quit []

13:25 tonyk has joined #dri-devel

13:34 alyssa has left #dri-devel [#dri-devel]

14:06 <marex> robertfoss: thanks for the lt9211 review

14:07 <robertfoss> marex: no worries, sorry about beeing a bit slow to get started.

14:07 <marex> robertfoss: no worries

14:07 <robertfoss> marex: it's in better shape than most drivers :)

14:08 <marex> MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: Permission denied (search paths /usr/lib/dri, suffix _dri)

14:08 <marex> what ?

14:08 <marex> chromium what kind of oddity is this new

14:08 <marex> *now

14:11 Haaninjo has joined #dri-devel

14:12 neonking has quit [Remote host closed the connection]

14:13 lemonzest has joined #dri-devel

14:14 alarumbe has joined #dri-devel

14:15 jewins has joined #dri-devel

14:26 neonking has joined #dri-devel

14:27 markyacoub has left #dri-devel [#dri-devel]

14:32 ella-0_ has joined #dri-devel

14:35 <jekstrand> airlied: \o/

14:35 ella-0 has quit [Read error: Connection reset by peer]

14:36 <ajax> neonking: the other thing glvnd (will) get used for is maintaining the drivers for pre-dx9 hardware like first-gen radeons and stuff

14:37 <ajax> neonking: but yeah, if you're building your own everything anyway then glvnd doesn't win you anything, you can safely build without it

14:39 <ajax> neonking: that said, i'm pretty sure you can get nvidia drivers for freebsd still, so if that's the bsd in question and you're planning on sharing this build and/or its recipe with others, maybe keep it

14:43 <neonking> ajax, thanks for the detailed answer ! I'm running OpenBSD here, so unfortunately proprietary drivers aren't a thing. Trying to get a wayland enabled mesa here :)

14:43 fxkamd has joined #dri-devel

14:45 <jekstrand> karolherbst: I got ahold of a friend on the CL team at Intel who's going to see if their driver does something funny for those normalized sampling tests.

14:46 <karolherbst> cool :)

14:46 <karolherbst> jekstrand: do you have a branch with all those sampler/image changes btw? I want to clean up my branch a little

14:50 alyssa has joined #dri-devel

14:51 <alyssa> jekstrand: I kind of want to make nir_block_worklist generic and stick it in util/ (or compiler/util/)

14:51 <jekstrand> karolherbst: rusticl/wip

14:51 <alyssa> I keep open coding work list data structures, wrongly.

14:53 <alyssa> it just treats nir_block as a black box *except* for ->index

14:54 <alyssa> but maybe some macro abomination can workaround that

14:54 <alyssa> `u_block_worklist_push_head(w, block, nir_block, index)`

14:54 <alyssa> util/list style

14:54 <alyssa> and then just `#define nir_block_worklist_push_head(w, block) u_block_worklist_push_head(w, block, nir_block, index)`

14:54 <alyssa> again like we do with util/list

14:56 <alyssa> that data structure makes liveness analysis a lot nicer, for example

14:57 <alyssa> my drivers use sets which is dubious and bit me last night, ir3 is lazy about progress and thus does way more work than it should,

14:57 <jekstrand> alyssa: I guess I'm fine with it.

14:57 <jekstrand> alyssa: I'd call it u_worklist, not u_block_worklist, though. No reason why it needs to be blocks. :)

14:57 <alyssa> lima/ppir is lazy like ir3

14:58 <alyssa> v3d ""

14:58 <jekstrand> alyssa: The only question I have is if we should assume a uint32 index or if we should have a u_worklist_link and then have nir_block have a `union { struct u_worklist_link worlist_link, uint32_t index };`

14:59 <alyssa> honestly dunno what aco does I can't read aco code

14:59 khfeng has quit [Ping timeout: 480 seconds]

15:00 <alyssa> I think intel is lazy, at least I don't see an explicit worklist

15:01 <alyssa> ...point is, lots of drivers would benefit from u_worklist :-p

15:01 <dschuermann> we usually just use an index and have the blocks strictly enumerated

15:01 * alyssa blinks

15:02 <dschuermann> and all blocks are in an std::vector with program->blocks[block_idx] ;)

15:02 <alyssa> sure... what do you use for a worklist though? a BITSET?

15:05 <dschuermann> no, just the index. updating the index becomes something like worklist = std::max(worklist, preds[i])

15:05 <alyssa> so with complex CF you reprocess some blocks unnecessarily?

15:06 <dschuermann> I think in loops, we could skip nested cf as long as phis stay untouched, but otherwise... meh

15:07 <dschuermann> maybe I can change it to idom on second iterations

15:09 <alyssa> NIR's algorithm seems pretty reasonable

15:10 neonking has quit [Remote host closed the connection]

15:11 <dschuermann> I think with a second index, you can entirely remove bitset or set. just keep track of what has yet to be visited

15:20 <karolherbst> jekstrand: mhh "Pass 2165 Fails 11 Crashes 0 Timeouts 0"

15:20 <karolherbst> jekstrand: contractions contractions_float_4 to 7 are real iris bugs btw

15:20 <alyssa> Wooo!

15:20 <karolherbst> regressions from swtiching to FULL_PROFILE

15:21 <karolherbst> "0) Error for float kernel5: -(-0x1.834b5ap-7 * -0x1.89133ep-70 + -0x1.2955e4p-76) = *-0x0p+0 vs. 0x0p+0" :(

15:21 <karolherbst> jekstrand: not sure why you got more fails though

15:21 <karolherbst> :D

15:22 MajorBiscuit has quit [Ping timeout: 480 seconds]

15:23 * karolherbst updated the tracker

15:23 <karolherbst> soo 4 fails are fp32 precision things

15:23 <karolherbst> 5 are clamping

15:23 <karolherbst> 1 is an llvm bug

15:23 <karolherbst> 1 is potentially a CTS bug

15:24 <jekstrand> karolherbst: I may have a different opencl-c-base.h or something. A bunch of my fails look like they're macro issues.

15:24 <karolherbst> ohh compiler fails?

15:24 <jekstrand> karolherbst: For contractions, yeah, I need to look at that one.

15:24 <karolherbst> jekstrand: you don't have a patched spirv-link, correct?

15:24 <jekstrand> karolherbst: Likely, we're not obeying nir_alu_instr::exact

15:24 <karolherbst> https://github.com/karolherbst/SPIRV-Tools/commit/cc21829ede1001b757807d90acc2fff017a32ce0

15:24 <jekstrand> karolherbst: I don't

15:25 <karolherbst> this one you need

15:25 <karolherbst> fixes all those compiler linking issues

15:26 <jekstrand> cool

15:28 <jekstrand> karolherbst: Hrm... We're not fusing mul+add. :-/

15:28 <karolherbst> mhh? is that a problem?

15:29 <jekstrand> karolherbst: I'm pretty sure that test is making sure you don't

15:29 <karolherbst> btw.. I think we still use libclcs fma emulation

15:29 <jekstrand> And we don't

15:29 <jekstrand> so I'm confused

15:29 <jekstrand> Ugh... it's -0 vs. +0

15:29 <karolherbst> yeah...

15:30 <karolherbst> the most annoying kind of fails

15:30 <jekstrand> I bet this is idr's negative re-distribution

15:30 <karolherbst> like who cares, but then: CL is like: yeah I do

15:30 <karolherbst> potentially

15:30 <jekstrand> Does it pass on llvmpipe?

15:30 <karolherbst> it fails even worse

15:31 <karolherbst> "Error for float kernel5: -(-0x1.56ad4ep-46 * 0x1.0efd88p-101 + 0x0p+0) = *-0x0p+0 vs. 0x1.8p-147"

15:31 <jekstrand> Bingo

15:34 maxzor has joined #dri-devel

15:35 <karolherbst> wow.. the CTS just needs twice as much time when running with -j2 instead 0f -j24 :D

15:35 <karolherbst> guess we are not CPU bottlenecked at all

15:36 <karolherbst> -j4 should be a good enough workaround for that kernel bug then

15:37 <karolherbst> ohh.. I should drop -w and see how long that takes with kernel caching

15:37 <karolherbst> chances are it's fast

15:37 slattann has quit []

15:39 <karolherbst> let's go then

15:40 heat has joined #dri-devel

15:40 <karolherbst> ahh yeah.. conversions are CPU bottlenecked because the CTS calculates all the stuff as well

15:40 <karolherbst> annoying

15:42 <jekstrand> The next question is why is multiply distribution wrong. What miserable little bit about -0 and I missing. :'(

15:43 <jekstrand> I know fadd(x, 0) != fadd(x, -0) but I don't remember why.

15:43 nchery is now known as Guest2307

15:43 nchery has joined #dri-devel

15:44 Duke`` has joined #dri-devel

15:44 <alyssa> jekstrand: x = 0

15:44 nchery has quit []

15:44 <alyssa> er

15:44 <alyssa> x = -0

15:44 <karolherbst> jekstrand: Inf + 0.0 != Inf -0.0, no?

15:44 nchery has joined #dri-devel

15:44 <alyssa> -0 + 0 = +0

15:44 <karolherbst> ehh wait

15:44 <alyssa> -0 + -0 = -0

15:44 <karolherbst> yeah, that

15:44 <karolherbst> Inf was a problem elsewhere

15:44 <alyssa> whyyy did i read the ieee 754 spec

15:45 <karolherbst> alyssa: why did I read the CL spec :p

15:45 <jekstrand> alyssa: Because it's useful, sadly.

15:46 <karolherbst> it's annoying that most of the ieee stuff even has a good reason for being like that

15:47 <karolherbst> jekstrand: we might need to revisit adding a "CL vs GL precision rules" flag or something :( it's just annoying that CL allows you do optimize according to weaker precision

15:47 <karolherbst> well.. if you specify the compiler flag

15:49 <karolherbst> there is also like "-cl-no-signed-zeros" :(

15:49 elongbug__ has quit [Read error: Connection reset by peer]

15:49 <karolherbst> so you can even decide what opts to turn on

15:50 elongbug__ has joined #dri-devel

15:50 Guest2307 has quit [Ping timeout: 480 seconds]

15:51 i-garrison has quit [Remote host closed the connection]

15:51 i-garrison has joined #dri-devel

15:53 <cwabbott> karolherbst: you know we have the same thing in vulkan, right?

15:53 <karolherbst> I don't

15:53 <alyssa> float_controls

15:54 <karolherbst> ahh

15:54 <alyssa> nir_is_float_control_signed_zero_inf_nan_preserve, etc

15:54 pallavim_ has joined #dri-devel

15:54 <cwabbott> you should be trying to map OpenCL semantics onto the NIR version of that, and maybe expanding it if there's something missing

15:55 <karolherbst> it's not covering everything though

15:55 <cwabbott> what's it missing?

15:55 <cwabbott> I believe it does really cover everything, it allows for ieee-compliant behavior

15:56 <cwabbott> (except for exceptions, which... I hope we never have to support)

15:56 <karolherbst> allow fmul+fadd to fmad/ffma and splitting no-signed-zero and finite-math

15:56 <jekstrand> I thought for sure I saved off a copy of IEEE 754 before I left Intel...

15:57 <karolherbst> looks like float_controls either disables -0.0 and NaN/Inf, or enables both, no?

15:57 <cwabbott> yes, it's not split out

15:57 pallavim has quit [Ping timeout: 480 seconds]

15:57 <cwabbott> but you can still fuse even if NaN/Inf and signed zero are preserved

15:57 <karolherbst> yeah

15:57 <cwabbott> that wasn't spelled out, but it will be soon hopefully

15:58 <karolherbst> anyway, it would need a bit of work, but seems like we already have some of that stuff

15:58 <cwabbott> you can disallow it be setting NoContraction (which maps to nir_alu_instr::exact)

15:58 <cwabbott> *by

15:58 <karolherbst> yeah... clang might actually do something like that already

15:59 <karolherbst> I know that there is something in the spirv about it

15:59 <cwabbott> OpenCL has a whole different thing

15:59 <karolherbst> would need to take another look at some point

15:59 <cwabbott> which was copied from LLVM

15:59 <cwabbott> and it's precise-by-default

15:59 <karolherbst> doesn't matter if we only operate on spirvs

15:59 <cwabbott> nope, it's a whole different thing *in SPIR-V*

15:59 <karolherbst> ahh

15:59 <karolherbst> could be

15:59 <cwabbott> there's a different spir-v instruction

15:59 <cwabbott> and different default

16:00 <cwabbott> vtn has to behave differently if ingesting CL SPIR-V and map it to the appropriate NIR thing

16:00 <karolherbst> could be :)

16:00 <cwabbott> so sad that it wasn't unified at the time, but that's how it is

16:00 <karolherbst> yeah

16:01 <cwabbott> also the CL SPIR-V thing can be per-instruction, which we can't do

16:01 <karolherbst> not anymore

16:01 <karolherbst> it was removed from the CL spec

16:01 <cwabbott> ?

16:01 <karolherbst> I think CL 1.0 supports it

16:01 <karolherbst> but not 1.1

16:02 <karolherbst> they didn't bother putting it in the spec that they removed it afiak

16:03 <alyssa> round modes got yeeted in CL1.1 right?

16:03 <karolherbst> per instruction round modes, yes

16:03 <karolherbst> you still have them at converts though

16:03 <cwabbott> it's in the SPIR-V spec for sure

16:03 <cwabbott> https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#FP_Fast_Math_Mode

16:04 <cwabbott> in fact it's only per-instruction, I think

16:04 <karolherbst> cwabbott: it's in the spec, but with CL1.1 you can't generate such spir-vs

16:04 <tango_> rounding modes control was part of an 1.0 extension

16:04 <karolherbst> it;s 1.0 only

16:04 <cwabbott> wow, that's confusing

16:04 <tango_> and it was a 1.0-only extension

16:04 <karolherbst> tango_: I thought it was actually core

16:04 <cwabbott> karolherbst: but how do you even ask for no-nan/no-inf globally?

16:05 <tango_> karolherbst: cl_khr_select_fprounding_mode

16:05 <karolherbst> yeah

16:05 <cwabbott> I don't see where it allows you to decorate anything except an individual floating-point instruction

16:05 <karolherbst> it's a pragma

16:05 <tango_> honestly I was rather annoyed by the deprecation of it. my rant: http://wok.oblomov.eu/tecnologia/gpgpu/opencl-rounding-modes/

16:05 <tango_> karolherbst: it's an extension. not all devices are required to support it

16:05 <karolherbst> ahh you were that :D

16:05 <tango_> you need to enable the extension first

16:06 <tango_> (which requires device support)

16:06 <tango_> yeah, I'm that guy 8-D

16:06 <karolherbst> cwabbott: also intel has a CL extension for some of that stuff

16:06 <karolherbst> per instruction I think

16:06 <karolherbst> but this old thing is just gone

16:06 <karolherbst> basically, and I think nobody uses that

16:06 tjmercier has quit [Quit: Page closed]

16:07 <cwabbott> karolherbst: yeah, looks like it adds some of the newer llvm flags

16:07 <cwabbott> the CL fast-math stuff is copied from llvm

16:07 <cwabbott> vulkan is from-scratch

16:07 <cwabbott> well, afaict from-scratch

16:08 <karolherbst> there is something the translator sets for global thing..

16:08 <cwabbott> hence the differences like vulkan being per-shader and CL being per instruction and having separate inf/nan and signed-zero flags

16:08 <karolherbst> ContractionOff

16:09 <karolherbst> cwabbott: CL is practically per kernel though

16:09 <jekstrand> karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16041

16:09 <karolherbst> \o/

16:09 ybogdano has joined #dri-devel

16:11 <tango_> FP contraction should be a different thing from inf/nan/signed-zero handling though, isn't it?

16:11 <tango_> (and from rounding modes)

16:11 <karolherbst> yeah.. should be

16:11 <karolherbst> I need to take a look at all the details here at some point and do the right thing (tm)

16:11 <jekstrand> Yes, but those tests trick us with redistribution rules.

16:11 <alyssa> jekstrand: the other question with u_worklist is that it's designed for IRs that store blocks as a linked list

16:12 <karolherbst> oh wow.. now I crashed my machine with -j4

16:12 <alyssa> for backends that don't do much CF manipulation, it's more efficient to store as an array, in which case some of the complexity of nir_worklist wouldn't be needed

16:12 <alyssa> (Namely, storing pointers to the blocks, as opposed to just a queue of integer indices)

16:13 <alyssa> (and in that case there's no need for any macros/templating/etc)

16:14 elongbug_ has joined #dri-devel

16:15 <karolherbst> jekstrand: I just hope that this clamping bug isn't like terribly annoying to fix

16:16 <alyssa> OTOH, there's not much overhead from doing it nir_worklist style even if you have an array of blocks

16:16 <karolherbst> like "you have to calculate coords in the kernel" kind of annoying

16:17 shankaru has joined #dri-devel

16:19 <karolherbst> "Pass 2169 Fails 7 Crashes 0 Timeouts 0: 100%" :)

16:21 <karolherbst> once the clamping stuff is fixed I will do a CTS run for real

16:21 <karolherbst> I can workaround the other issues locally

16:21 elongbug__ has quit [Ping timeout: 480 seconds]

16:37 * alyssa shuffles the data structures harder, I think this is good now

16:38 <karolherbst> jekstrand: https://github.com/intel/compute-runtime/blob/82f27e882d863a322e5534bf01113f5eb298ad1c/shared/source/gen12lp/os_agnostic_hw_info_config_gen12lp.inl#L20

16:38 <karolherbst> that's something I found inside intels stack

16:38 <karolherbst> might be a good hint on what might go wrong :)

16:39 <karolherbst> like maybe the filtering filter is always low for mesa

16:41 <karolherbst> but besides that there doesn't seem to be anything they are doing... unless I missed it

16:41 <jekstrand> karolherbst: Pretty sure iris sets to always high. :-/

16:41 <karolherbst> ahh :(

16:41 <karolherbst> maybe we have to set low in some cases

16:41 <karolherbst> :D

16:42 <karolherbst> seems like a debug var though

16:42 <karolherbst> :(

16:42 <jekstrand> We set it to FULL. I'll play with others.

16:42 tzimmermann has quit [Quit: Leaving]

16:44 <jekstrand> karolherbst: Yeah, all modes there fail

16:44 <karolherbst> :(

16:44 gouchi has joined #dri-devel

16:44 <karolherbst> jekstrand: is anything of that implemented inside shaders? like coords adjustments? Or is that all the hw?

16:45 <jekstrand> Should all be HW

16:45 <karolherbst> mhh

16:45 <jekstrand> OpenCL meeting is going on right now. I'm hoping to hear from bashbaugh after it's over.

16:45 <karolherbst> cool

16:49 aravind has quit [Ping timeout: 480 seconds]

16:50 <cwabbott> karolherbst: looks like contraction is per-kernel and zero/inf/nan is per-instruction

16:50 <karolherbst> yeah.. possibly

16:50 <cwabbott> except the intel extension allows contraction per-instruction, I think

16:51 <cwabbott> that's annoyingly different to how nir works (which is derived from vulkan float_controls)

16:51 <karolherbst> it's a bit hard to care about per instruction zero/inf/nan if nothing uses it though :) Although I would be interested on how that looks like in the spirv tbh

16:51 <karolherbst> like if you do the global falg

16:51 <karolherbst> *flag

16:52 <cwabbott> right, sounds like you still need to conseratively derive the global flag from the per-instruction flag

16:52 <cwabbott> or sth like that

16:52 dllud has quit [Ping timeout: 480 seconds]

16:52 <karolherbst> yeah.. dunno

16:54 <alyssa> me: Now that I have u_worklist in bifrost, I can make this data flow pass so much simpler

16:55 <alyssa> also me: Wait this pass doesn't need a worklist at all, just a trivial recursion

16:55 <alyssa> Good work Alyssa from 2020

17:00 dllud has joined #dri-devel

17:08 stuartsummers has quit []

17:21 shankaru has quit [Quit: Leaving.]

17:22 MajorBiscuit has joined #dri-devel

17:22 mclasen has quit [Remote host closed the connection]

17:27 dllud has quit [Ping timeout: 480 seconds]

17:36 mi6x3m has joined #dri-devel

17:37 <mi6x3m> hey friends, can anyone tell me where libGL is linked against libgallium-dri?

17:37 <mi6x3m> or does it happen during runtime?

17:38 <anholt> dri drivers are dlopened by GL at runtime.

17:38 dllud has joined #dri-devel

17:38 <mi6x3m> seems to be different for gallium because it statically links all drivers

17:38 <mi6x3m> or does it just delegate?

17:39 <anholt> what I said is true, and all the gallium drivers are linked together into a single dri driver to save disk space.

17:39 <mi6x3m> ok, so it all goes through loader.c as the old drivers?

17:45 <jekstrand> karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16044

17:45 <jekstrand> karolherbst: That's what I did for panfrost and, given that iris uses memcpy for buffer_subdata, I don't figure it's worse. So maybe all that blorp code I wrote isn't really needed.

17:46 <jekstrand> Still need to figure out clear_texture on panfrost but that probably does need to happen on the GPU.

17:47 Kayden has quit [Quit: reboot]

17:51 eletrotupi has quit [Remote host closed the connection]

17:51 clever has quit [Ping timeout: 480 seconds]

17:57 <mi6x3m> anholt, I see it now, libgallium_dri is linked to lib<driver>_dri.so

17:57 <mi6x3m> in install_megadrivers.py

17:57 <mi6x3m> so when you load lib<driver>_dri.so you always load libgallium_dri and you get the respective config

17:58 mbrost has joined #dri-devel

17:59 eletrotupi has joined #dri-devel

18:01 <karolherbst> jekstrand: cool

18:01 LexSfX has quit [Ping timeout: 480 seconds]

18:01 LexSfX has joined #dri-devel

18:04 mclasen has joined #dri-devel

18:04 Kayden has joined #dri-devel

18:06 <anholt> whee, looks like my MR is going to time out on windows again.

18:06 shankaru has joined #dri-devel

18:11 dllud has quit [Read error: Connection reset by peer]

18:14 ybogdano has quit [Ping timeout: 480 seconds]

18:16 clever has joined #dri-devel

18:17 <alyssa> wee :(

18:18 <karolherbst> jekstrand: ohh.. I found something interesting in the compute runtime..

18:18 <karolherbst> https://github.com/intel/compute-runtime/blob/2c1bfbb5b223a2b3aa6e0b65135eed9bdf465558/opencl/source/kernel/kernel.cpp#L1662

18:18 <karolherbst> getSnapWaValue: https://github.com/intel/compute-runtime/blob/25c71a6c13ce6e636d687e8e1dc560250af652a3/opencl/source/sampler/sampler.cpp#L155

18:18 dllud has joined #dri-devel

18:19 <karolherbst> samplerSnapWa set here: https://github.com/intel/compute-runtime/blob/090bfb9642bf60b74db0ab30335343a24746e2ee/shared/source/kernel/kernel_descriptor_from_patchtokens.cpp#L414

18:19 <karolherbst> not quite sure what it does yet

18:20 <jekstrand> karolherbst: uh oh... I think you may have found it

18:20 <karolherbst> yeah

18:20 <jekstrand> Let me dig and see if I can figure out what that does

18:21 <karolherbst> there are too many abstractions layer in the compute runtime :(

18:21 <jekstrand> yeah

18:21 <jekstrand> Welcome to Intel driver code

18:21 * jekstrand pulls IGC

18:23 <karolherbst> apparently you can dump those tokens

18:23 <karolherbst> coordinateSnapWaRequired is referenced in shared/source/device_binary_format/patchtokens_dumper.cpp

18:23 heat_ has joined #dri-devel

18:23 <karolherbst> what the hell is DATA_PARAMETER_SAMPLER_COORDINATE_SNAP_WA_REQUIRED

18:24 <karolherbst> feels like there is either macro magic or some other lib involved

18:24 heat has quit [Read error: No route to host]

18:25 ahajda has quit [Quit: Going offline, see ya! (www.adiirc.com)]

18:25 <karolherbst> uhh.. maybe I just digged too deep now

18:26 <karolherbst> or it comes out of llvm...

18:27 <karolherbst> no clue..

18:32 eletrotupi has quit [Quit: Bye]

18:32 * karolherbst clones igc

18:33 <karolherbst> bingo

18:33 heat_ has quit [Ping timeout: 480 seconds]

18:33 <karolherbst> jekstrand: sooo.. they add a new kernel arg

18:33 <karolherbst> KernelArg::ArgType::IMPLICIT_SAMPLER_SNAP_WA

18:34 <karolherbst> but what runtime value do they bind to it...

18:35 <karolherbst> IGC/Compiler/Optimizer/OpenCLPasses/ImageFuncs/ImageFuncResolution.cpp

18:35 <karolherbst> IGC/Compiler/Optimizer/OpenCLPasses/ImageFuncs/ImageFuncsAnalysis.cpp

18:35 <karolherbst> both reference SAMPLER_SNAP_WA

18:35 mi6x3m has quit [Quit: Leaving]

18:36 <karolherbst> __builtin_IB_get_snap_wa_reqd

18:36 <jekstrand> IGC/Compiler/tests/ImageFuncResolution/get_image_snap_wa_required.ll

18:36 <karolherbst> :)

18:36 <karolherbst> we arrived at the same thing at the same time :)

18:37 <jekstrand> hrm... that's a test. :-/

18:37 <karolherbst> jekstrand: IGC/BiFModule/Implementation/IGCBiF_Intrinsics.cl

18:37 <karolherbst> ehh wait

18:37 <karolherbst> there it's just declared

18:37 ybogdano has joined #dri-devel

18:37 <karolherbst> jekstrand: next step: their llvm fork

18:38 <jekstrand> karolherbst: THere is no llvm fork

18:38 <karolherbst> mhh

18:38 <jekstrand> Or at least you shouldn't need one

18:38 <karolherbst> ohh?

18:38 <alyssa> cthis is horrifying

18:38 <alyssa> whatever you're talking about

18:38 <karolherbst> what's that then? https://github.com/intel/llvm

18:38 <alyssa> horrifying

18:38 <karolherbst> maybe just random stuff then

18:39 <karolherbst> jekstrand: ahh now

18:39 <karolherbst> *no

18:39 <karolherbst> that intrinsic is a red hering

18:39 <jekstrand> IGC/Compiler/Optimizer/OpenCLPasses/ImageFuncs/ImageFuncResolution.cpp

18:39 <karolherbst> it gets resolved to imageFunc = &m_argMap[ImplicitArg::SAMPLER_SNAP_WA];

18:39 <karolherbst> so it just loads the arg

18:40 <jekstrand> IGC/BiFModule/Implementation/images.cl rather

18:40 <karolherbst> yeah

18:40 <karolherbst> at least tells us how it's used

18:40 <jekstrand> float4 SPIRV_OVERLOADABLE SPIRV_BUILTIN(ImageSampleExplicitLod, _v4f32_img2d_ro_v2f32_i32_f32, _Rfloat4)(__spirv_

18:41 <jekstrand> SampledImage_2D SampledImage, float2 Coordinate, int ImageOperands, float Lod)

18:41 <jekstrand> {

18:41 <jekstrand> int image_id = (int)__builtin_IB_get_image(SampledImage);

18:41 <jekstrand> int sampler_id = (int)__builtin_IB_get_sampler(SampledImage);

18:41 <jekstrand> float2 snappedCoords = Coordinate;

18:41 <jekstrand> if (__builtin_IB_get_snap_wa_reqd(sampler_id) != 0)

18:41 <jekstrand> {

18:41 <jekstrand> snappedCoords.x = (Coordinate.x < 0) ? -1.0f : Coordinate.x;

18:41 <jekstrand> snappedCoords.y = (Coordinate.y < 0) ? -1.0f : Coordinate.y;

18:41 <jekstrand> }

18:41 <jekstrand> return __builtin_IB_OCL_2d_sample_l(image_id, sampler_id, snappedCoords, Lod);

18:41 <jekstrand> }

18:41 <Kayden> Yeah, I've built IGC with upstream LLVM...11. It's been a while since I've looked at it.

18:42 <Kayden> IIRC they have a patched version that tunes some thresholds like copy propagation passes to be tuned less for CPUs with limited registers and toward GPUs with more, and things like that, but they're not necessary to run

18:43 <airlied> so all vals < 0 get rounded to -1.0?

18:43 <jekstrand> airlied: Aparently

18:43 <jekstrand> I can wire that up in iris but eewww

18:44 <karolherbst> mhhh

18:44 <jekstrand> I think iris is about to grow a new magic system value. Kayden, I'm sorry.

18:44 <karolherbst> LOL

18:45 <karolherbst> jekstrand: btw.. for CL runtimes we declare the COMPUTE_ONLY screen flag or whatever it's called

18:45 <karolherbst> not sure if that's usefull here or if we have to change the sampler API

18:45 <karolherbst> probably don't want that for GL

18:45 <karolherbst> or vk

18:46 <jekstrand> Yeah, ver much no.

18:46 <jekstrand> *very

18:46 <jekstrand> But, also, kind-of meh. It's only for rectangle textures.

18:46 <jekstrand> hrm...

18:46 <jekstrand> Yeah, I think we only want this for CL

18:46 iive has joined #dri-devel

18:46 <jekstrand> I may be able to key it off SHADER_STAGE_KERNEL

18:46 <karolherbst> probably

18:47 <karolherbst> yeah.. sounds like the better idea

18:47 <Kayden> I'm not finding where this is actually applied

18:47 <jekstrand> Kayden: https://github.com/intel/intel-graphics-compiler/blob/master/IGC/BiFModule/Implementation/images.cl#L39

18:48 <Kayden> yes but what platforms

18:48 <jekstrand> Kayden: All, AFAICT

18:48 <jekstrand> Why they haven't fixed the HW, I have no idea. Doesn't seem like it'd be many gates

18:49 <Kayden> but only for OpenCL?

18:49 <jekstrand> yup

18:49 <Kayden> how is OpenCL any different?

18:49 <jekstrand> It has very nasty tests

18:49 <karolherbst> jekstrand: maybe nobody who pays money uses those clamp modes :p

18:50 eletrotupi has joined #dri-devel

18:50 mbrost has quit [Read error: Connection reset by peer]

18:50 <karolherbst> and also CL_FILTER_NEAREST _and_ CL_ADDRESS_CLAMP

18:50 <jekstrand> yup

18:50 mbrost has joined #dri-devel

18:50 <karolherbst> I guess nobody cares

18:50 <jekstrand> Don't want that for REPEAT

18:50 <airlied> is opencl clamp the crappy clamp to border?

18:51 <jekstrand> airlied: No

18:51 <airlied> oh no there is no borders

18:51 <jekstrand> airlied: CL doesn't have border at all

18:51 <airlied> so it's clamp to edge

18:51 <karolherbst> no

18:51 <karolherbst> it's clamp to border

18:51 <karolherbst> SAMPLER_ADDRESSING_MODE_CLAMP == CL_ADDRESS_CLAMP

18:51 <karolherbst> just border is all 0

18:52 <Kayden> and you've translated CL_ADDRESS_CLAMP to CLAMP_TO_BORDER?

18:52 <karolherbst> ehh wait..

18:52 <karolherbst> I looked at the wrong code

18:52 <karolherbst> but yeah

18:52 <karolherbst> CL_ADDRESS_CLAMP == PIPE_TEX_WRAP_CLAMP_TO_BORDER

18:52 <karolherbst> CL_ADDRESS_CLAMP_TO_EDGE is PIPE_TEX_WRAP_CLAMP_TO_EDGE,

18:53 <Kayden> Yep. "CL_ADDRESS_CLAMP - Out-of-range image coordinates are assigned a border color value."

18:53 <karolherbst> there is just no border color value :)

18:53 <karolherbst> like what's a border color?

18:53 <karolherbst> CL never says anything about that

18:53 maxzor has quit [Ping timeout: 480 seconds]

18:54 <Kayden> it really doesn't

18:54 <jenatali> Pretty sure it does say it's zero somewhere

18:54 <karolherbst> maybe

18:54 <karolherbst> "images have a 0 border*" the *: some ext might allow you to change that

18:54 <karolherbst> :P

18:55 <karolherbst> jenatali: but the CL spec contains the word "border" exactly once

18:55 <Kayden> https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_ADDRESS_CLAMP - the word "border" appears exactly one time

18:56 <jenatali> Huh. Yeah nevermind. Guess I'd just assumed it said zero when I hooked it up

18:56 <karolherbst> maybe it's phrased like "OOB image accesses return the color 0,0,0,0" or something

18:57 <karolherbst> ahh

18:57 <airlied> like it seems like GL should be able to hit that case as well

18:57 shankaru has quit []

18:57 <karolherbst> the OpenCL C spec has more to say

18:57 <karolherbst> airlied: GL probably doesn't care

18:57 <karolherbst> Kayden, jenatali: "CLK_ADDRESS_CLAMP - out-of-range image coordinates will return a border color [66]" 66: "This is similar to the GL_ADDRESS_CLAMP_TO_BORDER addressing mode." :D

18:58 <karolherbst> and: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#determining-the-border-color-or-value

18:58 <karolherbst> the heck...

18:58 <jenatali> Yeah, still doesn't say what that border is though

18:59 <karolherbst> "the border color is (0.0f, 0.0f, 0.0f, 0.0f)."?

18:59 MajorBiscuit has quit [Ping timeout: 480 seconds]

18:59 <jekstrand> karolherbst: Can CL generate anything besides tex and txl?

18:59 <karolherbst> jekstrand: txf for samplerless

18:59 <karolherbst> or is that tex?

18:59 <jekstrand> that's txf

18:59 <karolherbst> k

19:00 <jekstrand> It doesn't have bias or tg4 or anything, right?

19:00 <karolherbst> uhm... I don't think so

19:00 <karolherbst> there are some extensions though

19:00 <karolherbst> but not sure if they add that kind of stuff

19:00 <jekstrand> of course there are....

19:00 <karolherbst> the ext I know of add things like msaa and depth and the likes

19:00 <karolherbst> maybe there is one for tg4 and bias as well

19:01 <karolherbst> maybe gl_Sharing defines random stuff... dunno

19:01 <karolherbst> we can deal with it when we find something :)

19:01 <jekstrand> yup

19:02 <karolherbst> my hope is that we can implement gl sharing as a basic noop :D

19:02 <karolherbst> will be fun

19:03 <jenatali> Oh, it does say the border color

19:03 <jenatali> jekstrand: No, I don't think there's a tg4

19:04 <jenatali> Pretty sure it's just txf, tex, txl, txs

19:05 MajorBiscuit has joined #dri-devel

19:22 ngcortes has joined #dri-devel

19:23 dllud has quit [Ping timeout: 480 seconds]

19:28 apinheiro has quit [Ping timeout: 480 seconds]

19:32 <jekstrand> karolherbst: Ugh... Clamp is per-coordinate-dimension, isn't it?

19:32 <karolherbst> jekstrand: sure it is

19:33 <jekstrand> Oh, well. I guess we're passing 3 32-bit bitfields, then.

19:33 <karolherbst> but it doesn't matter for border

19:33 <jekstrand> Yeah, idk about that

19:33 <karolherbst> as you always get the border color, no?

19:33 <karolherbst> per component should only matter for edge

19:33 <karolherbst> or well.. repeat

19:33 <jekstrand> Yeah, we only care about this for edge

19:35 <karolherbst> but shouldn't that be simply a bcsel on the coord vecs? or what do you mean by passing 3 bitfields?

19:36 <karolherbst> jekstrand: ohh wait

19:36 <karolherbst> I think I missunderstood your question

19:36 <karolherbst> CL has the same clamp method on all coords

19:36 <karolherbst> GL/gallium doesn't

19:36 <alyssa> jekstrand: "guess we're passing 3 32-bit bitfields," hello PIPE_CAP_GL_CLAMP speaking how may I help you

19:37 <Kayden> ah, that makes it easier then

19:37 <karolherbst> yep, very much so

19:37 <jekstrand> karolherbst: It does? Awesome!

19:37 <karolherbst> even min/mag is the same

19:38 <Kayden> also...32-bit bitfields? can't you have 128 images?

19:38 <karolherbst> Kayden: samplers

19:38 <karolherbst> you need 16 for full and 8 for embedded

19:38 <Kayden> oh, wow, okay

19:38 <karolherbst> 128 read images, 64 write images though

19:38 <karolherbst> for full

19:38 <karolherbst> 8/8 for embedded

19:39 <karolherbst> not sure if there is a test testing that though.. :D

19:39 <karolherbst> I mean ... per type yes

19:39 <karolherbst> but not 128 read + 64 write + 16 samplers

19:40 dllud has joined #dri-devel

19:41 <karolherbst> jekstrand: btw.. if drivers have to add workarounds for wrapping modes, we might have to add a configurable default mode

19:42 <karolherbst> atm I default to PIPE_TEX_WRAP_CLAMP_TO_EDGE, but other hw might have to do workarounds there

19:42 <Kayden> spec says CLAMP is the default though

19:42 <karolherbst> it does? I thought it's impl defined

19:43 <karolherbst> "CL_ADDRESS_NONE - Behavior is undefined for out-of-range image coordinates."

19:43 <Kayden> https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_ADDRESS_CLAMP

19:43 <Kayden> "The default is CL_ADDRESS_CLAMP."

19:43 <karolherbst> werid..

19:43 <karolherbst> why does the spec disagree with itself

19:43 <karolherbst> heh

19:43 <karolherbst> ohhh

19:44 <karolherbst> Kayden: it matters for samplerless I think...

19:44 <karolherbst> but you have to specify one value

19:44 <karolherbst> when creating a sampler

19:44 <karolherbst> ehh.. I guess with properties that's possible

19:44 <karolherbst> okay.. so on the API it's _CLAMP, but if you specify _NONE we can choose whatever

19:45 <Kayden> yep

19:45 mbrost has quit [Ping timeout: 480 seconds]

19:45 <karolherbst> yeah, I meant what we should do with _NONE :)

19:45 mbrost has joined #dri-devel

19:52 dllud has quit [Ping timeout: 480 seconds]

20:05 <alyssa> Oh right I need to debug !15991

20:05 <alyssa> i wrote !16046 while waiting for it to compile, oops :p

20:07 neonking has joined #dri-devel

20:08 <karolherbst> *sigh* "error: options cl_khr_fp64 and __opencl_c_fp64 are set to different values"

20:08 <jekstrand> karolherbst: Ok, I've got the workaround typed up and it's still not passing. :-/

20:08 <jekstrand> More digging, I guess.

20:08 <karolherbst> :(

20:08 <karolherbst> does it change the outcome a little though?

20:10 <jekstrand> doesn't seem to

20:10 <jekstrand> But I may still be missing something

20:10 <karolherbst> potentially

20:11 MajorBiscuit has quit [Ping timeout: 480 seconds]

20:11 pcercuei has quit [Quit: brb]

20:12 <karolherbst> jekstrand: what if you always apply the workaround and only run NORMALIZED CLAMP tests?

20:12 <jekstrand> karolherbst: Tried that. :)

20:12 <karolherbst> ehh nearest

20:12 <karolherbst> mhh

20:12 <karolherbst> annoying

20:13 mvlad has quit [Remote host closed the connection]

20:13 <jekstrand> I'm going to try applying the workaround in the CTS test itself

20:13 <karolherbst> let's see what's acutally happening on the tests

20:13 MajorBiscuit has joined #dri-devel

20:14 <karolherbst> jekstrand: yeah.. it makes no sense if that workaround should fix the fails we are seeing

20:14 <karolherbst> "Sample 0: coord {0.012158(0x1.8e6528p-7)} did not validate!"

20:14 <karolherbst> which.. is above 0

20:15 <jekstrand> wait... why is this resinfo in here?

20:15 <karolherbst> "test_image_streams read 1D CL_R CL_SIGNED_INT8 NORMALIZED CL_FILTER_NEAREST CL_ADDRESS_CLAMP"

20:15 <karolherbst> just look at this one

20:16 <karolherbst> this feels more like a rounding issue

20:16 <jekstrand> right, get_image_width()

20:16 <jekstrand> ugh

20:16 <jekstrand> Yeah, there's no clamping funnyness there. :-(

20:17 <karolherbst> I am sure we need the workaround later though :P

20:17 pcercuei has joined #dri-devel

20:17 <jekstrand> probably

20:18 <karolherbst> there is no processing on the coords at all.. mhh

20:18 <jekstrand> idk why I didn't think to check 1D first... Drp

20:19 <karolherbst> jekstrand: there are more workarounds in that file

20:20 <karolherbst> jekstrand: https://github.com/intel/intel-graphics-compiler/blob/master/IGC/BiFModule/Implementation/images.cl#L62

20:20 <karolherbst> that looks like... _fun_

20:21 <karolherbst> ohh wait.. that's for lod == 0.0

20:21 <karolherbst> not sure what the default lod is.. probably 1.0?

20:22 <jekstrand> 0

20:22 <karolherbst> ahh no.. 0

20:22 <karolherbst> yeah...

20:22 <karolherbst> explicit lod is behind cl_khr_mipmap_image

20:23 <karolherbst> yeah.. I guess you need to do this as well :)

20:24 <karolherbst> now that's annoying

20:24 <karolherbst> jekstrand: I think we should handle that in some kind of generic lowering

20:24 <karolherbst> ahh wait ... no

20:24 <karolherbst> I thought about something else

20:24 <jekstrand> karolherbst: Yeah, they're straight-up lowering normalized coordinates away in the shader. :facepalm:

20:25 <karolherbst> :)

20:25 <karolherbst> don't we have tex lowering for that?

20:25 <karolherbst> although I guess that's a bit too specific

20:26 <jekstrand> karolherbst: We have lowering going the other way. (-:

20:26 <karolherbst> also explains why it already fails at the first sample :)

20:26 <karolherbst> jekstrand: just mirror it then :P

20:27 <karolherbst> we have some ugly tex lowering as well.. I guess everybody has

20:27 <jekstrand> This is just truly rubbish, though.

20:27 <karolherbst> for images we have to do all of that inside the shader regardless

20:27 rasterman has quit [Quit: Gettin' stinky!]

20:27 <jekstrand> normalized coordinates are something everyone has in hardware since forever. Why can't Intel get it right?!?!

20:28 <jekstrand> *sigh*

20:28 <karolherbst> :D

20:28 <karolherbst> jekstrand: I don't think nv has it in hw either...

20:29 <karolherbst> ehh

20:29 <karolherbst> not for all targets

20:29 <karolherbst> seems like we have to normalize for cubes ourselves

20:29 <karolherbst> and for all images

20:33 <jekstrand> cubes are a bit of a different case

20:34 heat_ has joined #dri-devel

20:35 <alyssa> why would you need basic gfx functionality in a gpu??

20:36 apinheiro has joined #dri-devel

20:36 Duke`` has quit [Ping timeout: 480 seconds]

20:39 <Kayden> isn't that what adding an mgag200 chip is for?

20:44 * jekstrand is so very confused

20:45 jkrzyszt has quit [Ping timeout: 480 seconds]

20:45 <jekstrand> This all looks so incredibly reasonable and yet I look at the cl files and it looks like they're doing piles of stuff in shaders.

20:53 dllud has joined #dri-devel

20:57 Haaninjo has quit [Quit: Ex-Chat]

20:59 <karolherbst> mhhh.. so what am I doing wrong for get_kernel_arg_info

21:02 <karolherbst> ehh they always test read_write...

21:02 <karolherbst> but why...

21:03 dllud has quit [Ping timeout: 480 seconds]

21:04 <karolherbst> jenatali: can you remember issues with api get_kernel_arg_info?

21:04 maxzor has joined #dri-devel

21:05 <jenatali> Uh... I do remember it being tricky to get right, but not impossible

21:05 <karolherbst> it fails for me for read_write image tests

21:05 <jenatali> What's the failure?

21:05 <karolherbst> which I don't support, but they try to compile that stuff regardless

21:05 <jenatali> Oh that sounds familiar

21:05 <karolherbst> yeah...

21:06 <karolherbst> I guess your solution was to just support read write images? :D

21:06 <karolherbst> or don't claim CL 3.0?

21:06 <jenatali> Oh, no I was thinking of something else: https://github.com/KhronosGroup/OpenCL-CTS/issues/872

21:06 <jenatali> But I wouldn't be surprised if it's the same thing

21:06 <karolherbst> ahh

21:06 <karolherbst> probaby

21:07 <karolherbst> anyway, something where one can say "uhh.. bug in the CTS"

21:07 <karolherbst> should probably just fix it then

21:08 <karolherbst> compiler features_macro is just screwed inside llvm :(

21:08 <jenatali> But yeah, read-write images is something that's optional in D3D12. I can support it on some hardware but not universally

21:08 <karolherbst> I'd rather add support for that stuff later

21:09 <karolherbst> fixing the CTS is trivial here

21:09 <airlied> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13449 ?

21:09 * airlied should actually land that

21:10 <karolherbst> probably

21:10 <karolherbst> or just use clc :P

21:10 <airlied> ah yeah clc already gets that right I think

21:10 <karolherbst> it does

21:10 <karolherbst> so now how to fix test_compiler features_macro ...

21:10 <karolherbst> I fixed the header, but now it fails for fp64 :(

21:10 <alyssa> panfrost has a lot of code to allow configuring the stride of linear images explicitly in vulkan... I can't find where Vulkan allows this though..?

21:11 <karolherbst> "error: options cl_khr_fp64 and __opencl_c_fp64 are set to different values" :(

21:12 <karolherbst> airlied: is there an API to disable CL exts in clang?

21:12 ybogdano has quit [Ping timeout: 480 seconds]

21:12 <karolherbst> I don't even know what enables cl_khr_fp64

21:12 <airlied> karolherbst: I think that's part of the pain of -base.h stuff

21:12 <karolherbst> it's not

21:13 <karolherbst> well.. fp64 isn't

21:13 <karolherbst> for the other fails I fixed the header

21:13 <airlied> I've vague memories of tracing through that one, but no ideas of what it was now

21:13 <karolherbst> but something enables the fp64 ext and it's not the header

21:14 <karolherbst> and I checked with clinfo that I don't advertise the ext either

21:14 <karolherbst> odd

21:16 <karolherbst> it's not the kernel either

21:17 <alyssa> oh. VkImageDrmFormatModifierExplicitCreateInfoEXT.

21:20 <karolherbst> airlied: ahhh.. I thinkg I got it worked out now

21:21 <karolherbst> airlied: c->getTargetOpts().OpenCLExtensionsAsWritten.push_back("-all"); :)

21:21 <daniels> alyssa: yep, gbm

21:21 <alyssa> daniels: how... delightful.

21:22 <daniels> alyssa: yes, but no kmsro

21:22 <alyssa> \ o /

21:23 <karolherbst> what

21:23 <karolherbst> the

21:23 <karolherbst> heck

21:23 <karolherbst> who designed an API like that

21:24 <airlied> karolherbst: people munging thing into their compiler stacks without supervision :-P

21:24 <karolherbst> airlied: so if I pass -all as the first thing, and push +.. entries.. guess what happens?

21:26 <karolherbst> the fuck...

21:26 <karolherbst> with all it just writes into a different map...

21:26 <karolherbst> how does that make any sense

21:26 ybogdano has joined #dri-devel

21:27 <karolherbst> ehhh.. no... something else is weird

21:28 dllud has joined #dri-devel

21:28 <daniels> try -0.0

21:29 <karolherbst> shader caching happened :)

21:30 <karolherbst> but I still need to push two values

21:30 <karolherbst> cl_khr_3d_image_writes and __opencl_c_3d_image_writes

21:31 <karolherbst> "PASSED test." :)

21:31 <karolherbst> jekstrand: soo.. clamping is the only thing left now :D

21:31 gouchi has quit [Remote host closed the connection]

21:35 danvet has quit [Ping timeout: 480 seconds]

21:38 MajorBiscuit has quit [Quit: WeeChat 3.4]

21:39 <jekstrand> karolherbst: ugh

21:39 <jekstrand> karolherbst: I so wish the Intel CL driver code were readable.

21:40 <jekstrand> Maybe I need to figure out how to build it?

21:40 <jekstrand> That sounds hellish

21:40 <karolherbst> it is

21:41 <jekstrand> I'm going to write a pass to lower away normalized texture coordinates entirely. We'll see how that goes.

21:42 <jekstrand> ugh...

21:42 <jekstrand> I'm really going to need to distinguish between CL and GL if I want to do that. :-/

21:45 <jekstrand> Oh, this is all truly horrible

21:49 <karolherbst> https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/fd9b0fc7861a10c0eba56115c0e0b2f5165604ce :3

21:50 <karolherbst> jenatali: any thoughts of an interface like this? ^^

21:50 <karolherbst> *on

21:51 <jenatali> karolherbst: LGTM

21:51 <karolherbst> cool

21:51 <karolherbst> will extract all my libclc change then at some point this week

21:52 <karolherbst> that i915 bug is really annoying btw

21:53 <karolherbst> but fun that working on kernel caching made it even trigger in the first place

21:55 <karolherbst> ehh.. I need all those CL 1.0 and 1.1 exts as well :(

21:55 <karolherbst> but cool, we can be very explicit then

21:55 ahajda has joined #dri-devel

21:56 <karolherbst> jenatali: do you have any devices not support those CL 1.1 atomic exts or cl_khr_byte_addressable_store?

21:57 <jenatali> Don't think so

21:57 <jenatali> Well, we emulate byte addressable store, DXIL can't express it :(

21:57 <karolherbst> ahh

21:57 <jenatali> (Yet)

21:57 apinheiro has quit [Quit: Leaving]

21:59 <karolherbst> this "Fails 0" looks so nice :D

22:00 <karolherbst> jekstrand: why do you need to build the intel stack though?

22:00 <jekstrand> I'm not going to. I'm just going to implement all the pain

22:00 <karolherbst> :D

22:00 <karolherbst> okay

22:00 ybogdano has quit [Ping timeout: 480 seconds]

22:03 <airlied> karolherbst: now to add it to CI :-P

22:03 <karolherbst> :D

22:03 <daniels> karolherbst: !

22:03 mbrost has quit [Remote host closed the connection]

22:03 <karolherbst> well.. the iamge tests are still failing, but it does show "fails 0" for quite some time

22:03 mbrost has joined #dri-devel

22:04 <karolherbst> but if nothing broke only 5 fails remain

22:04 <karolherbst> and hopefully jekstrand will fix those :p

22:04 <karolherbst> airlied: we still need to fix LLVM :(

22:05 <karolherbst> and spirv-tools, but I do have a PR for that already: https://github.com/KhronosGroup/SPIRV-Tools/pull/4784

22:07 <karolherbst> guess I can cross that "[ ] write a conformant CL 3.0 implementation" on my bucket list

22:08 <alyssa> :D

22:08 <karolherbst> nice "Pass 2171 Fails 5 Crashes 0 Timeouts 0"

22:09 <karolherbst> the thing is though... I don't think it makes sense to run the real CTS if my machine will just crash midway

22:12 mbrost has quit [Ping timeout: 480 seconds]

22:16 fxkamd has quit []

22:18 ahajda has quit [Quit: Going offline, see ya! (www.adiirc.com)]

22:19 mbrost has joined #dri-devel

22:27 maxzor has quit [Ping timeout: 480 seconds]

22:28 mbrost has quit [Ping timeout: 480 seconds]

22:31 lygstate has joined #dri-devel

22:31 ybogdano has joined #dri-devel

22:42 jewins has quit [Ping timeout: 480 seconds]

22:45 mbrost has joined #dri-devel

22:49 bgs has quit [Remote host closed the connection]

22:49 bgs has joined #dri-devel

22:51 lygstate has quit [Write error: connection closed]

22:55 mbrost has quit [Ping timeout: 480 seconds]

22:58 pcercuei has quit [Quit: dodo]

23:06 <karolherbst> jekstrand: uhhh....

23:06 <karolherbst> I think I found the i915 bug

23:06 <karolherbst> or at least part of it

23:06 jewins has joined #dri-devel

23:06 <karolherbst> i915_vma_reopen is just racy

23:07 <jekstrand> yeah

23:07 <karolherbst> calling i915_vma_is_closed without taking the lock

23:07 <karolherbst> so it checks if it's closed

23:07 <karolherbst> and if it is, it takes the lock and does this remove_closed thing

23:07 <karolherbst> but it needs to take the lock before checking

23:07 <karolherbst> there might be more, but that one looks obviously wrong

23:08 <jekstrand> karolherbst: There's probably more

23:08 <karolherbst> I'll try that one change and see how that goes :D

23:09 <jekstrand> In general, I'd recommend against going down this rabbit-hole....

23:10 <karolherbst> I know

23:10 <karolherbst> but I'l just looked at it for 10 seconds and already found this one...

23:10 <karolherbst> if that's enough.. good

23:11 <karolherbst> if not.. I'll probably ignore it

23:11 <karolherbst> and if it makes the crash less likely, that already helps me anyway

23:11 icecream95 has joined #dri-devel

23:12 ramaling has quit [Remote host closed the connection]

23:12 ramaling has joined #dri-devel

23:21 morphis has quit [Ping timeout: 480 seconds]

23:21 morphis has joined #dri-devel

23:22 camus has quit [Remote host closed the connection]

23:22 camus has joined #dri-devel

23:23 <karolherbst> mhh, let me while true it, at least the first run didn't crash my machine yet

23:23 bgs_ has joined #dri-devel

23:26 bgs has quit [Ping timeout: 480 seconds]

23:29 <karolherbst> jekstrand: yeah soo.. I think I fixed it :)

23:30 <jekstrand> I don't doubt that you fixed something. :)

23:30 <karolherbst> if my CTS runs don't crash my system anymore, that's good enough for me :)

23:30 <jekstrand> :)

23:30 iive has quit []

23:30 <karolherbst> the fix is trivial though, so I doubt anybody would mind it

23:31 <karolherbst> and the code looks obviously incorrect

23:31 <karolherbst> I'll let it do a few more rounds, but it looks better at least

23:35 rkanwal has quit [Quit: rkanwal]

23:35 rkanwal has joined #dri-devel

23:39 <karolherbst> yeah.. so it's quite stable now, it alreayd survived 6 rounds, where before it like crashed in 2 out of 3 runs

23:45 <karolherbst> so let's see what others say about the patch