ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
icecream95 has joined #dri-devel
nchery is now known as Guest2248
nchery has joined #dri-devel
Guest2248 has quit [Ping timeout: 480 seconds]
slattann has joined #dri-devel
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
alatiera5 has joined #dri-devel
alatiera has quit [Ping timeout: 480 seconds]
ybogdano has quit [Ping timeout: 480 seconds]
mdnavare has quit [Remote host closed the connection]
mdnavare has joined #dri-devel
TD-Linux has quit []
TD-Linux has joined #dri-devel
<jekstrand> karolherbst: Ugh... This rebase isn't working. :-/
lemonzest has quit [Quit: WeeChat 3.4]
<jekstrand> Ok, got it rebased by deleting some things
* jekstrand installs rustup
<jekstrand> karolherbst: Doesn't build with latest rust. :(
* jekstrand is confused
slattann has quit []
Daanct12 has joined #dri-devel
<karolherbst> jekstrand: ehh.. you need to change rust_std
<karolherbst> meson configure -Drust_std=2021
<karolherbst> ported to 2018, because that won't require "extern crate" decls anymore
<karolherbst> ported to 2021, because you won't have to inport TryFrom/TryInto anymore
* jekstrand needs to update meson, aparently
TD-Linux_ has joined #dri-devel
TD-Linux has quit []
<jekstrand> Ok, meson updated. Rust updated.
TD-Linux_ has left #dri-devel [#dri-devel]
<karolherbst> jekstrand: not sure about rust 2021 though as it's quite new and the benefits are weaker than for 2018 (which also introduced futures)
<karolherbst> though rust 2021 has disjoint captures, which might come in handy
<jekstrand> Given that it's new, I think we can require as recent a version of Rust as we want
<karolherbst> yeah.. it also doesn't have this toolchain mess like C/C++ do
<airlied> yay aco builds a shader without dying, now to figure out how to launch it
elongbug has joined #dri-devel
<jekstrand> airlied: \o/
* jekstrand kicks off a full run and hopes it doesn't torch his kernel
<jekstrand> karolherbst: I've got a NUC showing up on Thursday so I'll have a separate test machine and can maybe do some kernel hacking if needed.
jimjams has joined #dri-devel
anarsoul has quit [Ping timeout: 480 seconds]
anarsoul has joined #dri-devel
Daaanct12 has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
Danct12 has quit [Ping timeout: 480 seconds]
maxzor has quit [Ping timeout: 480 seconds]
ngcortes has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
<jekstrand> karolherbst: Pass 2144 Fails 26 Crashes 1 Timeouts 0
mclasen has quit [Ping timeout: 480 seconds]
<jekstrand> karolherbst: Weirdly, the UNORM image tests seem to pass even with NORMALIZED
<jekstrand> idk what's different
<jekstrand> Might be a hint, though.
* jekstrand is so glad CL doesn't have border color...
elongbug_ has joined #dri-devel
elongbug has quit [Read error: Connection reset by peer]
rgallaispou has quit [Ping timeout: 480 seconds]
rgallaispou has joined #dri-devel
<Kayden> neat!
slattann has joined #dri-devel
<airlied> okay writes to one dwords of a buffer, now to write to the second one
Duke`` has joined #dri-devel
elongbug__ has joined #dri-devel
shankaru has joined #dri-devel
elongbug_ has quit [Ping timeout: 480 seconds]
danvet has joined #dri-devel
rgallaispou has quit [Read error: Connection reset by peer]
rgallaispou has joined #dri-devel
jewins1 has joined #dri-devel
jewins has quit [Remote host closed the connection]
tlwoerner_ has joined #dri-devel
tlwoerner has quit [Ping timeout: 480 seconds]
khfeng has joined #dri-devel
shankaru has quit [Quit: Leaving.]
itoral has joined #dri-devel
<dschuermann> airlied: cool :D
<dschuermann> airlied: variable workgroup size should be easy, but aco will have to assume some max wormgroup size which restricts occupancy
<dschuermann> you might just pass max workgroup size in case of variable to make it work
<dschuermann> err, it doesn't restrict occupancy, quite the opposite: it can create spilling as it needs to be able launch potentially lots of waves
shankaru has joined #dri-devel
<airlied> dschuermann, jekstrand, karolherbst : aco just executed it's first opencl kernel :-P
<dschuermann> \o/
<airlied> my tree is a wasteland, but it does run a basic test
jewins1 has quit [Ping timeout: 480 seconds]
sadlerap has quit [Ping timeout: 480 seconds]
sadlerap has joined #dri-devel
<dschuermann> airlied: there is two options: either clover can pass max_workgroup_size in case of variable, or it already calculates a suitable max depending on the required shared memory (otherwise, aco will have to do that)
sarnex has quit [Quit: Quit]
sarnex has joined #dri-devel
airlied has quit [Remote host closed the connection]
sdutt has quit [Read error: Connection reset by peer]
airlied has joined #dri-devel
dj-death has joined #dri-devel
mvlad has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
fxkamd has quit []
lemonzest has joined #dri-devel
shankaru has quit [Quit: Leaving.]
<neonking> hello everyone o/ i just made a mesa 22.0.1 build with libglvnd enabled but i'm not really getting the point of libglvnd
<neonking> as i am running on BSD, i'm not expecting to run binary proprietary drivers, so is it really worth it ?
tzimmermann has joined #dri-devel
<neonking> also, the point of my mesa build was to make a wayland-enabled build so i disabled x11 related options, making me wondering even more if libglvnd necessary||useful ?
jkrzyszt has joined #dri-devel
nchery has quit [Read error: Connection reset by peer]
JohnnyonF has joined #dri-devel
i-garrison has quit []
i-garrison has joined #dri-devel
jfalempe has joined #dri-devel
shankaru has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
nchery has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
ahajda has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
dj-death has quit [Ping timeout: 480 seconds]
rpigott has quit [Ping timeout: 480 seconds]
rasterman has joined #dri-devel
Gorg has quit [Read error: Connection reset by peer]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
rpigott has joined #dri-devel
jkrzyszt has joined #dri-devel
maxzor has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
shankaru has quit [Quit: Leaving.]
pcercuei has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
thellstrom has joined #dri-devel
apinheiro has joined #dri-devel
Daanct12 has quit [Quit: Leaving]
dj-death has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
dviola has joined #dri-devel
i-garrison has quit [Read error: Connection reset by peer]
i-garrison has joined #dri-devel
shankaru has joined #dri-devel
<bbrezillon> kusma, jenatali: any objection to merging !15911? I have a bunch of other MRs depending on this one...
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<jenatali> bbrezillon: ugh notifications got disabled for me on that one. I can re-review in a few hours when I'm awake for real
itoral has quit [Remote host closed the connection]
<kusma> bbrezillon: looks reasonable enough to me...
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<bbrezillon> jenatali: np, it can wait a few more hours ;-)
<bbrezillon> kusma: thanks
rkanwal has joined #dri-devel
maxzor has quit [Ping timeout: 480 seconds]
jimjams has quit [Quit: Connection closed for inactivity]
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
icecream95 has quit [Ping timeout: 480 seconds]
pallavim has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
mclasen has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
alatiera5 has quit []
alatiera5 has joined #dri-devel
jkrzyszt has joined #dri-devel
alatiera5 is now known as alatiera
<HdkR> Anyone know if there were any DRM ioctl additions between kernel 5.16 and current 5.18-rc3? I've been busy this last month and haven't had the time to look.
<HdkR> I'm sort of guessing virtio had some changes, but other than that?
shankaru has quit [Quit: Leaving.]
Company has joined #dri-devel
The_Company has joined #dri-devel
The_Company has quit []
Company has quit []
Company has joined #dri-devel
shankaru has joined #dri-devel
slattann has quit [Ping timeout: 480 seconds]
slattann has joined #dri-devel
rgallaispou1 has joined #dri-devel
rgallaispou has quit [Remote host closed the connection]
shankaru has quit []
shankaru has joined #dri-devel
MajorBiscuit has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.4]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
FireBurn has quit [Quit: Konversation terminated!]
<karolherbst> jekstrand: yeah... dunno if 26 fails are better or not as with 128 read only images you get a full profile and some things might get tested in more depth
<karolherbst> but if you have something which works, I can take a look and check
itoral has quit [Remote host closed the connection]
<karolherbst> airlied: nice
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<alyssa> Dumb floating point question
<alyssa> Is f2f16(f_32(f2f32(x), f2f32(y))) necessarily equal to f_16(x, y)?
<alyssa> (where f is fsub in this case but I don't know if that matters)
<alyssa> tarceri_: ^ investigating the bump shaders
<karolherbst> alyssa: it's not, question is, if it matters
<karolherbst> like imagine x and y being in this "normal in f32 but subnormal in f16 domain"
<karolherbst> alyssa: f_f32 stands for any float op, correct?
<daniels> jenatali: hmm, I'm not sure when this started happening, but windows-vs2019 builds are now pulling & building zlib patches with every build, rather than it being available system-wide ... is that new or am I just hallucinating?
<karolherbst> anyway.. precision matters for floats and stuff can be weird. but GL generally doesn't care
<jenatali> daniels: not new AFAIK
<karolherbst> things can get even weirder if you have 1 / 0.00000000x (which is 0.0 in f16, but not 0 in f32 and the fp16 ops generates NaN)
<daniels> jenatali: fair enough
<jenatali> daniels: we could (and maybe should) add it to the container rather than pulling it at build time
itoral has quit []
<daniels> jenatali: I only noticed because pulling it appears to be heroically slow
<daniels> like, about as long as the entire Mesa compilation
<daniels> (could just be a one-off)
<alyssa> karolherbst: aaah, right..
<jenatali> Ooh fun
<karolherbst> alyssa: but we do a lot of optimizations with floats where the result does change, but things are fine, because people still see the "correct" thing
<alyssa> yeah but some of these could cause unnoticable differences that would cause a trace job to fail, yeah?
<karolherbst> that's why all of this is quite hard to CI, because things does change, and people have to check if anything broke
<karolherbst> alyssa: sure, that's why people check the results
<jenatali> daniels: of course adding it to the container requires rebuilding llvm currently, which is also heroically slow
<alyssa> tarceri_: ^^ I think you should update the checksum, then
<alyssa> Slight scheduling differences lead to differences in FP16 vs FP32 use in that shader, and apparently that's not ok
<karolherbst> uhh
<karolherbst> scheduling as in "it affects future opts"?
<karolherbst> because normally scheduling itself shouldn't affect that stuff
<karolherbst> or at least I would assume it doesn't
<alyssa> karolherbst: I thought the hw had a "fp16 or fp32" slot
<alyssa> actually it's an fp32-only slot, that lets you fold in f2f32 on the inputs and f2f16 on the output for free
<karolherbst> ahh
<karolherbst> "fun"
<alyssa> which means the scheduler is turning "fsub16(x, y)" into "f2f16(fsub32(f2f32(x), f2f32(y)))" and while that's legal in GLES it's not necessarily the same
<karolherbst> yeah..
<karolherbst> for GL that's probably totally fine
<karolherbst> not sure about CL :) but you can also just say you support round to nearest but non of that denormal business
<karolherbst> not sure how well that would work for the fp16 ext though
<alyssa> if we wanted conformant CL on this chip (we don't), we'd disable this optimization
<alyssa> or rather, gate it on !exact and pipe through exact into the IR
<karolherbst> yeah.. probably
<alyssa> ^backend IR
<alyssa> tomeu: It does raise some questions about the validity of tracie on t860
<tomeu> well, right now what it does is to warn you of a possibly unintended change in rendering
<tomeu> it cannot figure by itself if the change is "good" or "bad"
<alyssa> sure, I know what the behaviour is, but it raises questions about whether it's acceptable to have midgard+tracie+fp16 in pre-merge CI
<tomeu> we will need a morality ISA extension in CPUs before then
<alyssa> (if correct changes to common NIR passes can change things in such a way that the scheduler does something slightly different and then the checksum changes imperceptibly)
<tomeu> well, how does the value of the capacity to know of unintended changes in rendering compare to the hassle of reacting to them?
<daniels> well, we could force it to be no-fp16, but then it's only testing things people don't run
<alyssa> tomeu: TBD.
<alyssa> This particular issue wasn't on my radar when we started doing trace testing, and in all that time, this is the first time it's caused a possible issue
<alyssa> So maybe it's irrelevant
<alyssa> Maybe we want fuzzy image comparisons (like dEQP) instead of straight up checksums, though...?
<tomeu> hard to say, up to the maintainer of the driver I would say
<tomeu> we can do fuzzy without much problems, but then we are just changing the problem slightly, but not the fundamental problem
<alyssa> (In the case under discussion, tracie already says 0 failed pixels and it's unclear what changed at all to my human eyes.)
<tomeu> but if we can reduce the hassle while maintaining the value, then it could be a good thing to do
<tomeu> yeah, sorry, that's a bug I should fix
<alyssa> what bug..?
<tomeu> the JS that compares the image has a tolerance value, but the job fails without any tolerance
<alyssa> Ahh, right
tony[m]12 has joined #dri-devel
Daaanct12 is now known as Danct12
sdutt has joined #dri-devel
tony[m]12 has left #dri-devel [#dri-devel]
sdutt has quit []
sdutt has joined #dri-devel
rgallaispou has joined #dri-devel
<neonking> gentle ping about some pointer to libglvnd role ?
<neonking> maybe i'm asking the wrong place?
rgallaispou1 has quit [Ping timeout: 480 seconds]
agd5f has quit [Read error: Connection reset by peer]
agd5f has joined #dri-devel
shankaru has quit []
tlwoerner_ has quit []
tlwoerner has joined #dri-devel
Emmy_ has quit [Remote host closed the connection]
Emmy_ has joined #dri-devel
tonyk has quit []
tonyk has joined #dri-devel
alyssa has left #dri-devel [#dri-devel]
<marex> robertfoss: thanks for the lt9211 review
<robertfoss> marex: no worries, sorry about beeing a bit slow to get started.
<marex> robertfoss: no worries
<robertfoss> marex: it's in better shape than most drivers :)
<marex> MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: Permission denied (search paths /usr/lib/dri, suffix _dri)
<marex> what ?
<marex> chromium what kind of oddity is this new
<marex> *now
Haaninjo has joined #dri-devel
neonking has quit [Remote host closed the connection]
lemonzest has joined #dri-devel
alarumbe has joined #dri-devel
jewins has joined #dri-devel
neonking has joined #dri-devel
markyacoub has left #dri-devel [#dri-devel]
ella-0_ has joined #dri-devel
<jekstrand> airlied: \o/
ella-0 has quit [Read error: Connection reset by peer]
<ajax> neonking: the other thing glvnd (will) get used for is maintaining the drivers for pre-dx9 hardware like first-gen radeons and stuff
<ajax> neonking: but yeah, if you're building your own everything anyway then glvnd doesn't win you anything, you can safely build without it
<ajax> neonking: that said, i'm pretty sure you can get nvidia drivers for freebsd still, so if that's the bsd in question and you're planning on sharing this build and/or its recipe with others, maybe keep it
<neonking> ajax, thanks for the detailed answer ! I'm running OpenBSD here, so unfortunately proprietary drivers aren't a thing. Trying to get a wayland enabled mesa here :)
fxkamd has joined #dri-devel
<jekstrand> karolherbst: I got ahold of a friend on the CL team at Intel who's going to see if their driver does something funny for those normalized sampling tests.
<karolherbst> cool :)
<karolherbst> jekstrand: do you have a branch with all those sampler/image changes btw? I want to clean up my branch a little
alyssa has joined #dri-devel
<alyssa> jekstrand: I kind of want to make nir_block_worklist generic and stick it in util/ (or compiler/util/)
<jekstrand> karolherbst: rusticl/wip
<alyssa> I keep open coding work list data structures, wrongly.
<alyssa> it just treats nir_block as a black box *except* for ->index
<alyssa> but maybe some macro abomination can workaround that
<alyssa> `u_block_worklist_push_head(w, block, nir_block, index)`
<alyssa> util/list style
<alyssa> and then just `#define nir_block_worklist_push_head(w, block) u_block_worklist_push_head(w, block, nir_block, index)`
<alyssa> again like we do with util/list
<alyssa> that data structure makes liveness analysis a lot nicer, for example
<alyssa> my drivers use sets which is dubious and bit me last night, ir3 is lazy about progress and thus does way more work than it should,
<jekstrand> alyssa: I guess I'm fine with it.
<jekstrand> alyssa: I'd call it u_worklist, not u_block_worklist, though. No reason why it needs to be blocks. :)
<alyssa> lima/ppir is lazy like ir3
<alyssa> v3d ""
<jekstrand> alyssa: The only question I have is if we should assume a uint32 index or if we should have a u_worklist_link and then have nir_block have a `union { struct u_worklist_link worlist_link, uint32_t index };`
<alyssa> honestly dunno what aco does I can't read aco code
khfeng has quit [Ping timeout: 480 seconds]
<alyssa> I think intel is lazy, at least I don't see an explicit worklist
<alyssa> ...point is, lots of drivers would benefit from u_worklist :-p
<dschuermann> we usually just use an index and have the blocks strictly enumerated
* alyssa blinks
<dschuermann> and all blocks are in an std::vector with program->blocks[block_idx] ;)
<alyssa> sure... what do you use for a worklist though? a BITSET?
<dschuermann> no, just the index. updating the index becomes something like worklist = std::max(worklist, preds[i])
<alyssa> so with complex CF you reprocess some blocks unnecessarily?
<dschuermann> I think in loops, we could skip nested cf as long as phis stay untouched, but otherwise... meh
<dschuermann> maybe I can change it to idom on second iterations
<alyssa> NIR's algorithm seems pretty reasonable
neonking has quit [Remote host closed the connection]
<dschuermann> I think with a second index, you can entirely remove bitset or set. just keep track of what has yet to be visited
<karolherbst> jekstrand: mhh "Pass 2165 Fails 11 Crashes 0 Timeouts 0"
<karolherbst> jekstrand: contractions contractions_float_4 to 7 are real iris bugs btw
<alyssa> Wooo!
<karolherbst> regressions from swtiching to FULL_PROFILE
<karolherbst> "0) Error for float kernel5: -(-0x1.834b5ap-7 * -0x1.89133ep-70 + -0x1.2955e4p-76) = *-0x0p+0 vs. 0x0p+0" :(
<karolherbst> jekstrand: not sure why you got more fails though
<karolherbst> :D
MajorBiscuit has quit [Ping timeout: 480 seconds]
* karolherbst updated the tracker
<karolherbst> soo 4 fails are fp32 precision things
<karolherbst> 5 are clamping
<karolherbst> 1 is an llvm bug
<karolherbst> 1 is potentially a CTS bug
<jekstrand> karolherbst: I may have a different opencl-c-base.h or something. A bunch of my fails look like they're macro issues.
<karolherbst> ohh compiler fails?
<jekstrand> karolherbst: For contractions, yeah, I need to look at that one.
<karolherbst> jekstrand: you don't have a patched spirv-link, correct?
<jekstrand> karolherbst: Likely, we're not obeying nir_alu_instr::exact
<jekstrand> karolherbst: I don't
<karolherbst> this one you need
<karolherbst> fixes all those compiler linking issues
<jekstrand> cool
<jekstrand> karolherbst: Hrm... We're not fusing mul+add. :-/
<karolherbst> mhh? is that a problem?
<jekstrand> karolherbst: I'm pretty sure that test is making sure you don't
<karolherbst> btw.. I think we still use libclcs fma emulation
<jekstrand> And we don't
<jekstrand> so I'm confused
<jekstrand> Ugh... it's -0 vs. +0
<karolherbst> yeah...
<karolherbst> the most annoying kind of fails
<jekstrand> I bet this is idr's negative re-distribution
<karolherbst> like who cares, but then: CL is like: yeah I do
<karolherbst> potentially
<jekstrand> Does it pass on llvmpipe?
<karolherbst> it fails even worse
<karolherbst> "Error for float kernel5: -(-0x1.56ad4ep-46 * 0x1.0efd88p-101 + 0x0p+0) = *-0x0p+0 vs. 0x1.8p-147"
<jekstrand> Bingo
maxzor has joined #dri-devel
<karolherbst> wow.. the CTS just needs twice as much time when running with -j2 instead 0f -j24 :D
<karolherbst> guess we are not CPU bottlenecked at all
<karolherbst> -j4 should be a good enough workaround for that kernel bug then
<karolherbst> ohh.. I should drop -w and see how long that takes with kernel caching
<karolherbst> chances are it's fast
slattann has quit []
<karolherbst> let's go then
heat has joined #dri-devel
<karolherbst> ahh yeah.. conversions are CPU bottlenecked because the CTS calculates all the stuff as well
<karolherbst> annoying
<jekstrand> The next question is why is multiply distribution wrong. What miserable little bit about -0 and I missing. :'(
<jekstrand> I know fadd(x, 0) != fadd(x, -0) but I don't remember why.
nchery is now known as Guest2307
nchery has joined #dri-devel
Duke`` has joined #dri-devel
<alyssa> jekstrand: x = 0
nchery has quit []
<alyssa> er
<alyssa> x = -0
<karolherbst> jekstrand: Inf + 0.0 != Inf -0.0, no?
nchery has joined #dri-devel
<alyssa> -0 + 0 = +0
<karolherbst> ehh wait
<alyssa> -0 + -0 = -0
<karolherbst> yeah, that
<karolherbst> Inf was a problem elsewhere
<alyssa> whyyy did i read the ieee 754 spec
<karolherbst> alyssa: why did I read the CL spec :p
<jekstrand> alyssa: Because it's useful, sadly.
<karolherbst> it's annoying that most of the ieee stuff even has a good reason for being like that
<karolherbst> jekstrand: we might need to revisit adding a "CL vs GL precision rules" flag or something :( it's just annoying that CL allows you do optimize according to weaker precision
<karolherbst> well.. if you specify the compiler flag
<karolherbst> there is also like "-cl-no-signed-zeros" :(
elongbug__ has quit [Read error: Connection reset by peer]
<karolherbst> so you can even decide what opts to turn on
elongbug__ has joined #dri-devel
Guest2307 has quit [Ping timeout: 480 seconds]
i-garrison has quit [Remote host closed the connection]
i-garrison has joined #dri-devel
<cwabbott> karolherbst: you know we have the same thing in vulkan, right?
<karolherbst> I don't
<alyssa> float_controls
<karolherbst> ahh
<alyssa> nir_is_float_control_signed_zero_inf_nan_preserve, etc
pallavim_ has joined #dri-devel
<cwabbott> you should be trying to map OpenCL semantics onto the NIR version of that, and maybe expanding it if there's something missing
<karolherbst> it's not covering everything though
<cwabbott> what's it missing?
<cwabbott> I believe it does really cover everything, it allows for ieee-compliant behavior
<cwabbott> (except for exceptions, which... I hope we never have to support)
<karolherbst> allow fmul+fadd to fmad/ffma and splitting no-signed-zero and finite-math
<jekstrand> I thought for sure I saved off a copy of IEEE 754 before I left Intel...
<karolherbst> looks like float_controls either disables -0.0 and NaN/Inf, or enables both, no?
<cwabbott> yes, it's not split out
pallavim has quit [Ping timeout: 480 seconds]
<cwabbott> but you can still fuse even if NaN/Inf and signed zero are preserved
<karolherbst> yeah
<cwabbott> that wasn't spelled out, but it will be soon hopefully
<karolherbst> anyway, it would need a bit of work, but seems like we already have some of that stuff
<cwabbott> you can disallow it be setting NoContraction (which maps to nir_alu_instr::exact)
<cwabbott> *by
<karolherbst> yeah... clang might actually do something like that already
<karolherbst> I know that there is something in the spirv about it
<cwabbott> OpenCL has a whole different thing
<karolherbst> would need to take another look at some point
<cwabbott> which was copied from LLVM
<cwabbott> and it's precise-by-default
<karolherbst> doesn't matter if we only operate on spirvs
<cwabbott> nope, it's a whole different thing *in SPIR-V*
<karolherbst> ahh
<karolherbst> could be
<cwabbott> there's a different spir-v instruction
<cwabbott> and different default
<cwabbott> vtn has to behave differently if ingesting CL SPIR-V and map it to the appropriate NIR thing
<karolherbst> could be :)
<cwabbott> so sad that it wasn't unified at the time, but that's how it is
<karolherbst> yeah
<cwabbott> also the CL SPIR-V thing can be per-instruction, which we can't do
<karolherbst> not anymore
<karolherbst> it was removed from the CL spec
<cwabbott> ?
<karolherbst> I think CL 1.0 supports it
<karolherbst> but not 1.1
<karolherbst> they didn't bother putting it in the spec that they removed it afiak
<alyssa> round modes got yeeted in CL1.1 right?
<karolherbst> per instruction round modes, yes
<karolherbst> you still have them at converts though
<cwabbott> it's in the SPIR-V spec for sure
<cwabbott> in fact it's only per-instruction, I think
<karolherbst> cwabbott: it's in the spec, but with CL1.1 you can't generate such spir-vs
<tango_> rounding modes control was part of an 1.0 extension
<karolherbst> it;s 1.0 only
<cwabbott> wow, that's confusing
<tango_> and it was a 1.0-only extension
<karolherbst> tango_: I thought it was actually core
<cwabbott> karolherbst: but how do you even ask for no-nan/no-inf globally?
<tango_> karolherbst: cl_khr_select_fprounding_mode
<karolherbst> yeah
<cwabbott> I don't see where it allows you to decorate anything except an individual floating-point instruction
<karolherbst> it's a pragma
<tango_> honestly I was rather annoyed by the deprecation of it. my rant: http://wok.oblomov.eu/tecnologia/gpgpu/opencl-rounding-modes/
<tango_> karolherbst: it's an extension. not all devices are required to support it
<karolherbst> ahh you were that :D
<tango_> you need to enable the extension first
<tango_> (which requires device support)
<tango_> yeah, I'm that guy 8-D
<karolherbst> cwabbott: also intel has a CL extension for some of that stuff
<karolherbst> per instruction I think
<karolherbst> but this old thing is just gone
<karolherbst> basically, and I think nobody uses that
tjmercier has quit [Quit: Page closed]
<cwabbott> karolherbst: yeah, looks like it adds some of the newer llvm flags
<cwabbott> the CL fast-math stuff is copied from llvm
<cwabbott> vulkan is from-scratch
<cwabbott> well, afaict from-scratch
<karolherbst> there is something the translator sets for global thing..
<cwabbott> hence the differences like vulkan being per-shader and CL being per instruction and having separate inf/nan and signed-zero flags
<karolherbst> ContractionOff
<karolherbst> cwabbott: CL is practically per kernel though
<karolherbst> \o/
ybogdano has joined #dri-devel
<tango_> FP contraction should be a different thing from inf/nan/signed-zero handling though, isn't it?
<tango_> (and from rounding modes)
<karolherbst> yeah.. should be
<karolherbst> I need to take a look at all the details here at some point and do the right thing (tm)
<jekstrand> Yes, but those tests trick us with redistribution rules.
<alyssa> jekstrand: the other question with u_worklist is that it's designed for IRs that store blocks as a linked list
<karolherbst> oh wow.. now I crashed my machine with -j4
<alyssa> for backends that don't do much CF manipulation, it's more efficient to store as an array, in which case some of the complexity of nir_worklist wouldn't be needed
<alyssa> (Namely, storing pointers to the blocks, as opposed to just a queue of integer indices)
<alyssa> (and in that case there's no need for any macros/templating/etc)
elongbug_ has joined #dri-devel
<karolherbst> jekstrand: I just hope that this clamping bug isn't like terribly annoying to fix
<alyssa> OTOH, there's not much overhead from doing it nir_worklist style even if you have an array of blocks
<karolherbst> like "you have to calculate coords in the kernel" kind of annoying
shankaru has joined #dri-devel
<karolherbst> "Pass 2169 Fails 7 Crashes 0 Timeouts 0: 100%" :)
<karolherbst> once the clamping stuff is fixed I will do a CTS run for real
<karolherbst> I can workaround the other issues locally
elongbug__ has quit [Ping timeout: 480 seconds]
* alyssa shuffles the data structures harder, I think this is good now
<karolherbst> that's something I found inside intels stack
<karolherbst> might be a good hint on what might go wrong :)
<karolherbst> like maybe the filtering filter is always low for mesa
<karolherbst> but besides that there doesn't seem to be anything they are doing... unless I missed it
<jekstrand> karolherbst: Pretty sure iris sets to always high. :-/
<karolherbst> ahh :(
<karolherbst> maybe we have to set low in some cases
<karolherbst> :D
<karolherbst> seems like a debug var though
<karolherbst> :(
<jekstrand> We set it to FULL. I'll play with others.
tzimmermann has quit [Quit: Leaving]
<jekstrand> karolherbst: Yeah, all modes there fail
<karolherbst> :(
gouchi has joined #dri-devel
<karolherbst> jekstrand: is anything of that implemented inside shaders? like coords adjustments? Or is that all the hw?
<jekstrand> Should all be HW
<karolherbst> mhh
<jekstrand> OpenCL meeting is going on right now. I'm hoping to hear from bashbaugh after it's over.
<karolherbst> cool
aravind has quit [Ping timeout: 480 seconds]
<cwabbott> karolherbst: looks like contraction is per-kernel and zero/inf/nan is per-instruction
<karolherbst> yeah.. possibly
<cwabbott> except the intel extension allows contraction per-instruction, I think
<cwabbott> that's annoyingly different to how nir works (which is derived from vulkan float_controls)
<karolherbst> it's a bit hard to care about per instruction zero/inf/nan if nothing uses it though :) Although I would be interested on how that looks like in the spirv tbh
<karolherbst> like if you do the global falg
<karolherbst> *flag
<cwabbott> right, sounds like you still need to conseratively derive the global flag from the per-instruction flag
<cwabbott> or sth like that
dllud has quit [Ping timeout: 480 seconds]
<karolherbst> yeah.. dunno
<alyssa> me: Now that I have u_worklist in bifrost, I can make this data flow pass so much simpler
<alyssa> also me: Wait this pass doesn't need a worklist at all, just a trivial recursion
<alyssa> Good work Alyssa from 2020
dllud has joined #dri-devel
stuartsummers has quit []
shankaru has quit [Quit: Leaving.]
MajorBiscuit has joined #dri-devel
mclasen has quit [Remote host closed the connection]
dllud has quit [Ping timeout: 480 seconds]
mi6x3m has joined #dri-devel
<mi6x3m> hey friends, can anyone tell me where libGL is linked against libgallium-dri?
<mi6x3m> or does it happen during runtime?
<anholt> dri drivers are dlopened by GL at runtime.
dllud has joined #dri-devel
<mi6x3m> seems to be different for gallium because it statically links all drivers
<mi6x3m> or does it just delegate?
<anholt> what I said is true, and all the gallium drivers are linked together into a single dri driver to save disk space.
<mi6x3m> ok, so it all goes through loader.c as the old drivers?
<jekstrand> karolherbst: That's what I did for panfrost and, given that iris uses memcpy for buffer_subdata, I don't figure it's worse. So maybe all that blorp code I wrote isn't really needed.
<jekstrand> Still need to figure out clear_texture on panfrost but that probably does need to happen on the GPU.
Kayden has quit [Quit: reboot]
eletrotupi has quit [Remote host closed the connection]
clever has quit [Ping timeout: 480 seconds]
<mi6x3m> anholt, I see it now, libgallium_dri is linked to lib<driver>_dri.so
<mi6x3m> in install_megadrivers.py
<mi6x3m> so when you load lib<driver>_dri.so you always load libgallium_dri and you get the respective config
mbrost has joined #dri-devel
eletrotupi has joined #dri-devel
<karolherbst> jekstrand: cool
LexSfX has quit [Ping timeout: 480 seconds]
LexSfX has joined #dri-devel
mclasen has joined #dri-devel
Kayden has joined #dri-devel
<anholt> whee, looks like my MR is going to time out on windows again.
shankaru has joined #dri-devel
dllud has quit [Read error: Connection reset by peer]
ybogdano has quit [Ping timeout: 480 seconds]
clever has joined #dri-devel
<alyssa> wee :(
<karolherbst> jekstrand: ohh.. I found something interesting in the compute runtime..
dllud has joined #dri-devel
<karolherbst> not quite sure what it does yet
<jekstrand> karolherbst: uh oh... I think you may have found it
<karolherbst> yeah
<jekstrand> Let me dig and see if I can figure out what that does
<karolherbst> there are too many abstractions layer in the compute runtime :(
<jekstrand> yeah
<jekstrand> Welcome to Intel driver code
* jekstrand pulls IGC
<karolherbst> apparently you can dump those tokens
<karolherbst> coordinateSnapWaRequired is referenced in shared/source/device_binary_format/patchtokens_dumper.cpp
heat_ has joined #dri-devel
<karolherbst> what the hell is DATA_PARAMETER_SAMPLER_COORDINATE_SNAP_WA_REQUIRED
<karolherbst> feels like there is either macro magic or some other lib involved
heat has quit [Read error: No route to host]
ahajda has quit [Quit: Going offline, see ya! (www.adiirc.com)]
<karolherbst> uhh.. maybe I just digged too deep now
<karolherbst> or it comes out of llvm...
<karolherbst> no clue..
eletrotupi has quit [Quit: Bye]
* karolherbst clones igc
<karolherbst> bingo
heat_ has quit [Ping timeout: 480 seconds]
<karolherbst> jekstrand: sooo.. they add a new kernel arg
<karolherbst> KernelArg::ArgType::IMPLICIT_SAMPLER_SNAP_WA
<karolherbst> but what runtime value do they bind to it...
<karolherbst> IGC/Compiler/Optimizer/OpenCLPasses/ImageFuncs/ImageFuncResolution.cpp
<karolherbst> IGC/Compiler/Optimizer/OpenCLPasses/ImageFuncs/ImageFuncsAnalysis.cpp
<karolherbst> both reference SAMPLER_SNAP_WA
mi6x3m has quit [Quit: Leaving]
<karolherbst> __builtin_IB_get_snap_wa_reqd
<jekstrand> IGC/Compiler/tests/ImageFuncResolution/get_image_snap_wa_required.ll
<karolherbst> :)
<karolherbst> we arrived at the same thing at the same time :)
<jekstrand> hrm... that's a test. :-/
<karolherbst> jekstrand: IGC/BiFModule/Implementation/IGCBiF_Intrinsics.cl
<karolherbst> ehh wait
<karolherbst> there it's just declared
ybogdano has joined #dri-devel
<karolherbst> jekstrand: next step: their llvm fork
<jekstrand> karolherbst: THere is no llvm fork
<karolherbst> mhh
<jekstrand> Or at least you shouldn't need one
<karolherbst> ohh?
<alyssa> cthis is horrifying
<alyssa> whatever you're talking about
<karolherbst> what's that then? https://github.com/intel/llvm
<alyssa> horrifying
<karolherbst> maybe just random stuff then
<karolherbst> jekstrand: ahh now
<karolherbst> *no
<karolherbst> that intrinsic is a red hering
<jekstrand> IGC/Compiler/Optimizer/OpenCLPasses/ImageFuncs/ImageFuncResolution.cpp
<karolherbst> it gets resolved to imageFunc = &m_argMap[ImplicitArg::SAMPLER_SNAP_WA];
<karolherbst> so it just loads the arg
<jekstrand> IGC/BiFModule/Implementation/images.cl rather
<karolherbst> yeah
<karolherbst> at least tells us how it's used
<jekstrand> float4 SPIRV_OVERLOADABLE SPIRV_BUILTIN(ImageSampleExplicitLod, _v4f32_img2d_ro_v2f32_i32_f32, _Rfloat4)(__spirv_
<jekstrand> SampledImage_2D SampledImage, float2 Coordinate, int ImageOperands, float Lod)
<jekstrand> {
<jekstrand> int image_id = (int)__builtin_IB_get_image(SampledImage);
<jekstrand> int sampler_id = (int)__builtin_IB_get_sampler(SampledImage);
<jekstrand> float2 snappedCoords = Coordinate;
<jekstrand> if (__builtin_IB_get_snap_wa_reqd(sampler_id) != 0)
<jekstrand> {
<jekstrand> snappedCoords.x = (Coordinate.x < 0) ? -1.0f : Coordinate.x;
<jekstrand> snappedCoords.y = (Coordinate.y < 0) ? -1.0f : Coordinate.y;
<jekstrand> }
<jekstrand> return __builtin_IB_OCL_2d_sample_l(image_id, sampler_id, snappedCoords, Lod);
<jekstrand> }
<Kayden> Yeah, I've built IGC with upstream LLVM...11. It's been a while since I've looked at it.
<Kayden> IIRC they have a patched version that tunes some thresholds like copy propagation passes to be tuned less for CPUs with limited registers and toward GPUs with more, and things like that, but they're not necessary to run
<airlied> so all vals < 0 get rounded to -1.0?
<jekstrand> airlied: Aparently
<jekstrand> I can wire that up in iris but eewww
<karolherbst> mhhh
<jekstrand> I think iris is about to grow a new magic system value. Kayden, I'm sorry.
<karolherbst> LOL
<karolherbst> jekstrand: btw.. for CL runtimes we declare the COMPUTE_ONLY screen flag or whatever it's called
<karolherbst> not sure if that's usefull here or if we have to change the sampler API
<karolherbst> probably don't want that for GL
<karolherbst> or vk
<jekstrand> Yeah, ver much no.
<jekstrand> *very
<jekstrand> But, also, kind-of meh. It's only for rectangle textures.
<jekstrand> hrm...
<jekstrand> Yeah, I think we only want this for CL
iive has joined #dri-devel
<jekstrand> I may be able to key it off SHADER_STAGE_KERNEL
<karolherbst> probably
<karolherbst> yeah.. sounds like the better idea
<Kayden> I'm not finding where this is actually applied
<Kayden> yes but what platforms
<jekstrand> Kayden: All, AFAICT
<jekstrand> Why they haven't fixed the HW, I have no idea. Doesn't seem like it'd be many gates
<Kayden> but only for OpenCL?
<jekstrand> yup
<Kayden> how is OpenCL any different?
<jekstrand> It has very nasty tests
<karolherbst> jekstrand: maybe nobody who pays money uses those clamp modes :p
eletrotupi has joined #dri-devel
mbrost has quit [Read error: Connection reset by peer]
<karolherbst> and also CL_FILTER_NEAREST _and_ CL_ADDRESS_CLAMP
<jekstrand> yup
mbrost has joined #dri-devel
<karolherbst> I guess nobody cares
<jekstrand> Don't want that for REPEAT
<airlied> is opencl clamp the crappy clamp to border?
<jekstrand> airlied: No
<airlied> oh no there is no borders
<jekstrand> airlied: CL doesn't have border at all
<airlied> so it's clamp to edge
<karolherbst> no
<karolherbst> it's clamp to border
<karolherbst> SAMPLER_ADDRESSING_MODE_CLAMP == CL_ADDRESS_CLAMP
<karolherbst> just border is all 0
<Kayden> and you've translated CL_ADDRESS_CLAMP to CLAMP_TO_BORDER?
<karolherbst> ehh wait..
<karolherbst> I looked at the wrong code
<karolherbst> but yeah
<karolherbst> CL_ADDRESS_CLAMP == PIPE_TEX_WRAP_CLAMP_TO_BORDER
<karolherbst> CL_ADDRESS_CLAMP_TO_EDGE is PIPE_TEX_WRAP_CLAMP_TO_EDGE,
<Kayden> Yep. "CL_ADDRESS_CLAMP - Out-of-range image coordinates are assigned a border color value."
<karolherbst> there is just no border color value :)
<karolherbst> like what's a border color?
<karolherbst> CL never says anything about that
maxzor has quit [Ping timeout: 480 seconds]
<Kayden> it really doesn't
<jenatali> Pretty sure it does say it's zero somewhere
<karolherbst> maybe
<karolherbst> "images have a 0 border*" the *: some ext might allow you to change that
<karolherbst> :P
<karolherbst> jenatali: but the CL spec contains the word "border" exactly once
<jenatali> Huh. Yeah nevermind. Guess I'd just assumed it said zero when I hooked it up
<karolherbst> maybe it's phrased like "OOB image accesses return the color 0,0,0,0" or something
<karolherbst> ahh
<airlied> like it seems like GL should be able to hit that case as well
shankaru has quit []
<karolherbst> the OpenCL C spec has more to say
<karolherbst> airlied: GL probably doesn't care
<karolherbst> Kayden, jenatali: "CLK_ADDRESS_CLAMP - out-of-range image coordinates will return a border color [66]" 66: "This is similar to the GL_ADDRESS_CLAMP_TO_BORDER addressing mode." :D
<karolherbst> the heck...
<jenatali> Yeah, still doesn't say what that border is though
<karolherbst> "the border color is (0.0f, 0.0f, 0.0f, 0.0f)."?
MajorBiscuit has quit [Ping timeout: 480 seconds]
<jekstrand> karolherbst: Can CL generate anything besides tex and txl?
<karolherbst> jekstrand: txf for samplerless
<karolherbst> or is that tex?
<jekstrand> that's txf
<karolherbst> k
<jekstrand> It doesn't have bias or tg4 or anything, right?
<karolherbst> uhm... I don't think so
<karolherbst> there are some extensions though
<karolherbst> but not sure if they add that kind of stuff
<jekstrand> of course there are....
<karolherbst> the ext I know of add things like msaa and depth and the likes
<karolherbst> maybe there is one for tg4 and bias as well
<karolherbst> maybe gl_Sharing defines random stuff... dunno
<karolherbst> we can deal with it when we find something :)
<jekstrand> yup
<karolherbst> my hope is that we can implement gl sharing as a basic noop :D
<karolherbst> will be fun
<jenatali> Oh, it does say the border color
<jenatali> jekstrand: No, I don't think there's a tg4
<jenatali> Pretty sure it's just txf, tex, txl, txs
MajorBiscuit has joined #dri-devel
ngcortes has joined #dri-devel
dllud has quit [Ping timeout: 480 seconds]
apinheiro has quit [Ping timeout: 480 seconds]
<jekstrand> karolherbst: Ugh... Clamp is per-coordinate-dimension, isn't it?
<karolherbst> jekstrand: sure it is
<jekstrand> Oh, well. I guess we're passing 3 32-bit bitfields, then.
<karolherbst> but it doesn't matter for border
<jekstrand> Yeah, idk about that
<karolherbst> as you always get the border color, no?
<karolherbst> per component should only matter for edge
<karolherbst> or well.. repeat
<jekstrand> Yeah, we only care about this for edge
<karolherbst> but shouldn't that be simply a bcsel on the coord vecs? or what do you mean by passing 3 bitfields?
<karolherbst> jekstrand: ohh wait
<karolherbst> I think I missunderstood your question
<karolherbst> CL has the same clamp method on all coords
<karolherbst> GL/gallium doesn't
<alyssa> jekstrand: "guess we're passing 3 32-bit bitfields," hello PIPE_CAP_GL_CLAMP speaking how may I help you
<Kayden> ah, that makes it easier then
<karolherbst> yep, very much so
<jekstrand> karolherbst: It does? Awesome!
<karolherbst> even min/mag is the same
<Kayden> also...32-bit bitfields? can't you have 128 images?
<karolherbst> Kayden: samplers
<karolherbst> you need 16 for full and 8 for embedded
<Kayden> oh, wow, okay
<karolherbst> 128 read images, 64 write images though
<karolherbst> for full
<karolherbst> 8/8 for embedded
<karolherbst> not sure if there is a test testing that though.. :D
<karolherbst> I mean ... per type yes
<karolherbst> but not 128 read + 64 write + 16 samplers
dllud has joined #dri-devel
<karolherbst> jekstrand: btw.. if drivers have to add workarounds for wrapping modes, we might have to add a configurable default mode
<karolherbst> atm I default to PIPE_TEX_WRAP_CLAMP_TO_EDGE, but other hw might have to do workarounds there
<Kayden> spec says CLAMP is the default though
<karolherbst> it does? I thought it's impl defined
<karolherbst> "CL_ADDRESS_NONE - Behavior is undefined for out-of-range image coordinates."
<Kayden> "The default is CL_ADDRESS_CLAMP."
<karolherbst> werid..
<karolherbst> why does the spec disagree with itself
<karolherbst> heh
<karolherbst> ohhh
<karolherbst> Kayden: it matters for samplerless I think...
<karolherbst> but you have to specify one value
<karolherbst> when creating a sampler
<karolherbst> ehh.. I guess with properties that's possible
<karolherbst> okay.. so on the API it's _CLAMP, but if you specify _NONE we can choose whatever
<Kayden> yep
mbrost has quit [Ping timeout: 480 seconds]
<karolherbst> yeah, I meant what we should do with _NONE :)
mbrost has joined #dri-devel
dllud has quit [Ping timeout: 480 seconds]
<alyssa> Oh right I need to debug !15991
<alyssa> i wrote !16046 while waiting for it to compile, oops :p
neonking has joined #dri-devel
<karolherbst> *sigh* "error: options cl_khr_fp64 and __opencl_c_fp64 are set to different values"
<jekstrand> karolherbst: Ok, I've got the workaround typed up and it's still not passing. :-/
<jekstrand> More digging, I guess.
<karolherbst> :(
<karolherbst> does it change the outcome a little though?
<jekstrand> doesn't seem to
<jekstrand> But I may still be missing something
<karolherbst> potentially
MajorBiscuit has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: brb]
<karolherbst> jekstrand: what if you always apply the workaround and only run NORMALIZED CLAMP tests?
<jekstrand> karolherbst: Tried that. :)
<karolherbst> ehh nearest
<karolherbst> mhh
<karolherbst> annoying
mvlad has quit [Remote host closed the connection]
<jekstrand> I'm going to try applying the workaround in the CTS test itself
<karolherbst> let's see what's acutally happening on the tests
MajorBiscuit has joined #dri-devel
<karolherbst> jekstrand: yeah.. it makes no sense if that workaround should fix the fails we are seeing
<karolherbst> "Sample 0: coord {0.012158(0x1.8e6528p-7)} did not validate!"
<karolherbst> which.. is above 0
<jekstrand> wait... why is this resinfo in here?
<karolherbst> "test_image_streams read 1D CL_R CL_SIGNED_INT8 NORMALIZED CL_FILTER_NEAREST CL_ADDRESS_CLAMP"
<karolherbst> just look at this one
<karolherbst> this feels more like a rounding issue
<jekstrand> right, get_image_width()
<jekstrand> ugh
<jekstrand> Yeah, there's no clamping funnyness there. :-(
<karolherbst> I am sure we need the workaround later though :P
pcercuei has joined #dri-devel
<jekstrand> probably
<karolherbst> there is no processing on the coords at all.. mhh
<jekstrand> idk why I didn't think to check 1D first... Drp
<karolherbst> jekstrand: there are more workarounds in that file
<karolherbst> that looks like... _fun_
<karolherbst> ohh wait.. that's for lod == 0.0
<karolherbst> not sure what the default lod is.. probably 1.0?
<jekstrand> 0
<karolherbst> ahh no.. 0
<karolherbst> yeah...
<karolherbst> explicit lod is behind cl_khr_mipmap_image
<karolherbst> yeah.. I guess you need to do this as well :)
<karolherbst> now that's annoying
<karolherbst> jekstrand: I think we should handle that in some kind of generic lowering
<karolherbst> ahh wait ... no
<karolherbst> I thought about something else
<jekstrand> karolherbst: Yeah, they're straight-up lowering normalized coordinates away in the shader. :facepalm:
<karolherbst> :)
<karolherbst> don't we have tex lowering for that?
<karolherbst> although I guess that's a bit too specific
<jekstrand> karolherbst: We have lowering going the other way. (-:
<karolherbst> also explains why it already fails at the first sample :)
<karolherbst> jekstrand: just mirror it then :P
<karolherbst> we have some ugly tex lowering as well.. I guess everybody has
<jekstrand> This is just truly rubbish, though.
<karolherbst> for images we have to do all of that inside the shader regardless
rasterman has quit [Quit: Gettin' stinky!]
<jekstrand> normalized coordinates are something everyone has in hardware since forever. Why can't Intel get it right?!?!
<jekstrand> *sigh*
<karolherbst> :D
<karolherbst> jekstrand: I don't think nv has it in hw either...
<karolherbst> ehh
<karolherbst> not for all targets
<karolherbst> seems like we have to normalize for cubes ourselves
<karolherbst> and for all images
<jekstrand> cubes are a bit of a different case
heat_ has joined #dri-devel
<alyssa> why would you need basic gfx functionality in a gpu??
apinheiro has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
<Kayden> isn't that what adding an mgag200 chip is for?
* jekstrand is so very confused
jkrzyszt has quit [Ping timeout: 480 seconds]
<jekstrand> This all looks so incredibly reasonable and yet I look at the cl files and it looks like they're doing piles of stuff in shaders.
dllud has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
<karolherbst> mhhh.. so what am I doing wrong for get_kernel_arg_info
<karolherbst> ehh they always test read_write...
<karolherbst> but why...
dllud has quit [Ping timeout: 480 seconds]
<karolherbst> jenatali: can you remember issues with api get_kernel_arg_info?
maxzor has joined #dri-devel
<jenatali> Uh... I do remember it being tricky to get right, but not impossible
<karolherbst> it fails for me for read_write image tests
<jenatali> What's the failure?
<karolherbst> which I don't support, but they try to compile that stuff regardless
<jenatali> Oh that sounds familiar
<karolherbst> yeah...
<karolherbst> I guess your solution was to just support read write images? :D
<karolherbst> or don't claim CL 3.0?
<jenatali> Oh, no I was thinking of something else: https://github.com/KhronosGroup/OpenCL-CTS/issues/872
<jenatali> But I wouldn't be surprised if it's the same thing
<karolherbst> ahh
<karolherbst> probaby
<karolherbst> anyway, something where one can say "uhh.. bug in the CTS"
<karolherbst> should probably just fix it then
<karolherbst> compiler features_macro is just screwed inside llvm :(
<jenatali> But yeah, read-write images is something that's optional in D3D12. I can support it on some hardware but not universally
<karolherbst> I'd rather add support for that stuff later
<karolherbst> fixing the CTS is trivial here
* airlied should actually land that
<karolherbst> probably
<karolherbst> or just use clc :P
<airlied> ah yeah clc already gets that right I think
<karolherbst> it does
<karolherbst> so now how to fix test_compiler features_macro ...
<karolherbst> I fixed the header, but now it fails for fp64 :(
<alyssa> panfrost has a lot of code to allow configuring the stride of linear images explicitly in vulkan... I can't find where Vulkan allows this though..?
<karolherbst> "error: options cl_khr_fp64 and __opencl_c_fp64 are set to different values" :(
<karolherbst> airlied: is there an API to disable CL exts in clang?
ybogdano has quit [Ping timeout: 480 seconds]
<karolherbst> I don't even know what enables cl_khr_fp64
<airlied> karolherbst: I think that's part of the pain of -base.h stuff
<karolherbst> it's not
<karolherbst> well.. fp64 isn't
<karolherbst> for the other fails I fixed the header
<airlied> I've vague memories of tracing through that one, but no ideas of what it was now
<karolherbst> but something enables the fp64 ext and it's not the header
<karolherbst> and I checked with clinfo that I don't advertise the ext either
<karolherbst> odd
<karolherbst> it's not the kernel either
<alyssa> oh. VkImageDrmFormatModifierExplicitCreateInfoEXT.
<karolherbst> airlied: ahhh.. I thinkg I got it worked out now
<karolherbst> airlied: c->getTargetOpts().OpenCLExtensionsAsWritten.push_back("-all"); :)
<daniels> alyssa: yep, gbm
<alyssa> daniels: how... delightful.
<daniels> alyssa: yes, but no kmsro
<alyssa> \ o /
<karolherbst> what
<karolherbst> the
<karolherbst> heck
<karolherbst> who designed an API like that
<airlied> karolherbst: people munging thing into their compiler stacks without supervision :-P
<karolherbst> airlied: so if I pass -all as the first thing, and push +.. entries.. guess what happens?
<karolherbst> the fuck...
<karolherbst> with all it just writes into a different map...
<karolherbst> how does that make any sense
ybogdano has joined #dri-devel
<karolherbst> ehhh.. no... something else is weird
dllud has joined #dri-devel
<daniels> try -0.0
<karolherbst> shader caching happened :)
<karolherbst> but I still need to push two values
<karolherbst> cl_khr_3d_image_writes and __opencl_c_3d_image_writes
<karolherbst> "PASSED test." :)
<karolherbst> jekstrand: soo.. clamping is the only thing left now :D
gouchi has quit [Remote host closed the connection]
danvet has quit [Ping timeout: 480 seconds]
MajorBiscuit has quit [Quit: WeeChat 3.4]
<jekstrand> karolherbst: ugh
<jekstrand> karolherbst: I so wish the Intel CL driver code were readable.
<jekstrand> Maybe I need to figure out how to build it?
<jekstrand> That sounds hellish
<karolherbst> it is
<jekstrand> I'm going to write a pass to lower away normalized texture coordinates entirely. We'll see how that goes.
<jekstrand> ugh...
<jekstrand> I'm really going to need to distinguish between CL and GL if I want to do that. :-/
<jekstrand> Oh, this is all truly horrible
<karolherbst> jenatali: any thoughts of an interface like this? ^^
<karolherbst> *on
<jenatali> karolherbst: LGTM
<karolherbst> cool
<karolherbst> will extract all my libclc change then at some point this week
<karolherbst> that i915 bug is really annoying btw
<karolherbst> but fun that working on kernel caching made it even trigger in the first place
<karolherbst> ehh.. I need all those CL 1.0 and 1.1 exts as well :(
<karolherbst> but cool, we can be very explicit then
ahajda has joined #dri-devel
<karolherbst> jenatali: do you have any devices not support those CL 1.1 atomic exts or cl_khr_byte_addressable_store?
<jenatali> Don't think so
<jenatali> Well, we emulate byte addressable store, DXIL can't express it :(
<karolherbst> ahh
<jenatali> (Yet)
apinheiro has quit [Quit: Leaving]
<karolherbst> this "Fails 0" looks so nice :D
<karolherbst> jekstrand: why do you need to build the intel stack though?
<jekstrand> I'm not going to. I'm just going to implement all the pain
<karolherbst> :D
<karolherbst> okay
ybogdano has quit [Ping timeout: 480 seconds]
<airlied> karolherbst: now to add it to CI :-P
<karolherbst> :D
<daniels> karolherbst: !
mbrost has quit [Remote host closed the connection]
<karolherbst> well.. the iamge tests are still failing, but it does show "fails 0" for quite some time
mbrost has joined #dri-devel
<karolherbst> but if nothing broke only 5 fails remain
<karolherbst> and hopefully jekstrand will fix those :p
<karolherbst> airlied: we still need to fix LLVM :(
<karolherbst> and spirv-tools, but I do have a PR for that already: https://github.com/KhronosGroup/SPIRV-Tools/pull/4784
<karolherbst> guess I can cross that "[ ] write a conformant CL 3.0 implementation" on my bucket list
<alyssa> :D
<karolherbst> nice "Pass 2171 Fails 5 Crashes 0 Timeouts 0"
<karolherbst> the thing is though... I don't think it makes sense to run the real CTS if my machine will just crash midway
mbrost has quit [Ping timeout: 480 seconds]
fxkamd has quit []
ahajda has quit [Quit: Going offline, see ya! (www.adiirc.com)]
mbrost has joined #dri-devel
maxzor has quit [Ping timeout: 480 seconds]
mbrost has quit [Ping timeout: 480 seconds]
lygstate has joined #dri-devel
ybogdano has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
bgs has quit [Remote host closed the connection]
bgs has joined #dri-devel
lygstate has quit [Write error: connection closed]
mbrost has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: dodo]
<karolherbst> jekstrand: uhhh....
<karolherbst> I think I found the i915 bug
<karolherbst> or at least part of it
jewins has joined #dri-devel
<karolherbst> i915_vma_reopen is just racy
<jekstrand> yeah
<karolherbst> calling i915_vma_is_closed without taking the lock
<karolherbst> so it checks if it's closed
<karolherbst> and if it is, it takes the lock and does this remove_closed thing
<karolherbst> but it needs to take the lock before checking
<karolherbst> there might be more, but that one looks obviously wrong
<jekstrand> karolherbst: There's probably more
<karolherbst> I'll try that one change and see how that goes :D
<jekstrand> In general, I'd recommend against going down this rabbit-hole....
<karolherbst> I know
<karolherbst> but I'l just looked at it for 10 seconds and already found this one...
<karolherbst> if that's enough.. good
<karolherbst> if not.. I'll probably ignore it
<karolherbst> and if it makes the crash less likely, that already helps me anyway
icecream95 has joined #dri-devel
ramaling has quit [Remote host closed the connection]
ramaling has joined #dri-devel
morphis has quit [Ping timeout: 480 seconds]
morphis has joined #dri-devel
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
<karolherbst> mhh, let me while true it, at least the first run didn't crash my machine yet
bgs_ has joined #dri-devel
bgs has quit [Ping timeout: 480 seconds]
<karolherbst> jekstrand: yeah soo.. I think I fixed it :)
<jekstrand> I don't doubt that you fixed something. :)
<karolherbst> if my CTS runs don't crash my system anymore, that's good enough for me :)
<jekstrand> :)
iive has quit []
<karolherbst> the fix is trivial though, so I doubt anybody would mind it
<karolherbst> and the code looks obviously incorrect
<karolherbst> I'll let it do a few more rounds, but it looks better at least
rkanwal has quit [Quit: rkanwal]
rkanwal has joined #dri-devel
<karolherbst> yeah.. so it's quite stable now, it alreayd survived 6 rounds, where before it like crashed in 2 out of 3 runs
<karolherbst> so let's see what others say about the patch