ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
nchery is now known as Guest1875
nchery has joined #dri-devel
<karolherbst> wow rust even specifies true == 1
<airlied> jekstrand: functions won't start themselves :-P
* airlied started hacking it up in llvmpipe but it was too much for my brain at the time
<airlied> esp around passing implicit kernel args or context to each fn call
Guest1875 has quit [Ping timeout: 480 seconds]
<karolherbst> okay.. that CTS test is broken :)
<karolherbst> okay
khfeng has joined #dri-devel
<karolherbst> I think I will fix that image_format and image_order stuff next
<karolherbst> I just have no good idea on how to do it...
<karolherbst> I kind of hate what I've done for clover
<airlied> I also think in theory amd could do some of that on the hw side
<karolherbst> fun
<karolherbst> airlied: the thing is just, we have to map to those silly CL values
<karolherbst> something something was very ugly there
<airlied> yeah its messy, and I'm fine with just lowering to const buffers
<airlied> since I doubt it's very used
<karolherbst> no
<karolherbst> ugly in a "llvm does magic" sense
<karolherbst> so we get magic + 0x000010d0 and stuff in the shader
<airlied> ah yeah clang magic
<karolherbst> I don't even know why they think it was a good idea?
<airlied> karolherbst: something with spir-v as well
<karolherbst> could be
<airlied> I think spir-v defines them at 0 base
<airlied> then has to add them to get CL
<karolherbst> I don't think so
<karolherbst> I had to add a isub to make it pass tests
<karolherbst> airlied: yeah.. so OpenCL C is 0 based
<karolherbst> i think?
<karolherbst> ohh wait no.. so something adds the add, and if I push the CL value in, I have to isub the base again..
<karolherbst> right
<karolherbst> that's how it was
<karolherbst> airlied: ahh yeah.. seems like you are right
<karolherbst> spir-v is indeed 0 based
<karolherbst> annoying
<karolherbst> oh well
<karolherbst> it's constant folded anyway
<karolherbst> I will sleep over it and come up with a good solution
mclasen has quit [Ping timeout: 480 seconds]
<airlied> karolherbst: I think the conversion code is in the translator
<airlied> SPIRVToOCLBase::visitCallSPIRVImageQueryBuiltIn
<airlied> OCLToSPIRVBase::visitCallGetImageChannel
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
<karolherbst> yeah.. but it does seem to do the correct thing given the spir-v spec
Ristovski has quit [Ping timeout: 480 seconds]
tales__ has quit []
tales__ has joined #dri-devel
Ristovski has joined #dri-devel
tales__ has left #dri-devel [#dri-devel]
<jekstrand> airlied: I'm starting to think that functions may be something I need to start on sooner rather than later.
<jekstrand> Because if I don't, someone else will, badly, and then I'll get to clean up the mess. :-/
<jekstrand> Or maybe that's just too pessimistic of me.
<jekstrand> idk
<jekstrand> The first step, though, is to fix various optimization passes so they're valid to run pre-lowering. Lots of stuff assumes everything is inlined.
<jekstrand> Even if it has nothing to do with functions, lots of passes make implicit assumptions the author didn't realize.
<jekstrand> Maybe I should dig into that with the Intel compiler and see where it goes.
<jekstrand> I'll probably stick with compute for now. Other stages have lots of I/O which makes assuming inlining is really convenient.
tales-aparecida has joined #dri-devel
<icecream95> jekstrand: r u saying that my implementation of functions is going to be bad??
<jekstrand> icecream95: Not you in particular. I just tend to assume that people will forget things or not know how NIR works somewhere or not realize that they're running optimizations that aren't function-safe or whatever.
<jekstrand> It's not an insult.
<icecream95> jekstrand: I'm going to try to keep to the backend side, so hopefully will not make too many conflicting changes to NIR passes
<jekstrand> Yeah, the back-end is where a lot of the pain is.
<jekstrand> Retrofitting the Intel back-end will be... entertaining.
<jekstrand> Not especially looking forward to that TBH
<jekstrand> But I should be able to pull something off.
<icecream95> But first I need to bisect an issue around shadow samplers..
<icecream95> Let me guess.. GL_CLAMP emulation?
<karolherbst> icecream95: shadow samplers?
<icecream95> karolherbst: This isn't CL
<karolherbst> ahh
<karolherbst> jekstrand: I am wondering if we should just emit two arrays for the format+order thing at index it with the image index...
<karolherbst> *and
<karolherbst> I don't really like the approach I had for clover, where I was adding uniform variables for each image
<icecream95> zmike: lower_tex_to_txd seems to miss a number of fields that should be copied, such as is_shadow
<zmike> better copy them then
<zmike> the txb one probably does too
<airlied> jekstrand: I did have it pass functions through to llvmpipe and not have nir explode
<airlied> jekstrand: only 3-4 spirv/nir patches to set things up, it at least passed some basic tests :-P
<karolherbst> airlied: I'd throw luxmark at it, but llvmpipe kind of crashes with it :(
<airlied> that branch does the wrong thing in lots of places before that :)
<karolherbst> :D
<airlied> mixing llvmpipe simd flow control with LLVM flow control is tricky
<airlied> got to at least pass the current exec_mask into all functions
<karolherbst> something something is very wrong with buffers and I have no clue what
<airlied> then I went down the hole of passing in lots of things into every function
<karolherbst> ohh I have an idea what's broken
<airlied> and then I think I was like wtf globabl variables could save me
<airlied> then I realised I had other problems with global vars so just ran away
<karolherbst> oops
<karolherbst> airlied: https://i.imgur.com/Rdzoo8K.png :(
<karolherbst> I hope it has nothing to do with alignment
<karolherbst> airlied: what's that "Treating load_kernel_arg in control flow as uniform, results may be incorrect" message btw?
<karolherbst> ohhh
<airlied> load_kernel_arg expects the argument to be uniform across the wave
<karolherbst> yeah mhh
<airlied> and only picks the first active wave to use to load it
<airlied> not sure if kernel arguments can be dynanically indexed, don't think you can
<karolherbst> yeah.. no idea
<karolherbst> I will take a look at the test hitting this
<karolherbst> okay, it's a load_const (0x00000020 = 0.000000) :)
<airlied> yeah so that should be fine
* airlied isn't sure if you can have an array of kernel arguments :-P
<karolherbst> you can
<karolherbst> so, arrays are of course illegal as kernel args
<karolherbst> but structs aren't
<karolherbst> so you just wrap your array with a struct and indirectly access it
sadlerap1 has quit []
<karolherbst> but "allocations image2d_write" and "allocations image2d_read" fail and this worries me a bit
sadlerap has joined #dri-devel
<airlied> dcbaker: 3bbd404457e6e3278afd78f6721be9e174c6b777 still seems to be missing from 22.0 staging
<karolherbst> but that test is a little insane
<karolherbst> "Pixel 9440, 5, component 0, expected 47200, got 3873333467."
<karolherbst> it's always failing at that one pixel though
<karolherbst> maybe we overflow somewhere?
ngcortes has quit [Ping timeout: 480 seconds]
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
rkanwal has quit [Ping timeout: 480 seconds]
<karolherbst> "kernel compilation time: 175809ms"
<karolherbst> I think we can do better
ybogdano has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
Daanct12 has joined #dri-devel
mbrost has joined #dri-devel
mbrost has quit []
h0tc0d3 has quit [Remote host closed the connection]
Daanct12 has quit [Quit: Leaving]
mbrost has joined #dri-devel
Daanct12 has joined #dri-devel
h0tc0d3 has joined #dri-devel
Daanct12 has quit [Remote host closed the connection]
Daanct12 has joined #dri-devel
Daanct12 has quit []
Daanct12 has joined #dri-devel
Daanct12 has quit [Remote host closed the connection]
<jekstrand> airlied: I'm less worried about exploding than I am about silently and subtly optimizing something wrong and you have no clue why.
<jekstrand> airlied: Lots of stuff doesn't actually think through global variables very well, for instnace.
<jekstrand> We tend to assume they're just like locals (global as in nir_var_shader_temp)
<jekstrand> We also used to have a bunch of metadata problems where it would or wouldn't get invalidated on a per-shader basis when it should have been per-function. I think that one's mostly sorted now.
<airlied> jekstrand: hopefully a full cl CTS would show up any major insanity, but sounds like a lot of auditing
<jekstrand> airlied: What CTS? OpenCL? Nah. it's not that complex.
<jekstrand> I think some of it will be shown by optimizing libclc more before we inline it all.
<jekstrand> And successfully running luxmark would give me some confidence.
<jekstrand> The Vulkan CTS might have enough function stuff going on; not sure.
<jekstrand> It either doesn't use functions at all or uses them for lots of stupid.
<Sachiel> the graphicsfuzz tests have plenty of those
mbrost has quit [Remote host closed the connection]
mbrost has joined #dri-devel
Company has quit [Quit: Leaving]
ppascher has joined #dri-devel
<jekstrand> Yeah, I think if I could run the full Vulkan CTS with zero inlining for compute shaders, I'd have a reasonable level of confidence that it was working.
neonking has quit [Ping timeout: 480 seconds]
Daanct12 has joined #dri-devel
tales-aparecida has quit [Remote host closed the connection]
shankaru has joined #dri-devel
tales_ has quit []
mbrost has quit [Remote host closed the connection]
mbrost has joined #dri-devel
Duke`` has joined #dri-devel
mbrost has quit [Read error: Connection reset by peer]
mhenning has quit [Remote host closed the connection]
Administrator has joined #dri-devel
Administrator has quit [Remote host closed the connection]
<airlied> robclark: can you put a small bit more summary info in fixes pull requests :-)
rgallaispou1 has joined #dri-devel
<robclark> airlied: last -fixes is "fail less at system suspend plus misc small fixes"?
<airlied> cool, I just stuck in some guess work anyways :-P
<robclark> system suspend is, tbh, something I wonder about with other drivers.. we've been finding some fun corner cases that we wouldn't have seen without umm.. crowd sourced debugging (ie. digging through crash reports from field)
rgallaispou has quit [Ping timeout: 480 seconds]
rgallaispou1 has quit [Read error: Connection reset by peer]
rgallaispou has joined #dri-devel
lemonzest has joined #dri-devel
OftenTimeConsuming is now known as Guest1894
sdutt has quit [Read error: Connection reset by peer]
Guest1894 has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
Daanct12 has quit [Ping timeout: 480 seconds]
mszyprow has joined #dri-devel
shankaru has quit [Quit: Leaving.]
shankaru has joined #dri-devel
mszyprow has quit [Ping timeout: 480 seconds]
pnowack has joined #dri-devel
pnowack has quit [Remote host closed the connection]
pnowack has joined #dri-devel
mszyprow has joined #dri-devel
flto_ has joined #dri-devel
flto has quit [Ping timeout: 480 seconds]
flto has joined #dri-devel
flto_ has quit [Ping timeout: 480 seconds]
danvet has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
sumoon has joined #dri-devel
paulk1 has quit [Ping timeout: 480 seconds]
frieder has joined #dri-devel
abhinav__ has quit [Quit: The Lounge - https://thelounge.chat]
jessica_24 has quit [Quit: The Lounge - https://thelounge.chat]
jessica_240 has quit []
jessica_24 has joined #dri-devel
mvlad has joined #dri-devel
neonking has joined #dri-devel
paulk1 has joined #dri-devel
maxzor has joined #dri-devel
fxkamd has quit []
jkrzyszt has joined #dri-devel
camus has joined #dri-devel
danvet has quit [Ping timeout: 480 seconds]
eukara has quit []
eukara has joined #dri-devel
oneforall2 has quit [Ping timeout: 480 seconds]
nchery has quit [Read error: Connection reset by peer]
lynxeye has joined #dri-devel
Haaninjo has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
ppascher has quit [Ping timeout: 480 seconds]
vyivel has quit [Remote host closed the connection]
bl4ckb0ne has quit [Remote host closed the connection]
emersion has quit [Remote host closed the connection]
bl4ckb0ne has joined #dri-devel
emersion has joined #dri-devel
vyivel has joined #dri-devel
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
pnowack has quit [Quit: pnowack]
jljusten has quit [Quit: WeeChat 3.4]
jljusten has joined #dri-devel
pnowack has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
maxzor has quit [Ping timeout: 480 seconds]
dliviu has joined #dri-devel
pallavim has joined #dri-devel
nashpa has quit [Ping timeout: 480 seconds]
icecream95 has quit [Ping timeout: 480 seconds]
natto has quit [Ping timeout: 480 seconds]
mclasen has joined #dri-devel
jkrzyszt has joined #dri-devel
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
<karolherbst> dcbaker: I triggered an annoying bug. If rustc gets updated on the system in the meantime, meson doesn't recompile stuff. Not sure if I reported it in the past or not
<karolherbst> jekstrand: https://blog.rust-lang.org/2022/04/07/Rust-1.60.0.html#stabilized-apis there is some nice stuff un it :)
<karolherbst> *in
<karolherbst> like Arc::new_cyclic
<karolherbst> not sure if we still need it though?
rkanwal has joined #dri-devel
nchery has joined #dri-devel
Company has joined #dri-devel
ROw has joined #dri-devel
SR_71 has quit [Ping timeout: 480 seconds]
shankaru has quit [Quit: Leaving.]
mclasen has quit [Ping timeout: 480 seconds]
<jekstrand> karolherbst: ! I like! That's exactly what I wanted.
neonking has quit [Ping timeout: 480 seconds]
natto has joined #dri-devel
mclasen has joined #dri-devel
pcercuei has joined #dri-devel
* jekstrand starts a CL CTS run on panfrost
<jekstrand> Without clear_buffer or clear_texture, it's gonna be a bit busted but better than nothing, I guess.
karolherbst has quit [Ping timeout: 480 seconds]
karolherbst has joined #dri-devel
LexSfX has quit [Remote host closed the connection]
LexSfX has joined #dri-devel
neonking has joined #dri-devel
<karolherbst> jekstrand: we kind of need a better solution for those static inlines :(
<jekstrand> karolherbst: context?
<karolherbst> Also I kind of plan to port over to rust 2018, just don't know if I want to fix up history or do one mega commit
<karolherbst> jekstrand: like bindgen won't generate bindings for static inline functions
<jekstrand> karolherbst: right
ppascher has joined #dri-devel
Haaninjo has quit [Read error: Connection reset by peer]
Haaninjo has joined #dri-devel
Jasprose has joined #dri-devel
shankaru has joined #dri-devel
<dv_> does anyone know if it is valid to dup() the FD of an open dma-heap device node?
<karolherbst> jekstrand: I am seriously thinking about just emitting two u16 arrays for format+order and always put those into the input buffer.. because handling indirects any other way would be brutal
<jekstrand> karolherbst: I guess. I didn't think the input-per-image was too bad
<jekstrand> Depends on when you want to do the lowering, I guess.
<karolherbst> yeah, it's not, it just gets complicated in terms of DCE and what if you have an indirect
<karolherbst> anyway.. it's also just 32 bits per image argument
* jekstrand decides to let Fedora download LLVM debug symbols this time
<karolherbst> big mistake :P
<karolherbst> nir_load_deref(nir_build_deref_array(nir_load_var)) is what I need to do, right?
<jekstrand> yup
<karolherbst> ehh s/nir_load_var/nir_build_deref_var/
<jekstrand> yeah
paulk1 has quit []
<karolherbst> nnoooo.. those #undefs in nir_builder.h are killing my wrapper :D
<jekstrand> Why are you trying to write NIR passes in Rust?
<jekstrand> You're asking for pain
<karolherbst> because I am not doing much
<karolherbst> but yeah.. maybe I should move the pass into C code and see how I deal with sharing data
<jekstrand> *sigh* Who thought vec4 for compute was a good idea? Aparently, Arm did...
<imirkin> it's 4x faster
<karolherbst> :) it's 4 times as fast as scalar, everybody nows that
<imirkin> lol
<karolherbst> *knows
<imirkin> and two people can't both be wrong
<karolherbst> :D
mszyprow has quit [Ping timeout: 480 seconds]
<jekstrand> The annoying thing is that, if we want this mess to actually work, we either need to teach the bifrost compiler vec8 and vec16 or we need to make nir_lower_alu_to_scalar a generic narrowing pass that takes a maximum width or a callback or something.
Net147 has quit [Quit: Quit]
Net147 has joined #dri-devel
jewins has joined #dri-devel
<karolherbst> jekstrand: I'd guess the latter is better
<jekstrand> karolherbst: Yeah, I'm looking into that.
<jekstrand> karolherbst: I don't think it should be that hard to make it narrow instead of always scalarize
<karolherbst> shouldn't
maxzor has joined #dri-devel
sdutt has joined #dri-devel
fxkamd has joined #dri-devel
tlwoerner has quit [Read error: Connection reset by peer]
flto has quit [Ping timeout: 480 seconds]
ella-0 has joined #dri-devel
<karolherbst> soo.. kernel side is all done for format and order :) now I just have to upload the values
flto has joined #dri-devel
ella-0_ has quit [Remote host closed the connection]
alyssa has joined #dri-devel
<alyssa> I find myself wanting to write developer docs for panfrost
<alyssa> The obvious place is https://docs.mesa3d.org/drivers/panfrost.html but I don't know how appropriate it is,
<alyssa> the info there is all targeted at end users
<alyssa> (How to build and run panfrost, not how to hack on it)
<alyssa> karolherbst: u64vec4x0a32B
<alyssa> the heck?
<karolherbst> yes...
<karolherbst> soo
<karolherbst> that's explicit stride stuff or something
<karolherbst> I though it's some memory corruption somewhere, but Jason said that's how it's supposed to look like
<karolherbst> PASSED 42 of 42 sub-tests. :)
<karolherbst> jenatali: do you have any data on what image formats/types applications are most interested in?
<karolherbst> uhhh "Returned array size did not validate (expected 53, got 0)" :(
<jenatali> karolherbst: no, not really. I'd assume the normal 8bpc unorm stuff
<karolherbst> yeah... I mean. CL already specifies what's required, I am just wondering what I should care about on top from the start
<karolherbst> and the 8bpc unorm stuff is already included in that afaik
<karolherbst> but like.. only CL_R, not CL_A
<karolherbst> although as long as stuff passes I can also just expose as much as possible
<jenatali> Yeah I doubt apps really care for much beyond the required
<karolherbst> I just don't have a nice way of declaring the CL -> pipe mappings, so every combination is a new entry :(
<jenatali> My read on how CL was designed was that the speccers just looked at what they *could* do without asking what people want
<karolherbst> ohh that's for sure
<jenatali> How else do you end up with CL2.x that nobody uses
<karolherbst> but some might want want 2 channel images
<karolherbst> which are purely optional
<karolherbst> anyway, if you don't have any data on that, then I guess we have to see what people complain about :)
<jenatali> Oh didn't realize. I just hooked up whatever D3D supports, which covered all the required, I didn't look at which ones of them were optional
<karolherbst> yeah right..
mclasen has quit []
<karolherbst> in C you can also just do loops and macro magic
<karolherbst> rust macros can't create new tokens :(
<karolherbst> so you can't just concat names
mclasen has joined #dri-devel
<linkmauve> std::concat_idents!() is annoyingly nightly-only, but there is the concat-idents crate which makes it usable on stable.
<karolherbst> linkmauve: right.. which comes back to the issue that we can't use external crates with meson yet :)
<karolherbst> so I just ignore those issues unless it's important
<linkmauve> For now you could perhaps copy its code?
<karolherbst> it's not as simple
<karolherbst> although meson does support proc macros now I think :D
<linkmauve> Right, it depends on syn.
soreau has quit [Read error: No route to host]
khfeng has quit [Ping timeout: 480 seconds]
<karolherbst> proc macro support will help us with some stuff though
<karolherbst> maybe I hack something up
soreau has joined #dri-devel
<karolherbst> jekstrand: I think image_size is busted for array images :( maybe I do something wrongly, but it does work for read images.. mhh
pjakobsson has quit [Remote host closed the connection]
pjakobsson has joined #dri-devel
anarsoul has quit [Ping timeout: 480 seconds]
anarsoul has joined #dri-devel
<karolherbst> mhhhhh
<karolherbst> vec2 32 con ssa_3 = intrinsic image_size (ssa_0, ssa_0) (image_dim=1D /*0*/, image_array=true /*1*/, format=none /*0*/, access=8)
<karolherbst> vec2 64 con ssa_4 = intrinsic image_size (ssa_0, ssa_0) (image_dim=1D /*0*/, image_array=true /*1*/, format=none /*0*/, access=8)
<jekstrand> karolherbst: Do we have a pass for scalarizing I/O? nir_intrinsic_load_global and friends?
<karolherbst> uhm...
<karolherbst> I think so...
<karolherbst> if not in tree, there should be an MR
<karolherbst> I think I saw something at some point somewhere
<karolherbst> jekstrand: nir_opt_load_store_vectorize.c?
<karolherbst> ehh wait
<karolherbst> scalarizing, not vectorizing
<karolherbst> :(
<jekstrand> karolherbst: Yeah, that one might be able to scalarize too. I can't remember.
<karolherbst> jekstrand: nir_lower_io_to_scalar.c
shankaru has quit []
<karolherbst> but I guess it only does input atm
<karolherbst> I doubt it's hard to add support for global there
* jekstrand will type something mali-specific for now
<karolherbst> nooo.. I broke stuff :(
HankB__ has quit [Remote host closed the connection]
HankB__ has joined #dri-devel
sdutt has quit []
sdutt has joined #dri-devel
jkrzyszt_ has joined #dri-devel
ybogdano has joined #dri-devel
<karolherbst> ehh.. crap
<karolherbst> we have seperate numbering for readonly and writeonly images
jkrzyszt has quit [Ping timeout: 480 seconds]
mclasen has quit [Ping timeout: 480 seconds]
kmn has quit [Quit: Leaving.]
Duke`` has joined #dri-devel
mbrost has joined #dri-devel
ybogdano has quit [Ping timeout: 480 seconds]
<jekstrand> uh... Why am I getting 64-bit immediates in this shader?!?
<karolherbst> jekstrand: you don't want them?
<jekstrand> The panfrost compiler doesn't seem to think so. (-:
<karolherbst> well that's just sad
<karolherbst> the hw is 64 bit though, no?
<jekstrand> The panfrost compiler also thinks it's lowering 64-bit stuff away and that's clearly not happening. :-/
<jekstrand> Ooh, because I added it! Drp.
<karolherbst> :D
* jekstrand needs to lower harder
<karolherbst> I think I overengineered again
<karolherbst> "An image type cannot be used to declare a variable, a structure or union field, an array of images, a pointer to an image, or the return type of a function."
<karolherbst> that makes things simple
<jekstrand> :)
Peuc has joined #dri-devel
<karolherbst> I still keep the array as this makes it easier in rusticl, but still :D I wanted to figure out how to properly solve the issue of indirects at readonly and writeonly images, but guess the spec solves that for me
<alyssa> jekstrand: sounds like you're having fun
<jekstrand> alyssa: more or less. :)
<karolherbst> ehh
<karolherbst> I think image_deref_format lowering is broken
lemonzest has quit [Quit: WeeChat 3.4]
<karolherbst> ehh maybe not
nchery has quit [Ping timeout: 480 seconds]
<karolherbst> I can rely on the access thing, no?
<jekstrand> I've got test_buffers buffer_map_write_float not dying, but it fails random test. :-/
<karolherbst> mhhh
<karolherbst> let me check what was a good test to start all of this
abhinav__ has joined #dri-devel
<karolherbst> jekstrand: does allocations buffer work?
<jekstrand> It's definitely allocating successfully
<karolherbst> I think that one also maps and verifies content
<karolherbst> it's doing weird things, but it's not as huge as the buffers tests
* karolherbst kicks of another CTS run
<karolherbst> alyssa: I got to say, that's one of the most fun projects I've been working on for quite some time :D
ybogdano has joined #dri-devel
<jekstrand> karolherbst: Maybe it is mapping fail. I'm seeing all zeros
<jekstrand> And it doesn't seem fully deterministic
<jekstrand> Unless someone's flushing these here denorms. :-/
* jekstrand tries the int test
<jekstrand> Yeah, either somethings going really wrong launching my kernel or this map is bad.
<jekstrand> But not all my maps are bad
<jekstrand> But I don't know panfrost well enough to know which to distrust more. :-(
Jasprose has quit [Remote host closed the connection]
lemonzest has joined #dri-devel
<jekstrand> karolherbst: test_allocations buffer fails :-/
nchery has joined #dri-devel
<jekstrand> Oh, test_allocations buffer is hanging. Or, at least, timing out.
<jekstrand> I guess that poor little GPU doesn't want to checksum 512MB that fast.
anholt has quit [Remote host closed the connection]
mclasen has joined #dri-devel
<kisak> dcbaker: if you have a spare moment, can you check that the (staging/)mesa 22.1 branch went live, and the 22.1-branchpoint tag?
<dcbaker> kisak: they haven't yet, I ran out of time waiting for marge to merge the version bump last night so I'm making them right now
<karolherbst> jekstrand: mhh, could be some flushing issue
<karolherbst> or fencing
<kisak> thanks, I was just checking if it fell into the abyss by accident
<dcbaker> just the CI abyss :)
<karolherbst> "Pass 2122 Fails 24 Crashes 30 Timeouts 0" :)
<alyssa> :D
<alyssa> iris?
<kisak> over here, I completely missed that llvm 14.0.0 was released until there was 14.0.1 news
<jekstrand> karolherbst: I typed up a u_default_clear_buffer() helper for panfrost. Maybe we should do that for iris too?
<karolherbst> alyssa: yes
<karolherbst> jekstrand: maybe
<karolherbst> jekstrand: soo.. image_size is somewhat broken with iris
<jekstrand> karolherbst: It's identical to buffer_subdata except with the repeat
<jekstrand> karolherbst: That's entirely possible.
<jekstrand> karolherbst: I did think it worked, though.
<karolherbst> but only for 1d and 2d arrays
<karolherbst> and only for images
<karolherbst> those tests pass on llvmpipe
<jekstrand> hrm
anholt has joined #dri-devel
<dcbaker> kisak: branches and tags are up, the release is cutting right now
<dcbaker> karolherbst: that's an interesting issue... we don't track that explicitly for C-like lanuages either, I suspect that as a side effect that a major gcc bump also changes some headers so ninja decides to rebuild everything because the headers have changed...
<karolherbst> jekstrand: heh.. maybe I messed up.. on ADL-S it asserts
<karolherbst> dcbaker: potentially
<karolherbst> but gcc doesn't has this strong version check on object files
<jekstrand> modulo CL/Vulkan/GL differences, image_size should work. Vulkan tests it.
<karolherbst> rustc bails if a dep is compiled with a different compiler
<karolherbst> or well.. different version at least
<karolherbst> I see a crash inside brw_nir_clamp_image_1d_2d_array_sizeshere
<karolherbst> I am sure I messed it up for good
Haaninjo has quit [Ping timeout: 480 seconds]
Haaninjo has joined #dri-devel
frieder has quit [Remote host closed the connection]
<karolherbst> jekstrand: what is a bit odd is that I get two image_size ops, one 32 and the other 64 bit
<jekstrand> that is odd
<jekstrand> Why is there a 64-bit one?
<karolherbst> because of the spir-v
<karolherbst> jekstrand: ahh.. I know
<karolherbst> get_image_* funcs return int
<karolherbst> get_image_array_size returns... size_t
<karolherbst> because.. you know
<jekstrand> Of course it does!
<karolherbst> this makes sense, because cl_image_desc.image_width is size_t
<karolherbst> (and the others)
<karolherbst> honestly...
<jekstrand> Yeah, we need to turn that into 32-bit in NIR somewhere.
<karolherbst> guess when handling OpImageQuerySizeLod
<karolherbst> (and OpImageQuerySize)
<jekstrand> Yeah, that would work.
<jekstrand> Or as some bit of lowering somewhere.
<jekstrand> Though spirv_to_nir seems as good a place as any for now.
<jekstrand> I can't envision us caring about 64-bit image dimensions any time soon
<karolherbst> CL doesn't care anyway
<karolherbst> the API allows it, but...
<karolherbst> but maybe they thought allowing that on arrays makes sense because....?
rkanwal has quit [Ping timeout: 480 seconds]
<karolherbst> jekstrand: vtn_handle_image is kind of a messy hell, isn't it? :D
<jekstrand> A bit
<karolherbst> it passes now :)
<jekstrand> :)
<jekstrand> I'm starting to think something is wrong with panfrost compute
<jekstrand> Kernels aren't launching right or something
* jekstrand runs test_basic
<karolherbst> but everything else works?
<jekstrand> hard to tell
<jekstrand> fpmath fails. :-/
<karolherbst> :(
<karolherbst> ohhhhh
<karolherbst> I think I know what's up
<jekstrand> sometimes
<karolherbst> weird
<karolherbst> you are aware that I still use this input buffer thing? :P
<jekstrand> what input buffer thing?
<jekstrand> Oh, the grid inputs?
<jekstrand> Right...
<karolherbst> yeah...
<jekstrand> Those might not be hooked up. :)
<karolherbst> :)
<jekstrand> but, wait... If they weren't, it'd be crashing on the NIR intrinsic, right?
<karolherbst> jekstrand: yeah...
<karolherbst> I guess somebody already did something there
<karolherbst> but maybe there is a sync issue or whatever weird stuff is going
<karolherbst> but I'd assume the kernel to just crash on hw anyway then
<karolherbst> jekstrand: are the int or conversion tests running? those are usually pretty trivial
<karolherbst> mhh.. I hope my USE_HOST_PTR emulation isn't broken, but I did do a run on iris with always using the shadow buffers and that worked fine
<karolherbst> the CTS kind of uses USE_HOST_PTR all over the place though
<karolherbst> jekstrand: didn't you had a patch for math_brute_force isnormal somewhere? or was that airlied?
<jekstrand> karolherbst: I've not touched that one
<karolherbst> I think I will go down the spirv-link hell...
<karolherbst> that's like 1 fail and 11 crashes
aravind has quit [Ping timeout: 480 seconds]
<karolherbst> but I think I will add a workaround to vtn so we don't depend on a fixed one
<karolherbst> shouldn't be too hard
<zmike> dcbaker: nice work on the release
<dcbaker> thanks!
<dcbaker> btw, could you look at the top commit of the staging/22.0 branch? I had to do some manual fixups on that
<zmike> looking
<zmike> did you get that llvmpipe patch into the 22.0 branch?
<dcbaker> I'
<dcbaker> what's in the staging is what's in right now
<dcbaker> I'm trying to work through my backlog of patches right now
<dcbaker> I unfortunately have a lot of them
<zmike> dcbaker: it looks like it hasn't landed then
<zmike> please make sure the next 22.0 release doesn't go out without "gallivm/sample: detect if rho is inf or nan and flush to zero"
<zmike> this is needed for conformance submissions
<zmike> and yeah that fixup looks 👍
<dcbaker> cool, I'll get the gallivm/sample patch in next
MajorBiscuit has joined #dri-devel
tjmercier has joined #dri-devel
<jekstrand> karolherbst: Can I specify a list of tests to run?
<karolherbst> jekstrand: yeah, kind of
<karolherbst> -i buffers
<karolherbst> not sure if I implemented subtests
<jekstrand> Ok, if I run fpmath_float fpmath_float2 fpmath_float4, it fails in FP_ADD float4.
<karolherbst> :(
<jekstrand> But they all pass individually
<jekstrand> So state's getting messed up somewhere
<karolherbst> oh no
<jekstrand> Uh oh... Now they all passed
<karolherbst> sounds like memory corruptions or something
<jekstrand> Quite possibly
* jekstrand runs with valgrind
<jekstrand> Valgrind on Arm... wah wah...
<daniels> it works fine
<jekstrand> Oh, I'm sure it works correctly. You just have to wait for it.
<HdkR> jekstrand: Time for an M1Ultra? :P
<jekstrand> HdkR: I keep telling people I'll buy an M1 once someone finishes writing the GPU kernel driver for it.
<jekstrand> And, no, I'm not going to sign up for that.
<jekstrand> Nor am I making any promisses about signing up once there's a kernel driver.
<jekstrand> But it's not compelling so long as the options are MacOS vs. llvmpipe.
<HdkR> It's true, even a VM is a bit of a pain
<jekstrand> Once drm_agx.ko is alive and well, then it might be a compelling platform to hack on.
<karolherbst> mhhh
<karolherbst> annoying
* jekstrand should probably use ubsan... it's faster.
<jannau> jekstrand: https://www.youtube.com/AsahiLina and I took over Alyssa's driver for the annoying display controller
<jekstrand> karolherbst: ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
<jekstrand> karolherbst: :'(
<karolherbst> :(
<jekstrand> karolherbst: Are you not enabling disk cache for clc loading?
<karolherbst> I do
<jekstrand> hrm...
<karolherbst> why?
<jekstrand> I'm seeing SPIR-V warnings on every test startup
<karolherbst> weird
<karolherbst> I am using the drivers disk_cache in case that makes a difference
<jekstrand> karolherbst: Looks like panfrost isn't giving you a disk cache
<karolherbst> I am very sad about this
<jekstrand> Yeah, panfrost doesn't disk cache. :-(
<jekstrand> This is sad
<dcbaker> pepp: I'm looking at "3c3a8f853d gallium/tc: zero alloc transfers" for 22.0, but I'm not sure it applies, since the tc storage PR isn't in 22.0. Should I pull that series, or just forget about that patch?
<jekstrand> It doesn't take taht long to build libclc but it's still a tad annoying
<karolherbst> yeah...
<karolherbst> I don't particular like the way we convert libclc to nir, but it is a device specific thing, so I don't want to use a rusticl internal disk_cache for that where I am sure we would't mess it up
<jekstrand> Running fpmath_float fpmath_float2 fpmath_float4 seems to fail about 1 in 5 or maybe a little less often.
<karolherbst> but I do plan to wire up OpenCL C to spir-v caching at some point
<jekstrand> yeah
tlwoerner has joined #dri-devel
<karolherbst> uhh.. I honestly have no good idea on how to fix this spirv-link stuff inside mesa :( My rough plan was to check if all needed variables were processed in vtn_handle_entry_point when calling another spir-v entrypoint, but ugh...
<karolherbst> OpVariables are already processed at this point
<karolherbst> mhhh, maybe not?
<karolherbst> ahh no, it already is
<karolherbst> guess fixing it inside spirv-link is the easier path (I hope)
garrison has joined #dri-devel
i-garrison has quit [Read error: Connection reset by peer]
* jekstrand wonders if there's something funny with synchronization and compute jobs
MajorBiscuit has quit [Ping timeout: 480 seconds]
oneforall2 has joined #dri-devel
* jekstrand is skeptical of panfrost_fence_finish
<jekstrand> Nah. It's fine. A little weird but fine.
<pepp> dcbaker: the bug predates the tc storage MR but it wasn't visible because the only app using this feature was viewperf. I think that pulling the 2 commits from !15298 makes sense
gawin has joined #dri-devel
MajorBiscuit has joined #dri-devel
pallavim_ has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.4]
pallavim has quit [Ping timeout: 480 seconds]
oilofparaf has joined #dri-devel
oilofparaf has left #dri-devel [#dri-devel]
ngcortes has joined #dri-devel
ds` has quit [Quit: ...]
ds` has joined #dri-devel
pallavim has joined #dri-devel
<dcbaker> pepp: sounds good, thanks
jkrzyszt_ has quit [Ping timeout: 480 seconds]
pallavim_ has quit [Ping timeout: 480 seconds]
mszyprow has joined #dri-devel
<karolherbst> jekstrand: does it work with clover?
<jekstrand> karolherbst: Haven't tried.
<karolherbst> ehh wait.. then you'd have to do this nir serialized stuff :)
<karolherbst> :(
<jekstrand> Yeah
<jekstrand> And I think I'm hitting a panfrost bug somewhere.
<karolherbst> potentially
<jekstrand> None of what I'm seeing looks like a rusticl bug
<karolherbst> or just some implicit gallium requierement I am not following
<jekstrand> Not with how well things are working or inirs.
<karolherbst> yeah.. probably
<alyssa> jekstrand: is panfrost supposed to use a disk cache nobody told me
<karolherbst> could be that I don't set correct MEM_FLAGS or something weird, or panfrost doesn't sync on the correct combinatin or other random things :/
<jekstrand> alyssa: "supposed" is a strong word. I'd generally recommend it.
<alyssa> What does common do for the driver and what does the driver have to do?
<alyssa> and are there docs anywhere?
<karolherbst> alyssa: you just serialize your shader and cache it
<karolherbst> that's essentially it
<jekstrand> alyssa: src/util/disk_cache.h. It's better documented than most of Mesa. :-/
<anholt> jekstrand: that doesn't help make sense of how it fits in gallium drivers, unfortunately.
<jekstrand> anholt: :-/
<karolherbst> alyssa: you can also look at how we added support for nouveau for that, it's all in one MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4264
<alyssa> * WARNING: 3rd party applications might be reading the cache item metadata. * Do not change these values without making the change widely known. * Please contact Valve developers and make them aware of this change.
<karolherbst> last patch is the driver stuff
<alyssa> this does not inspire confident
<alyssa> anholt: ^^ that
<anholt> alyssa: first step is you hook up the screen disk_cache. you have to do it in the driver, because you have to mix in your driver config knobs that might affect frontend shader compiles to the build id.
<alyssa> hrumble
<karolherbst> alyssa: that's just the metadata we add to cached entries internally
<karolherbst> alyssa: steam makes use of those
<anholt> with the screen stuff hooked up, mesa/st gets to cache the output of the frontend compiler->nir path.
* alyssa reads the reference implementation (v3d_disk_cache)
<karolherbst> yeah.. first step is just initing the cache and configure a proper hash key thingy
<karolherbst> alyssa: very very high level overview is.. store your compiler result in an uint8_t array and have a function doing the reverse :)
<anholt> alyssa: sp-disk-cache of my tree has a trivial version of doing it.
<jekstrand> Hrm... The pandecode.dump for all three of my compute dispatches has the same Push uniforms pointer. That seems a bit fishy.
<jekstrand> I would expect there to be a bit of ring buffering going on
<alyssa> anholt: karolherbst: ack, thank you
<alyssa> jekstrand: no ring buffer, all transient memory (ie. allocated off the batch) is freed when the batch is freed
<alyssa> and then the BO is returned to the BO cache
<jekstrand> alyssa: So I may just be getting the same BO over and over again?
<jekstrand> Hrm...
<alyssa> and compute jobs get put in their own batch right now for simplicity (this could be optimized)
<karolherbst> what I and Mark have done for nouveau was to 1. split structs into input and outputs ones, 2. write a hash functions for the input 3. write serialize/deserialize 4. hook it up
<alyssa> yes, if you're submitting the same stuff over and over
mclasen has quit [Ping timeout: 480 seconds]
<jekstrand> alyssa: Ok, that makes sense then.
mbrost has quit [Ping timeout: 480 seconds]
<karolherbst> mhhh
<karolherbst> that's ehhh.
<karolherbst> wait..
<alyssa> yes, this means that there are false deps between batches.
<jekstrand> woo
<alyssa> might be leaving some perf on the table. don't know.
<karolherbst> yeah I guess that can break rusticl then :)
<karolherbst> jekstrand: I could imagine that calls to create_compute_state could overwrite the content of the bo without the hardware executes stuff, no?
<karolherbst> but mhh
<karolherbst> yeah I guess this can happen if there is no sync point in between
<karolherbst> jekstrand: you could try a ctx.flush().wait(); after launch_grid and see if that changes anything?
<alyssa> shouldn't happen, it's serialized in the kernel
<alyssa> there is a flush after lunch grid in panfrost
<karolherbst> yeah.. but I also unbind the entire state directly after launch_grid...
<jekstrand> PAN_MESA_DEBUG=sync seems to make the fails go away
<karolherbst> so wait might be important
<karolherbst> I copied clover design here, which I still don't like :)
<alyssa> jekstrand: That's... odd.
<jekstrand> I'm starting to think our clEnqueueReadBuffer is racing with the kernel
ybogdano has quit [Ping timeout: 480 seconds]
<alyssa> Plausible
<alyssa> depending on what transfer flags you use, gallium can optimize away your correctness ;)
<karolherbst> MAP_READ_WRITE
<jekstrand> pan_set_global_binding is calling panfrost_batch_write_rsrc so it should be getting a write fence set on it in the kernel ioctl
<jekstrand> Let me double-check that
<karolherbst> at some point I will optimize setting all those flags :)
<alyssa> karolherbst: sync? unsync? etc
<karolherbst> potentially PIPE_MAP_UNSYNCHRONIZED on non blocking maps
<karolherbst> but we do flush and wait later
<karolherbst> but clEnqueueReadBuffer is synced
<karolherbst> so it's just READ_WRITE
<alyssa> ok..
<karolherbst> should be just READ really, but...
<karolherbst> I don't fight those bugs yet
<karolherbst> *want to
mvlad has quit [Remote host closed the connection]
<karolherbst> " SPIRV-Headers was not found - please checkout a copy under external/." I already don't want to :D
<jekstrand> karolherbst: What are you building now?
<karolherbst> spirv-tools
<karolherbst> the linker is buggy :(
<jekstrand> :(
<karolherbst> yeah...
<karolherbst> I thought we could work around it in mesa, but... it's complicated
mclasen has joined #dri-devel
mbrost has joined #dri-devel
columbarius has joined #dri-devel
co1umbarius has quit [Ping timeout: 480 seconds]
nchery has quit [Ping timeout: 480 seconds]
nchery has joined #dri-devel
<karolherbst> " error: ‘spvValidate’ is not a member of ‘spvtools’; did you mean ‘spvValidate’?"
<karolherbst> it tries to tell me I shouldn't use the namespace thingy
<jekstrand> Ugh... Why don't GDB watchpoints work on this box?!?
<jekstrand> Or maybe they do and they're just that slow? idk.
<karolherbst> watchpoints are this kind of thing I set 5 times until I set them correctly
fw400 has joined #dri-devel
<karolherbst> but yeah
<karolherbst> watchpoints make things slow :)
fw400 has left #dri-devel [#dri-devel]
Duke`` has quit [Ping timeout: 480 seconds]
<jekstrand> someone is removing BOs from my batch
<jekstrand> I think that's why it's failing
<karolherbst> jekstrand: did you try flush().wait() after launch_grid and/or disable unbinding all the stuff?
<jekstrand> karolherbst: unbinding doesn't matter
<karolherbst> I hope you are right :)
<jekstrand> karolherbst: set_global_binding with resources == NULL is a no-op on panfrost
<karolherbst> ahh
<alyssa> ..how would BOs get removed from a batch
<jekstrand> idk
<jekstrand> I tried to set a watchpoint on batch->num_bos but GDB hates me
<alyssa> there's no api for that, for good reason
<karolherbst> jekstrand: watch -l &batch->num_bos ?
ybogdano has joined #dri-devel
pallavim has quit [Read error: Connection reset by peer]
<karolherbst> (at least I think this is the correct way of using watch)
<jekstrand> karolherbst: Yeah, I tried that. GDB hates me. :P
<karolherbst> annyoing
<karolherbst> maybe it doesn't get written
<karolherbst> ahh check_interface_variable :)
<jekstrand> Found it!
<karolherbst> there it is
<karolherbst> jekstrand: \o/
<karolherbst> so, is it my fault or someone elses?
<jekstrand> Someone else
<karolherbst> yes
<alyssa> Is the someone else me
<jekstrand> Maybe
<alyssa> Shoot
<jekstrand> The someone else is whoever hooked up set_global_binding
<alyssa> Yeah that sounds like me
maxzor has quit [Ping timeout: 480 seconds]
<alyssa> git blame says Icecream95 actually, but I'll take the blame anyway if you want :p
<jekstrand> karolherbst: What's the cap for max global bindings?
<karolherbst> uhhh, there is one?
<alyssa> + /* The handle points to uint32_t, but space is allocated for 64 bits */
<alyssa> does rusticl not emulate that particular clover quirk? :p
h0tc0d3 has quit [Remote host closed the connection]
<karolherbst> alyssa: i do the same
<alyssa> ack
<karolherbst> but it doesn't support subbuffers :p
<jekstrand> alyssa: It'll be obvious as soon as I send the patch. :)
<karolherbst> jekstrand: btw, you might want to wire in support for sub buffers
<alyssa> jekstrand: :D
<karolherbst> jekstrand: that's the kind of terrible interface set_global_bindings is: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/d9b1a592fdd4ef734f77eb7395e7a589e9df38dc
<karolherbst> for subbuffers I have to offset with the offset.. obviously
<jekstrand> karolherbst: I don't understand what that does
maxzor has joined #dri-devel
<jekstrand> karolherbst: Is the handle an input also?
<karolherbst> jekstrand: reads out handles, adds the resource address, writes it back
<karolherbst> yes....
<jekstrand> Oh, that's truly horrible
<karolherbst> it is
<karolherbst> but it's required for subbuffers
<karolherbst> that was the reason those tests failed ...
<karolherbst> there isn't really a better way of doing it except fixing up the offset inside rusticl
<karolherbst> but clover adds the offset this way
h0tc0d3 has joined #dri-devel
<karolherbst> I wouldn't mind replacing it with something new :)
<karolherbst> or maybe we also change what clover is doing...
mbrost has quit [Ping timeout: 480 seconds]
rasterman has joined #dri-devel
maxzor has quit [Ping timeout: 480 seconds]
<h0tc0d3> Does anyone know why the release schedule is not working? https://docs.mesa3d.org/release-calendar.html
<h0tc0d3> The plan was to have a corrective release of 22.0.2, but there is none.
<airlied> I think dcbaker is working on it at present
* jekstrand kicks off a new full CTS run on panfrost
<jekstrand> This one should be much less full of random fail
<h0tc0d3> The 2 week release schedule hasn't been working for over a month now.
* jekstrand goes to find a nap while he waits for his panfrost CTS run. It'll probably take an hour or mor.
<karolherbst> my name shows up more often than I'd like to in those spirv repos :D
<karolherbst> at least now I have a plan on fixing this linker bug
<jekstrand> karolherbst: \o/
maxzor has joined #dri-devel
<alyssa> jekstrand: ...so what we had before was more of a set_local_binding? :p
lynxeye has quit [Quit: Leaving.]
Major_Biscuit has joined #dri-devel
MajorBiscuit has quit [Ping timeout: 480 seconds]
<airlied> robclark: oh also in case you are wondering, agd5f PR also just say Fixes but provide a list in the signed tag
<airlied> I'm also happy with that :-)
freem_ has joined #dri-devel
mbrost has joined #dri-devel
<robclark> airlied: I guess I should figure out how to do the signed tag thing
dj-death has quit [Ping timeout: 480 seconds]
Lucretia-backup has joined #dri-devel
Lucretia has quit [Ping timeout: 480 seconds]
Major_Biscuit has quit [Ping timeout: 480 seconds]
ybogdano has quit [Ping timeout: 480 seconds]
nchery has quit [Ping timeout: 480 seconds]
garrison has quit [Read error: Connection reset by peer]
garrison has joined #dri-devel
<karolherbst> I was so close to reimplement that stuff, but then I saw that this pass doesn't remove, but just recalcluates
mszyprow has quit [Ping timeout: 480 seconds]
* karolherbst kicks of another run
ybogdano has joined #dri-devel
h0tc0d3 has quit [Quit: Leaving]
<jekstrand> Pass 1364 Fails 346 Crashes 466 Timeouts 0
nchery has joined #dri-devel
<karolherbst> jekstrand: that's not too bad
<karolherbst> meanwhile: Pass 2137 Fails 22 Crashes 17 Timeouts 0 :3
<karolherbst> jekstrand: the list becomes very short
<karolherbst> jekstrand: what's the biggest problem with panfrost though?
<karolherbst> anything standing out or just random stuff all over the place
<alyssa> broken compiler stuff I would wager? lot of code paths not previously exercised
<jekstrand> alyssa: Yeah, just made it lower 8-bit integers
<jekstrand> Something's wrong with vec3
<karolherbst> vec3?
<jekstrand> Error for vector size 3 found at 0x00000001: *0x01 vs 0x00
<jekstrand> Input value: 0x01 (convert_char3( char3 )) *** convert_charn( charn ) FAILED **
<karolherbst> I thought we lower that
<karolherbst> ehh
mbrost_ has joined #dri-devel
gawin has quit [Ping timeout: 480 seconds]
<karolherbst> jekstrand: btw, did you figure out why clamping is broken?
<jekstrand> karolherbst: Nope
<jekstrand> Those tests are evil
mbrost has quit [Read error: Connection reset by peer]
<karolherbst> yeah...
<jenatali> Yep
<karolherbst> jenatali: I will figure out the divide fails btw
<karolherbst> eh
<karolherbst> jekstrand: ^^
<karolherbst> jenatali: btw.. I think you could move to CL 3.0 by now :D
<jenatali> With LLVM 14 or 15, right?
<karolherbst> I have a fix for that linking issue
<karolherbst> but yeah.. it might require newer llvm as well
<jenatali> Yeah. I'll get around to it... eventually...
<jenatali> I hope
<karolherbst> I don't think it's much work tbh
<jenatali> Yeah but it's still a big context switch from my current priorities
<karolherbst> just some new APIs, but nothing really new
<karolherbst> I see
<jenatali> I've implemented CL3.0, but only exposing CL C 1.2
<jenatali> So the APIs are done
<alyssa> rusticl dx12 when
<karolherbst> alyssa: I was already wondering if I should push it through zink
* karolherbst hides
<jenatali> It's not out of the question
<alyssa> jenatali: The question was for jekstrand, unless you said yes ;-p
<jenatali> Just, I don't have nearly enough time in the day, and my personal time for this kind of stuff has completely evaporated
<karolherbst> but I don't think it would be a very good fit tbh
<karolherbst> it's so gallium specific
<jekstrand> TBH, once you've got the Mesa compiler, gallium doesn't buy you that much
<jenatali> ^^
<karolherbst> sure
<karolherbst> but most of the code isn't interfacing with the mesa compiler
<jenatali> The CL API surface area is so tiny compared to the compiler infrastructure
<karolherbst> yeah
<karolherbst> but what I meant is, if rusticl would run on dx12, we'd essentially would write it from scratch as most things would have to change...
<karolherbst> maybe the api validation could stay
* karolherbst steals scale_fdiv
<jenatali> Oh I assumed alyssa meant on the d3d12 gallium backend. I don't see any point giving rusticl a direct DX12 backend
<karolherbst> jenatali: how do you want to implement set_global_bindings
<karolherbst> I am not doing globals to ssbo lowering
<jenatali> Yeah I'd want them as ssbos
<karolherbst> I guess we would have to take some of the passes from src/microsoft and have some flips
<karolherbst> maybe that could work, but...
<jenatali> Yeah. Or else move stuff from the frontend into the backend
<jenatali> But doesn't seem worth it. We're happy with our frontend for now
<karolherbst> yeah, and I don't want to bother backends with random stuff
<karolherbst> basing it on top of zink on the other hand....
<alyssa> jenatali: That is what I meant yes
gawin has joined #dri-devel
<karolherbst> I see we'd need some vulkan extensions
<jenatali> karolherbst: I assume zink would also want it lowered to ssbo
<karolherbst> but honestly.. we could just allow kernel spir-vs, do the 64 bit buffer stuff and that would be it
<jenatali> Or I guess you could do it with the BDA extensions
<karolherbst> or new extensions
<karolherbst> just push in the spir-v kernels
<jenatali> But also then you're just rewriting clspv :)
<jenatali> Er, clvk?
<karolherbst> I wouldn't
<alyssa> consonants sure do
<karolherbst> their idea was to not fix vulkan
<jekstrand> Yeah, you'd just emit a pile of bda
<karolherbst> ohh, so bdas are good enough for CL buffers?
<jenatali> "Fix..." their idea was to fix CL by making it work on Vulkan :)
<karolherbst> :D
<karolherbst> yeah well
<karolherbst> does anybody use it?
<jekstrand> anybody use what?
<karolherbst> clvk
<jenatali> CL's already pretty niche I feel like. I know they had a target use for clvk but I don't remember what it is. And I doubt they have many more customers
<jekstrand> I don't know about clvk but there's at least one major app doing serious compute work shipping on clspv
<karolherbst> honestly.. I think using zink is probably the best option here and just add a new extension for spirv kernels
<karolherbst> ahh
<karolherbst> jekstrand: but clspv is just kernel spirv to vulkan spirv, right?
<jenatali> Pretty much
<jekstrand> yup
<karolherbst> yeah I guess if that fits your use case
<karolherbst> anyway, I don't have any plans wiht rusticl anyway, I just wanted to learn rust :D
* jekstrand wants Mesa to have a competent compute story
* jekstrand still isn't quite sure what story that will be
<karolherbst> yeah...
<karolherbst> I think a CL stack at least as good as intels or AMDs would be a good starting point
<karolherbst> (making it pointless to their install theirs that is)
<jekstrand> alyssa: What splits vec3 loads/stores into scalar?
gawin has quit [Ping timeout: 480 seconds]
<jekstrand> Maybe LLVM is doing that?
<jekstrand> that's believable
<karolherbst> mhh
<karolherbst> scale_fdiv doesn't fix ERROR: divide: -16777216.000000 ulp error at {-0x1.fffffep+127, -0x1.fffffep+127}: *0x1p+0 vs. 0x0p+0 (0x00000000) at index: 198 :(
<karolherbst> but that's a subnormal, isn't it?
<jekstrand> I think that's just 1 vs 0
<karolherbst> yeah, but the inputs
<karolherbst> -0x1.fffffep+127 / -0x1.fffffep+127
<jekstrand> We may need to use the actual fdiv opcode
<karolherbst> ohh, you don't?
<jekstrand> Nope
<jekstrand> We don't for GL
<karolherbst> how can I flip it?
<jekstrand> We do mul+rcp
mbrost_ has quit [Remote host closed the connection]
<karolherbst> ahh
<jenatali> karolherbst: We don't support denormals, we always flush them
<karolherbst> yeah, that won't work
<jekstrand> It'll require some compiler work.
mbrost_ has joined #dri-devel
<karolherbst> okay
<jekstrand> Not too much but more than zero
<karolherbst> right
<karolherbst> yeah, we want real fdiv for CL :)
<jekstrand> We don't for GL because you can often CSE the RCP
<karolherbst> now what's up with "mem_host_flags mem_host_write_only_image"
<karolherbst> sure
<karolherbst> and a real fdiv is slow
<jekstrand> idk that it's that much slower than rcp
<jekstrand> But it's slower than fmul
<jekstrand> By a lot
<karolherbst> yeah
<karolherbst> not that luxmark perf tanks when we start using it :D
* jekstrand hates this kernel
<jekstrand> llvm turns a very simple char3 load/store into a giant pile of garbage
<karolherbst> classic llvm
h0tc0d3 has joined #dri-devel
<karolherbst> why though?
<jekstrand> idk
<jekstrand> Why does LLVM do anything it does?!?
<karolherbst> maybe it's not aligned
<karolherbst> because llvm is a smart compiler always doing the right thing, everybody knows that
<karolherbst> ehhh the remaining fails are painful
<jekstrand> Yeah, LLVM is definitely checking for alignment and emitting 64-bit load/store if it can.
<karolherbst> can we disable that?
<jekstrand> I think it decided this test was some poor soul's hand-rolled memcpy. :joy:
<karolherbst> basically I just want llvm to give us the plain thing, no idiotic postprocessing :D
<karolherbst> lol
<karolherbst> we do pass -O0 into llvm
<jekstrand> Maybe we need to pass -O-0?
<jekstrand> In any case, panfrost should be able to compile this even if it is stupid
mbrost_ has quit [Ping timeout: 480 seconds]
<karolherbst> I still don't want llvm to do silly things :D
* jekstrand views that as inevitable
<karolherbst> jekstrand: which test is it btw?
<jekstrand> test_conversions char_char
<jekstrand> char3 case, to be particular
<karolherbst> what the...
<jekstrand> int_int also fails
<jekstrand> So it's not an 8-bit problem
<karolherbst> ahh no
<karolherbst> it's not llvm
<karolherbst> it's the CTS
<jekstrand> wha?
<karolherbst> the CTS special cased vec3 and uses vloadn
<jekstrand> Of course it did...
<jekstrand> So it's probably a vloadn problem
<karolherbst> potentially
morphis has quit [Ping timeout: 480 seconds]
<karolherbst> vloadn doesn't guarentee alignment
* jekstrand looks at vloadn
<jekstrand> right
morphis has joined #dri-devel
<karolherbst> well besides what the base type needs
<jekstrand> Do we implement vloadn ourselves or use libclc?
<jekstrand> we do it ourselves
<karolherbst> yeah
<karolherbst> not sure if libclc has an impl
<jekstrand> it appears to
<jekstrand> Ugh... Yeah, this is all in the CTS test
<jekstrand> Ok, this makes a lot more sense
<jekstrand> ironically, bifrost seems to have load/store_i24 opcodes. :-/
<jekstrand> I wonder if get_global_size is just wrong
h0tc0d3 has quit [Remote host closed the connection]
mbrost_ has joined #dri-devel
neonking has quit [Ping timeout: 480 seconds]
maxzor has quit [Ping timeout: 480 seconds]
<karolherbst> jekstrand: ohh so you include bound checks?
<karolherbst> eh wait
<karolherbst> get_global_size is this CL thing :D
maxzor has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
<karolherbst> soo.. why is api clone_kernel crashing...
khfeng has joined #dri-devel
<karolherbst> jekstrand: I can't use nir_opt_dead_write_vars before inlining, can I?
<karolherbst> it asserts on vec1 64 ssa_5 = deref_var &copy_in (function_temp struct.structArg)
ybogdano has quit [Ping timeout: 480 seconds]
<karolherbst> ehh intrinsic copy_deref (ssa_5, ssa_4) (dst_access=0, src_access=0)
<karolherbst> wait, it's not inlining, but I have to split
rasterman has quit [Quit: Gettin' stinky!]