<karolherbst>
yeah.. but it does seem to do the correct thing given the spir-v spec
Ristovski has quit [Ping timeout: 480 seconds]
tales__ has quit []
tales__ has joined #dri-devel
Ristovski has joined #dri-devel
tales__ has left #dri-devel [#dri-devel]
<jekstrand>
airlied: I'm starting to think that functions may be something I need to start on sooner rather than later.
<jekstrand>
Because if I don't, someone else will, badly, and then I'll get to clean up the mess. :-/
<jekstrand>
Or maybe that's just too pessimistic of me.
<jekstrand>
idk
<jekstrand>
The first step, though, is to fix various optimization passes so they're valid to run pre-lowering. Lots of stuff assumes everything is inlined.
<jekstrand>
Even if it has nothing to do with functions, lots of passes make implicit assumptions the author didn't realize.
<jekstrand>
Maybe I should dig into that with the Intel compiler and see where it goes.
<jekstrand>
I'll probably stick with compute for now. Other stages have lots of I/O which makes assuming inlining is really convenient.
tales-aparecida has joined #dri-devel
<icecream95>
jekstrand: r u saying that my implementation of functions is going to be bad??
<jekstrand>
icecream95: Not you in particular. I just tend to assume that people will forget things or not know how NIR works somewhere or not realize that they're running optimizations that aren't function-safe or whatever.
<jekstrand>
It's not an insult.
<icecream95>
jekstrand: I'm going to try to keep to the backend side, so hopefully will not make too many conflicting changes to NIR passes
<jekstrand>
Yeah, the back-end is where a lot of the pain is.
<jekstrand>
Retrofitting the Intel back-end will be... entertaining.
<jekstrand>
Not especially looking forward to that TBH
<jekstrand>
But I should be able to pull something off.
<icecream95>
But first I need to bisect an issue around shadow samplers..
<icecream95>
Let me guess.. GL_CLAMP emulation?
<karolherbst>
icecream95: shadow samplers?
<icecream95>
karolherbst: This isn't CL
<karolherbst>
ahh
<karolherbst>
jekstrand: I am wondering if we should just emit two arrays for the format+order thing at index it with the image index...
<karolherbst>
*and
<karolherbst>
I don't really like the approach I had for clover, where I was adding uniform variables for each image
<icecream95>
zmike: lower_tex_to_txd seems to miss a number of fields that should be copied, such as is_shadow
<zmike>
better copy them then
<zmike>
the txb one probably does too
<airlied>
jekstrand: I did have it pass functions through to llvmpipe and not have nir explode
<karolherbst>
it's always failing at that one pixel though
<karolherbst>
maybe we overflow somewhere?
ngcortes has quit [Ping timeout: 480 seconds]
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
rkanwal has quit [Ping timeout: 480 seconds]
<karolherbst>
"kernel compilation time: 175809ms"
<karolherbst>
I think we can do better
ybogdano has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
Daanct12 has joined #dri-devel
mbrost has joined #dri-devel
mbrost has quit []
h0tc0d3 has quit [Remote host closed the connection]
Daanct12 has quit [Quit: Leaving]
mbrost has joined #dri-devel
Daanct12 has joined #dri-devel
h0tc0d3 has joined #dri-devel
Daanct12 has quit [Remote host closed the connection]
Daanct12 has joined #dri-devel
Daanct12 has quit []
Daanct12 has joined #dri-devel
Daanct12 has quit [Remote host closed the connection]
<jekstrand>
airlied: I'm less worried about exploding than I am about silently and subtly optimizing something wrong and you have no clue why.
<jekstrand>
airlied: Lots of stuff doesn't actually think through global variables very well, for instnace.
<jekstrand>
We tend to assume they're just like locals (global as in nir_var_shader_temp)
<jekstrand>
We also used to have a bunch of metadata problems where it would or wouldn't get invalidated on a per-shader basis when it should have been per-function. I think that one's mostly sorted now.
<airlied>
jekstrand: hopefully a full cl CTS would show up any major insanity, but sounds like a lot of auditing
<jekstrand>
airlied: What CTS? OpenCL? Nah. it's not that complex.
<jekstrand>
I think some of it will be shown by optimizing libclc more before we inline it all.
<jekstrand>
And successfully running luxmark would give me some confidence.
<jekstrand>
The Vulkan CTS might have enough function stuff going on; not sure.
<jekstrand>
It either doesn't use functions at all or uses them for lots of stupid.
<Sachiel>
the graphicsfuzz tests have plenty of those
mbrost has quit [Remote host closed the connection]
mbrost has joined #dri-devel
Company has quit [Quit: Leaving]
ppascher has joined #dri-devel
<jekstrand>
Yeah, I think if I could run the full Vulkan CTS with zero inlining for compute shaders, I'd have a reasonable level of confidence that it was working.
neonking has quit [Ping timeout: 480 seconds]
Daanct12 has joined #dri-devel
tales-aparecida has quit [Remote host closed the connection]
shankaru has joined #dri-devel
tales_ has quit []
mbrost has quit [Remote host closed the connection]
mbrost has joined #dri-devel
Duke`` has joined #dri-devel
mbrost has quit [Read error: Connection reset by peer]
mhenning has quit [Remote host closed the connection]
Administrator has joined #dri-devel
Administrator has quit [Remote host closed the connection]
<airlied>
robclark: can you put a small bit more summary info in fixes pull requests :-)
rgallaispou1 has joined #dri-devel
<robclark>
airlied: last -fixes is "fail less at system suspend plus misc small fixes"?
<airlied>
cool, I just stuck in some guess work anyways :-P
<robclark>
system suspend is, tbh, something I wonder about with other drivers.. we've been finding some fun corner cases that we wouldn't have seen without umm.. crowd sourced debugging (ie. digging through crash reports from field)
rgallaispou has quit [Ping timeout: 480 seconds]
rgallaispou1 has quit [Read error: Connection reset by peer]
rgallaispou has joined #dri-devel
lemonzest has joined #dri-devel
OftenTimeConsuming is now known as Guest1894
sdutt has quit [Read error: Connection reset by peer]
Guest1894 has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
Daanct12 has quit [Ping timeout: 480 seconds]
mszyprow has joined #dri-devel
shankaru has quit [Quit: Leaving.]
shankaru has joined #dri-devel
mszyprow has quit [Ping timeout: 480 seconds]
pnowack has joined #dri-devel
pnowack has quit [Remote host closed the connection]
nchery has quit [Read error: Connection reset by peer]
lynxeye has joined #dri-devel
Haaninjo has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
ppascher has quit [Ping timeout: 480 seconds]
vyivel has quit [Remote host closed the connection]
bl4ckb0ne has quit [Remote host closed the connection]
emersion has quit [Remote host closed the connection]
bl4ckb0ne has joined #dri-devel
emersion has joined #dri-devel
vyivel has joined #dri-devel
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
pnowack has quit [Quit: pnowack]
jljusten has quit [Quit: WeeChat 3.4]
jljusten has joined #dri-devel
pnowack has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
maxzor has quit [Ping timeout: 480 seconds]
dliviu has joined #dri-devel
pallavim has joined #dri-devel
nashpa has quit [Ping timeout: 480 seconds]
icecream95 has quit [Ping timeout: 480 seconds]
natto has quit [Ping timeout: 480 seconds]
mclasen has joined #dri-devel
jkrzyszt has joined #dri-devel
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
<karolherbst>
dcbaker: I triggered an annoying bug. If rustc gets updated on the system in the meantime, meson doesn't recompile stuff. Not sure if I reported it in the past or not
<karolherbst>
not sure if we still need it though?
rkanwal has joined #dri-devel
nchery has joined #dri-devel
Company has joined #dri-devel
ROw has joined #dri-devel
SR_71 has quit [Ping timeout: 480 seconds]
shankaru has quit [Quit: Leaving.]
mclasen has quit [Ping timeout: 480 seconds]
<jekstrand>
karolherbst: ! I like! That's exactly what I wanted.
neonking has quit [Ping timeout: 480 seconds]
natto has joined #dri-devel
mclasen has joined #dri-devel
pcercuei has joined #dri-devel
* jekstrand
starts a CL CTS run on panfrost
<jekstrand>
Without clear_buffer or clear_texture, it's gonna be a bit busted but better than nothing, I guess.
karolherbst has quit [Ping timeout: 480 seconds]
karolherbst has joined #dri-devel
LexSfX has quit [Remote host closed the connection]
LexSfX has joined #dri-devel
neonking has joined #dri-devel
<karolherbst>
jekstrand: we kind of need a better solution for those static inlines :(
<jekstrand>
karolherbst: context?
<karolherbst>
Also I kind of plan to port over to rust 2018, just don't know if I want to fix up history or do one mega commit
<karolherbst>
jekstrand: like bindgen won't generate bindings for static inline functions
<jekstrand>
karolherbst: right
ppascher has joined #dri-devel
Haaninjo has quit [Read error: Connection reset by peer]
Haaninjo has joined #dri-devel
Jasprose has joined #dri-devel
shankaru has joined #dri-devel
<dv_>
does anyone know if it is valid to dup() the FD of an open dma-heap device node?
<karolherbst>
jekstrand: I am seriously thinking about just emitting two u16 arrays for format+order and always put those into the input buffer.. because handling indirects any other way would be brutal
<jekstrand>
karolherbst: I guess. I didn't think the input-per-image was too bad
<jekstrand>
Depends on when you want to do the lowering, I guess.
<karolherbst>
yeah, it's not, it just gets complicated in terms of DCE and what if you have an indirect
<karolherbst>
anyway.. it's also just 32 bits per image argument
* jekstrand
decides to let Fedora download LLVM debug symbols this time
<karolherbst>
big mistake :P
<karolherbst>
nir_load_deref(nir_build_deref_array(nir_load_var)) is what I need to do, right?
<karolherbst>
nnoooo.. those #undefs in nir_builder.h are killing my wrapper :D
<jekstrand>
Why are you trying to write NIR passes in Rust?
<jekstrand>
You're asking for pain
<karolherbst>
because I am not doing much
<karolherbst>
but yeah.. maybe I should move the pass into C code and see how I deal with sharing data
<jekstrand>
*sigh* Who thought vec4 for compute was a good idea? Aparently, Arm did...
<imirkin>
it's 4x faster
<karolherbst>
:) it's 4 times as fast as scalar, everybody nows that
<imirkin>
lol
<karolherbst>
*knows
<imirkin>
and two people can't both be wrong
<karolherbst>
:D
mszyprow has quit [Ping timeout: 480 seconds]
<jekstrand>
The annoying thing is that, if we want this mess to actually work, we either need to teach the bifrost compiler vec8 and vec16 or we need to make nir_lower_alu_to_scalar a generic narrowing pass that takes a maximum width or a callback or something.
Net147 has quit [Quit: Quit]
Net147 has joined #dri-devel
jewins has joined #dri-devel
<karolherbst>
jekstrand: I'd guess the latter is better
<jekstrand>
karolherbst: Yeah, I'm looking into that.
<jekstrand>
karolherbst: I don't think it should be that hard to make it narrow instead of always scalarize
<karolherbst>
shouldn't
maxzor has joined #dri-devel
sdutt has joined #dri-devel
fxkamd has joined #dri-devel
tlwoerner has quit [Read error: Connection reset by peer]
flto has quit [Ping timeout: 480 seconds]
ella-0 has joined #dri-devel
<karolherbst>
soo.. kernel side is all done for format and order :) now I just have to upload the values
flto has joined #dri-devel
ella-0_ has quit [Remote host closed the connection]
alyssa has joined #dri-devel
<alyssa>
I find myself wanting to write developer docs for panfrost
<karolherbst>
that's explicit stride stuff or something
<karolherbst>
I though it's some memory corruption somewhere, but Jason said that's how it's supposed to look like
<karolherbst>
PASSED 42 of 42 sub-tests. :)
<karolherbst>
jenatali: do you have any data on what image formats/types applications are most interested in?
<karolherbst>
uhhh "Returned array size did not validate (expected 53, got 0)" :(
<jenatali>
karolherbst: no, not really. I'd assume the normal 8bpc unorm stuff
<karolherbst>
yeah... I mean. CL already specifies what's required, I am just wondering what I should care about on top from the start
<karolherbst>
and the 8bpc unorm stuff is already included in that afaik
<karolherbst>
but like.. only CL_R, not CL_A
<karolherbst>
although as long as stuff passes I can also just expose as much as possible
<jenatali>
Yeah I doubt apps really care for much beyond the required
<karolherbst>
I just don't have a nice way of declaring the CL -> pipe mappings, so every combination is a new entry :(
<jenatali>
My read on how CL was designed was that the speccers just looked at what they *could* do without asking what people want
<karolherbst>
ohh that's for sure
<jenatali>
How else do you end up with CL2.x that nobody uses
<karolherbst>
but some might want want 2 channel images
<karolherbst>
which are purely optional
<karolherbst>
anyway, if you don't have any data on that, then I guess we have to see what people complain about :)
<jenatali>
Oh didn't realize. I just hooked up whatever D3D supports, which covered all the required, I didn't look at which ones of them were optional
<karolherbst>
yeah right..
mclasen has quit []
<karolherbst>
in C you can also just do loops and macro magic
<karolherbst>
rust macros can't create new tokens :(
<jekstrand>
karolherbst: Yeah, that one might be able to scalarize too. I can't remember.
<karolherbst>
jekstrand: nir_lower_io_to_scalar.c
shankaru has quit []
<karolherbst>
but I guess it only does input atm
<karolherbst>
I doubt it's hard to add support for global there
* jekstrand
will type something mali-specific for now
<karolherbst>
nooo.. I broke stuff :(
HankB__ has quit [Remote host closed the connection]
HankB__ has joined #dri-devel
sdutt has quit []
sdutt has joined #dri-devel
jkrzyszt_ has joined #dri-devel
ybogdano has joined #dri-devel
<karolherbst>
ehh.. crap
<karolherbst>
we have seperate numbering for readonly and writeonly images
jkrzyszt has quit [Ping timeout: 480 seconds]
mclasen has quit [Ping timeout: 480 seconds]
kmn has quit [Quit: Leaving.]
Duke`` has joined #dri-devel
mbrost has joined #dri-devel
ybogdano has quit [Ping timeout: 480 seconds]
<jekstrand>
uh... Why am I getting 64-bit immediates in this shader?!?
<karolherbst>
jekstrand: you don't want them?
<jekstrand>
The panfrost compiler doesn't seem to think so. (-:
<karolherbst>
well that's just sad
<karolherbst>
the hw is 64 bit though, no?
<jekstrand>
The panfrost compiler also thinks it's lowering 64-bit stuff away and that's clearly not happening. :-/
<jekstrand>
Ooh, because I added it! Drp.
<karolherbst>
:D
* jekstrand
needs to lower harder
<karolherbst>
I think I overengineered again
<karolherbst>
"An image type cannot be used to declare a variable, a structure or union field, an array of images, a pointer to an image, or the return type of a function."
<karolherbst>
that makes things simple
<jekstrand>
:)
Peuc has joined #dri-devel
<karolherbst>
I still keep the array as this makes it easier in rusticl, but still :D I wanted to figure out how to properly solve the issue of indirects at readonly and writeonly images, but guess the spec solves that for me
<alyssa>
jekstrand: sounds like you're having fun
<jekstrand>
alyssa: more or less. :)
<karolherbst>
ehh
<karolherbst>
I think image_deref_format lowering is broken
lemonzest has quit [Quit: WeeChat 3.4]
<karolherbst>
ehh maybe not
nchery has quit [Ping timeout: 480 seconds]
<karolherbst>
I can rely on the access thing, no?
<jekstrand>
I've got test_buffers buffer_map_write_float not dying, but it fails random test. :-/
<karolherbst>
mhhh
<karolherbst>
let me check what was a good test to start all of this
abhinav__ has joined #dri-devel
<karolherbst>
jekstrand: does allocations buffer work?
<jekstrand>
karolherbst: I did think it worked, though.
<karolherbst>
but only for 1d and 2d arrays
<karolherbst>
and only for images
<karolherbst>
those tests pass on llvmpipe
<jekstrand>
hrm
anholt has joined #dri-devel
<dcbaker>
kisak: branches and tags are up, the release is cutting right now
<dcbaker>
karolherbst: that's an interesting issue... we don't track that explicitly for C-like lanuages either, I suspect that as a side effect that a major gcc bump also changes some headers so ninja decides to rebuild everything because the headers have changed...
<karolherbst>
jekstrand: heh.. maybe I messed up.. on ADL-S it asserts
<karolherbst>
dcbaker: potentially
<karolherbst>
but gcc doesn't has this strong version check on object files
<jekstrand>
modulo CL/Vulkan/GL differences, image_size should work. Vulkan tests it.
<karolherbst>
rustc bails if a dep is compiled with a different compiler
<karolherbst>
or well.. different version at least
<karolherbst>
I see a crash inside brw_nir_clamp_image_1d_2d_array_sizeshere
<karolherbst>
I am sure I messed it up for good
Haaninjo has quit [Ping timeout: 480 seconds]
Haaninjo has joined #dri-devel
frieder has quit [Remote host closed the connection]
<karolherbst>
jekstrand: what is a bit odd is that I get two image_size ops, one 32 and the other 64 bit
<karolherbst>
I guess somebody already did something there
<karolherbst>
but maybe there is a sync issue or whatever weird stuff is going
<karolherbst>
but I'd assume the kernel to just crash on hw anyway then
<karolherbst>
jekstrand: are the int or conversion tests running? those are usually pretty trivial
<karolherbst>
mhh.. I hope my USE_HOST_PTR emulation isn't broken, but I did do a run on iris with always using the shadow buffers and that worked fine
<karolherbst>
the CTS kind of uses USE_HOST_PTR all over the place though
<karolherbst>
jekstrand: didn't you had a patch for math_brute_force isnormal somewhere? or was that airlied?
<jekstrand>
karolherbst: I've not touched that one
<karolherbst>
I think I will go down the spirv-link hell...
<karolherbst>
that's like 1 fail and 11 crashes
aravind has quit [Ping timeout: 480 seconds]
<karolherbst>
but I think I will add a workaround to vtn so we don't depend on a fixed one
<karolherbst>
shouldn't be too hard
<zmike>
dcbaker: nice work on the release
<dcbaker>
thanks!
<dcbaker>
btw, could you look at the top commit of the staging/22.0 branch? I had to do some manual fixups on that
<zmike>
looking
<zmike>
did you get that llvmpipe patch into the 22.0 branch?
<dcbaker>
I'
<dcbaker>
what's in the staging is what's in right now
<dcbaker>
I'm trying to work through my backlog of patches right now
<dcbaker>
I unfortunately have a lot of them
<zmike>
dcbaker: it looks like it hasn't landed then
<zmike>
please make sure the next 22.0 release doesn't go out without "gallivm/sample: detect if rho is inf or nan and flush to zero"
<zmike>
this is needed for conformance submissions
<zmike>
and yeah that fixup looks 👍
<dcbaker>
cool, I'll get the gallivm/sample patch in next
MajorBiscuit has joined #dri-devel
tjmercier has joined #dri-devel
<jekstrand>
karolherbst: Can I specify a list of tests to run?
<karolherbst>
jekstrand: yeah, kind of
<karolherbst>
-i buffers
<karolherbst>
not sure if I implemented subtests
<jekstrand>
Ok, if I run fpmath_float fpmath_float2 fpmath_float4, it fails in FP_ADD float4.
<karolherbst>
:(
<jekstrand>
But they all pass individually
<jekstrand>
So state's getting messed up somewhere
<karolherbst>
oh no
<jekstrand>
Uh oh... Now they all passed
<karolherbst>
sounds like memory corruptions or something
<jekstrand>
Quite possibly
* jekstrand
runs with valgrind
<jekstrand>
Valgrind on Arm... wah wah...
<daniels>
it works fine
<jekstrand>
Oh, I'm sure it works correctly. You just have to wait for it.
<HdkR>
jekstrand: Time for an M1Ultra? :P
<jekstrand>
HdkR: I keep telling people I'll buy an M1 once someone finishes writing the GPU kernel driver for it.
<jekstrand>
And, no, I'm not going to sign up for that.
<jekstrand>
Nor am I making any promisses about signing up once there's a kernel driver.
<jekstrand>
But it's not compelling so long as the options are MacOS vs. llvmpipe.
<HdkR>
It's true, even a VM is a bit of a pain
<jekstrand>
Once drm_agx.ko is alive and well, then it might be a compelling platform to hack on.
<karolherbst>
mhhh
<karolherbst>
annoying
* jekstrand
should probably use ubsan... it's faster.
<jekstrand>
karolherbst: ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
<jekstrand>
karolherbst: :'(
<karolherbst>
:(
<jekstrand>
karolherbst: Are you not enabling disk cache for clc loading?
<karolherbst>
I do
<jekstrand>
hrm...
<karolherbst>
why?
<jekstrand>
I'm seeing SPIR-V warnings on every test startup
<karolherbst>
weird
<karolherbst>
I am using the drivers disk_cache in case that makes a difference
<jekstrand>
karolherbst: Looks like panfrost isn't giving you a disk cache
<karolherbst>
I am very sad about this
<jekstrand>
Yeah, panfrost doesn't disk cache. :-(
<jekstrand>
This is sad
<dcbaker>
pepp: I'm looking at "3c3a8f853d gallium/tc: zero alloc transfers" for 22.0, but I'm not sure it applies, since the tc storage PR isn't in 22.0. Should I pull that series, or just forget about that patch?
<jekstrand>
It doesn't take taht long to build libclc but it's still a tad annoying
<karolherbst>
yeah...
<karolherbst>
I don't particular like the way we convert libclc to nir, but it is a device specific thing, so I don't want to use a rusticl internal disk_cache for that where I am sure we would't mess it up
<jekstrand>
Running fpmath_float fpmath_float2 fpmath_float4 seems to fail about 1 in 5 or maybe a little less often.
<karolherbst>
but I do plan to wire up OpenCL C to spir-v caching at some point
<jekstrand>
yeah
tlwoerner has joined #dri-devel
<karolherbst>
uhh.. I honestly have no good idea on how to fix this spirv-link stuff inside mesa :( My rough plan was to check if all needed variables were processed in vtn_handle_entry_point when calling another spir-v entrypoint, but ugh...
<karolherbst>
OpVariables are already processed at this point
<karolherbst>
mhhh, maybe not?
<karolherbst>
ahh no, it already is
<karolherbst>
guess fixing it inside spirv-link is the easier path (I hope)
garrison has joined #dri-devel
i-garrison has quit [Read error: Connection reset by peer]
* jekstrand
wonders if there's something funny with synchronization and compute jobs
MajorBiscuit has quit [Ping timeout: 480 seconds]
oneforall2 has joined #dri-devel
* jekstrand
is skeptical of panfrost_fence_finish
<jekstrand>
Nah. It's fine. A little weird but fine.
<pepp>
dcbaker: the bug predates the tc storage MR but it wasn't visible because the only app using this feature was viewperf. I think that pulling the 2 commits from !15298 makes sense
gawin has joined #dri-devel
MajorBiscuit has joined #dri-devel
pallavim_ has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.4]
pallavim has quit [Ping timeout: 480 seconds]
oilofparaf has joined #dri-devel
oilofparaf has left #dri-devel [#dri-devel]
ngcortes has joined #dri-devel
ds` has quit [Quit: ...]
ds` has joined #dri-devel
pallavim has joined #dri-devel
<dcbaker>
pepp: sounds good, thanks
jkrzyszt_ has quit [Ping timeout: 480 seconds]
pallavim_ has quit [Ping timeout: 480 seconds]
mszyprow has joined #dri-devel
<karolherbst>
jekstrand: does it work with clover?
<jekstrand>
karolherbst: Haven't tried.
<karolherbst>
ehh wait.. then you'd have to do this nir serialized stuff :)
<karolherbst>
:(
<jekstrand>
Yeah
<jekstrand>
And I think I'm hitting a panfrost bug somewhere.
<karolherbst>
potentially
<jekstrand>
None of what I'm seeing looks like a rusticl bug
<karolherbst>
or just some implicit gallium requierement I am not following
<jekstrand>
Not with how well things are working or inirs.
<karolherbst>
yeah.. probably
<alyssa>
jekstrand: is panfrost supposed to use a disk cache nobody told me
<karolherbst>
could be that I don't set correct MEM_FLAGS or something weird, or panfrost doesn't sync on the correct combinatin or other random things :/
<jekstrand>
alyssa: "supposed" is a strong word. I'd generally recommend it.
<alyssa>
What does common do for the driver and what does the driver have to do?
<alyssa>
and are there docs anywhere?
<karolherbst>
alyssa: you just serialize your shader and cache it
<karolherbst>
that's essentially it
<jekstrand>
alyssa: src/util/disk_cache.h. It's better documented than most of Mesa. :-/
<anholt>
jekstrand: that doesn't help make sense of how it fits in gallium drivers, unfortunately.
<alyssa>
* WARNING: 3rd party applications might be reading the cache item metadata. * Do not change these values without making the change widely known. * Please contact Valve developers and make them aware of this change.
<karolherbst>
last patch is the driver stuff
<alyssa>
this does not inspire confident
<alyssa>
anholt: ^^ that
<anholt>
alyssa: first step is you hook up the screen disk_cache. you have to do it in the driver, because you have to mix in your driver config knobs that might affect frontend shader compiles to the build id.
<alyssa>
hrumble
<karolherbst>
alyssa: that's just the metadata we add to cached entries internally
<karolherbst>
alyssa: steam makes use of those
<anholt>
with the screen stuff hooked up, mesa/st gets to cache the output of the frontend compiler->nir path.
* alyssa
reads the reference implementation (v3d_disk_cache)
<karolherbst>
yeah.. first step is just initing the cache and configure a proper hash key thingy
<karolherbst>
alyssa: very very high level overview is.. store your compiler result in an uint8_t array and have a function doing the reverse :)
<anholt>
alyssa: sp-disk-cache of my tree has a trivial version of doing it.
<jekstrand>
Hrm... The pandecode.dump for all three of my compute dispatches has the same Push uniforms pointer. That seems a bit fishy.
<jekstrand>
I would expect there to be a bit of ring buffering going on
<alyssa>
anholt: karolherbst: ack, thank you
<alyssa>
jekstrand: no ring buffer, all transient memory (ie. allocated off the batch) is freed when the batch is freed
<alyssa>
and then the BO is returned to the BO cache
<jekstrand>
alyssa: So I may just be getting the same BO over and over again?
<jekstrand>
Hrm...
<alyssa>
and compute jobs get put in their own batch right now for simplicity (this could be optimized)
<karolherbst>
what I and Mark have done for nouveau was to 1. split structs into input and outputs ones, 2. write a hash functions for the input 3. write serialize/deserialize 4. hook it up
<alyssa>
yes, if you're submitting the same stuff over and over
mclasen has quit [Ping timeout: 480 seconds]
<jekstrand>
alyssa: Ok, that makes sense then.
mbrost has quit [Ping timeout: 480 seconds]
<karolherbst>
mhhh
<karolherbst>
that's ehhh.
<karolherbst>
wait..
<alyssa>
yes, this means that there are false deps between batches.
<jekstrand>
woo
<alyssa>
might be leaving some perf on the table. don't know.
<karolherbst>
yeah I guess that can break rusticl then :)
<karolherbst>
jekstrand: I could imagine that calls to create_compute_state could overwrite the content of the bo without the hardware executes stuff, no?
<karolherbst>
but mhh
<karolherbst>
yeah I guess this can happen if there is no sync point in between
<karolherbst>
jekstrand: you could try a ctx.flush().wait(); after launch_grid and see if that changes anything?
<alyssa>
shouldn't happen, it's serialized in the kernel
<alyssa>
there is a flush after lunch grid in panfrost
<karolherbst>
yeah.. but I also unbind the entire state directly after launch_grid...
<jekstrand>
PAN_MESA_DEBUG=sync seems to make the fails go away
<karolherbst>
so wait might be important
<karolherbst>
I copied clover design here, which I still don't like :)
<alyssa>
jekstrand: That's... odd.
<jekstrand>
I'm starting to think our clEnqueueReadBuffer is racing with the kernel
ybogdano has quit [Ping timeout: 480 seconds]
<alyssa>
Plausible
<alyssa>
depending on what transfer flags you use, gallium can optimize away your correctness ;)
<karolherbst>
MAP_READ_WRITE
<jekstrand>
pan_set_global_binding is calling panfrost_batch_write_rsrc so it should be getting a write fence set on it in the kernel ioctl
<jekstrand>
Let me double-check that
<karolherbst>
at some point I will optimize setting all those flags :)
<alyssa>
karolherbst: sync? unsync? etc
<karolherbst>
potentially PIPE_MAP_UNSYNCHRONIZED on non blocking maps
<karolherbst>
but we do flush and wait later
<karolherbst>
but clEnqueueReadBuffer is synced
<karolherbst>
so it's just READ_WRITE
<alyssa>
ok..
<karolherbst>
should be just READ really, but...
<karolherbst>
I don't fight those bugs yet
<karolherbst>
*want to
mvlad has quit [Remote host closed the connection]
<karolherbst>
" SPIRV-Headers was not found - please checkout a copy under external/." I already don't want to :D
<jekstrand>
karolherbst: What are you building now?
<karolherbst>
spirv-tools
<karolherbst>
the linker is buggy :(
<jekstrand>
:(
<karolherbst>
yeah...
<karolherbst>
I thought we could work around it in mesa, but... it's complicated
mclasen has joined #dri-devel
mbrost has joined #dri-devel
columbarius has joined #dri-devel
co1umbarius has quit [Ping timeout: 480 seconds]
nchery has quit [Ping timeout: 480 seconds]
nchery has joined #dri-devel
<karolherbst>
" error: ‘spvValidate’ is not a member of ‘spvtools’; did you mean ‘spvValidate’?"
<karolherbst>
it tries to tell me I shouldn't use the namespace thingy
<jekstrand>
Ugh... Why don't GDB watchpoints work on this box?!?
<jekstrand>
Or maybe they do and they're just that slow? idk.
<karolherbst>
watchpoints are this kind of thing I set 5 times until I set them correctly
fw400 has joined #dri-devel
<karolherbst>
but yeah
<karolherbst>
watchpoints make things slow :)
fw400 has left #dri-devel [#dri-devel]
Duke`` has quit [Ping timeout: 480 seconds]
<jekstrand>
someone is removing BOs from my batch
<jekstrand>
I think that's why it's failing
<karolherbst>
jekstrand: did you try flush().wait() after launch_grid and/or disable unbinding all the stuff?
<jekstrand>
karolherbst: unbinding doesn't matter
<karolherbst>
I hope you are right :)
<jekstrand>
karolherbst: set_global_binding with resources == NULL is a no-op on panfrost
<karolherbst>
ahh
<alyssa>
..how would BOs get removed from a batch
<jekstrand>
idk
<jekstrand>
I tried to set a watchpoint on batch->num_bos but GDB hates me
<karolherbst>
jekstrand: btw, did you figure out why clamping is broken?
<jekstrand>
karolherbst: Nope
<jekstrand>
Those tests are evil
mbrost has quit [Read error: Connection reset by peer]
<karolherbst>
yeah...
<jenatali>
Yep
<karolherbst>
jenatali: I will figure out the divide fails btw
<karolherbst>
eh
<karolherbst>
jekstrand: ^^
<karolherbst>
jenatali: btw.. I think you could move to CL 3.0 by now :D
<jenatali>
With LLVM 14 or 15, right?
<karolherbst>
I have a fix for that linking issue
<karolherbst>
but yeah.. it might require newer llvm as well
<jenatali>
Yeah. I'll get around to it... eventually...
<jenatali>
I hope
<karolherbst>
I don't think it's much work tbh
<jenatali>
Yeah but it's still a big context switch from my current priorities
<karolherbst>
just some new APIs, but nothing really new
<karolherbst>
I see
<jenatali>
I've implemented CL3.0, but only exposing CL C 1.2
<jenatali>
So the APIs are done
<alyssa>
rusticl dx12 when
<karolherbst>
alyssa: I was already wondering if I should push it through zink
* karolherbst
hides
<jenatali>
It's not out of the question
<alyssa>
jenatali: The question was for jekstrand, unless you said yes ;-p
<jenatali>
Just, I don't have nearly enough time in the day, and my personal time for this kind of stuff has completely evaporated
<karolherbst>
but I don't think it would be a very good fit tbh
<karolherbst>
it's so gallium specific
<jekstrand>
TBH, once you've got the Mesa compiler, gallium doesn't buy you that much
<jenatali>
^^
<karolherbst>
sure
<karolherbst>
but most of the code isn't interfacing with the mesa compiler
<jenatali>
The CL API surface area is so tiny compared to the compiler infrastructure
<karolherbst>
yeah
<karolherbst>
but what I meant is, if rusticl would run on dx12, we'd essentially would write it from scratch as most things would have to change...
<karolherbst>
maybe the api validation could stay
* karolherbst
steals scale_fdiv
<jenatali>
Oh I assumed alyssa meant on the d3d12 gallium backend. I don't see any point giving rusticl a direct DX12 backend
<karolherbst>
jenatali: how do you want to implement set_global_bindings
<karolherbst>
I am not doing globals to ssbo lowering
<jenatali>
Yeah I'd want them as ssbos
<karolherbst>
I guess we would have to take some of the passes from src/microsoft and have some flips
<karolherbst>
maybe that could work, but...
<jenatali>
Yeah. Or else move stuff from the frontend into the backend
<jenatali>
But doesn't seem worth it. We're happy with our frontend for now
<karolherbst>
yeah, and I don't want to bother backends with random stuff
<karolherbst>
basing it on top of zink on the other hand....
<alyssa>
jenatali: That is what I meant yes
gawin has joined #dri-devel
<karolherbst>
I see we'd need some vulkan extensions
<jenatali>
karolherbst: I assume zink would also want it lowered to ssbo
<karolherbst>
but honestly.. we could just allow kernel spir-vs, do the 64 bit buffer stuff and that would be it
<jenatali>
Or I guess you could do it with the BDA extensions
<karolherbst>
or new extensions
<karolherbst>
just push in the spir-v kernels
<jenatali>
But also then you're just rewriting clspv :)
<jenatali>
Er, clvk?
<karolherbst>
I wouldn't
<alyssa>
consonants sure do
<karolherbst>
their idea was to not fix vulkan
<jekstrand>
Yeah, you'd just emit a pile of bda
<karolherbst>
ohh, so bdas are good enough for CL buffers?
<jenatali>
"Fix..." their idea was to fix CL by making it work on Vulkan :)
<karolherbst>
:D
<karolherbst>
yeah well
<karolherbst>
does anybody use it?
<jekstrand>
anybody use what?
<karolherbst>
clvk
<jenatali>
CL's already pretty niche I feel like. I know they had a target use for clvk but I don't remember what it is. And I doubt they have many more customers
<jekstrand>
I don't know about clvk but there's at least one major app doing serious compute work shipping on clspv
<karolherbst>
honestly.. I think using zink is probably the best option here and just add a new extension for spirv kernels
<karolherbst>
ahh
<karolherbst>
jekstrand: but clspv is just kernel spirv to vulkan spirv, right?
<jenatali>
Pretty much
<jekstrand>
yup
<karolherbst>
yeah I guess if that fits your use case
<karolherbst>
anyway, I don't have any plans wiht rusticl anyway, I just wanted to learn rust :D
* jekstrand
wants Mesa to have a competent compute story
* jekstrand
still isn't quite sure what story that will be
<karolherbst>
yeah...
<karolherbst>
I think a CL stack at least as good as intels or AMDs would be a good starting point
<karolherbst>
(making it pointless to their install theirs that is)
<jekstrand>
alyssa: What splits vec3 loads/stores into scalar?
gawin has quit [Ping timeout: 480 seconds]
<jekstrand>
Maybe LLVM is doing that?
<jekstrand>
that's believable
<karolherbst>
mhh
<karolherbst>
scale_fdiv doesn't fix ERROR: divide: -16777216.000000 ulp error at {-0x1.fffffep+127, -0x1.fffffep+127}: *0x1p+0 vs. 0x0p+0 (0x00000000) at index: 198 :(
<karolherbst>
but that's a subnormal, isn't it?
<jekstrand>
I think that's just 1 vs 0
<karolherbst>
yeah, but the inputs
<karolherbst>
-0x1.fffffep+127 / -0x1.fffffep+127
<jekstrand>
We may need to use the actual fdiv opcode
<karolherbst>
ohh, you don't?
<jekstrand>
Nope
<jekstrand>
We don't for GL
<karolherbst>
how can I flip it?
<jekstrand>
We do mul+rcp
mbrost_ has quit [Remote host closed the connection]
<karolherbst>
ahh
<jenatali>
karolherbst: We don't support denormals, we always flush them
<karolherbst>
yeah, that won't work
<jekstrand>
It'll require some compiler work.
mbrost_ has joined #dri-devel
<karolherbst>
okay
<jekstrand>
Not too much but more than zero
<karolherbst>
right
<karolherbst>
yeah, we want real fdiv for CL :)
<jekstrand>
We don't for GL because you can often CSE the RCP
<karolherbst>
now what's up with "mem_host_flags mem_host_write_only_image"
<karolherbst>
sure
<karolherbst>
and a real fdiv is slow
<jekstrand>
idk that it's that much slower than rcp
<jekstrand>
But it's slower than fmul
<jekstrand>
By a lot
<karolherbst>
yeah
<karolherbst>
not that luxmark perf tanks when we start using it :D
* jekstrand
hates this kernel
<jekstrand>
llvm turns a very simple char3 load/store into a giant pile of garbage
<karolherbst>
classic llvm
h0tc0d3 has joined #dri-devel
<karolherbst>
why though?
<jekstrand>
idk
<jekstrand>
Why does LLVM do anything it does?!?
<karolherbst>
maybe it's not aligned
<karolherbst>
because llvm is a smart compiler always doing the right thing, everybody knows that
<karolherbst>
ehhh the remaining fails are painful
<jekstrand>
Yeah, LLVM is definitely checking for alignment and emitting 64-bit load/store if it can.
<karolherbst>
can we disable that?
<jekstrand>
I think it decided this test was some poor soul's hand-rolled memcpy. :joy:
<karolherbst>
basically I just want llvm to give us the plain thing, no idiotic postprocessing :D
<karolherbst>
lol
<karolherbst>
we do pass -O0 into llvm
<jekstrand>
Maybe we need to pass -O-0?
<jekstrand>
In any case, panfrost should be able to compile this even if it is stupid
mbrost_ has quit [Ping timeout: 480 seconds]
<karolherbst>
I still don't want llvm to do silly things :D
* jekstrand
views that as inevitable
<karolherbst>
jekstrand: which test is it btw?
<jekstrand>
test_conversions char_char
<jekstrand>
char3 case, to be particular
<karolherbst>
what the...
<jekstrand>
int_int also fails
<jekstrand>
So it's not an 8-bit problem
<karolherbst>
ahh no
<karolherbst>
it's not llvm
<karolherbst>
it's the CTS
<jekstrand>
wha?
<karolherbst>
the CTS special cased vec3 and uses vloadn
<jekstrand>
Of course it did...
<jekstrand>
So it's probably a vloadn problem
<karolherbst>
potentially
morphis has quit [Ping timeout: 480 seconds]
<karolherbst>
vloadn doesn't guarentee alignment
* jekstrand
looks at vloadn
<jekstrand>
right
morphis has joined #dri-devel
<karolherbst>
well besides what the base type needs
<jekstrand>
Do we implement vloadn ourselves or use libclc?
<jekstrand>
we do it ourselves
<karolherbst>
yeah
<karolherbst>
not sure if libclc has an impl
<jekstrand>
it appears to
<jekstrand>
Ugh... Yeah, this is all in the CTS test
<jekstrand>
Ok, this makes a lot more sense
<jekstrand>
ironically, bifrost seems to have load/store_i24 opcodes. :-/
<jekstrand>
I wonder if get_global_size is just wrong
h0tc0d3 has quit [Remote host closed the connection]
mbrost_ has joined #dri-devel
neonking has quit [Ping timeout: 480 seconds]
maxzor has quit [Ping timeout: 480 seconds]
<karolherbst>
jekstrand: ohh so you include bound checks?
<karolherbst>
eh wait
<karolherbst>
get_global_size is this CL thing :D
maxzor has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
<karolherbst>
soo.. why is api clone_kernel crashing...
khfeng has joined #dri-devel
<karolherbst>
jekstrand: I can't use nir_opt_dead_write_vars before inlining, can I?