ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
<pinchartl>
C++ also has parentheses :D
<karolherbst>
I mean, you can return nothing with ()
<karolherbst>
so you have "()" as the return type
<pinchartl>
but can you store a () in a field of a structure ?
<clever>
in haskell, that value is called `unit`
<karolherbst>
not very useful in functions itself, but for generic arguments
<karolherbst>
like Result<(), cl_int> e.g.
<clever>
it is both a value and a type
<clever>
and only something of that type, can hold that value
tzimmermann has quit [Ping timeout: 480 seconds]
<karolherbst>
pinchartl: you don't store it, that's the point
<clever>
yeah, its effectively a 0 bit value, and it just vanishes at compile time
<karolherbst>
you it gets used as if
<karolherbst>
clever: nope
<karolherbst>
it's _no_ value
<HdkR>
Ends up being annoying for template magic code since you need to special case void returns, usually isn't a huge issue though
<clever>
in haskell, its the only value for that type
<karolherbst>
in rust it doesn't have a value
<pinchartl>
HdkR: exactly
<karolherbst>
you just "return ()"
<clever>
karolherbst: i think `Result<(), cl_int>` is like `Either () Int`, where you can store 2 different things, so its a type + value pair
<karolherbst>
clever: yeah
<karolherbst>
but it's used like an exception mechanism
<clever>
so the `()` itself, is a 0 bit value, that vanishes at compile time
<karolherbst>
so if your function returns Result
<clever>
but the type tag, saying it was such, still persists
<karolherbst>
you can var.some_method()?; and if that method has an "error" your function returns immediately
<pinchartl>
HdkR: usually you can get away with a partial specialization, but sometimes the class(es) that need partial specialization are large, so it results in quite a bit of duplicated code
<karolherbst>
but it's just an example where you use it
<karolherbst>
pinchartl: that sucks :p
<pinchartl>
the interesting part of the game is to design the classes so that partial specialization is only needed in small classes
<karolherbst>
I do think that rust at least solved this issue, so you really don't have to duplicate any code unless you really want to
<clever>
karolherbst: can rust macro's effectively run code at compile time, on compile-time constants? and then generate code?
<karolherbst>
clever: with proc_macros yes
<pinchartl>
clever: I hope so, otherwise it's not a macro if it's all done at runtime :-)
<karolherbst>
there you essentially write a compiler plugin and can do whatever you want
<clever>
ah, thats like TemplateHaskell
<karolherbst>
pinchartl: clever meant more than that
<karolherbst>
I think
<clever>
as an example, i need a string that contains v32st
gpoo has joined #dri-devel
<clever>
the 32, comes from the number of bits in a type T (32bit int)
<clever>
the problem, is that a C macro, cant evaluate sizeof(), and turn the number into a string
<karolherbst>
ahh yeah, a proc_macro has the ability to inspect the types you hand in
<clever>
and you cant printf in a C macro
<clever>
the printf happens at runtime, and then its too late to pass it to the assembler
<karolherbst>
"Procedural macros allow you to run code at compile time that operates over Rust syntax, both consuming and producing Rust syntax. You can sort of think of procedural macros as functions from an AST to another AST."
<clever>
i just barely managed to make it work, `v%[width]st` with inline asm, and then flag width as an immediate
<clever>
so gcc thinks your doing `mov r0, 42`
<clever>
but due to the wonky format string, youve tricked it into doing v32st
<karolherbst>
yeah...
<clever>
and i got lucky
<clever>
all of the docs, say to do H, HX, and HY
<clever>
but by chance, the assembler also supports H8, H16, and H32 as aliases
<clever>
there is no way that i know of, to generate the H/HX/HY in the inline asm, based on a sizeof
<karolherbst>
it's a bit sad that cpp is a different language or stage of the compilation
<karolherbst>
that... complicates things
<clever>
so i would have to write it 3 times, and let -O delete the if statements and carve it down to 1
<clever>
there is one trick i considered, but i dont think it works
<clever>
gcc, lets you do string concat at compile time, puts("foo" "bar")
<clever>
no operators, it just concats the strings for you
<karolherbst>
yeah..
<karolherbst>
it's wonky to use
<karolherbst>
normally you need like double indirection and stuff
<clever>
but, what about puts("foo" (42 > 5 ? "bar" : "baz"));
<clever>
would this work??
<karolherbst>
with the infamous str macro it might
<karolherbst>
ehh wait
<karolherbst>
yeah
<karolherbst>
should
<clever>
str macro?
<karolherbst>
#define str(x) x
<karolherbst>
or so
<clever>
that looks like a no-op
<clever>
there is also the #x thing, that ive used on occasion
<karolherbst>
but you do have this comparison thing there...
<karolherbst>
I think there are some limitations though :D
<karolherbst>
but you also have a SUB
<pinchartl>
karolherbst: is there any standard coding style guidelines for rust to avoid being too clever with syntax and making things unreadable ?
<karolherbst>
so you could do IF(SUB(MIN(42, 5), 5), "bar", "baz") ?
<karolherbst>
pinchartl: yes, there is clippy, which is a tool which annoys you about your stupid syntax
<karolherbst>
and it enforces rust owns guidlines
<clever>
karolherbst: for an actual example, i need to take the sizeof an int, and then return "H", "HY", or "HX", based on if its 8bit, 16bit, or 32bit
<karolherbst>
clever: mhhhhhh
<clever>
karolherbst: and the string must be in a form that is compatible with "foo" "bar" concats
<karolherbst>
clever: I don't think the preprocessor allows this kind of introspection on C types as... it doesn't know anything about C
<clever>
and yes, that sounds like it will get unreadable quickly
<pinchartl>
I mean something more high-level. I'm thinking about the google C++ guidelines for instance (but they mix low-level and high-level stuff)
<clever>
karolherbst: exactly why i think its impossible
<karolherbst>
clever: you could specify the bit size instead though
<karolherbst>
and generate int32_t and such out of it
<karolherbst>
stdint.h does exist!
<clever>
karolherbst: the function must also receive an uint32_t[16] as an argument, and i wanted the types to enforce that
<clever>
so it auto-detects the type of whatever you throw at it, and picks the right bit-width opcode
<karolherbst>
clever: uhhh... mhh, you know that C functions can't do arrays? :p
<karolherbst>
uint32_t[16] is just a plain pointer
<clever>
pointer, close enough
<karolherbst>
yeah, but you want a size argument to write safe code
<karolherbst>
can't rely on the array size
<clever>
the size of the type the array points to
<karolherbst>
it's 4 for uint32_t[16] :)
<clever>
yep
<clever>
template <typename T> static inline void vst(int x, int y, T *dst, bool xinc, bool yinc, int rep, bool horizontal, int stride) {
<karolherbst>
the only sane thing is to not use those
<clever>
for this, T must be 8bit/16bit/32bit, and it always points to an array of 16 (not enforced) objects
<karolherbst>
ahh
<karolherbst>
well, all I can think of is that you could abuse stdint.h here
<clever>
there are a total of $rep sets of 16, the start of each seperated by $stride elements
<clever>
nothing is enforced, beyond detecting how wide T is, and picking the right opcode
<clever>
a stride of 16, would be a densely packed array, of 16 * rep elements
<clever>
a stride of 32, would be a sparse array, of say 32 * rep elements, pick 16, skip 16, repeat
<clever>
there are ~3 major steps to this whole problem
<clever>
1: writting these macro/template functions (twice, asm and c based)
<karolherbst>
clever: ohh wait.. I thought about your issue from a wrong perspective
<karolherbst>
so..
<clever>
2: giving that header to somebody that knows vector computation better, and having them write some algo using it (the c based version, lets them debug and test on x86 or arm)
<clever>
3: teaching gcc how to do 2 on its own
<karolherbst>
you have three versions of the same thing... one for 8/16/32 bits, and from your C code you want the compiler to pick the correct one, right?
<clever>
the problem, is that i dont want to write the same function 3 times
<karolherbst>
right
<clever>
i want the compiler to fill in the blanks
<karolherbst>
sooo
<karolherbst>
there is a way
<karolherbst>
you can use a macro to generate the three functions, with different names, and then write a _Generic wrapper which chooses the correct one depending on the C type
<clever>
basically, the difference between A mult(B b, C c) { return b * c; }
<clever>
and uint32_t mult88(uint8_t a, uint8_t b} { return a*b; } uint32_t mult816(uint8_t a, uint16_t b) { return a * b; } ........
<clever>
karolherbst: well, the 2 operations can each be of 3 different widhts, as can the return type, so thats now 9 functions
<clever>
and its supposed to be fast, so whatever happens, it must inline down to 1 opcode, as-if none of that was ever there to begin with
<clever>
s/operations/operands/
<karolherbst>
mhh
<clever>
each operand, and the destination, also need an XY coord, and a direction
<karolherbst>
this encouples selection of the correct code from writing the correct code
<karolherbst>
so you write all the different functions (with macros or whatever) and use _Generic to select the correct one based on the input parameters
<clever>
this opcode, says that operand A is a row of 16bit ints, starting at 0,0, and spanning to 0,15 (row,column)
<clever>
when repeating, the row# will increment
<clever>
operand B, is an immediate of `2`
<clever>
treat both operands as signed ints, multiply, and then write it back to the same location in the matrix
<clever>
repeat that whole operation, a total of 64 times
<clever>
karolherbst: that all make sense?
<karolherbst>
I mean.. I don't need to know the specifics here :p you essentially just want to have static inline funcs doing some assembly, right? and you want to generate a pile of those functions depending on the types
<karolherbst>
and you essentially wnat function overloading when using it, no?
<clever>
the problem, is that operand B, can either be a vector like HX(0,0), an immediate like 42, or a scalar register like r0
<karolherbst>
mhhh
<clever>
can such a thing ever be specified, without exploding your function into needing 5 arguments?
<clever>
per argument
<karolherbst>
yes actually yes
<karolherbst>
sooo
<clever>
or would it be better to encode that into the symbol name
<karolherbst>
there is macro magic to detect a constant
<clever>
mult_vector_immediate() for ex
<karolherbst>
the kernel has it
<clever>
that solves 2 of the cases
<clever>
so it can switch between immediate and scalar
<karolherbst>
yes
<clever>
if its a constante, just immediate it into the asm
<clever>
if its not, load it into a reg, and use that reg
<clever>
so now i just need 2 variants, vector or scalar
adjtm has quit [Remote host closed the connection]
adjtm has joined #dri-devel
<clever>
vector would have a direction (row vs column), bit width, origin coord, (oh yeah) offset coord!, and which coord to auto-increment
columbarius has joined #dri-devel
<clever>
i forgot to mention offsets, you can do HX(0,0) + r0
<clever>
HX(0,0) uses immediates to encode a coord, but then +r0, passes a (packed) offset to add to those immediates
<clever>
so you can change the matrix coords at runtiime
co1umbarius has quit [Ping timeout: 480 seconds]
<karolherbst>
yeah.. no idea here :D probably somebody could be able to come up with some macro magic, but I think it also depends on what you are willing to accept as the input syntax here
<clever>
yeah
<karolherbst>
if it's just a variable vs constant in C
<karolherbst>
then that's easy
<karolherbst>
variables have different types and you can select on that with _Generic
<clever>
the bigger problem, is that it really needs a register allocator
<karolherbst>
:D
<karolherbst>
well
<karolherbst>
that's a ... hard issue
<clever>
if there was some way to automatically pick a part of the matrix, and assign a symbol to that location
<clever>
and then pass that symbol into the function
<clever>
then things would be far simpler
<clever>
that feems like something mesa and shaders are kinda solving?
<clever>
basically, i need a type like vec16_32_t, and behind the scenes, it just contains a matrix coord+direction
<clever>
and at compile time, those coords get input into the asm, and the variable effectively vanishes
<clever>
then i can just have `vec16_32_t a,b,c; a = b * c;`
<clever>
and using c++ operators, i could implement my own * ....
<clever>
and operators, let you implement several versions, for each type!
<clever>
karolherbst: this might actually solve things!
<karolherbst>
maybe
<clever>
the tricky part, is getting enough things to be constexpr
<karolherbst>
but yeah.. operator overloading is quite powerful
<clever>
if my class has a constructor, and that constructor takes 2 ints
<clever>
can those ints become a constexpr for an operator?
<karolherbst>
clever: people seem to like "if constexpre (...)"
<karolherbst>
*constexpr
<clever>
ah yeah, that forces it to be a constexpr, and i think it will fail to compile if it isnt
<karolherbst>
it's more about checking compile time information
<karolherbst>
like existing methods or... if something is a constant
<karolherbst>
etc...
<karolherbst>
and select code at compile time
<clever>
yeah
<karolherbst>
might help
<airlied>
anholt_, MrCooper : glsl tests hitting timeouts, did we ever work out what that was? got 3-4s locally hitting 30s in CI occasionally
<holmanb>
clever: I skipped a big chunk of the convo, sorry if this is no longer relevant. I think you could probably do something like sizeof(typof(x)) to get your 8/16/32 sizes? Or do I misunderstand the issue?
<clever>
holmanb: the problem, is that i need to then pick a string, H/HY/HX, based on the size
<clever>
holmanb: and then insert that string into some inline asm
JohnnyonFlame has quit [Ping timeout: 480 seconds]
camus has quit [Remote host closed the connection]
pendingchaos has quit [Remote host closed the connection]
camus has joined #dri-devel
pendingchaos has joined #dri-devel
camus1 has joined #dri-devel
camus1 has quit []
camus has quit [Ping timeout: 480 seconds]
lemonzest has joined #dri-devel
gpoo has quit [Ping timeout: 480 seconds]
camus has joined #dri-devel
gpoo has joined #dri-devel
<holmanb>
clever: "so it auto-detects the type of whatever you throw at it, and picks the right bit-width opcode" -> sounds like you are trying to implement a template.... inside a template
<clever>
holmanb: heh :D
camus1 has joined #dri-devel
<karolherbst>
but honestly.. do you know why rust will kill C and C++? the compiler tells you how to fix your errors and warnings
camus has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
pendingchaos has quit [Ping timeout: 480 seconds]
Company has quit [Quit: Leaving]
<holmanb>
karolherbst: I want rust to fail to compile if there is a logic error, without humans telling it what logic is correct. Once it can do that, no other language will stand a chance ;)
<clever>
karolherbst: ive noticed newer gcc's giving better errors, for some things
<karolherbst>
clever: yeah...... they are trying
<karolherbst>
but it doesn't have this priority
<karolherbst>
holmanb: well.. it does find _some_ logic errors
<holmanb>
Heh, I know :) I like rust
<karolherbst>
but yeah... but going beyond the obvious stuff is.. well
<holmanb>
to read it ;)
<holmanb>
clever: you seem to be in search of runtime features at preprocess/compile time - short of a hacky templatized source code generation thing I doubt you're going to get that as a standard C/C++ toolchain feature. If I were trying to do what you are in stock C/C++ I'd probably accept the redundant implementations for readability, or get comfortable with generating functions from macros and hope that no one else ever has
<clever>
holmanb: modifying gcc directly, might be a simpler task, lol
<clever>
holmanb: so it just emits the right asm, based on the types present
lemonzest has quit [Quit: WeeChat 3.2]
<holmanb>
clever: I've done some preprocessor function generation, so lmk if you want some examples. It's not that bad to write once you've seen the general pattern, but debugging is not fun and it confounds code navigation tools (IDE/ctags/cscope).
<clever>
hence the 2021-09-06 00:21:47 < clever> holmanb: modifying gcc directly, might be a simpler task, lol
<karolherbst>
yeah..
<karolherbst>
it pushes the responsibility down to gcc, which is mostly a good thing :p
<clever>
gcc knows what types are involved, and is running code at compile-time
<clever>
so it can just make a choice, and emit the asm directly into the function body
<karolherbst>
that reminds me.. we also have those weirdo vector instructions on nv hw, but never bothered to actually use those
<karolherbst>
nvidia also doesn't... at least for shaders
<karolherbst>
maybe they are used for shader based video decoding/encoding accel?
<karolherbst>
dunno
<clever>
on the QPU (seperate from the above), it is a vector core, wearing 16? scalar core masks
<clever>
if you dont look close enough, you could be fooled into thinking its just a scalar core
<karolherbst>
PTX does call them "video instructions"
<clever>
the QPU gets away with treating a vector core like 16 scalar cores, because thats where the fragment shaders run
<clever>
you have a very high chance of running the identical program 1000's of times
<holmanb>
clever: lol, idk what to even say then
<karolherbst>
yeah.. that's the weird thing.. the ISA ist close to comepletly scalar, except those video instructions
<clever>
so it can just schedule 16 instances to a single vector unit, feed it 16 sets of varyings + uniforms, and then all of the docs claim its scalar
<clever>
holmanb: the only point where the vector nature becomes visible (that i have noticed), is that conditional execution has an extra "any lane" and "all lanes" flag
Lyude has quit [Quit: WeeChat 3.2]
<clever>
i think the condition impacts every lane
<clever>
or maybe thats just for when you want a conditional jump?
<clever>
mesa refers to the QPU as vc4 in most places
<clever>
if you want to see what mesa thinks of that
Lyude has joined #dri-devel
<clever>
karolherbst: the VPU is mostly scalar, true scalar, with vector tacked on the side
<clever>
while the QPU is fake scalar, with a vector core behind the curtain
aravind has joined #dri-devel
sturmmann has joined #dri-devel
APic has quit [Read error: Connection reset by peer]
APic has joined #dri-devel
Duke`` has joined #dri-devel
adjtm is now known as Guest6462
adjtm has joined #dri-devel
Guest6462 has quit [Ping timeout: 480 seconds]
agners has quit [Read error: Connection reset by peer]
camus1 has quit []
camus has joined #dri-devel
kevintang has joined #dri-devel
itoral has joined #dri-devel
camus has quit []
camus has joined #dri-devel
gpoo has quit [Ping timeout: 480 seconds]
thelounge53 has quit [Ping timeout: 480 seconds]
mlankhorst has joined #dri-devel
Hi-Angel has joined #dri-devel
lemonzest has joined #dri-devel
NiksDev has joined #dri-devel
xlei has quit [Ping timeout: 480 seconds]
alanc has quit [Remote host closed the connection]
kevintang has quit [Remote host closed the connection]
kevintang has joined #dri-devel
tursulin has joined #dri-devel
hansg has joined #dri-devel
pcercuei has joined #dri-devel
<airlied>
MrCooper: uggh do we have a way to not run those tests?
<MrCooper>
can't someone just fix them?
<airlied>
the work fine outside CI, rewriting standalone compiler seems a big ask to enable unrelated functionality
<airlied>
ill take a look tomorrow and see how long it'll take
thellstrom has quit [Ping timeout: 480 seconds]
<MrCooper>
does it really require a rewrite? Is it currently using stdin for something else?
<MrCooper>
another possibility might be some kind of trickery to make a pipe stand in for a file
pendingchaos has joined #dri-devel
<icecream95>
Wouldn't just putting the tempfile in a directory backed by tmpfs (maybe dir='/dev/shm'?) be enough?
<icecream95>
Otherwise, opening files/pipes without CLOEXEC and passing /proc/self/fd/N to the subprocess should work
plat has joined #dri-devel
hch12907 has joined #dri-devel
hch12907_ has quit [Ping timeout: 480 seconds]
<MrCooper>
icecream95: not sure tmpfs would help (the issue is the CI runners being extremely loaded), worth a shot though I guess
<MrCooper>
one issue with the latter idea is it's Python code which runs on Windows as well
Lucretia has quit []
Lucretia has joined #dri-devel
hch12907_ has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
plat_ has joined #dri-devel
plat has quit [Read error: Connection reset by peer]
JohnnyonFlame has joined #dri-devel
thellstrom has joined #dri-devel
hch12907_ has quit [Remote host closed the connection]
aravind has quit []
thelounge53 has joined #dri-devel
thelounge53 is now known as alatiera
xlei has joined #dri-devel
rsalvaterra_ has joined #dri-devel
rsalvaterra has quit [Ping timeout: 480 seconds]
kevintang has quit [Ping timeout: 480 seconds]
hch12907_ has joined #dri-devel
gpoo has joined #dri-devel
Peste_Bubonica has joined #dri-devel
Akari has quit [Ping timeout: 480 seconds]
rsalvaterra_ has quit []
rsalvaterra has joined #dri-devel
<daniels>
not that those tests are particularly quick anyway, the compiler+asan test takes 80s to execute on my laptop which is pretty ludicrous for a single test
<HdkR>
Is it valgrind asan or llvm asan?
Ahuj has quit [Ping timeout: 480 seconds]
Company has joined #dri-devel
itoral has quit [Remote host closed the connection]
f11f12 has joined #dri-devel
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
bcarvalho has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
FireBurn has joined #dri-devel
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
mlankhorst_ has joined #dri-devel
mlankhorst has quit [Read error: Connection reset by peer]
Peste_Bubonica has quit [Ping timeout: 480 seconds]
sturmmann has quit [Ping timeout: 480 seconds]
shashanks has quit [Ping timeout: 480 seconds]
JohnnyonFlame has joined #dri-devel
FireBurn has quit [Quit: Konversation terminated!]
f11f12 has quit [Quit: Leaving]
hansg has quit [Remote host closed the connection]
jewins has joined #dri-devel
iive has joined #dri-devel
frieder has quit [Remote host closed the connection]
Akari has joined #dri-devel
jkrzyszt has quit [Ping timeout: 480 seconds]
bcarvalho has quit [Ping timeout: 480 seconds]
shashanks has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
<robclark>
danvet: is shrinker not allowed to acquire gem resv lock? That would be somewhat inconvenient..