jvesely has quit [Remote host closed the connection]
glennk has joined #dri-devel
Jeremy_Rand_Talos has quit [Remote host closed the connection]
Jeremy_Rand_Talos has joined #dri-devel
sudeepd has joined #dri-devel
Dark-Show has joined #dri-devel
Dark-Show has quit [Remote host closed the connection]
heat is now known as Guest2255
Guest2255 has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
sudeepd has quit [Ping timeout: 480 seconds]
jurassic has quit []
Company has quit [Quit: Leaving]
cernico has joined #dri-devel
<cernico>
mwk released a stinky fart , can be felt from distance, your life will be complex after terrorizing me, airlied is nose in mwk's butt again investigating the components eaten. Computer users need to suffer on this endless butt and nose game.
<cernico>
This is pretty rough concept but based of the latency of instruction it orders things statically to get scheduled with order of lower stall counts, which technically never should be touched if you use fast computation paradigm.
<cernico>
i.e where instructions are all with same latency
a-865 has left #dri-devel [#dri-devel]
kts has joined #dri-devel
<cernico>
so those blocks do not do much of difference , it releases youngest instructions with lower latency grouped together that are independent, and more latent async instructions together. So that requires code that is harder to maintain.
kts has quit [Quit: Konversation terminated!]
<cernico>
they do a bit difference but it's possible to manually sort the procedures for runtime without those control codes too. So regardless of whether it is control codes or just manual reordering those procedures would never have to be modified.
<cernico>
or maintained per output of end user program, it's always kept the same
macromorgan has quit [Ping timeout: 480 seconds]
rcf has quit [Quit: WeeChat 3.8]
rcf has joined #dri-devel
kts has joined #dri-devel
<cernico>
I do not know what compilation analyses it requires if there is no control codes either, so they are handy but the programmers needs to know the hw as ones own 10fingers to get any benefits so control codes order the independent instructions so shortest clocks first, so it could seek more aggressively to forward
<cernico>
and on paradigm where every alu is add or sub i.e minimal latency it needs to do an ordering for the sw scheduler only, and that i know only how it goes, this one is pretty simple
<cernico>
so there is a memory that gets read, memory and alu operation are separate, lds and global memory instructions for an example share the same alu units except for memory fetch only where there is no alu involved
<cernico>
so in that case, you should only be worried about issuing the independent instructions first.
kts has quit [Quit: Konversation terminated!]
Duke`` has joined #dri-devel
neobrain is now known as neobrain_
kts has joined #dri-devel
neobrain_ is now known as neobrain
<cernico>
so as register pressure never exists, all the lds is pointless by then and you only call global data share instructions to registers , that just internally somehow use the only meaningful part of lds, which is not compulsory , cause sw can do it as well
<cernico>
for an example it's useful to preload to lds at times
<cernico>
so all in all, programmers need something like llvm ir or unoptimzed machine layer ir to do the needed trick
<cernico>
the mesa optimized machine code is pointless
cheako has quit [Quit: Connection closed for inactivity]
<cernico>
since you do not program a nuclear reactor or centrifuge, you program a gpu that is beneficial to be kept always on best performance
simondnnsn has quit [Read error: Connection reset by peer]
ninjaaaaa has quit [Write error: connection closed]
ninjaaaaa has joined #dri-devel
simondnnsn has joined #dri-devel
<cernico>
yep, it's the memories only use as of now, to preload some global data, and all the global memory alus and lds is useless
<cernico>
otherwise
<cernico>
it only wins you on very deep loads you can skip the memory loads per iteration, as if you have very long pixel shader and iterates over the size of lds cache
<cernico>
you win with preloading all except the first iteration worth of memory throughput
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
<cernico>
i am just saying that all the scheduling related complexity and LDS mubuf alus are functionality wise worthless
<cernico>
so is 2.1 opencl , it's pointless unless you utlise the hardware as frequency generator based of chips states somehow
<cernico>
in nuclear world you would for instance just need some of those frequencies
<cernico>
but not as a computer user
glennk has quit [Ping timeout: 480 seconds]
<cernico>
so in general my research is over in the era of using computers at home and energy savings and performance issues
<cernico>
the complexity to get performance is not there, i do not know well howto utilize the default complex modes, except frequency generating things,, such as piezo crystals in shockwave and ultrasound era etc.
<cernico>
it's yes those modes are quite complex , i do not know these usages extremely well either
jsa has joined #dri-devel
junaid has joined #dri-devel
heat has quit [Remote host closed the connection]
<cernico>
who would use a gpu to do sound related things, makes no sense overall, they want just some diode based ic to do things alike, and in the world of security hardware states are not so good random number generators either, and for heaters gpus are still too expensive too :)
heat has joined #dri-devel
<cernico>
there is no point to communicate on the grounds of solving things that have been solved, or on problems that never existed, there is just no reason to allocate time for so silly thing
<cernico>
the correct would be to discuss problems that can be solved in every day life
<cernico>
since you are not honest as to why your criminal career started , and why you come to my territory to terror me, then i say it's not allowed and after 2.5 years of getting that on my vacation i do not accept excuses
<cernico>
and the final result is, that real problems you do not want to confess or discuss with me
<cernico>
and on the framerate issues i do not want to play the slave or clown to communicate over with
<cernico>
every proper employing company would clear that performance problem with 1month if they needed to or wanted, and that type of artist am I too, i am capable of moving things into that position too.
davispuh has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
amarsh04 has quit []
<cernico>
now if you want to still do it , and those short integer permutes, i am busy a bit, someone might want to communicate with scala autohors or read the documentation, i left my research into position where i could generate the dictionaries per alu operations if that has not been done, but i do not have time to test in this month
<cernico>
They already express those things in around solutions of rdf and even in pure scala core libs.
<cernico>
that object orientated programming language has support for such things
kts has quit [Quit: Konversation terminated!]
amarsh04 has joined #dri-devel
<cernico>
any AI models or such type of solutions i have not found that suite, in other words you just need xml dictionary of openmath likely and are already on the run there towards good performance
<cernico>
if not, those permutes can be generated with loops
<cernico>
those are meant so like signal collect and other you generate a dictionary for mathematical operation evaluators, and the backend just packs them
<cernico>
but i know only one such backend and it is currently compliant with 32bit
<cernico>
there are more , but elias fano does not do that by default
<cernico>
elias fano does no compression the way i finally looked
<cernico>
it just can store many small ints in the same machine word
cernico was kicked from #dri-devel by ChanServ [You are not permitted on this channel]
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
simon-perretta-img has quit [Ping timeout: 480 seconds]
glennk has joined #dri-devel
simon-perretta-img has joined #dri-devel
apinheiro has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
ungeskriptet is now known as Guest2269
ungeskriptet has joined #dri-devel
Guest2269 has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
sima has joined #dri-devel
hansg has joined #dri-devel
sukuna has quit [Ping timeout: 480 seconds]
pcercuei has joined #dri-devel
warpme has joined #dri-devel
dorcaslitunya has joined #dri-devel
JohnnyonFlame has quit [Read error: Connection reset by peer]
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
davispuh has joined #dri-devel
guludo has joined #dri-devel
guludo has quit []
passimoto has joined #dri-devel
kts has joined #dri-devel
dorcaslitunya has quit [Remote host closed the connection]
dorcaslitunya has joined #dri-devel
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
frankbinns has joined #dri-devel
riteo_ has joined #dri-devel
riteo has quit [Ping timeout: 480 seconds]
kts has quit [Read error: Connection reset by peer]
mripard has quit [Quit: mripard]
kts has joined #dri-devel
jsa has quit [Ping timeout: 480 seconds]
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
apinheiro has quit [Quit: Leaving]
warpme has quit []
junaid has quit [Remote host closed the connection]
dorcaslitunya has quit [Remote host closed the connection]
glennk has quit [Ping timeout: 480 seconds]
heat is now known as Guest2286
Guest2286 has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
cheako has joined #dri-devel
urja has quit [Read error: Connection reset by peer]
urja has joined #dri-devel
yyds has quit [Remote host closed the connection]
heat is now known as Guest2288
Guest2288 has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
iive has joined #dri-devel
<passimoto>
https://kampersanda.github.io/pdf/InnovateData2017.pdf , so i drop another link, they claim to generate the dictionaries faster than others, however the approach is not the same as the neat method i described, training times would not matter if all alus are pretrained, so that would be compatible with mine with a hack, cause i offer very quick assembly of dicts by core method too already, but all links are listed in current theirs approach,
<passimoto>
which is just treating the strings as small ints. They describe some tech behind that.
anujp has quit [Ping timeout: 480 seconds]
hansg has quit [Quit: Leaving]
kts has quit [Ping timeout: 480 seconds]
glennk has joined #dri-devel
kts has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
<karolherbst>
mareko: once it's merged I'll probably have to clean up a few casts, but it shouldn't cause any direct issues (except when drivers start to expose higher limits maybe?)
simon-perretta-img has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
<DavidHeidelberg>
<built-in>:1:10: fatal error: 'opencl-c.h' where I did made mistake: mesa/clover compilation or app compilation?
<karolherbst>
probably mesa
<karolherbst>
the packaging is quite broken in debian for all of this and you might have to reinstall some stuff, because reasons
kzd has joined #dri-devel
<DavidHeidelberg>
hmm, except adding "-Dgallium-opencl=icd " I needed most likelydo something else (not Debian, Alpine)
<karolherbst>
ahh..
<karolherbst>
ohh, that's with clover?
<karolherbst>
mhhh
<karolherbst>
I have no idea :) that part is pretty broken in clover
<karolherbst>
do you have such a file installed?
<DavidHeidelberg>
I wanted to test on freedreno these 2/4/8 types w/ Clover
<DavidHeidelberg>
I installed it, but probably not a dep for Mesa build
<DavidHeidelberg>
(after building Mesa)
<karolherbst>
there is a `CLANG_RESOURCE_DIR` thing for clover and I suspect it points to the wrong directory
<karolherbst>
I've fixed it probably 5 times for src/compiler/clc already and clover didn't recieved any of those
<DavidHeidelberg>
let me check :)
<DavidHeidelberg>
thx
<karolherbst>
it's a mess because every distribution does sometihng different and the way we used to do it was working based on wishful thinking :)
<DavidHeidelberg>
Alpine originally didn't even build clover
<DavidHeidelberg>
on Alpine it's not llvm 17, but 17.0.3 in the path
<DavidHeidelberg>
*17.0.6,but doesn't matter :D
<Ristovski>
mareko: AMD_TEST=testdmaperf on gfx90c (Ryzen 5700G) causes a "no-retry page fault" when it hits "VRAM->VRAM CS x2". Last logged value is always under 4096K (replicated three times), which is extremely low (<100) before it dies. Tested on 6.7.0 up to 6.7.9. testdmaperf log: https://bpa.st/raw/FMBA, page fault: https://bpa.st/raw/74KQ. Do I file this under mesa or drm/amd?
<Ristovski>
A couple days back I triggered a nearly identical page fault messing around with AMD_pinned_memory
<mareko>
drm/amd
<Ristovski>
oh, "under 4096K" as in - it always dies on the 4096K test with CS x2
<mareko>
also mention that it's a trivial memcpy compute shader, and that it works on other gfx9 chips
<Ristovski>
Will do. Anything else I can quickly try/debug that might yield useful info?
<karolherbst>
DavidHeidelberg: yeah.... it was all super fragile and I hope I fixed it inside clc enough that we don't have any bugs anymore
<Ristovski>
mareko: One more question while I have you here, "fill->VRAM ,CS x64" (nothing under L2p) for example returns 534639 for 131072KB. How are such speeds even possible?
<mareko>
Ristovski: the L2 cache is faster than memory
<Ristovski>
Oh so it _is_ using L2 cache? I had assumed those tests are uncached
<mareko>
yes
<Ristovski>
That explains it, thanks :)
<mareko>
actually, it should not be using L2
<mareko>
but it looks like it's using it
<Ristovski>
Are file attachments borked on freedesktop gitlab? The button doesn't seem to do anything :P
<mareko>
the high result is bogus
<Ristovski>
It sure seems like it - the L2 cache on this APU isn't even that big
<Ristovski>
Seems like CS x4 is fine, but then x8 and above are bogus above 2048K, idk
<mareko>
same for Navi31, the test seems buggy
<mareko>
or rewriting the shader to NIR broke it
macromorgan has joined #dri-devel
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
bolson has joined #dri-devel
<karolherbst>
mareko: should I fix the rust part of your MR or will you manage?
anujp has joined #dri-devel
i509vcb has quit []
benjaminl has quit [Remote host closed the connection]
benjaminl has joined #dri-devel
DodoGTA has quit [Quit: DodoGTA]
DodoGTA has joined #dri-devel
<mareko>
karolherbst: no idea how to fix that
<karolherbst>
mareko: I already posted a patch to the MR
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
cheako has quit [Quit: Connection closed for inactivity]
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
riteo_ is now known as riteo
warpme has quit []
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
KDDLb has joined #dri-devel
KDDLb has left #dri-devel [#dri-devel]
KDDLb has joined #dri-devel
KDDLb has left #dri-devel [#dri-devel]
KDDLb has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
sima has quit [Ping timeout: 480 seconds]
KDDLB0 has joined #dri-devel
KDDLb has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
nick_ has joined #dri-devel
nick_ has quit []
Passage8775 has joined #dri-devel
<DavidHeidelberg>
karolherbst: up to you, I thought that these failing tests are important, if it's nothing serious, then I'm not the person who will use Clover+freedreno :D
<karolherbst>
looks like those are all image related anyway
<karolherbst>
might be real freedreno bugs even
<karolherbst>
like something busted with image arrays?