<karolherbst>
forget the benchmark, I want to see the simulation
<karolherbst>
I wonder if with rusticl it's faster than ROCm...
<karolherbst>
I think I have one of the GPUs listed there
<DottorLeo>
karolherbst: i asked you in the past that in theory rusticl can use simultaneous different gpus, right?
<DottorLeo>
lets say nvidia+amd+igpu intel
<karolherbst>
yeah... though I think the memory model is a bit broken on this
<karolherbst>
not sure, but it e.g. worked with luxmark just fine
<karolherbst>
so if you see any issues there feel free to report it
<DottorLeo>
so that software could use ALL the computing units from a PC, CPU+all the gpus? :D
<karolherbst>
yeah
<karolherbst>
just
<karolherbst>
llvmpipe is slower than pocl :D
<karolherbst>
it got better, but there are still some issues left to resolve
<karolherbst>
llvmpipe is really bad at utilizing the CPU
<DottorLeo>
wow you know that when rusticl will be conformant on all the platforms it will be seen as the second reborn of OpenCL? :D
<karolherbst>
maybe?
<karolherbst>
my goal is just to make _some_ compute API available to the linux desktop
<karolherbst>
as in: people can rely on it being functional
<karolherbst>
at this point, only Nvidias and Intels stack is what I'd consider somewhat functional
<DottorLeo>
it's interesting why the author of FLuidX3D used OpenCL instead of only CUDA, not only just for the multivendor support. He says that if done right, openCL on Nvidia is as good as CUDA
<karolherbst>
yeah, it is
<karolherbst>
most code is just bad
<karolherbst>
and most runtimes are
<karolherbst>
nvidia's CL impl is really the best so far
<karolherbst>
but they also have the compiler to back it up
<karolherbst>
I mean.. there are computational heavy benchmarks where rusticl outperforms ROCm by 20%
<karolherbst>
it's a bit disappointing to be honest
<DottorLeo>
why?
<karolherbst>
I expected a serious business company like AMD would put more effort into this
<penguin42>
but there again I've got kernels where ROCm wins for me; so shrug
<DottorLeo>
Maybe rusticl will be used on some GPU computation farms instead of ROCM, i think that for the final client doesn't matter, only the speed and correctness matter :)
<karolherbst>
penguin42: yeah.. but rusticl isn't optimized at all
<DottorLeo>
did you tried it with blender?
<karolherbst>
blender dropped CL
<karolherbst>
so.. it's either CUDA or HIP
<karolherbst>
there is a HIP on CL implementation, but it's not ready for blender
<DottorLeo>
yeah, sorry the HIP implementation
<karolherbst>
but rusticl also still has huge issues so it's still gonna take a while
<DottorLeo>
and the SyCL from intel?
<karolherbst>
it's progressing
<karolherbst>
the issue with SyCL from intel is, that they produce invalid spir-v
<karolherbst>
a lot
aravind has joined #dri-devel
<DottorLeo>
karolherbst: one last thing, when you merged the Optional image support, it also enables it on r600? it was one of the missing things on clover for that cards
<karolherbst>
ohh, images were supported since day 1
<karolherbst>
the optional stuff are just more formats
<DottorLeo>
yes but when you merge a feature, it is enabled for all the supported platforms that uses rusticl? Sorry, i'm trying to understand how it works when you add new stuff to rusticl :)
<karolherbst>
yeah
<karolherbst>
sometimes there are driver bits to it
<karolherbst>
but we try to be accurate in the features.txt file
<DottorLeo>
because @gerddie said on the MR request for r600 that images support was missing
<karolherbst>
doesn't list r600 yet, because it's broken
<karolherbst>
ehh.. should be fine
<karolherbst>
there is a bit missing for r600, I just don't have the hardware to test it
<penguin42>
karolherbst: If you need a test run on r600 I can do that for you, this <--- laptop has oene
<DottorLeo>
i should have an old 5450 (cedar) to test it :D
<DottorLeo>
@illwieckz has probably all the R600 cards :D
<DottorLeo>
it's impressive
<penguin42>
'AMD Thames [Radeon HD 7550M/7570M/7650M]'
<karolherbst>
amazing.. Intel's CL stack ooms my system
<penguin42>
(Very oddly configured HP Elitebook I found in a 2nd hand shop; nice i7, 8G RAM, Radeon, every interface you can imagine, and a shit 1366x768 display...]
<karolherbst>
hashcat benchmarks in the most silly way though
<karolherbst>
so whatever
<karolherbst>
penguin42: yeah.. so somebody needs to implement the `get_compute_info` hook
fcarrijo has joined #dri-devel
* penguin42
tries to get himself past his existing patches first
<karolherbst>
the key to the compute info stuff is really calculating how many threads can be launched
Putti has joined #dri-devel
DottorLeo has quit [Quit: Konversation terminated!]
fcarrijo has quit []
aravind has quit []
pallavim has joined #dri-devel
Company has joined #dri-devel
frankbinns1 has joined #dri-devel
Putti has quit [Ping timeout: 480 seconds]
Putti has joined #dri-devel
frankbinns has quit [Ping timeout: 480 seconds]
junaid has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
glennk has quit [Ping timeout: 480 seconds]
<penguin42>
karolherbst: Nora is asking for 'Btw, please add the CL_PLATFORM_HOST_TIMER_RESOLUTION and CL_PLATFORM_HOST_TIMER_RESOLUTION device info queries in api/device.rs' aren't they in platform.rs query?
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
<karolherbst>
yeah looks like CL_PLATFORM_HOST_TIMER_RESOLUTION is indeed a platform query
junaid has quit [Remote host closed the connection]
kasper93 has joined #dri-devel
glennk has joined #dri-devel
rasterman has joined #dri-devel
idr has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: brb]
heat has joined #dri-devel
pcercuei has joined #dri-devel
djbw has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
Thymo has joined #dri-devel
Thymo has quit [Ping timeout: 480 seconds]
Thymo has joined #dri-devel
<karolherbst>
jenatali: sooo.. I looked a bit into what LLVM passes actually help: EarlyCSEPass roughly 10% cut in spir-v size, MergeFunctions roughly a 50% cut in spir-v size. 10% cut is great, so enabling EarlyCSE is probably what we should do. However, MergeFunctions can generate function pointers and the translater only allows us to use it with SPV_INTEL_function_pointers
<karolherbst>
but 50% in reduction is kinda neat... but a lot of LLVM passes are just generating random stuff we can't handle, so maybe it's better to really rely on spirv-opt here instead :/
heat has quit [Read error: No route to host]
heat_ has joined #dri-devel
<jenatali>
I don't know that I see much point in running spirv-opt, vtn is pretty lightweight I feel like
<karolherbst>
it's more about reducing the size of the spir-v
<karolherbst>
like.. hashcat generates a 7MB spirv by default
<karolherbst>
for one hash function
<karolherbst>
but smaller spirv also means less time spend in clc_parse_spirv, which.. kinda makes up a huge amount of CPU overhead at that size
<karolherbst>
but smaller spirv also helps with the disk cache and evertyhing
<karolherbst>
and I also suspect the compilation to be quicker the earlier we drop massive amount of code
<karolherbst>
but anyway...
<karolherbst>
would be cool to just be able to use MergeFunctions on the LLVM IR level
<karolherbst>
but... it generates function pointers :(
<karolherbst>
sometimes
cleverca22[m] has quit []
<karolherbst>
another problem is, that linking spirvs isn't cheap either :/ and even with a single spirv file we kinda have to do it, because... random nonsense
Thymo has quit [Ping timeout: 480 seconds]
kzd has joined #dri-devel
<karolherbst>
mhh GVN seems to also help a lot, nice
smiles_1111 has quit [Ping timeout: 480 seconds]
rasterman has quit [Remote host closed the connection]
konstantin has joined #dri-devel
konstantin_ has quit [Ping timeout: 480 seconds]
fab has quit [Quit: fab]
heat_ has quit [Read error: Connection reset by peer]
heat_ has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
bgs has quit [Remote host closed the connection]
jewins has joined #dri-devel
sima has quit [Ping timeout: 480 seconds]
heat_ has quit [Remote host closed the connection]
heat_ has joined #dri-devel
<DemiMarie>
Is the compilation happening ahead of time or at runtime? If the latter, could LLVM IR be translated directly to NIR, without going through SPIR-V?
<karolherbst>
no
<karolherbst>
the point is to use spirv
<DemiMarie>
Ah
<DemiMarie>
sorry, I was missing some context
ceoarrrrrrrrrrrrrrr^ has quit [Ping timeout: 480 seconds]