<Plagman>
it's running part of it on the gpu, but then runs on a single cpu thread for a long while and gives me ~10fps
<Plagman>
compared to 40 on CPU
Haaninjo has joined #dri-devel
coldfeet has quit [Remote host closed the connection]
simon-perretta-img has quit [Read error: Connection reset by peer]
simon-perretta-img has joined #dri-devel
simon-perretta-img has quit [Read error: Connection reset by peer]
konstantin_ has joined #dri-devel
konstantin is now known as Guest1852
konstantin_ is now known as konstantin
simon-perretta-img has joined #dri-devel
Guest1852 has quit [Ping timeout: 480 seconds]
pepp has quit [Ping timeout: 480 seconds]
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
fab has quit [Quit: fab]
moony has quit []
moony has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
sskras has quit []
<mareko>
Company: we provide a driver installer that installs Mesa on older distros of RHEL, SLES, and Ubuntu
<Company>
does that installer build its own llvm and Rust?
Duke`` has quit [Ping timeout: 480 seconds]
mljoke has joined #dri-devel
chloekek has quit [Remote host closed the connection]
sukuna has joined #dri-devel
sukuna1 has joined #dri-devel
sukuna has quit [Ping timeout: 480 seconds]
<karolherbst>
Plagman: is it any better with ROCm? Could also be just bad code, which is often the case with those AI/ML libs
<Plagman>
rocm instantly hangs my system so not really - i didn't get it working with it so far
<karolherbst>
:') sounds like rusticl is already better then
<Plagman>
bad code is likely, looking at a trace i see the actual gpu compute time is maybe 11ms total
<Plagman>
and all the other time might be spent in a slow readback
<karolherbst>
yeah.. the libs I've looked into often busy waited on the CPU or other crazy things
<Plagman>
seems like the opencl stuff was written for intel gpus primarily
<karolherbst>
ahh..
<karolherbst>
yeah, intel has kinda the best CL stack atm
<Plagman>
is there a way to force caching for allocations in rusticl somehow?
<karolherbst>
you mean like keeping a copy of the data in RAM?
<Plagman>
i'm still trying to trace the slow memmove()s
<Plagman>
more like the cache coherent flag on the mapping
<karolherbst>
ahh
riteo has quit [Remote host closed the connection]
<karolherbst>
if there is a special gallium flag I should set, that could help, but usually those things are kinda up to the driver otherwise
<karolherbst>
have you tried using zink?
<Plagman>
i tried, yeah - it runs one frame and then times out the gpu
<karolherbst>
mhh
<karolherbst>
are you using main or some release?
<Plagman>
it doesn't run directly on my host display because i'm using the amdgpu ddx so i had to point it to a gamescope display
<Plagman>
mesa is 24.0.5
<karolherbst>
do you have any more info on that program build failure btw? Not sure if "RUSTICL_DEBUG=program" already works on 24.0 or when I've added it, but often it's also helpful to run a build with asserts enabled to see why zink or other drivers are unhappy about things
<Plagman>
it seems non-fatal, so i'm guessing it's a test build to see if -cl-no-subgroup-ifp is supported
<karolherbst>
ahh
sima has quit [Ping timeout: 480 seconds]
<karolherbst>
maybe I should handle this flag then, if clang doesn't like it
riteo has joined #dri-devel
<airlied>
do we know what is ifp there?
<karolherbst>
independent forward progress
<karolherbst>
it's related to "CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS"
<Plagman>
fwiw trying to run that sample is real easy
<karolherbst>
yeah.. I could take a look tomorrow and see if it runs any better on iris or so
<Plagman>
from the face_detection_yunet of the repo above it's just `RUSTICL_ENABLE=radeonsi python demo.py`
<Plagman>
or building the cpp version works too
<Plagman>
the rest seems to just be working from arch packages of opencv/etc
<airlied>
ah CL_intel_subgroups related
<karolherbst>
yeah..
<karolherbst>
it's optional in CL 3.0 and I have no idea if I'm in the mood of wiring it up in gallium if nothing really needs it
<karolherbst>
so I'd just do nothing with that flag and prevent a compilation error
<airlied>
I wonder why intel added it, they must have some hw that can't do ifp
<karolherbst>
might be...
<karolherbst>
maybe I should ask Ben
<karolherbst>
uhh.. Ben Ashbaugh
<karolherbst>
Ben usually knows those things
<Plagman>
uhhh
<Plagman>
i'm guessing that finding that i'm clCreateKernel() after breaking into the program once it's been running for a while is like.. bad
<Plagman>
right?
<Plagman>
like that's a pipeline build equivalent or whatever?
<karolherbst>
shouldn't matter
<karolherbst>
nah... clCreateKernel is ran after all the shaders have been built
<Plagman>
ah ok
<karolherbst>
CL is a bit weird
<Plagman>
i know next to nothing about it so i thought it was a shader build
<karolherbst>
clCreateKernel is like.. creating an execution environment for a compiled entry pointer
<karolherbst>
*point
<karolherbst>
clBuildProgram and clLinkProgram will generate the binaries in rusticl. a "cl_kernel" is more like a thing holding the kernel input parameters (like function parameters) and is the interface to launch code
<Plagman>
yeah ok
<Plagman>
i forgot to mention i had to edit the sample to use CL, a one-liner edit: [cv.dnn.DNN_BACKEND_OPENCV, cv.dnn.DNN_TARGET_OPENCL],
<karolherbst>
mhh.. might be untested
<Plagman>
it's cool that it works!
<karolherbst>
yeah.. that's the whole idea
<Plagman>
clvk looks very similar as well, so app-side being dumb seems likely
<karolherbst>
annoying that zink doesn't work then. I should take a look tomorrow then why zink breaks down
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit [Remote host closed the connection]
<karolherbst>
Plagman: what is it using by default? CPU?
<Plagman>
yeah, seems like
<Plagman>
if the current compute runtime is any indication, maybe it's be ~90fps on my 6900XT
<Plagman>
vs. the 40FPS it gets on my monster CPU going wide on all cores
<karolherbst>
mhh..
<karolherbst>
I'd hope for a bigger difference tbh
<karolherbst>
Plagman: how much is the GPU idle when running that stuff?
<Plagman>
it is a big cpu to be fair
<Plagman>
it's like 10ms of compute on the gpu then idle, usage doesn't go above 10% in umr
<Plagman>
that's how i'm theorizing 90fps if it wasn't idling
<karolherbst>
oof
<karolherbst>
I wonder if the main CPU thread is just busy all the time?
<karolherbst>
but anyway... could also be just terrible offloading