MrCooper has quit [Remote host closed the connection]
MrCooper has joined #dri-devel
xroumegue has quit [Ping timeout: 480 seconds]
MrCooper has quit [Remote host closed the connection]
ngcortes has quit [Read error: Connection reset by peer]
MrCooper has joined #dri-devel
xroumegue has joined #dri-devel
shashanks has joined #dri-devel
off^ has quit [Remote host closed the connection]
smiles_1111 has joined #dri-devel
shashanks has quit [Ping timeout: 480 seconds]
MrCooper has quit [Remote host closed the connection]
MrCooper has joined #dri-devel
akselmo has quit [Remote host closed the connection]
akselmo has joined #dri-devel
MrCooper has quit [Remote host closed the connection]
MrCooper has joined #dri-devel
donaldrobson has joined #dri-devel
jagan_ has quit [Remote host closed the connection]
penguin42 has joined #dri-devel
MrCooper has quit [Remote host closed the connection]
MrCooper has joined #dri-devel
MrCooper has quit [Remote host closed the connection]
jagan_ has joined #dri-devel
MrCooper has joined #dri-devel
smiles_1111 has quit [Read error: Connection reset by peer]
epony has quit [autokilled: This host violated network policy and has been banned. Mail support@oftc.net if you think this is in error. (2023-07-14 11:13:52)]
<pq>
emersion, maybe Emil Velikov might be a better person to look at DRM device minor number stuff than me. I've never looked into that side.
<emersion>
me neither! :P
<emersion>
but noted
<pq>
IIRC Emil (xexaxo) has done things around libdrm device identification
Danyil has joined #dri-devel
<Danyil>
Hello people, I was wondering if there's some progress or info (if any) regarding hardware accelerated GPU scheduling on linux. I haven't seen any projects utilizing it.
<Danyil>
It makes a noticeable difference on windows desktop in terms or input latency so I wanted to test it on linux but haven't found any info about it...
donaldrobson has quit [Remote host closed the connection]
ahajda has joined #dri-devel
heat has joined #dri-devel
Danyil has quit [Read error: Connection reset by peer]
krushia has joined #dri-devel
<alyssa>
nir_opt_preamble why >_>
<alyssa>
I'm trying to refine this to reconstruct if's when needed but not when not and oof this is starting to feel like homework for my algorithms class :p
donaldrobson has joined #dri-devel
bmodem has joined #dri-devel
camus has quit []
ahajda has quit [Remote host closed the connection]
<eric_engestrom>
alyssa: nice, thanks gfxstrand 🤗
kzd has joined #dri-devel
bmodem has quit [Quit: bmodem]
Haaninjo has joined #dri-devel
tursulin has quit [Ping timeout: 480 seconds]
swalker_ has joined #dri-devel
swalker_ is now known as Guest5976
Guest5953 has quit [Remote host closed the connection]
MrCooper has quit [Remote host closed the connection]
swalker__ has joined #dri-devel
kts has joined #dri-devel
MrCooper has joined #dri-devel
Guest5976 has quit [Ping timeout: 480 seconds]
fxkamd has joined #dri-devel
MrCooper has quit [Remote host closed the connection]
MrCooper has joined #dri-devel
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
MrCooper has quit [Remote host closed the connection]
MrCooper has joined #dri-devel
benjamin1 has quit [Ping timeout: 480 seconds]
MrCooper has quit [Remote host closed the connection]
MrCooper has joined #dri-devel
benjamin1 has joined #dri-devel
benjamin1 has quit [Ping timeout: 480 seconds]
MrCooper has quit [Remote host closed the connection]
ezequielg has joined #dri-devel
MrCooper has joined #dri-devel
donaldrobson has quit [Remote host closed the connection]
kts has quit [Quit: Konversation terminated!]
apinheiro has quit [Ping timeout: 480 seconds]
apinheiro has joined #dri-devel
idr has joined #dri-devel
benjamin1 has joined #dri-devel
<karolherbst>
doesn't AMD hardware have an instruction to fetch the workgroup size?
<karolherbst>
so.. I have to redefine nir_intrinsic_load_workgroup_size to be the workgroup_size of the _current_ grid item, not any. So that the last_block (which might be smaller) can return its actual size
<karolherbst>
but radeonsi currently lowers it and just sets it to the normal block size, which is actually not what I need here
<karolherbst>
dschuermann: maybe you have any ideas what can be done here and if AMD hardware does have some instructions to deal with last_block stuff?
<mareko>
karolherbst: the shader doesn't have that information
MrCooper has quit [Remote host closed the connection]
<karolherbst>
lowering that could get pretty messy if you have to figure out inside the shader if you are one of the last blocks
<karolherbst>
because that's a thing on each dimension
<mareko>
the last block feature was not meant to be used by frontends
<karolherbst>
well.. I need it for CL
<karolherbst>
and I already have it working on llvmpipe and radeonsi, just that the workgroup_size reported is wrong
<karolherbst>
on radeonsi that is
cmichael has quit [Quit: Leaving]
MrCooper has joined #dri-devel
<alyssa>
what.. was it for?
<karolherbst>
anyway, all of that nonsense is lowered in the frontend (and optimized paths as today are taking if last block is disabled), so it's zero cost if it's not used
<alyssa>
if not frontends?
<karolherbst>
_but_
<mareko>
driver blits
swalker__ has quit [Remote host closed the connection]
<karolherbst>
I rely on drivers to report propre workgroup sizes
<alyssa>
okie
<karolherbst>
well.. not anymore, because I'm planning to use it in a frontend
<karolherbst>
CL is just a bit annoying with this feature, as it has CLC queries to get the _current_ and the _enqueued_ workgroup size (and other things)
<karolherbst>
so I need both, just the enqueued one can be perfectly lowered in the frontend
<mareko>
you basically have to lower it except that you don't have to put the whole shader into a conditional block
<karolherbst>
lower what?
<mareko>
the last block
<karolherbst>
I sure won't lower it
<karolherbst>
well.. if radeonsi needs it lowered, it's radeonsis business
<karolherbst>
but I don't see why the frontend should lower it
<mareko>
whatever
<karolherbst>
there is hardware supporting it natively, so there is that
<alyssa>
karolherbst: tbh I'm on team "rusticl lowers it"
<karolherbst>
alyssa: last_block?
<alyssa>
yeah
<karolherbst>
why
<alyssa>
behind a compute cap
<karolherbst>
ahh
<alyssa>
because there's piles of hw that doesn't support it natively
<alyssa>
and it's not a thing in gl
<alyssa>
so either it's 1 lowering call in rusticl or N calls in every gallium driver that wants cl
<karolherbst>
it's an optional feature though
<alyssa>
in that case, compute cap and don't advertise on radeonsi?
<karolherbst>
well.. the hardware can actually do it
djbw has quit [Read error: Connection reset by peer]
<mareko>
"can"
<karolherbst>
so the restriction on AMD hardware is, that there are no proper interfaces for those system values?
<mareko>
it wouldn't be difficult to lower it in radeonsi
<karolherbst>
I mean, lowering the system value is a different story than lowering the entire feature
<alyssa>
lowering the whole feature is just "round up and put the shader in a big conditional", right?
<karolherbst>
no
<mareko>
yes
<karolherbst>
there is more to it
* alyssa
watches rock paper scissors rematch
<karolherbst>
you have to calculate the workgroup size according to the disabled work items
<karolherbst>
so a simple x * y * z thing won't do
<DemiMarie>
Danyil: I don’t believe any of the existing GPU drivers for Linux use firmware scheduling. The in-development Xe and Asahi drivers do use it.
<karolherbst>
also the local group id has to be in order and everything
<mareko>
need "load_last_workgroup_size" in radeonsi (easy), then a hw-agnostic NIR pass which lowers it
<mareko>
it seems easy
<karolherbst>
yeah, something like that I guess
djbw has joined #dri-devel
<karolherbst>
thing is, it's just a quite bit of code and it would impact everything using nir_intrinsic_load_workgroup_size where nir_intrinsic_load_enqueued_workgroup_size isn't a proper replacement
<karolherbst>
but if there is enough hardware which actually doesn't have either of those it's getting a bit messy and I might have to redesign things
<mareko>
messy how? all drivers just need load_last_wg_size and the NIR pass can do the rest
<karolherbst>
yeah.. and on some drivers wg_id also explodes into more code
<karolherbst>
but I guess if there is no better alternative that's what needs to happen
<karolherbst>
I just wished drivers/hardware would have a native system value for that
<mareko>
we don't even have a native value for num_wg
<mareko>
and wg_size
<karolherbst>
yeah, but that's cheap to put into a ubo
<mareko>
not cheap if the shader is tiny
<karolherbst>
just anything depending on the current block id is.. well.. I wished it would be in hardware
<karolherbst>
or push constants or whatever driver prefer to use there
<mareko>
user data SGPRs
<mareko>
any load would be slower
<karolherbst>
yeah.. I just forget that most hardware doesn't have UBOs with GPR access speed
<karolherbst>
I'd have to play around on nvidia as well wiht this feature, but I think nvidia has the stuff for it
<karolherbst>
iris will be interesting to figure out
<alyssa>
karolherbst: where are we at with deleting clover
<karolherbst>
somebody needs to figure out r600
<alyssa>
ah..
<mareko>
r600 is not compute hw
<alyssa>
there, official word from AMD, r600 is not compute hw
* alyssa
deletes clover
<karolherbst>
:D
<alyssa>
mareko: that's ok, mali isn't graphics hw ;)
<karolherbst>
there are apparently still users
<karolherbst>
and it kinda works, just somebody needs to figure out the remaining issues
<karolherbst>
probably a week of work? dunno
<mareko>
we should actually emulate the last block completely
<mareko>
in radeonsi
<karolherbst>
why?
<karolherbst>
seems to work good enough at least, or rather the shader header thing does what I need
<mareko>
there may be perf penalty in some hw
<karolherbst>
mhhh
<karolherbst>
I'd kinda prefer to have more data on that
<mareko>
it will be revealed eventually, now is not the time
<karolherbst>
I think for now it's probably fine to use whatever there is until someobdy has time to actually look into it
<karolherbst>
fair enough
<karolherbst>
the annoying part is simply that in CL if you explicitly compile a CL2.0 or CL3.0 kernel, the compiler has to assume the last_block feature will be used unless it's disabled by the application, so I kinda prefer to not have to add overhead it it's not actually needed
<mareko>
it's just isub, ieq, bcsel per dimension, and other-than-radeonsi drivers also need UBO loads
<karolherbst>
yeah.. I guess it's not that bad given how little num_workgroups is used (probably)
<karolherbst>
I can add code for it to lower_system_values, that's not the big problem here
<karolherbst>
just have to rethink all the lowering here
<karolherbst>
but anyway, that's for next week.
rasterman has quit [Quit: Gettin' stinky!]
tzimmermann has quit [Quit: Leaving]
apinheiro has quit [Quit: Leaving]
lynxeye has quit [Quit: Leaving.]
jagan_ has quit [Remote host closed the connection]
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit [Remote host closed the connection]
ngcortes has joined #dri-devel
AndrewR has joined #dri-devel
<AndrewR>
...it seems I compiled embree/luxcorerender for 32-bit Slackware. It even outputs something with
<karolherbst>
but yeah.. such an output is expected on the CPU if it's not fast enough
<karolherbst>
the quality of the image increases over time
<AndrewR>
karolherbst, for some reason (with both clover and rusticl active) Luxcorerender does not print opencl info w/o that variable set ...
<karolherbst>
it's a bit buggy...
<karolherbst>
luxcore doesn't query the devices correctly and then also has internal bugs, it's a bit of a problem... just only have one impl active and it mostly works
<AndrewR>
karolherbst, yeah .... but for now workaround seems to work for me :)
<karolherbst>
ohh right.. if clover doens't advertise devices it fails :)
Leopold_ has quit [Remote host closed the connection]
cambrian_invader has joined #dri-devel
Leopold_ has joined #dri-devel
alyssa has left #dri-devel [#dri-devel]
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
<cambrian_invader>
i509vcb has joined #dri-devel
f11f12 has joined #dri-devel
sima has quit [Ping timeout: 480 seconds]
Daanct12 has joined #dri-devel
Danct12 has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
pzanoni has quit [Quit: Coyote finally caught me]
pzanoni has joined #dri-devel
f11f12 has quit [Quit: Leaving]
Cyrinux94 has quit []
Cyrinux94 has joined #dri-devel
pzanoni_ has joined #dri-devel
fxkamd has quit []
heat_ has joined #dri-devel
heat has quit [Read error: No route to host]
dmitz has left #dri-devel [#dri-devel]
heat_ has quit [Remote host closed the connection]
simon-perretta-img has quit [Quit: Leaving]
egbert has quit [Ping timeout: 480 seconds]
<cambrian_invader>
sanity check: I'm getting EACCESS from DRM_IOCTL_GEM_OPEN because it doesn't have DRM_RENDER_ALLOW set and it's being called with the gpu_fd from lima_bo_import
<cambrian_invader>
the obvious fix is to use kms_fd instead
egbert has joined #dri-devel
<cambrian_invader>
but shouldn't this get caught by anyone who runs lima?
Ultrasauce has joined #dri-devel
omioceodeotm^ has joined #dri-devel
sauce has quit [Ping timeout: 480 seconds]
edt has joined #dri-devel
<edt>
Just installed a am5 mb with an rx7700 and a rx660xt gpu. When looking in nvtop, I often see both gpu (builtin and 6600) active. This happens in both game and video playback. I am using mesa 21.1.3 on linux 6.4.3. How is mesa utilizing the buildin gpu? and for what?
<edt>
thats a ryzen 7700
<penguin42>
edt: I don't know the answer, but does radeontop allow you to see actiivity on each?
<edt>
never figured out how to get radeontop to show in builtin gpu - nvtop generally gives better info
* penguin42
didn't realise nvtop did anything on Radeon
<penguin42>
edt: I see radeontop has an option -b for bus and -p for path
* penguin42
only has the one card
<edt>
-p /dev/dri/card0 dies, card1 shows the rx6600xt
<edt>
card0 is the builtin
<edt>
is suggest you try nvtop - surprised me on how well it works with radeon (radv)
vliaskov has quit [Remote host closed the connection]
<edt>
apparently nv in nvtop stands for Neat Videocard
<penguin42>
haha
<edt>
and it works for amd, intel and nvidia
<penguin42>
hmm very pretty
<HdkR>
nvtop's name changed once it gained support for more than NVIDIA
<HdkR>
It also supports Adreno :)
<edt>
Its interesting it shows both gpu(s) (buildin on ryzen 7700 (2 cus) rx6600xt (32 cus)) as used. I'd like to understand what is happening.
<edt>
btw I use two displays, both are connected to the the rx6600xt
<edt>
one via DP and the other via HDMI
<penguin42>
edt: So do you ever see the builtin doing anything on nvtop ?
<edt>
I just seen the gpu & memory traces moving up & down (like the other gpu but not matching it). Its probably good. An extra 2 cus probably help in some cases. I'd just like to understand how/why this happens. From what I read 'crossfire' does not work on linux. Looks like something like it does though...
<edt>
I just see...
<edt>
search a bit shows crossfire as replaced by xDMA and there are patches for xdma in linux 6.3. They are for AMD-Xilinx Alvep cards. Wonder if these patches are also helping/working here???