<karolherbst>
something is broken and I have no idea what :)
co1umbarius has joined #dri-devel
<ecm`>
libEGL debug: EGL user error 0x300c (EGL_BAD_PARAMETER) in eglGetPlatformDisplay is the new error now
columbarius has quit [Ping timeout: 480 seconds]
benjaminl has joined #dri-devel
benjamin1 has quit [Ping timeout: 480 seconds]
ecm` has quit [Ping timeout: 480 seconds]
ecm has quit [Ping timeout: 480 seconds]
<airlied>
mlankhorst: care to dequeue drm-misc-fixes?
heat has quit [Read error: No route to host]
heat has joined #dri-devel
avoidr_ has joined #dri-devel
avoidr has quit [Ping timeout: 480 seconds]
yuq825 has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
benjaminl has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
Mal_ has joined #dri-devel
jewins has joined #dri-devel
Mal__ has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
benjaminl has joined #dri-devel
Mal_ has quit [Ping timeout: 480 seconds]
Mal__ has quit [Ping timeout: 480 seconds]
benjaminl has quit [Ping timeout: 480 seconds]
the_sea_peoples has quit [Quit: WeeChat 2.8]
the_sea_peoples has joined #dri-devel
Leopold_ has quit [Remote host closed the connection]
Leopold_ has joined #dri-devel
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
dviola has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
Kayden has joined #dri-devel
Leopold_ has quit [Remote host closed the connection]
Leopold_ has joined #dri-devel
Mal__ has joined #dri-devel
benjaminl has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
mbrost has quit [Ping timeout: 480 seconds]
bmodem has joined #dri-devel
Mal__ has quit [Ping timeout: 480 seconds]
Mal__ has joined #dri-devel
Company has joined #dri-devel
benjaminl has joined #dri-devel
dviola has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
kts has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has joined #dri-devel
sima has joined #dri-devel
fab has joined #dri-devel
Mal__ has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
Mal__ has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
kzd has quit [Ping timeout: 480 seconds]
ced117 has quit [Ping timeout: 480 seconds]
tzimmermann has joined #dri-devel
Mal__ has quit [Read error: Connection reset by peer]
benjaminl has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
alanc has quit [Remote host closed the connection]
BobBeck is now known as Guest3236
BobBeck has joined #dri-devel
gerddie3 has joined #dri-devel
alanc has joined #dri-devel
fab has quit [Ping timeout: 480 seconds]
robobub_ has quit []
frankbinns has quit [Remote host closed the connection]
K`den has joined #dri-devel
Kayden has quit [Read error: Connection reset by peer]
K`den has quit []
K`den has joined #dri-devel
K`den is now known as Kayden
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
pochu has joined #dri-devel
benjaminl has joined #dri-devel
Mal__ has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
Mal__ has quit []
bgs has joined #dri-devel
frankbinns has joined #dri-devel
fab has joined #dri-devel
jkrzyszt has joined #dri-devel
rsalvaterra has quit []
rsalvaterra has joined #dri-devel
rasterman has joined #dri-devel
benjaminl has joined #dri-devel
<MrCooper>
AFAICT drm_syncobj fds can't be polled, can they?
<emersion>
if you mean poll(), no. i have a patch for that but it needs an IGT
<MrCooper>
right, thanks
benjaminl has quit [Ping timeout: 480 seconds]
lynxeye has joined #dri-devel
tursulin has joined #dri-devel
<RAOF>
Ok, amdgpu. Why are you allocating a buffer with tiling mode incompatible with scanout when I ask for a GBM_BO_USE_RENDERING | GBM_BO_USE_SCANOUT surface?!
<RAOF>
What am I doing differently to the working case?!
<MrCooper>
sounds like a radeonsi bug, it should pick a scanout capable modifier with GBM_BO_USE_SCANOUT
benjaminl has joined #dri-devel
<emersion>
how do you figure out that it's not scanout capable?
<RAOF>
Because when it's displayed it's garbled in a nice blocky tiling fashion.
<RAOF>
It's definitely the right device; I'm only opening a single drm node.
<MrCooper>
that sounds like an amdgpu kernel bug then; if the modifier isn't scanout capable, it should refuse to create a KMS FB for it
<emersion>
RAOF: are you stripping an explicit modifier by any chance?
<RAOF>
Or maybe it's the other way around? Maybe EGL is confused and it's rendering to it as if it's tiled?
<emersion>
ie, allocating with with_modifiers(), then importing it without passing the modifier?
<emersion>
amdgpu should reject buffers it cannot scanout, in theory
<RAOF>
emersion: Nope! I'm deliberately using non-modifiers path, (because there aren't any supported modifiers on amdgpu, on at least this card).
<emersion>
ok, GFX8-
<RAOF>
It might still be a bug on my end; a different branch (with significantly different code flow) does work, but I can't see any difference in the way I'm allocating the gbm_surface, nor in the way I'm using EGL.
<RAOF>
And none of the debugging I've tried has seen any differences, and it's difficult to introspect this state.
benjaminl has quit [Ping timeout: 480 seconds]
<RAOF>
If there's any magical MESA_DEBUG environment that will make some of these decisions more legible that'd be awesome 😐
AndroUser2 has quit [Remote host closed the connection]
<MrCooper>
AMD_DEBUG might be more relevant here, maybe check AMD_DEBUG=help and try some of those which sound related
AndroUser2 has joined #dri-devel
<MrCooper>
lynxeye: interesting plot twist on mesa#8729 :)
<lynxeye>
MrCooper: He, sorry about that, but I hadn't seen this discussion before you linked to it in here.
<MrCooper>
no worries, happens to me all the time
AndroUser2 has quit [Remote host closed the connection]
AndroUser2 has joined #dri-devel
benjaminl has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
swivel has quit [Remote host closed the connection]
swivel has joined #dri-devel
jfalempe_ is now known as jfalempe
swalker__ has joined #dri-devel
avoidr_ has quit []
avoidr has joined #dri-devel
benjaminl has joined #dri-devel
AndroUser2 has quit [Remote host closed the connection]
AndroUser2 has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
djbw_ has quit [Read error: Connection reset by peer]
<ishitatsuyuki>
i'm new to drm/ttm, but I wonder why the wait-wound business is needed instead of e.g. sorting locks by their pointer or other ID?
benjaminl has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
bmodem1 has joined #dri-devel
bmodem has quit [Ping timeout: 480 seconds]
Company has quit [Read error: Connection reset by peer]
<airlied>
ishitatsuyuki: because sorting is expensive usually
<karolherbst>
anybody ever thought about enforcing DRM API locking rules via `WARN_ON(spin_is_locked(lock))`? at least the dma-fence API looks very hard to actually use correctly and there seem plenty of code around just not caring properly about locks
<airlied>
sima likely had
<airlied>
has
<karolherbst>
also.. I'm convinced that `dma_fence_is_signaled` has to go
<karolherbst>
(or to properly lock access to fence->flags)
<karolherbst>
dma-fence kinda smells to much "we outsmart data races by using atomics"
<airlied>
sima: ^
<lynxeye>
karolherbst: Not a fan of the WARN_ON way to do this, but lockdep_assert_held is really useful.
vliaskov has joined #dri-devel
<karolherbst>
yeah.. so that's basically the same just triggers when lockdep is enabled, right?
<karolherbst>
or is it checking for deps?
<karolherbst>
because I don't see why lock dependencies matter here at all, it's just enforcing what the API states
<karolherbst>
*it should
<lynxeye>
karolherbst: Yea, triggers with lockdep and in contrast to spin_is_locked it actually verifies that it's your thread that has lock and not someone else.
<karolherbst>
ahhh
<karolherbst>
okay
<karolherbst>
I expect this spamming warnings all over the place, so having it behind lockdep might be better here anyway
benjaminl has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
Haaninjo has joined #dri-devel
<cwabbott>
jenatali: wow, that spec seems to require some real driver heroics, especially around suspend/resume
<cwabbott>
implementing suspend/resume was already painful enough in turnip after they added it to Vulkan dynamic rendering to match DX12
alyssa has joined #dri-devel
<alyssa>
cwabbott: is now a good time to invoke the axiom of QDS?
<cwabbott>
now it sounds like they want the driver to compute tile layouts at submit time, because you can't compute the tile layout until you know all of the render passes you want to merge
<cwabbott>
alyssa: uhh, what is that?
benjaminl has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
<alyssa>
"now it sounds like they want the driver to compute tile layouts at submit time"
<alyssa>
IDK about Qualcomm but this would be extra spicy on Apple (and maybe Mali) because the layouts get baked into the fragment shaders
<alyssa>
what's that? you want even more FS prologs + FS epilogs?
benjaminl has quit [Ping timeout: 480 seconds]
<alyssa>
and somehow want to defer shader linking until submit time instead of just draw time?
smiles_1111 has quit [Remote host closed the connection]
<alyssa>
well, if you insist! (-:
smiles_1111 has joined #dri-devel
<alyssa>
(might be possible to push offsets as a uniform, but still.)
kxkamil has quit []
<mlankhorst>
airlied: sorry about that!
AndroUser2 has quit [Remote host closed the connection]
AndroUser2 has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.6]
benjaminl has joined #dri-devel
lemonzest has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
kxkamil has joined #dri-devel
fab has quit [Quit: fab]
<jenatali>
cwabbott: Yeah, that's the impression that I get, but QC was okay with it so 🤷
<alyssa>
jenatali: so far i've not been impressed with QC's software (-:
<jenatali>
Yeah
benjaminl has joined #dri-devel
yuq825 has left #dri-devel [#dri-devel]
* alyssa
writes optimization passes Just For Fun becuase it's Friday and she did real work all week
<alyssa>
Do we have an easy way to find the first unconditional block executed after an instruction? Shouldn't be too hard to walk the cf list
<karolherbst>
alyssa: are you bored?
<alyssa>
karolherbst: tired
<karolherbst>
mhhh
<karolherbst>
I still have my subgroup MR, but you already reviewed quite a bit of it
<alyssa>
oh I can look at that today
<karolherbst>
but it also changed quite a bit
<karolherbst>
cool
<alyssa>
I feel like what I want is something dominance related maybe?
<alyssa>
although maybe not even
<karolherbst>
soooo
<karolherbst>
I have a fun optimization we need
<karolherbst>
but it's also quite a bit of work
<karolherbst>
but probably also fun
<karolherbst>
ever looked into loop merging?
<alyssa>
uh oh
<alyssa>
I already know what fun optimization I'm writing :-p
<karolherbst>
like merging the inner loop with the outer one
<alyssa>
i know better than to write loop opts
<karolherbst>
so threads taking longer in the inner loop don't stall threads waiting on the next outer iteration
benjaminl has quit [Ping timeout: 480 seconds]
alyssa has left #dri-devel [#dri-devel]
<karolherbst>
:D
kts has joined #dri-devel
<HdkR>
That sounds like a dependency tracking hellscape
<karolherbst>
it's what nvidia is doing
<karolherbst>
HdkR: but it's actually not that hard, you just turn the inner loop into predicated blocks
<karolherbst>
and build a little state machine deciding what outer+inner loop iteration the thraed is at
<karolherbst>
and decouple threads like this
<karolherbst>
it's kinda fun from a concept perspective
camus has quit [Ping timeout: 480 seconds]
pochu has quit [Quit: leaving]
<karolherbst>
HdkR: it also helps with minimizing c/r stack usage in shaders
<HdkR>
I see the improvements. Sounds painful :D
<karolherbst>
:D
<karolherbst>
we have to do it though
AndroUser2 has quit [Remote host closed the connection]
AndroUser2 has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
jkrzyszt has quit [Ping timeout: 480 seconds]
benjaminl has joined #dri-devel
<jenatali>
Have to?
benjaminl has quit [Ping timeout: 480 seconds]
alyssa has joined #dri-devel
<alyssa>
karolherbst: ...Lol
<alyssa>
I googled loop merging and the result is a presentation from a prof at my school :-p
* alyssa
recognized the name
<karolherbst>
:D
<karolherbst>
it's a sign
<karolherbst>
now you have to do it, it's the law
<karolherbst>
jenatali: well.. for getting more perf I mean
<jenatali>
Got it
<karolherbst>
it basically leads to threads getting diverged less often and you even need to converge them less
<alyssa>
karolherbst: The school I graduated from and am slowly recovering mentally from? Indeed a sign that I should not write the nir pass :-D
<karolherbst>
:D
<karolherbst>
oh well.. guess I'll have to do it sooner or later as it's more critical for compute anyway
elongbug has joined #dri-devel
jkrzyszt has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
<sima>
(not yet enough oxygen in the brain after work out I guess)
<karolherbst>
ehhh.. yes, but I also found another nouveau bug I think...
<sima>
unless you do a weak reference protected by a lock or so, if you need locking to fix a use-after-free your design is very cursed
<karolherbst>
the fence lock needs to be taken before calling into dma_fence_signal_locked, right?
<karolherbst>
nouveau design here is a little cursed anyway
<karolherbst>
anywya, my point was rather, that those interfaces are hard to use correctly and we should at least try to warn on certain patterns violating API contracts
aravind has quit [Ping timeout: 480 seconds]
<karolherbst>
I look at this dma-fence code and it immediate rings "people try to outsmart locking" bells
<karolherbst>
and what nouveau seems to be doing is to have a global fence list lock and takes this instead of taking the fence own locks
<gfxstrand>
uh oh...
<sima>
karolherbst, it's maybe just badly documented, but we assume your driver can cope with concurrent calls to this
<gfxstrand>
Don't try to outsmart the locking. That will not go well for you.
<karolherbst>
well.. dma_fence_signal_locked states "Unlike dma_fence_signal(), this function must be called with &dma_fence.lock held)" which nouveau absolteuly doesn't :)
<karolherbst>
and I'd rather have the kernel warn on violating this rule
<sima>
karolherbst, oh that's clearly a bug
<karolherbst>
or drop the rule and put whatever is the actual rule
<karolherbst>
okay :)
<karolherbst>
so we _should_ warn on using it incorrectly
<sima>
karolherbst, enable lockdep and it will
<karolherbst>
it didn't
<sima>
dma_fence_signal_timestamp_locked() has lockdep_assert_held(fence->lock);
<karolherbst>
huh...
<karolherbst>
maybe I should check that out again, but I was sure that lockdep didn't tell me anything
<karolherbst>
but yeah.. dma_fence_signal_timestamp_locked has indeed a assert here...
<sima>
karolherbst, you checked it's still running? lockdep gets disabled after the first splat
<karolherbst>
uhhh... dunno actually
<sima>
tbf I'd have surprised me if we'd indeed sucked that much
<sima>
I've been sprinkling lockdep_assert_held and encouraged others to do the same for years now
<sima>
they're both good documentation and good runtime checks
<karolherbst>
yeah.. let me check that out again just to be sure
<karolherbst>
maybe the problem was me running the full debug kernel and uhm.. doing other weird things
idr has joined #dri-devel
kzd has joined #dri-devel
benjaminl has joined #dri-devel
<sima>
karolherbst, so locking at this again I think we're missing a load_acquire barrier before the various test_bit()
<karolherbst>
potentially
<karolherbst>
but `test_bit` isn't strictly an atomic operation, is it?
<sima>
the test_and_set_bit is an rmw which on linux has full barriers
<sima>
it is
<karolherbst>
ahh, so it is
<sima>
linux atomic bitops do not have atomic_ anywhere in their name
<karolherbst>
silly
<sima>
for entertainment value
<sima>
the non-atomic versions have a __ prefix
<karolherbst>
....
<karolherbst>
maybe we need a "keep naming closer to C11" patch
<karolherbst>
the closest you can get to a CC all on the lkml
<sima>
correction, only the rmw with return value have full barrier semantics
<karolherbst>
pain
<karolherbst>
it's still feels wrong
<karolherbst>
s/it's/it/
<sima>
so yeah we need a pile of smp_mb_after_atomic I think
<sima>
for the "it's signalled already" case
<sima>
plus a pile of comments
<karolherbst>
yeah so my complain about dma_fence_signal specifically is, that it appears to be a locked operation but strictly isn't
<karolherbst>
ehh
<karolherbst>
I meant the other one
<karolherbst>
dma_fence_is_signaled
<sima>
yeah it's only a conditional barrier
<sima>
or well, supposed to be, it's a bit buggy in that regard
<karolherbst>
yep
<sima>
this follows the design of waitqueue and completion and everything else in the linux kernel
<sima>
so yeah this is how this works
<karolherbst>
pain
<karolherbst>
it shouldn't
<sima>
imo it's the right semantics
alyssa has left #dri-devel [#dri-devel]
<sima>
for completions or anything that looks like one
<karolherbst>
I disagree :P
<sima>
if your completion needs locking your design seriously smells
<karolherbst>
yeah, probably
<karolherbst>
I'm sure nouveaus code there is kinda wrong anyway, but oh well
<karolherbst>
the future is just to use linas abstractions on this probably anyway :P
benjaminl has quit [Ping timeout: 480 seconds]
<sima>
yeah
<sima>
in general, if the barrier semantics of core primitives (completion, work, kref, anything really) don't work for you
<sima>
you're doing something really fishy
<sima>
the atomics are lolz because they don't match C11 and have inconsistent naming
<karolherbst>
yeah... dunno.. maybe they work, but nouveau doesn't take fence locks but instead it's own lock across a list of fences
<karolherbst>
so that's kinda fishy
<sima>
but the other stuff is imo solid
<karolherbst>
maybe it does so in a few places.. I found one where it doesn't
<sima>
yeah if that fence list lock keeps the fence alive, then the irq handler might need to take it too
<sima>
karolherbst, btw you're volunteering for the dma_fence barrier review patch?
vliaskov has quit [Remote host closed the connection]
benjaminl has joined #dri-devel
Dr_Who has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Dr_Who has joined #dri-devel
swalker__ has quit [Remote host closed the connection]
tursulin has quit [Ping timeout: 480 seconds]
anujp has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
f11f12 has quit [Quit: Leaving]
Dr_Who has quit []
Dr_Who has joined #dri-devel
benjaminl has joined #dri-devel
<mbrost>
dakr: where is the latest version of gpuva? Also any idea if / when you plan on landing this upstream? We are fairly close to trying to land Xe with it.
idr has quit [Ping timeout: 480 seconds]
benjaminl has quit [Quit: WeeChat 3.8]
benjaminl has joined #dri-devel
Dr_Who has quit []
Dr_Who has joined #dri-devel
frankbinns has quit [Remote host closed the connection]
AndroUser2 has quit [Remote host closed the connection]
smiles_1111 has quit [Ping timeout: 480 seconds]
AndroUser2 has joined #dri-devel
anujp has quit [Remote host closed the connection]
anujp has joined #dri-devel
Dr_Who has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
iive has joined #dri-devel
HerrSpliet is now known as RSpliet
Guest3147 has quit [Remote host closed the connection]
Daanct12 has joined #dri-devel
AndroUser2 has quit [Remote host closed the connection]
AndroUser2 has joined #dri-devel
gouchi has joined #dri-devel
gouchi has quit [Remote host closed the connection]
ced117 has joined #dri-devel
AndroUser2 has quit [Remote host closed the connection]
AndroUser2 has joined #dri-devel
jkrzyszt has quit [Ping timeout: 480 seconds]
djbw_ has joined #dri-devel
<airlied>
karolherbst: there was a lockdep splat we were seeing that was being ignored i think
<airlied>
pretty sure i dont see it now after the fix
<karolherbst>
yeah....
<airlied>
but there is a new one to fix
<karolherbst>
mind trying to reproduce it, because I sure can't
<jannau>
is there a way to tell Xorg/modesetting to not use a kms device? preferably in a generic way like MatchDriver in OutputClass
<jannau>
one option might be to patch modesetting to ignore "non-desktop" device
heat has quit [Read error: Connection reset by peer]
heat_ has joined #dri-devel
<jannau>
we have now a working minimal implementation for the touchbar on apple silicon macbooks
Company has joined #dri-devel
<karolherbst>
jannau: I suspect the plan is to have a special daemon take care of displaying stuff on it?
<airlied>
but I'm not seeing it at the moment in that form
<jannau>
it works with gdm + plasma under wayland. either since those respect a multi-seat config or because of "non-desktop" = 1
<karolherbst>
airlied: yeah.. it's kinda random it seems
<karolherbst>
jannau: right... I think DRM leases would allow you to do this without relying on any kind of configs. But I also see why compositors/X shouldn't use the touchbar anyway, because it's really not a display in the common sense and it probably messes up things
<airlied>
but yeah once that goes down you ain't seeing anything else
<karolherbst>
but I think with DRM leases the daemon won't need root privileges
<karolherbst>
but no idea how mature all of this is
axeldavy is now known as adavy
<karolherbst>
but yeah.. I guess Xorg shouldn't use `non-desktop` displays at all, but not sure if that's the responsibility of the compositor in an X world or not
<karolherbst>
and also not sure if we even care enough
<karolherbst>
just let X die already 🙃
<airlied>
there's a difference between non-desktop displays and non-desktop kms drivers though
<jannau>
we're ready to kill X on apple silicon devices (declaring it broken and unsupported) but we need a sddm release with wayland support first
<gfxstrand>
Applying all should be a legal implementation
<karolherbst>
maybe we could scan all possible values...
<karolherbst>
but uhhhh
<jenatali>
I could see a pass that gathers bcsels and phis and ors all the constant values, but you'd have to give up pretty easily once you get out of that pattern
<karolherbst>
yeah...
frankbinns has quit [Ping timeout: 480 seconds]
<karolherbst>
well... I think it has to resolve to constants
<karolherbst>
so it's just a phi of constants in the end
<jenatali>
Oh then that's not so bad
<jenatali>
Could even do that straight in vtn
<karolherbst>
maybe we could do this: if it all resolves to constant, use that, otherwise do all
<karolherbst>
jenatali: well.. I suspect it could also be nested phis
<jenatali>
Sure
<karolherbst>
maybe llvm gets smart and adds alus on it and we get cursed spir-v
<jenatali>
And then we give up lol
<jenatali>
Unless those alus are bcsels
<karolherbst>
I think it would still all constant fold
<karolherbst>
maybe we should have a scoped barrier intrinsics with variable semantics
<karolherbst>
and then after constant folding we resolve it
<karolherbst>
and we shall call it cursed_scoped_barrier
<jenatali>
Appropriate name at least
<jenatali>
I'd just as soon put it straight in vtn and not try hard at all before giving up
<karolherbst>
I'll deal with lower hanging fruits for now
<karolherbst>
jenatali: btw, are you subscribed to the OpenCL label?
<jenatali>
I am
<karolherbst>
ahh, okay
<karolherbst>
I have more additions to clc :D
kasper93 has joined #dri-devel
<jenatali>
I'll be honest though I haven't been paying too much attention, just following to make sure we don't get broken
<karolherbst>
I'm just adding options to the validator
<karolherbst>
as apparently the validator checks number of function args and....
<jenatali>
Oh I saw that one. I'm not in the office today but remind me on Monday and I can take a look
<karolherbst>
cool :)
<karolherbst>
I also fixed a bug, because apparently if you pass options, the "src" is a null string killing the logger :)
Duke`` has quit [Ping timeout: 480 seconds]
<jenatali>
Hah
<karolherbst>
and I was confused about those "(file=" errors :)
mbrost_ has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
bgs has quit [Remote host closed the connection]
mbrost_ has quit [Ping timeout: 480 seconds]
frankbinns1 has quit [Remote host closed the connection]
rasterman has quit [Quit: Gettin' stinky!]
<jannau>
airlied: looks like rejecting devices with max_width or max_height smaller than 120 pixels (arbitrarily chosen) would be easier than checking the non-desktop connector property
<jannau>
desktop at a smaller resolution than 320x240 would be painful smaller than quarter of that hopefully hits not too many fringe use cases
<DavidHeidelberg[m]>
Merged the updated farm handling, if for some reason you'll see some anomaly when testing MR or running Marge, please ping me, it could be related. Just for sure!
<DavidHeidelberg[m]>
*anomaly = missing or having too many farms inside the pipeline
jewins has quit [Quit: jewins]
Kayden has quit [Remote host closed the connection]
<karolherbst>
Kayden: I know I kinda hand waved a lot on this topic in the past, but what's the actual deal with the SIMD situation with iris? 8, 16, 32 are generally supported? Or can certain CSOs only support certain SIMD sizes, and how would I know? And it appears that 16 is somehow the one which is preferred?
Dr_Who has quit [Ping timeout: 480 seconds]
benjaminl has quit [Ping timeout: 480 seconds]
<Kayden>
for compute, it depends on the local workgroup size
<Kayden>
the shaders can run in 8, 16, or 32 channels
<Kayden>
well...be dispatched in groups of 8, 16, or 32 lanes
<karolherbst>
right... so I started to implement CL subgroups and they are kinda annoying
<Kayden>
not surprising
<karolherbst>
sooo.. on some gpus it appears you can launch 56 sub groups max, which doesn't work with e.g. SIMD16 if you want to launch 1024 threads
<Kayden>
yeah. for those, we force to SIMD32 :/
<karolherbst>
so I'm mostly trying to understand what's like the optimalSIMD size in general
<karolherbst>
so atm it's mostly just "what's optimal"
<karolherbst>
e.g. Intel seems to limit to 256 threads in CL
<karolherbst>
and I suspect it's because of their messy SIMD situation :)
<Kayden>
Almost nothing can run in SIMD32, so those programs are basically pairs of SIMD16 instructions for each half
<Kayden>
so we only do that when necessary. it can perform better for things like simple pixel shaders
<Kayden>
but it's often really bad for register pressure too
<karolherbst>
I see
<Kayden>
SIMD16 is usually the sweet spot as long as your register pressure isn't terrible
<Kayden>
memory access can normally happen 16 channels at a time
<karolherbst>
maybe I should do some benchmarks and force a SIMD mode and see how it goes
<karolherbst>
okay
<karolherbst>
mhhh
AndroUser2 has joined #dri-devel
<karolherbst>
I wonder if I want to limit to max_subgroups * preferred_simd_size (being 16) for iris
<karolherbst>
Kayden: btw, "devinfo->max_cs_workgroup_threads" is the amount of subgroups, right?
<karolherbst>
or is there a different limit I'm not aware of
<Kayden>
I guess so
<karolherbst>
I was seeing worse perf compared to Intel's stack, so I'm actually wondering if this has anything to do with the SIMD size forced based on the block size
<Kayden>
I was thinking max_cs_threads but it looks like that's basically clamped
<Kayden>
err
<Kayden>
max_cs_workgroup_threads is a clamped version of max_cs_threads
<Kayden>
so it's probably what you want, yeah
<karolherbst>
it reports 56 on gen0.5
<karolherbst>
*9.5
<karolherbst>
and 64 on gen12
<karolherbst>
which kinda makes sense
<Kayden>
mmm
<Kayden>
yeah it's clamped to 64 based on the bit-width of a GPGPU_WALKER field...
<Kayden>
you can actually have 112 threads on gen12...
frankbinns1 has quit [Remote host closed the connection]
<karolherbst>
huh
<Kayden>
I think we're dispatching height 1 row N grids, I guess we'd have to do a rectangular grid
<karolherbst>
yeah well.. doesn't really matter as long as max_cs_workgroup_threads * 16 stays at or above the max threads
<karolherbst>
on my 9.5 it's werid because 56 * 16 < 1024
frankbinns1 has joined #dri-devel
<karolherbst>
so if I report 1024 threads, applications might just enqueue 1024 threads per block and force SIMD32
<karolherbst>
but again, intel only reports 256 threads and 8 as the subgroup size, so maybe that allows for even more optimized code? no clue.. it's kinda weird
<karolherbst>
the one advantage I have is, that if an application does not specify the block size I can pick freely, so for this alone it would already be helpful know what's the best to pick
<karolherbst>
if e.g. SIMD8 is generally the fastest, I'd just use that
<karolherbst>
unless the app sets constraints making that impossible
jewins has quit [Ping timeout: 480 seconds]
<Kayden>
yeah, awkwardly I think SIMD16 is the best, but it really depends :/
ngcortes has joined #dri-devel
AndroUser2 has quit [Remote host closed the connection]