ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
Daaanct12 has quit [Quit: Quitting]
Danct12 has joined #dri-devel
heat_ has quit [Ping timeout: 480 seconds]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
yuq825 has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
<ishitatsuyuki> Lynne, I probably need to understand what kind of structure your algorithm is. Can you write out it in pseudocode? e.g. Step 1: Do prefix sum for x[k][i]..x[k][j] for all k
columbarius has joined #dri-devel
co1umbarius has quit [Ping timeout: 480 seconds]
<Lynne> ishitatsuyuki: the opencl version is easier to read - https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/opencl/nlmeans.cl
<Lynne> step 1 is load all values needed from the source images and compute the s1 - s2 diff
<Lynne> store that per-pixel vector in the integral image
<Lynne> then compute the horizontal prefix sum, followed by the vertical prefix sum
<Lynne> which gives you the integral image
<Lynne> finally, get the a, b, c, d (a rectangle) vector values from the integral for each individual block at a given offset, compute the weight, and add it in the weights array
<Lynne> opencl does it by having a separate pass for each step, and when I tried to do that, the performance was 20 _times_ slower than opencl than vulkan, because of all the pipeline barriers
<Lynne> even though I was using a prefix sum algorithm which is around 100 times faster than the naive sum that opencl uses
<Lynne> so I decided to merge the horizontal, vertical and weights passes into a single shader, to eliminate pretty much all of the pipeline barriers
Leopold__ has joined #dri-devel
<Lynne> merging the horizontal and vertical passes works fine, but merging the weights pass causes any integral image loads after height amount of _horizontal_ pixels to give old, pre-vertical pass data
Leopold_ has quit [Ping timeout: 480 seconds]
Zopolis4_ has quit []
jdavies has joined #dri-devel
jdavies is now known as Guest10794
<ishitatsuyuki> this does sound like a use case requiring pipeline barrier unfortunately
<ishitatsuyuki> because you need to wait for the entire prefix sum to finish before proceeding to the next stage
<ishitatsuyuki> there is https://w3.impa.br/~diego/projects/NehEtAl11/, which gives you reduced bandwidth for integral image by keeping the horizontal-only prefix sum in shared memory
<ishitatsuyuki> but it's significantly more complicated to implement
<ishitatsuyuki> another approach is to do decoupled lookback but with 2D indices, where you calculate in the order of block (0,0), (1,0),(0,1),(2,0),(1,1),(0,2),... which will give you the block on top and left on each step
<ishitatsuyuki> at the end of day you will still need a barrier between generating the integral image and weight computation
Guest10794 has quit [Ping timeout: 480 seconds]
<Lynne> why there and not between the horizontal and vertical prefix sums too?
<ishitatsuyuki> for you current approach you need barrier there too, but I described alternative algorithms that doesn't require synchronization and even memory store between hori/vert prefix sums
<Lynne> why need a pipeline barrier at all?
<ishitatsuyuki> because computing vertical prefix sum requires the entire horizontal prefix sum to finish?
<Lynne> why is barrier() not enough?
<ishitatsuyuki> barrier only synchronize within the workgroup
<Lynne> we only have a single workgroup
<ishitatsuyuki> code says s->pl_int_hor.wg_size[2]
<Lynne> always set to 1
<Lynne> just copypaste from other code, I'll remove it to make it clearer
<ishitatsuyuki> a single workgroup does sound like why it's slow though
<ishitatsuyuki> a GPU has many WGPs (in amd terms) and a workgroup runs within a single WGP
ngcortes_ has joined #dri-devel
<Lynne> it's a max-sized workgroup, 1024, with each invocation handling 4 pixels at a time for a 4k image
<ishitatsuyuki> for a single workgroup, it should work if you put controlBarrier(wg,wg,buf,acqrel | makeavail | makevisible) between each pass
<Lynne> nope, doesn't, even if I splat it before each prefix_sum call
<Lynne> on neither nvidia nor radv
ngcortes has quit [Ping timeout: 480 seconds]
<Lynne> btw, each dispatch handles 4 displancements (xoffs/yoffs) at a time, and for a research radius of 3, there are 36 dispatches ((2_r)_(2*r) - 1)/4) that have to be done
<Lynne> so we still do multiple wgps, it's just that each dispatch handles one
<ishitatsuyuki> ok, utilization should be fine with that
<Lynne> the default research radius is 15 btw so that's 900ish dispatches, hence you can see why barriers kill performance -_-
<Lynne> needing to do 3 passes would result in 3x the number of dispatches and barriers
<ishitatsuyuki> use the meme split barrier feature that no one use? :P (vkCmdSetEvent)
<Lynne> that would still need a memory barrier, though, wouldn't it?
<ishitatsuyuki> consider it an async barrier
<Lynne> I'll leave it as a challenge for someone to do better than my current approach
<ishitatsuyuki> yeah fair
jewins has quit [Ping timeout: 480 seconds]
<ishitatsuyuki> back to debugging, did you try putting the barrier right after prefix sum as well?
<Lynne> yup, splatted it everywhere, the integral image buffer has coherent flags too
<ishitatsuyuki> i'm afraid i'm out of ideas again
<Lynne> oh, hey, maybe I could test with llvmpipe
<Lynne> well, that's weird, the part which is broken on a GPU is also broken on lavapipe
<Lynne> but the part which is fine on a gpu is pure black on lavapipe
<Lynne> limiting lavapipe to a single thread doesn't help either
<Lynne> do none of the thousand of synchronization options actually do anything in vulkan?
<Lynne> I know a lot of them are there just to satisfy some alien hardware's synchronization requirements, but still
<HdkR> Gitlab having some performance issues right now?
<HdkR> Managing to clone at a blazing 38KB/s
Daanct12 has joined #dri-devel
aravind has joined #dri-devel
Daaanct12 has joined #dri-devel
bmodem has joined #dri-devel
Danct12 has quit [Ping timeout: 480 seconds]
Daanct12 has quit [Ping timeout: 480 seconds]
<Lynne> anyone with any ideas or willing to run my code?
<Lynne> it's the last roadblock to merging the entire video patchset in ffmpeg
<zmike> what "synchronization options" are you referring to
zf_ has joined #dri-devel
zf has quit [Read error: Connection reset by peer]
fxkamd has quit []
<Lynne> barrier(); memoryBarrier(); controlBarrier();
<Lynne> I still don't understand how this can fail so consistently on all hardware
<zmike> sounds like you're using it wrong if it's broken everywhere
<Lynne> I simplified the issue down to load the same value in all invocations, and put it on a pixel
<Lynne> and I got different values after height amount of invocations*rows
Company has quit [Quit: Leaving]
<Lynne> simplified it as much as possible - https://paste.debian.net/1277180/
<Lynne> load integral_img with vec4(1), do a vertical prefix, load the vec4 at 300,300, write the .x to all of weights[]
<Lynne> err, had a typo, https://paste.debian.net/1277181/
<Lynne> making the if on line 252 "if ((gl_GlobalInvocationID.x * 4) < height[0]) {" to always true seems to fix the issue
kts has joined #dri-devel
krushia has quit [Ping timeout: 480 seconds]
ngcortes_ has quit [Read error: Connection reset by peer]
Danct12 has joined #dri-devel
Danct12 has quit []
Danct12 has joined #dri-devel
Danct12 has quit []
Danct12 has joined #dri-devel
bgs has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
khfeng has joined #dri-devel
apinheiro has quit [Quit: Leaving]
bgs has quit [Remote host closed the connection]
itoral has joined #dri-devel
pochu has quit [Ping timeout: 480 seconds]
camus1 has joined #dri-devel
camus has quit [Read error: Connection reset by peer]
pochu has joined #dri-devel
macromorgan has quit [Ping timeout: 480 seconds]
sgruszka has joined #dri-devel
frieder has joined #dri-devel
danvet has joined #dri-devel
macromorgan has joined #dri-devel
vliaskov_ has joined #dri-devel
rasterman has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
vliaskov__ has joined #dri-devel
MajorBiscuit has joined #dri-devel
vliaskov_ has quit [Ping timeout: 480 seconds]
<ishitatsuyuki> oh yeah, control barrier inside control flow is undefined behavior
<ishitatsuyuki> common practice is to hoist barrier/controlBarrier() outside if condition
tursulin has joined #dri-devel
karolherbst has quit [Read error: Connection reset by peer]
karolherbst has joined #dri-devel
pcercuei has joined #dri-devel
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
yuq8251 has joined #dri-devel
MajorBiscuit has quit [Quit: WeeChat 3.6]
yuq825 has quit [Ping timeout: 480 seconds]
lynxeye has joined #dri-devel
jkrzyszt has joined #dri-devel
<Lynne> more gotchas than... I have no idea what, there's just too many
Haaninjo has joined #dri-devel
Danct12 has quit [Ping timeout: 480 seconds]
kugel_ has joined #dri-devel
sarahwalker has joined #dri-devel
swalker_ has joined #dri-devel
swalker_ is now known as Guest10811
kugel has quit [Ping timeout: 480 seconds]
sarahwalker has quit [Ping timeout: 480 seconds]
kugel_ is now known as kugel
Danct12 has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
paulk-bis has quit []
paulk has joined #dri-devel
Danct12 has quit [Quit: WeeChat 3.8]
swalker__ has joined #dri-devel
Guest10811 has quit [Remote host closed the connection]
swalker_ has joined #dri-devel
swalker_ is now known as Guest10814
swalker__ has quit [Ping timeout: 480 seconds]
swalker__ has joined #dri-devel
Guest10814 has quit [Remote host closed the connection]
sarahwalker has joined #dri-devel
Zopolis4_ has joined #dri-devel
swalker__ has quit [Ping timeout: 480 seconds]
ajhalaney[m] has quit [Quit: Bridge terminating on SIGTERM]
hch12907 has quit [Quit: Bridge terminating on SIGTERM]
bylaws has quit [Quit: Bridge terminating on SIGTERM]
DrNick has quit []
x512[m] has quit []
LinuxHackerman has quit [Quit: Bridge terminating on SIGTERM]
Wallbraker has quit [Quit: Bridge terminating on SIGTERM]
danylo has quit [Quit: Bridge terminating on SIGTERM]
vidal72[m] has quit []
martijnbraam has quit [Quit: Bridge terminating on SIGTERM]
Guest8249 has quit []
Newbyte has quit []
cmeissl[m] has quit [Quit: Bridge terminating on SIGTERM]
EricCurtin[m] has quit []
Mis012[m] has quit []
Hazematman has quit [Quit: Bridge terminating on SIGTERM]
jasuarez has quit [Quit: Bridge terminating on SIGTERM]
K0bin[m] has quit []
xerpi[m] has quit []
RAOF has quit [Quit: Bridge terminating on SIGTERM]
Vin[m] has quit []
ofirbitt[m] has quit []
egalli has quit [Quit: Bridge terminating on SIGTERM]
bubblethink[m] has quit []
nicofee[m] has quit []
ella-0[m] has quit []
ttayar[m] has quit []
jluthra has quit []
Sumera[m] has quit []
dabrain34[m] has quit []
jenatali has quit [Quit: Bridge terminating on SIGTERM]
AlexisHernndezGuzmn[m] has quit []
nyorain[m] has quit []
robertmader[m] has quit []
Guest7817 has quit []
naheemsays[m] has quit []
onox[m] has quit []
gallo[m] has quit []
talcohen[m] has quit []
Quinten[m] has quit []
aradhya7[m] has quit []
znullptr[m] has quit []
zzoon[m] has quit [Quit: Bridge terminating on SIGTERM]
KunalAgarwal[m][m] has quit []
Tooniis[m] has quit []
reactormonk[m] has quit []
moben[m] has quit []
doras has quit [Quit: Bridge terminating on SIGTERM]
DavidHeidelberg[m] has quit [Quit: Bridge terminating on SIGTERM]
enunes[m] has quit [Quit: Bridge terminating on SIGTERM]
zzxyb[m] has quit []
Eighth_Doctor has quit []
T_UNIX has quit []
ohadsharabi[m] has quit []
swick[m] has quit [Quit: Bridge terminating on SIGTERM]
yshui` has quit [Quit: Bridge terminating on SIGTERM]
MotiH[m] has quit []
viciouss[m] has quit []
kunal_10185[m] has quit []
sigmoidfunc[m] has quit []
dcbaker has quit [Quit: Bridge terminating on SIGTERM]
DUOLabs[m] has quit [Quit: Bridge terminating on SIGTERM]
gnustomp[m] has quit []
YaLTeR[m] has quit [Write error: connection closed]
heftig has quit [Write error: connection closed]
masush5[m] has quit [Write error: connection closed]
shoffmeister[m] has quit [Write error: connection closed]
eyearesee has quit [Write error: connection closed]
DemiMarie has quit [Write error: connection closed]
ids1024[m] has quit [Write error: connection closed]
sewn has quit [Write error: connection closed]
samueldr has quit [Write error: connection closed]
pushqrdx[m] has quit [Write error: connection closed]
devarsht[m] has quit [Write error: connection closed]
tleydxdy has quit [Write error: connection closed]
pp[m] has quit [Write error: connection closed]
q4a has quit [Write error: connection closed]
JosExpsito[m] has quit [Write error: connection closed]
fkassabri[m] has quit [Write error: connection closed]
aura[m] has quit [Write error: connection closed]
madhavpcm has quit [Write error: connection closed]
tomeu has quit [Write error: connection closed]
Anson[m] has quit [Write error: connection closed]
dantob has quit [Write error: connection closed]
daniliberman[m] has quit [Write error: connection closed]
cwfitzgerald[m] has quit [Write error: connection closed]
gdevi has quit [Write error: connection closed]
dhirschfeld2[m] has quit [Write error: connection closed]
zamundaaa[m] has quit [Write error: connection closed]
kusma has quit [Write error: connection closed]
ramprakash[m] has quit [Write error: connection closed]
undvasistas[m] has quit [Write error: connection closed]
Mershl[m] has quit [Write error: connection closed]
Nirvin[m] has quit [Write error: connection closed]
mripard has quit [Write error: connection closed]
kunal10710[m] has quit [Write error: connection closed]
knr has quit [Write error: connection closed]
kelbaz[m] has quit [Write error: connection closed]
nekit[m] has quit []
JPEW has quit [Read error: Connection reset by peer]
MTCoster has quit [Remote host closed the connection]
jbarnes has quit [Remote host closed the connection]
MTCoster has joined #dri-devel
jbarnes has joined #dri-devel
JPEW has joined #dri-devel
Celmor[m] has joined #dri-devel
iive has joined #dri-devel
devilhorns has joined #dri-devel
<Lynne> still 3x slower than opencl, I leave it as a challenge for anyone to make it faster -_-
<Lynne> ishitatsuyuki: could you take a look at it, just in case I missed something?
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
swalker_ has joined #dri-devel
swalker_ is now known as Guest10834
cheako has quit [Quit: Connection closed for inactivity]
sarahwalker has quit [Read error: Connection reset by peer]
<ishitatsuyuki> it's hard to guess about performance unless you're obviously underutilizing
swalker__ has joined #dri-devel
Guest10834 has quit [Remote host closed the connection]
<ishitatsuyuki> radv's profiler support requires the app to be presenting, but maybe nvidia has better tools on that side
dviola has left #dri-devel [#dri-devel]
iive has quit [Quit: They came for me...]
djbw has quit [Read error: Connection reset by peer]
smiles_ has joined #dri-devel
smilessh has quit [Ping timeout: 480 seconds]
<Lynne> ishitatsuyuki: if (first) is C, though
<ishitatsuyuki> ah ok, nevermind
<ishitatsuyuki> the practice of mixing glsl in C is making me uncomfortable :/
<Lynne> with rdna3, which has atomic float ops, barrier overhead is pretty much zero
<Lynne> it's also not 3x, but 2x slower than opencl
<ishitatsuyuki> well that's a huge red flag
<ishitatsuyuki> don't use float atomics
<Lynne> removing the prefix sums boosts fps to 20, which sound still quite low to me
<Lynne> it's either barriers, or atomic float adds :/
<ishitatsuyuki> have you tried aggregating through shared memory first so only one thread per workgroup need to do the global atomic?
<Lynne> 2 per dispatch, so it's not horrible
<Lynne> not quite sure what you mean
<ishitatsuyuki> i'd avoid float atomics at all cost
<ishitatsuyuki> the alternative is probably a workgroup barrier, not pipeline barrier, right?
<ishitatsuyuki> I don't know how float atomics are emulated but they seem to be insanely low throughput
<ishitatsuyuki> meanwhile, barrier only costs as much as it needs to wait, it's not like the instruction itself has any execution cost
<Lynne> sadly integral image calculations are not separable
<Lynne> I could calculate weights across multiple buffers and just merge them during the final step, plenty of descriptors left
<ishitatsuyuki> is integral image not separable?
<ishitatsuyuki> separable in the filter sense?
<Lynne> separable in that you can't independently compute a horizontal and vertical prefix sum and merge them
<ishitatsuyuki> the usual definition of separability for 2D IIR/FIR filters is a bit different, but ok
<ishitatsuyuki> separability there means if a NxN convolution can be done with a 1xN then Nx1 convolution
<ishitatsuyuki> can you use ints for the atomics somehow?
<ishitatsuyuki> that should still be better than float atomics
<Lynne> removing the atomic float adds makes literally no difference, they're not the roadblock
<Lynne> *bottleneck
<ishitatsuyuki> ok
<ishitatsuyuki> you should probably try getting a profile
<Lynne> guess I could fire up the filter under mpv, the client will be presenting in that case
<Lynne> how do I use the radv profiler?
Company has joined #dri-devel
<Lynne> uuh, I think I found the bottleneck - resolution
<Lynne> I was testing on 4k video, but on 720p, I get 700fps with my code, while opencl gets 70
jfalempe has quit [Quit: Leaving]
sarnex has quit [Ping timeout: 480 seconds]
<ishitatsuyuki> does mpv present with vulkan though
smilessh has joined #dri-devel
smiles_ has quit [Read error: Connection reset by peer]
elongbug has joined #dri-devel
sarnex has joined #dri-devel
Zopolis4_ has quit []
Wallbraker has joined #dri-devel
ajhalaney[m] has joined #dri-devel
arisu has joined #dri-devel
Andy[m]1 has joined #dri-devel
aradhya7[m] has joined #dri-devel
aura[m] has joined #dri-devel
bluepqnuin has joined #dri-devel
bubblethink[m] has joined #dri-devel
chema has joined #dri-devel
bylaws has joined #dri-devel
RAOF has joined #dri-devel
cleverca22[m] has joined #dri-devel
cmeissl[m] has joined #dri-devel
Eighth_Doctor has joined #dri-devel
cwfitzgerald[m] has joined #dri-devel
dafna33[m] has joined #dri-devel
daniliberman[m] has joined #dri-devel
dantob has joined #dri-devel
dcbaker has joined #dri-devel
DemiMarieObenour[m] has joined #dri-devel
devarsht[m] has joined #dri-devel
Anson[m] has joined #dri-devel
dhirschfeld2[m] has joined #dri-devel
Guest10825 has joined #dri-devel
doras has joined #dri-devel
danylo has joined #dri-devel
DUOLabs[m] has joined #dri-devel
EricCurtin[m] has joined #dri-devel
egalli has joined #dri-devel
ella-0[m] has joined #dri-devel
Ella[m] has joined #dri-devel
eballetbo has joined #dri-devel
enunes[m] has joined #dri-devel
AlexisHernndezGuzmn[m] has joined #dri-devel
fkassabri[m] has joined #dri-devel
FloGrauper[m] has joined #dri-devel
gallo[m] has joined #dri-devel
gdevi has joined #dri-devel
gnustomp[m] has joined #dri-devel
Guest10844 has joined #dri-devel
MotiH[m] has joined #dri-devel
Harvey[m] has joined #dri-devel
Hazematman has joined #dri-devel
hch12907 has joined #dri-devel
heftig has joined #dri-devel
zzoon[m] has joined #dri-devel
ids1024[m] has joined #dri-devel
jasuarez has joined #dri-devel
jenatali has joined #dri-devel
jluthra has joined #dri-devel
JosExpsito[m] has joined #dri-devel
K0bin[m] has joined #dri-devel
kallisti5[m] has joined #dri-devel
madhavpcm has joined #dri-devel
kelbaz[m] has joined #dri-devel
kunal10710[m] has joined #dri-devel
kunal_10185[m] has joined #dri-devel
KunalAgarwal[m][m] has joined #dri-devel
kusma has joined #dri-devel
Labnan[m] has joined #dri-devel
LaughingMan[m] has joined #dri-devel
LinuxHackerman has joined #dri-devel
m00nlit[m] has joined #dri-devel
mairacanal[m] has joined #dri-devel
MarkCollins[m] has joined #dri-devel
marmarek[m] has joined #dri-devel
martijnbraam has joined #dri-devel
masush5[m] has joined #dri-devel
MayeulC has joined #dri-devel
Mershl[m] has joined #dri-devel
michael5050[m] has joined #dri-devel
Mis012[m] has joined #dri-devel
moben[m] has joined #dri-devel
mripard has joined #dri-devel
msizanoen[m] has joined #dri-devel
Vin[m] has joined #dri-devel
naheemsays[m] has joined #dri-devel
nekit[m] has joined #dri-devel
neobrain[m] has joined #dri-devel
Newbyte has joined #dri-devel
nicofee[m] has joined #dri-devel
eyearesee has joined #dri-devel
nielsdg has joined #dri-devel
Nirvin[m] has joined #dri-devel
nyorain[m] has joined #dri-devel
ofirbitt[m] has joined #dri-devel
ohadsharabi[m] has joined #dri-devel
DavidHeidelberg[m] has joined #dri-devel
onox[m] has joined #dri-devel
pac85[m] has joined #dri-devel
PiGLDN[m] has joined #dri-devel
pmoreau has joined #dri-devel
pp[m] has joined #dri-devel
pushqrdx[m] has joined #dri-devel
q4a has joined #dri-devel
Quinten[m] has joined #dri-devel
ramacassis[m] has joined #dri-devel
ram15[m] has joined #dri-devel
reactormonk[m] has joined #dri-devel
robertmader[m] has joined #dri-devel
samueldr has joined #dri-devel
dabrain34[m] has joined #dri-devel
sewn has joined #dri-devel
shoffmeister[m] has joined #dri-devel
siddh has joined #dri-devel
sigmoidfunc[m] has joined #dri-devel
Sofi[m] has joined #dri-devel
sergi has joined #dri-devel
Sumera[m] has joined #dri-devel
swick[m] has joined #dri-devel
knr has joined #dri-devel
T_UNIX has joined #dri-devel
talcohen[m] has joined #dri-devel
tintou has joined #dri-devel
underpantsgnome[m] has joined #dri-devel
tleydxdy has joined #dri-devel
tomba has joined #dri-devel
tomeu has joined #dri-devel
Tooniis[m] has joined #dri-devel
ttayar[m] has joined #dri-devel
tuxayo has joined #dri-devel
undvasistas[m] has joined #dri-devel
Soroush has joined #dri-devel
vidal72[m] has joined #dri-devel
viciouss[m] has joined #dri-devel
MatrixTravelerbot[m]1 has joined #dri-devel
Weiss-Fder[m] has joined #dri-devel
x512[m] has joined #dri-devel
xerpi[m] has joined #dri-devel
YaLTeR[m] has joined #dri-devel
yshui` has joined #dri-devel
zamundaaa[m] has joined #dri-devel
znullptr[m] has joined #dri-devel
zzxyb[m] has joined #dri-devel
pmoreau is now known as Guest10855
<Lynne> it does, how do I analyze the captures?
kts has joined #dri-devel
godvino has joined #dri-devel
minecrell has quit [Read error: Connection timed out]
itoral has quit [Remote host closed the connection]
minecrell has joined #dri-devel
fxkamd has joined #dri-devel
godvino has quit [Quit: WeeChat 3.6]
pochu has quit [Quit: leaving]
JohnnyonFlame has quit [Ping timeout: 480 seconds]
<ishitatsuyuki> open it with https://gpuopen.com/rgp/
<ishitatsuyuki> you first identify the slow pass (in your case, there should be only a single compute pass), then go to instruction timing
<ishitatsuyuki> the numbers should give you a rough idea of "cost" of instructions
JohnnyonFlame has joined #dri-devel
devilhorns has quit []
FireBurn has quit [Quit: Konversation terminated!]
Daaanct12 has quit [Remote host closed the connection]
Daaanct12 has joined #dri-devel
<tleydxdy> when I look at all the fds opened by a vulkan game and their corresponding drm_file, some of them have the correct pid but they don't seems to be doing any work, and some have the pid of Xwayland and is doing all the work. does anyone know why that is?
<tleydxdy> I assume it is because those fds are sent over by the X server, but why is it using those to do all the rendering work?
Daaanct12 has quit [Remote host closed the connection]
Daaanct12 has joined #dri-devel
<tleydxdy> how fun
<tleydxdy> I'm mostly curious about why this pattern exist
<tleydxdy> seeing how the application opens the render node itself anyway
<danvet> tleydxdy, I thought for vk the render operations should always go through a file that's directly opened
<danvet> and just winsys might go through one opened by Xwayland (if it's DRI3 proto)
<danvet> gfxstrand, ^^ or does this work differently?
rasterman has quit [Quit: Gettin' stinky!]
<tleydxdy> yeah, if I look at vram used reported by fdinfo for example, the directly opened ones only uses 4K while the Xwayland one have >300MiB
<emersion> tleydxdy: that's how the X11 DRI3 proto was designed
<emersion> the Wayland protocol is different, for instance
<emersion> AFAIK, the X11 DRI3 protocol was designed with DRM authentication in mind, where the X11 server would send authenticated DRM FDs to clients
<emersion> before render node existed
alyssa has joined #dri-devel
<alyssa> jenatali: you've been conscripted to ack https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22123
kzd has joined #dri-devel
<alyssa> I think
<alyssa> Strictly I think maybe I could get away with xfb_size if I make my dispatch more complicated...
yuq8251 has left #dri-devel [#dri-devel]
<tleydxdy> emersion I see, so if the underlying wsi for vulkan is not X11 DRI3 then the pattern should not be seen?
<emersion> on Wayland, all DRM FDs should be opened by the client *
<emersion> ( * except DRM leasing stuff)
<emersion> (but that's only for VR apps, and the DRM FD send by the compositor is read-only before the DRM lease is accepted by the compositor)
<tleydxdy> got it
<jenatali> alyssa: Ack, but without seeing how it's used it's hard for me to really get why it's needed
<jenatali> XFB is one of those things that's magic from my POV, I don't know any of the implementation details
<danvet> emersion, yeah but I thought for vk you still open the render node
<danvet> since winsys is only set up later
<emersion> what do you mean?
<danvet> emersion, like you can set up your entire render pipeline and allocate all the buffers without winsys connection
<emersion> maybe Mesa will open render nodes, but these won't be coming from the compositor
<danvet> and so no way to get the DRI3 fd
<danvet> and only when you set up winsys will that part happen
<danvet> so I kinda expected that the render nodes will have most buffers and the winsys one opened by xwayland just winsys buffers
<emersion> sure. the question was about FDs coming from Xwayland though
<emersion> ah
<danvet> but it seems to be the other way round
<danvet> per tleydxdy at least
<emersion> maybe the swapchain buffers are allocated via Xwayland's FD\
<danvet> for glx/egl it'll all be on the xwayland fd
<emersion> the swapchain is tied to WSI
<danvet> 300mb swapchain seems a bit much :-)
<tleydxdy> yeah, also all the gfx engine time is logged on the xwayland fd
<tleydxdy> so it's also doing cs_ioctl on that
<emersion> that's weird
sgruszka has quit [Remote host closed the connection]
<tleydxdy> the game is unity, fwiw, and as far as I can tell it's not doing anything special
kts has quit [Quit: Leaving]
cheako has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
<gfxstrand> danvet: That's how things work initially, yes. I think some drivers are trying to get a master and use that if they can these days.
<gfxstrand> They shouldn't be getting it from the winsys, though. That'd be weird.
<danvet> gfxstrand, well the master you only need for display, and for that you pretty much have to lease it
<danvet> unless bare metal winsys
<danvet> no one else should be master than the current compositor
<alyssa> jenatali: purely software xfb implementation, see linked MR :~)
<jenatali> Ah I missed that link
<alyssa> ~~just you wait for geometry shaders~~
<jenatali> Got it, so you run an additional VS with rast discard and just augment the VS to write out the xfb data?
<alyssa> Yep
sebastien has joined #dri-devel
<alyssa> I mean, that's conceptually how it works for GLES3.0 level transform feedback
<alyssa> and that's what panfrost does
<alyssa> all the real fun comes in when you want the full big GL thnig
sebastien is now known as Guest10882
<jenatali> Hm. Can't you mix VS+XFB+rast?
<alyssa> indexed draws, triangle strips, all that fun stuff
<jenatali> I haven't looked at the ES limitations for XFB so I dunno
<alyssa> GLES3.0 is just glDrawArrays() with POINTS/LINES/STRIPS
<alyssa> 1 vertex in, 1 vertex our
<alyssa> *out
<alyssa> which is all panfrost does (and hence panfrost fails a bunch of piglits for xfb)
<MrCooper> emersion tleydxdy: there are pending kernel patches which will correctly attribute DRM FDs passed from an X server to a DRI3 client to the latter
<alyssa> s/STRIPS/TRIANGLES/
Guest10882 has quit []
<alyssa> for full GL there's all sorts of batshit interactions allowed, e.g. indirect indexed draw + primitive restart + TRIANGLE_STRIPS + XFB
<alyssa> how is that supposed to work? don't even worry about it ;-)
<emersion> MrCooper: my preference would've been to fix the X11 protocol, instead of fixing the kernel…
<alyssa> spec has a really inane requirement that you can draw strips/loops/fans but they need to streamed out like plain lines/triangles
siddh has quit [Quit: Reconnecting]
siddh has joined #dri-devel
<alyssa> (e.g. drawing 4 vertices with TRIANGLE_STRIPS would emit 6 vertices for streamout, duplicating the shared edge)
<alyssa> in that case the linked MR does the stupid simple approach of invoking the transform feedback shader 6 times (instead of 4) and doing some arithmetic to work out which vertex should be processed in a given invocation
<alyssa> this is suboptimal but hopefully nothing ever hits this other than piglit
<alyssa> (..hopefully)
siddh has quit []
siddh has joined #dri-devel
jewins has joined #dri-devel
fxkamd has quit []
Duke`` has joined #dri-devel
<jenatali> alyssa: Right that all makes sense. But can you not mix XFB+rast?
<jenatali> Or alternatively, does GLES3 not allow SSBOs/atomics in VS?
<alyssa> VS side effects are optional in all the khronos apis
<alyssa> for mali, works on older hw but not newer ones due to arm's questionable interpretations of the spec
<alyssa> for agx, IDK, haven't tried, Metal doesn't allow it and I don't know what's happening internally
<alyssa> (VS side effects are unpredictable on tilers in general, the spec language is very 'forgiving' here)
<alyssa> wouldn't help a ton in every case, consider e.g. GL_TRIANGLE_FANS with 1000 triangles drawn
<alyssa> vertex 0 needs to be written out 1000 times
<alyssa> all other vertices are written out just once
<alyssa> re side effects being unpredictable, I *think* this means the decoupled approach is kosher even if we allow vertex shader side effects
<alyssa> but I'd need to spec lawyer to find out
<karolherbst> alyssa: are SSBO writes a thing in vertex shaders?
<alyssa> 15:23 < alyssa> VS side effects are optional in all the khronos apis
<karolherbst> right.. was more like about is it a thing in your driver/hardware
<alyssa> 15:23 < alyssa> for mali, works on older hw but not newer ones due to arm's questionable interpretations of the spec
<alyssa> 15:24 < alyssa> for agx, IDK, haven't tried, Metal doesn't allow it and I don't know what's happening internally
<alyssa> 15:24 < alyssa> (VS side effects are unpredictable on tilers in general, the spec language is very 'forgiving' here)
alyssa has left #dri-devel [#dri-devel]
<daniels> 'newer ones' being anything with IDVS?
<jenatali> I was very surprised when I learned that Vulkan not only allows side effects, but also wave ops and even quad ops in VS. Like wtf is the meaning of a quad of vertex invocations?
<jenatali> FWIW D3D does wave ops, but not quads
<gfxstrand> jenatali: Well, when you render with GL_QUADS...
* gfxstrand shows herself out
<jenatali> Which Vulkan doesn't have, right?
<jenatali> ... right?
<gfxstrand> There was a quads extension but we killed it. :)
<gfxstrand> Also, quad lane groups in a VS have nothing whatsoever to do with GL_QUAD. I was just making dumb jokes.
<gfxstrand> They're literally just groups of 4 lanes which you can do stuff on.
<jenatali> Yeah I know, I'm just also confirming :)
<gfxstrand> They do make sense with certain CS patterns you can do with the NV derivatives extension, though.
<jenatali> I guess I could lower quad ops to plain wave ops and support them in VS
<jenatali> Yeah D3D defines quad ops + derivatives in CS
dsrt^ has joined #dri-devel
<gfxstrand> Yeah, that's really all they are
<gfxstrand> In fact, I think we have NIR lowering for it
<gfxstrand> Yup. lower_quad
<mareko> quad ops work in VS if num_patches == 4 or 2 and TCS is present
<mareko> I mean num_input_cp
<jenatali> Oh cool, I should just run that on non-CS/FS and then I can support quad ops everywhere
djbw has joined #dri-devel
<macromorgan> so question... I'm trying to troubleshoot a problem that happens only on suspend and shutdown of regulators unbalanced disables.
MajorBiscuit has joined #dri-devel
<macromorgan> As best I can tell when I try to shut down a panel mipi_dsi_drv_shutdown is getting called which runs the panel_nv3051d_shutdown function which calls drm_panel_unprepare which calls panel_nv3051d_unprepare. Then, I also see panel_bridge_post_disable is calling panel_nv3051d_unprepare
<macromorgan> should there not be a shutdown function for the panel?
MajorBiscuit has quit []
MajorBiscuit has joined #dri-devel
<macromorgan> a lot of panels have a "prepared" or "enabled" flag, but when I was upstreaming the driver I was told not to do that
<jenatali> Has the branchpoint happened?
Leopold has joined #dri-devel
rasterman has joined #dri-devel
MajorBiscuit has quit [Ping timeout: 480 seconds]
<jenatali> Oh yep, there it is. Would be nice to have a dedicated label for the post-branch MR that bumps the version for the next release. eric_engestrom
<jenatali> I'd subscribe to that
Leopold__ has quit [Ping timeout: 480 seconds]
swalker__ has quit [Remote host closed the connection]
heat has joined #dri-devel
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
bmodem has quit [Ping timeout: 480 seconds]
iive has joined #dri-devel
tursulin has quit [Ping timeout: 480 seconds]
stuarts has joined #dri-devel
vliaskov__ has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
kts has quit [Remote host closed the connection]
kts has joined #dri-devel
frieder has quit [Ping timeout: 480 seconds]
ngcortes has joined #dri-devel
jkrzyszt has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #dri-devel
lynxeye has quit [Quit: Leaving.]
jeeeun841351 has joined #dri-devel
jeeeun84135 has quit [Ping timeout: 480 seconds]
<eric_engestrom> jenatali: I've created the label ~mesa-release and I'll write up some doc in a bit, and hopefully we (dcbaker and I) won't forget to use it too often :]
<dcbaker> eric_engestrom: thanks for doing that
<jenatali> Thanks!
vliaskov__ has joined #dri-devel
MajorBiscuit has quit [Quit: WeeChat 3.6]
stuarts has quit [Remote host closed the connection]
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
JohnnyonFlame has joined #dri-devel
<karolherbst> dcbaker: some hackish bindgen_version thing: https://github.com/mesonbuild/meson/pull/11679
jaganteki has joined #dri-devel
vliaskov__ has quit [Ping timeout: 480 seconds]
<dcbaker> karolherbst: I left you a few comments, it's really annoying that they treat the command line as always up for change
<karolherbst> yeah...
<karolherbst> get_version seems to work alright, nice
<dcbaker> sweet
<karolherbst> it's a bit annoying that rust.bindgen is already taken otherwise it could be some higher level struct and rust.bindgen => rust.bindgen.generate and we could just add rust.bindgen.version_compare().... but oh well...
<dcbaker> Yeah. A long time ago I'd written a find_program() cache, which would have made this a bit simpler since you could have done something like find_program('bindgen').get_version().version_compare(...) and since all calls to find_program would use the same cache the lookup would be effectively free and we could just recommend that
<dcbaker> unfortunately I never got it working quite right
<karolherbst> yeah.. but that's also kinda annoying
<karolherbst> I'd kinda prefer wrapping those things so it's always in control of meson
mbrost has joined #dri-devel
Haaninjo has joined #dri-devel
Leopold has quit []
<karolherbst> dcbaker: anyway... would be cool to get my rust stuff resolved for 1.2 so I only have to bump the version once :D
<karolherbst> do I need to add any kwarg stuff?
Leopold_ has joined #dri-devel
<dj-death> oh noes, gitlab 504
JohnnyonFlame has quit [Ping timeout: 480 seconds]
stuarts has joined #dri-devel
<dcbaker> karolherbst: Yeah, I left you a comment, otherwise I think that looks good
gouchi has joined #dri-devel
<anholt> starting on the 1.3.5.1 CTS update (with a couple extra bugfixes pulled in)
rasterman has quit [Quit: Gettin' stinky!]
<karolherbst> now I need somebody else or me to figure out that isystem stuff ...
ngcortes has quit [Ping timeout: 480 seconds]
prahladk has joined #dri-devel
kts has quit [Quit: Leaving]
alyssa has joined #dri-devel
<alyssa> daniels: yeah, Arm's implementation of IDVS is "creative"
<alyssa> gfxstrand: lol at VK_QUADS ops
mbrost has quit [Remote host closed the connection]
prahladk has quit []
FloGrauper[m] has quit []
madhavpcm has quit []
MatrixTravelerbot[m]1 has quit []
tuxayo has quit []
LaughingMan[m] has quit []
cleverca22[m] has quit []
tintou has quit []
bluepqnuin has quit []
Celmor[m] has quit []
Harvey[m] has quit []
Guest10825 has quit []
arisu has quit []
zzxyb[m] has quit []
chema has quit []
hch12907 has quit []
cmeissl[m] has quit []
neobrain[m] has quit []
Andy[m]1 has quit []
kallisti5[m] has quit []
YuGiOhJCJ has joined #dri-devel
Ella[m] has quit []
bylaws has quit []
ngcortes has joined #dri-devel
danvet has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
ngcortes has quit [Remote host closed the connection]
ngcortes has joined #dri-devel
<karolherbst> quads? reasonable primitives
<alyssa> * Catmull-Clark has entered the chat
<robclark> alyssa: idvs sounds _kinda_ like qcom's VS vs binning shader (except that adreno VS also calcs position/psize)
<alyssa> robclark: same idea, yeah
<alyssa> the problem isn't the concept, it's an implementation detail :~)
<robclark> hmm, ok
gouchi has quit [Quit: Quitte]
Leopold_ has quit []
Leopold_ has joined #dri-devel
JohnnyonFlame has joined #dri-devel
bluetail42 has joined #dri-devel
bluetail4 has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
fxkamd has joined #dri-devel
<Kayden> there's some mention of nir_register in src/freedreno/ir3/* still, is this dead code now that the backend is SSA?
<anholt> Kayden: indirect temps are still registers
<Kayden> oh, interesting, ok
<gfxstrand> We should convert ir3 to load/store_scratch
<gfxstrand> Unless you really can indirect on the register file and really want to be doing that.
<karolherbst> anholt: or scratch mem if lowered to it
<Kayden> was just a little surprised to see it there still
<Kayden> wasn't sure if it was leftover or still used :)
<karolherbst> I don't know if I or somebody else ported codegen to scratch, but I think it was done...
<karolherbst> ahh nope
<gfxstrand> Kayden: I mean, the Intel vec4 back-end still uses it last I checked... 😭
<karolherbst> or was it...
<karolherbst> mhhh
<Kayden> gfxstrand: not surprised to see it in general, just in ir3 :)
<karolherbst> what was the pass again to lower to scratch?
<gfxstrand> Kayden: Ues, but we should kill NIR register indirects in general.
<Kayden> ah.
<Kayden> yeah, probably
<gfxstrand> I suppose I do have a haswell sitting in the corner over there.
<anholt> Kayden: register indirects turn into register array accesses. large temps get turned into scratch.
<karolherbst> I'm sure it's almost trivial to remove `nir_register` in codegen as it already supports scratch memory
<gfxstrand> NAK won't support it
<karolherbst> yeah, no point in supporting it on nv hw
<gfxstrand> There's no point on Intel, either. They go to scratch in the vec4 back-end it's just that the back-end code to do that has been around for a long time and no one has bothered to clean it up.
<gfxstrand> Technically, Intel can do indirect reads
<gfxstrand> And indirect stores if the indirect is uniform and the stars align.
<alyssa> I don't have a great plan for ir3 nir_register use.
pcercuei has quit [Quit: dodo]
elongbug has quit [Read error: Connection reset by peer]
iive has quit [Quit: They came for me...]
<karolherbst> anybody here every played around with onednn? I kinda want to know what I need to actually use it
<karolherbst> bonus point: make it non painful
bluebugs has joined #dri-devel
<gfxstrand> That sounds like an Intel invention
<karolherbst> it is
<gfxstrand> Of course...
<gfxstrand> They have to have One of everything...
<karolherbst> but apparently it has a CL backend
<karolherbst> and it can run pytorch
<karolherbst> and stuff
<karolherbst> dunno
<karolherbst> just want to see what CL extensions it needs
<karolherbst> but I think it's INTEL_subgroup and INTEL_UVM
jewins1 has joined #dri-devel
pzanoni` has joined #dri-devel
jssummer has joined #dri-devel
mattrope_ has joined #dri-devel
djbw_ has joined #dri-devel
jhli_ has quit []
jhli has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
pzanoni has quit [Ping timeout: 480 seconds]
stuarts has quit [Ping timeout: 480 seconds]
djbw has quit [Ping timeout: 480 seconds]
mattrope has quit [Ping timeout: 480 seconds]
jssummers has joined #dri-devel
pzanoni has joined #dri-devel
mattrope has joined #dri-devel
djbw__ has joined #dri-devel
<karolherbst> ehhh..
<karolherbst> why is oneDNN checking if the platform name is "Intel" 🙃
jssummer has quit [Ping timeout: 480 seconds]
pzanoni` has quit [Ping timeout: 480 seconds]
mattrope_ has quit [Ping timeout: 480 seconds]
jewins1 has quit [Ping timeout: 480 seconds]
djbw_ has quit [Ping timeout: 480 seconds]
<karolherbst> they even have a vendor id check 🙃
<psykose> because it was made by intel
<karolherbst> you know that this won't stop me!
<karolherbst> (but rusticl not having all the required features will! 🙃)
<gfxstrand> What features are you missing? Intel subgroups you should be able to pretty much just turn on
<gfxstrand> Even on non-intel
<karolherbst> I don't know yet
<karolherbst> the stuff just faults randomly
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
<karolherbst> but yeah.. maybe I check if there are any tests for that extension and just support it
TMM has joined #dri-devel
<karolherbst> the README mentions subgroups and UVM
<karolherbst> the CTS has tests for cl_khr_subgroups
<karolherbst> but there is of course cl_intel_subgroups
<karolherbst> and I don't even know if that's upstream llvm
<karolherbst> at least now a days those things are open source
<karolherbst> ahh yes
<karolherbst> another "is intel platform" check 🙃
alyssa has left #dri-devel [#dri-devel]
Azirino has joined #dri-devel
<karolherbst> uhhhhh
<karolherbst> :pain:
<karolherbst> they are doing unholy things
Haaninjo has quit [Quit: Ex-Chat]
<karolherbst> they fetch the binary of the compiled CL program and check if it's some ELF thing do to more cursed stuff
<karolherbst> welll....
<karolherbst> I guess that's a WONTFIX then
<gfxstrand> ugh
<gfxstrand> Am I surprised? No.
<karolherbst> the file patch gave me the rest: "/home/kherbst/git/oneDNN/src/gpu/jit/ngen/ngen_elf.hpp"
<karolherbst> anyway...
<karolherbst> they do seem to support other vendors via SyCL, but.. uhhh
<karolherbst> why...
<gfxstrand> It's a One* product
<gfxstrand> The open-source is a sham. It exists for vendor lock-in.
<karolherbst> I mean.. yes
<anholt> mupuf: are you intentially rebooting the DUT after a GPU hang? I'm not getting any info on the test that hung, so I can't mark it a skip.
<zmike> I think he's still afk another week