#dri-devel on 2023-04-13 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:05 Daaanct12 has quit [Quit: Quitting]

00:05 Danct12 has joined #dri-devel

00:08 heat_ has quit [Ping timeout: 480 seconds]

00:29 alanc has quit [Remote host closed the connection]

00:29 alanc has joined #dri-devel

00:36 yuq825 has joined #dri-devel

00:37 Haaninjo has quit [Quit: Ex-Chat]

00:41 <ishitatsuyuki> Lynne, I probably need to understand what kind of structure your algorithm is. Can you write out it in pseudocode? e.g. Step 1: Do prefix sum for x[k][i]..x[k][j] for all k

00:52 columbarius has joined #dri-devel

00:53 co1umbarius has quit [Ping timeout: 480 seconds]

01:01 <Lynne> ishitatsuyuki: the opencl version is easier to read - https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/opencl/nlmeans.cl

01:02 <Lynne> step 1 is load all values needed from the source images and compute the s1 - s2 diff

01:02 <Lynne> store that per-pixel vector in the integral image

01:02 <Lynne> then compute the horizontal prefix sum, followed by the vertical prefix sum

01:03 <Lynne> which gives you the integral image

01:04 <Lynne> finally, get the a, b, c, d (a rectangle) vector values from the integral for each individual block at a given offset, compute the weight, and add it in the weights array

01:04 <Lynne> opencl does it by having a separate pass for each step, and when I tried to do that, the performance was 20 _times_ slower than opencl than vulkan, because of all the pipeline barriers

01:05 <Lynne> even though I was using a prefix sum algorithm which is around 100 times faster than the naive sum that opencl uses

01:05 <Lynne> so I decided to merge the horizontal, vertical and weights passes into a single shader, to eliminate pretty much all of the pipeline barriers

01:06 Leopold__ has joined #dri-devel

01:07 <Lynne> merging the horizontal and vertical passes works fine, but merging the weights pass causes any integral image loads after height amount of _horizontal_ pixels to give old, pre-vertical pass data

01:10 <Lynne> this is the vulkan port of the code - https://github.com/cyanreg/FFmpeg/blob/vulkan/libavfilter/vf_nlmeans_vulkan.c

01:13 Leopold_ has quit [Ping timeout: 480 seconds]

01:16 Zopolis4_ has quit []

01:39 jdavies has joined #dri-devel

01:39 jdavies is now known as Guest10794

01:39 <ishitatsuyuki> this does sound like a use case requiring pipeline barrier unfortunately

01:40 <ishitatsuyuki> because you need to wait for the entire prefix sum to finish before proceeding to the next stage

01:42 <ishitatsuyuki> there is https://w3.impa.br/~diego/projects/NehEtAl11/, which gives you reduced bandwidth for integral image by keeping the horizontal-only prefix sum in shared memory

01:42 <ishitatsuyuki> but it's significantly more complicated to implement

01:46 <ishitatsuyuki> another approach is to do decoupled lookback but with 2D indices, where you calculate in the order of block (0,0), (1,0),(0,1),(2,0),(1,1),(0,2),... which will give you the block on top and left on each step

01:47 <ishitatsuyuki> at the end of day you will still need a barrier between generating the integral image and weight computation

01:47 Guest10794 has quit [Ping timeout: 480 seconds]

01:47 <Lynne> why there and not between the horizontal and vertical prefix sums too?

01:48 <ishitatsuyuki> for you current approach you need barrier there too, but I described alternative algorithms that doesn't require synchronization and even memory store between hori/vert prefix sums

01:50 <Lynne> why need a pipeline barrier at all?

01:50 <ishitatsuyuki> because computing vertical prefix sum requires the entire horizontal prefix sum to finish?

01:51 <Lynne> why is barrier() not enough?

01:51 <ishitatsuyuki> barrier only synchronize within the workgroup

01:51 <Lynne> we only have a single workgroup

01:51 <ishitatsuyuki> code says s->pl_int_hor.wg_size[2]

01:51 <Lynne> always set to 1

01:52 <Lynne> just copypaste from other code, I'll remove it to make it clearer

01:52 <ishitatsuyuki> a single workgroup does sound like why it's slow though

01:52 <ishitatsuyuki> a GPU has many WGPs (in amd terms) and a workgroup runs within a single WGP

01:52 ngcortes_ has joined #dri-devel

01:53 <Lynne> it's a max-sized workgroup, 1024, with each invocation handling 4 pixels at a time for a 4k image

01:54 <ishitatsuyuki> for a single workgroup, it should work if you put controlBarrier(wg,wg,buf,acqrel | makeavail | makevisible) between each pass

01:59 <Lynne> nope, doesn't, even if I splat it before each prefix_sum call

01:59 <Lynne> on neither nvidia nor radv

01:59 ngcortes has quit [Ping timeout: 480 seconds]

02:05 <Lynne> btw, each dispatch handles 4 displancements (xoffs/yoffs) at a time, and for a research radius of 3, there are 36 dispatches ((2_r)_(2*r) - 1)/4) that have to be done

02:05 <Lynne> so we still do multiple wgps, it's just that each dispatch handles one

02:06 <ishitatsuyuki> ok, utilization should be fine with that

02:06 <Lynne> the default research radius is 15 btw so that's 900ish dispatches, hence you can see why barriers kill performance -_-

02:07 <Lynne> needing to do 3 passes would result in 3x the number of dispatches and barriers

02:07 <ishitatsuyuki> use the meme split barrier feature that no one use? :P (vkCmdSetEvent)

02:08 <Lynne> that would still need a memory barrier, though, wouldn't it?

02:08 <ishitatsuyuki> consider it an async barrier

02:09 <Lynne> I'll leave it as a challenge for someone to do better than my current approach

02:09 <ishitatsuyuki> yeah fair

02:09 jewins has quit [Ping timeout: 480 seconds]

02:10 <ishitatsuyuki> back to debugging, did you try putting the barrier right after prefix sum as well?

02:11 <Lynne> yup, splatted it everywhere, the integral image buffer has coherent flags too

02:13 <ishitatsuyuki> i'm afraid i'm out of ideas again

02:16 <Lynne> oh, hey, maybe I could test with llvmpipe

02:27 <Lynne> well, that's weird, the part which is broken on a GPU is also broken on lavapipe

02:27 <Lynne> but the part which is fine on a gpu is pure black on lavapipe

02:29 <Lynne> limiting lavapipe to a single thread doesn't help either

02:29 <Lynne> do none of the thousand of synchronization options actually do anything in vulkan?

02:29 <Lynne> I know a lot of them are there just to satisfy some alien hardware's synchronization requirements, but still

02:31 <HdkR> Gitlab having some performance issues right now?

02:32 <HdkR> Managing to clone at a blazing 38KB/s

02:33 Daanct12 has joined #dri-devel

02:33 aravind has joined #dri-devel

02:35 Daaanct12 has joined #dri-devel

02:37 bmodem has joined #dri-devel

02:39 Danct12 has quit [Ping timeout: 480 seconds]

02:41 Daanct12 has quit [Ping timeout: 480 seconds]

02:57 <Lynne> anyone with any ideas or willing to run my code?

02:57 <Lynne> it's the last roadblock to merging the entire video patchset in ffmpeg

03:00 <zmike> what "synchronization options" are you referring to

03:10 zf_ has joined #dri-devel

03:10 zf has quit [Read error: Connection reset by peer]

03:11 fxkamd has quit []

03:12 <Lynne> barrier(); memoryBarrier(); controlBarrier();

03:13 <Lynne> I still don't understand how this can fail so consistently on all hardware

03:21 <zmike> sounds like you're using it wrong if it's broken everywhere

03:28 <Lynne> I simplified the issue down to load the same value in all invocations, and put it on a pixel

03:29 <Lynne> and I got different values after height amount of invocations*rows

03:34 Company has quit [Quit: Leaving]

03:55 <Lynne> simplified it as much as possible - https://paste.debian.net/1277180/

03:56 <Lynne> load integral_img with vec4(1), do a vertical prefix, load the vec4 at 300,300, write the .x to all of weights[]

03:58 <Lynne> err, had a typo, https://paste.debian.net/1277181/

03:59 <Lynne> making the if on line 252 "if ((gl_GlobalInvocationID.x * 4) < height[0]) {" to always true seems to fix the issue

04:01 kts has joined #dri-devel

04:07 krushia has quit [Ping timeout: 480 seconds]

04:18 ngcortes_ has quit [Read error: Connection reset by peer]

04:38 Danct12 has joined #dri-devel

04:42 Danct12 has quit []

04:42 Danct12 has joined #dri-devel

04:43 Danct12 has quit []

04:43 Danct12 has joined #dri-devel

04:53 bgs has joined #dri-devel

05:19 kzd has quit [Ping timeout: 480 seconds]

05:25 khfeng has joined #dri-devel

05:49 apinheiro has quit [Quit: Leaving]

05:59 bgs has quit [Remote host closed the connection]

06:02 itoral has joined #dri-devel

06:25 pochu has quit [Ping timeout: 480 seconds]

06:29 camus1 has joined #dri-devel

06:29 camus has quit [Read error: Connection reset by peer]

06:34 pochu has joined #dri-devel

06:40 macromorgan has quit [Ping timeout: 480 seconds]

06:43 sgruszka has joined #dri-devel

06:46 frieder has joined #dri-devel

06:46 danvet has joined #dri-devel

06:48 macromorgan has joined #dri-devel

06:55 vliaskov_ has joined #dri-devel

06:57 rasterman has joined #dri-devel

06:58 sghuge has quit [Remote host closed the connection]

06:58 sghuge has joined #dri-devel

07:02 vliaskov__ has joined #dri-devel

07:04 MajorBiscuit has joined #dri-devel

07:08 vliaskov_ has quit [Ping timeout: 480 seconds]

07:23 <ishitatsuyuki> oh yeah, control barrier inside control flow is undefined behavior

07:24 <ishitatsuyuki> common practice is to hoist barrier/controlBarrier() outside if condition

07:25 tursulin has joined #dri-devel

07:33 karolherbst has quit [Read error: Connection reset by peer]

07:33 karolherbst has joined #dri-devel

07:33 pcercuei has joined #dri-devel

07:34 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

07:34 TMM has joined #dri-devel

07:35 kts has quit [Quit: Konversation terminated!]

07:39 yuq8251 has joined #dri-devel

07:40 MajorBiscuit has quit [Quit: WeeChat 3.6]

07:45 yuq825 has quit [Ping timeout: 480 seconds]

07:48 lynxeye has joined #dri-devel

08:03 jkrzyszt has joined #dri-devel

08:08 <Lynne> more gotchas than... I have no idea what, there's just too many

08:11 Haaninjo has joined #dri-devel

08:13 Danct12 has quit [Ping timeout: 480 seconds]

08:18 kugel_ has joined #dri-devel

08:20 sarahwalker has joined #dri-devel

08:21 swalker_ has joined #dri-devel

08:22 swalker_ is now known as Guest10811

08:23 kugel has quit [Ping timeout: 480 seconds]

08:28 sarahwalker has quit [Ping timeout: 480 seconds]

08:40 kugel_ is now known as kugel

08:48 Danct12 has joined #dri-devel

09:10 Haaninjo has quit [Quit: Ex-Chat]

09:12 paulk-bis has quit []

09:12 paulk has joined #dri-devel

09:17 Danct12 has quit [Quit: WeeChat 3.8]

09:18 swalker__ has joined #dri-devel

09:23 Guest10811 has quit [Remote host closed the connection]

09:24 swalker_ has joined #dri-devel

09:25 swalker_ is now known as Guest10814

09:30 swalker__ has quit [Ping timeout: 480 seconds]

09:31 swalker__ has joined #dri-devel

09:35 Guest10814 has quit [Remote host closed the connection]

10:02 sarahwalker has joined #dri-devel

10:04 Zopolis4_ has joined #dri-devel

10:07 swalker__ has quit [Ping timeout: 480 seconds]

10:08 ajhalaney[m] has quit [Quit: Bridge terminating on SIGTERM]

10:08 hch12907 has quit [Quit: Bridge terminating on SIGTERM]

10:08 bylaws has quit [Quit: Bridge terminating on SIGTERM]

10:08 DrNick has quit []

10:08 x512[m] has quit []

10:08 LinuxHackerman has quit [Quit: Bridge terminating on SIGTERM]

10:08 Wallbraker has quit [Quit: Bridge terminating on SIGTERM]

10:08 danylo has quit [Quit: Bridge terminating on SIGTERM]

10:08 vidal72[m] has quit []

10:08 martijnbraam has quit [Quit: Bridge terminating on SIGTERM]

10:08 Guest8249 has quit []

10:08 Newbyte has quit []

10:08 cmeissl[m] has quit [Quit: Bridge terminating on SIGTERM]

10:08 EricCurtin[m] has quit []

10:08 Mis012[m] has quit []

10:08 Hazematman has quit [Quit: Bridge terminating on SIGTERM]

10:08 jasuarez has quit [Quit: Bridge terminating on SIGTERM]

10:08 K0bin[m] has quit []

10:08 xerpi[m] has quit []

10:08 RAOF has quit [Quit: Bridge terminating on SIGTERM]

10:08 Vin[m] has quit []

10:08 ofirbitt[m] has quit []

10:08 egalli has quit [Quit: Bridge terminating on SIGTERM]

10:08 bubblethink[m] has quit []

10:08 nicofee[m] has quit []

10:08 ella-0[m] has quit []

10:08 ttayar[m] has quit []

10:08 jluthra has quit []

10:08 Sumera[m] has quit []

10:08 dabrain34[m] has quit []

10:08 jenatali has quit [Quit: Bridge terminating on SIGTERM]

10:08 AlexisHernndezGuzmn[m] has quit []

10:08 nyorain[m] has quit []

10:08 robertmader[m] has quit []

10:08 Guest7817 has quit []

10:08 naheemsays[m] has quit []

10:08 onox[m] has quit []

10:08 gallo[m] has quit []

10:08 talcohen[m] has quit []

10:08 Quinten[m] has quit []

10:08 aradhya7[m] has quit []

10:08 znullptr[m] has quit []

10:08 zzoon[m] has quit [Quit: Bridge terminating on SIGTERM]

10:08 KunalAgarwal[m][m] has quit []

10:08 Tooniis[m] has quit []

10:08 reactormonk[m] has quit []

10:08 moben[m] has quit []

10:08 doras has quit [Quit: Bridge terminating on SIGTERM]

10:08 DavidHeidelberg[m] has quit [Quit: Bridge terminating on SIGTERM]

10:08 enunes[m] has quit [Quit: Bridge terminating on SIGTERM]

10:08 zzxyb[m] has quit []

10:08 Eighth_Doctor has quit []

10:08 T_UNIX has quit []

10:08 ohadsharabi[m] has quit []

10:08 swick[m] has quit [Quit: Bridge terminating on SIGTERM]

10:08 yshui` has quit [Quit: Bridge terminating on SIGTERM]

10:08 MotiH[m] has quit []

10:08 viciouss[m] has quit []

10:08 kunal_10185[m] has quit []

10:08 sigmoidfunc[m] has quit []

10:08 dcbaker has quit [Quit: Bridge terminating on SIGTERM]

10:08 DUOLabs[m] has quit [Quit: Bridge terminating on SIGTERM]

10:08 gnustomp[m] has quit []

10:08 YaLTeR[m] has quit [Write error: connection closed]

10:08 heftig has quit [Write error: connection closed]

10:08 masush5[m] has quit [Write error: connection closed]

10:08 shoffmeister[m] has quit [Write error: connection closed]

10:08 eyearesee has quit [Write error: connection closed]

10:08 DemiMarie has quit [Write error: connection closed]

10:08 ids1024[m] has quit [Write error: connection closed]

10:08 sewn has quit [Write error: connection closed]

10:08 samueldr has quit [Write error: connection closed]

10:08 pushqrdx[m] has quit [Write error: connection closed]

10:08 devarsht[m] has quit [Write error: connection closed]

10:08 tleydxdy has quit [Write error: connection closed]

10:08 pp[m] has quit [Write error: connection closed]

10:08 q4a has quit [Write error: connection closed]

10:08 JosExpsito[m] has quit [Write error: connection closed]

10:08 fkassabri[m] has quit [Write error: connection closed]

10:08 aura[m] has quit [Write error: connection closed]

10:08 madhavpcm has quit [Write error: connection closed]

10:08 tomeu has quit [Write error: connection closed]

10:08 Anson[m] has quit [Write error: connection closed]

10:08 dantob has quit [Write error: connection closed]

10:08 daniliberman[m] has quit [Write error: connection closed]

10:08 cwfitzgerald[m] has quit [Write error: connection closed]

10:08 gdevi has quit [Write error: connection closed]

10:08 dhirschfeld2[m] has quit [Write error: connection closed]

10:08 zamundaaa[m] has quit [Write error: connection closed]

10:08 kusma has quit [Write error: connection closed]

10:08 ramprakash[m] has quit [Write error: connection closed]

10:08 undvasistas[m] has quit [Write error: connection closed]

10:08 Mershl[m] has quit [Write error: connection closed]

10:08 Nirvin[m] has quit [Write error: connection closed]

10:08 mripard has quit [Write error: connection closed]

10:08 kunal10710[m] has quit [Write error: connection closed]

10:08 knr has quit [Write error: connection closed]

10:08 kelbaz[m] has quit [Write error: connection closed]

10:08 nekit[m] has quit []

10:10 JPEW has quit [Read error: Connection reset by peer]

10:10 MTCoster has quit [Remote host closed the connection]

10:10 jbarnes has quit [Remote host closed the connection]

10:10 MTCoster has joined #dri-devel

10:12 jbarnes has joined #dri-devel

10:12 JPEW has joined #dri-devel

10:16 Celmor[m] has joined #dri-devel

10:23 iive has joined #dri-devel

10:25 devilhorns has joined #dri-devel

10:26 <Lynne> finally finished it, code is here - https://github.com/cyanreg/FFmpeg/blob/vulkan/libavfilter/vf_nlmeans_vulkan.c

10:27 <Lynne> still 3x slower than opencl, I leave it as a challenge for anyone to make it faster -_-

10:27 <Lynne> ishitatsuyuki: could you take a look at it, just in case I missed something?

10:30 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

10:33 swalker_ has joined #dri-devel

10:33 swalker_ is now known as Guest10834

10:33 cheako has quit [Quit: Connection closed for inactivity]

10:34 sarahwalker has quit [Read error: Connection reset by peer]

10:34 <ishitatsuyuki> it's hard to guess about performance unless you're obviously underutilizing

10:35 swalker__ has joined #dri-devel

10:35 Guest10834 has quit [Remote host closed the connection]

10:35 <ishitatsuyuki> radv's profiler support requires the app to be presenting, but maybe nvidia has better tools on that side

10:37 <ishitatsuyuki> looks like you still have barrier in CF though https://github.com/cyanreg/FFmpeg/blob/aeff8ad1de646f501a7d8d8b769b5533bb4ff08b/libavfilter/vf_nlmeans_vulkan.c#L96

10:49 dviola has left #dri-devel [#dri-devel]

10:56 iive has quit [Quit: They came for me...]

10:56 djbw has quit [Read error: Connection reset by peer]

10:58 smiles_ has joined #dri-devel

11:04 smilessh has quit [Ping timeout: 480 seconds]

11:15 <Lynne> ishitatsuyuki: if (first) is C, though

11:15 <ishitatsuyuki> ah ok, nevermind

11:16 <ishitatsuyuki> the practice of mixing glsl in C is making me uncomfortable :/

11:16 <Lynne> with rdna3, which has atomic float ops, barrier overhead is pretty much zero

11:16 <Lynne> it's also not 3x, but 2x slower than opencl

11:17 <ishitatsuyuki> well that's a huge red flag

11:17 <ishitatsuyuki> don't use float atomics

11:17 <Lynne> removing the prefix sums boosts fps to 20, which sound still quite low to me

11:17 <Lynne> it's either barriers, or atomic float adds :/

11:17 <ishitatsuyuki> have you tried aggregating through shared memory first so only one thread per workgroup need to do the global atomic?

11:17 <Lynne> 2 per dispatch, so it's not horrible

11:18 <Lynne> not quite sure what you mean

11:18 <ishitatsuyuki> https://developer.nvidia.com/blog/gpu-pro-tip-fast-histograms-using-shared-atomics-maxwell/

11:20 <ishitatsuyuki> i'd avoid float atomics at all cost

11:20 <ishitatsuyuki> the alternative is probably a workgroup barrier, not pipeline barrier, right?

11:21 <ishitatsuyuki> I don't know how float atomics are emulated but they seem to be insanely low throughput

11:22 <ishitatsuyuki> meanwhile, barrier only costs as much as it needs to wait, it's not like the instruction itself has any execution cost

11:25 <Lynne> sadly integral image calculations are not separable

11:26 <Lynne> I could calculate weights across multiple buffers and just merge them during the final step, plenty of descriptors left

11:27 <ishitatsuyuki> is integral image not separable?

11:27 <ishitatsuyuki> separable in the filter sense?

11:29 <Lynne> separable in that you can't independently compute a horizontal and vertical prefix sum and merge them

11:30 <ishitatsuyuki> the usual definition of separability for 2D IIR/FIR filters is a bit different, but ok

11:31 <ishitatsuyuki> separability there means if a NxN convolution can be done with a 1xN then Nx1 convolution

11:31 <ishitatsuyuki> can you use ints for the atomics somehow?

11:32 <ishitatsuyuki> that should still be better than float atomics

11:34 <Lynne> removing the atomic float adds makes literally no difference, they're not the roadblock

11:34 <Lynne> *bottleneck

11:34 <ishitatsuyuki> ok

11:35 <ishitatsuyuki> you should probably try getting a profile

11:47 <Lynne> guess I could fire up the filter under mpv, the client will be presenting in that case

11:47 <Lynne> how do I use the radv profiler?

11:52 Company has joined #dri-devel

11:54 <Lynne> uuh, I think I found the bottleneck - resolution

11:55 <Lynne> I was testing on 4k video, but on 720p, I get 700fps with my code, while opencl gets 70

11:57 jfalempe has quit [Quit: Leaving]

11:59 sarnex has quit [Ping timeout: 480 seconds]

12:01 <ishitatsuyuki> does mpv present with vulkan though

12:01 <ishitatsuyuki> for instructions see https://docs.mesa3d.org/envvars.html#envvar-RADV_THREAD_TRACE

12:03 smilessh has joined #dri-devel

12:03 smiles_ has quit [Read error: Connection reset by peer]

12:05 elongbug has joined #dri-devel

12:12 sarnex has joined #dri-devel

12:13 Zopolis4_ has quit []

12:14 Wallbraker has joined #dri-devel

12:14 ajhalaney[m] has joined #dri-devel

12:14 arisu has joined #dri-devel

12:14 Andy[m]1 has joined #dri-devel

12:14 aradhya7[m] has joined #dri-devel

12:14 aura[m] has joined #dri-devel

12:14 bluepqnuin has joined #dri-devel

12:14 bubblethink[m] has joined #dri-devel

12:14 chema has joined #dri-devel

12:14 bylaws has joined #dri-devel

12:14 RAOF has joined #dri-devel

12:14 cleverca22[m] has joined #dri-devel

12:14 cmeissl[m] has joined #dri-devel

12:14 Eighth_Doctor has joined #dri-devel

12:14 cwfitzgerald[m] has joined #dri-devel

12:14 dafna33[m] has joined #dri-devel

12:14 daniliberman[m] has joined #dri-devel

12:14 dantob has joined #dri-devel

12:14 dcbaker has joined #dri-devel

12:15 DemiMarieObenour[m] has joined #dri-devel

12:15 devarsht[m] has joined #dri-devel

12:15 Anson[m] has joined #dri-devel

12:15 dhirschfeld2[m] has joined #dri-devel

12:15 Guest10825 has joined #dri-devel

12:15 doras has joined #dri-devel

12:15 danylo has joined #dri-devel

12:15 DUOLabs[m] has joined #dri-devel

12:15 EricCurtin[m] has joined #dri-devel

12:15 egalli has joined #dri-devel

12:15 ella-0[m] has joined #dri-devel

12:15 Ella[m] has joined #dri-devel

12:15 eballetbo has joined #dri-devel

12:15 enunes[m] has joined #dri-devel

12:15 AlexisHernndezGuzmn[m] has joined #dri-devel

12:15 fkassabri[m] has joined #dri-devel

12:15 FloGrauper[m] has joined #dri-devel

12:15 gallo[m] has joined #dri-devel

12:15 gdevi has joined #dri-devel

12:15 gnustomp[m] has joined #dri-devel

12:15 Guest10844 has joined #dri-devel

12:15 MotiH[m] has joined #dri-devel

12:15 Harvey[m] has joined #dri-devel

12:15 Hazematman has joined #dri-devel

12:15 hch12907 has joined #dri-devel

12:15 heftig has joined #dri-devel

12:15 zzoon[m] has joined #dri-devel

12:15 ids1024[m] has joined #dri-devel

12:15 jasuarez has joined #dri-devel

12:15 jenatali has joined #dri-devel

12:15 jluthra has joined #dri-devel

12:15 JosExpsito[m] has joined #dri-devel

12:15 K0bin[m] has joined #dri-devel

12:15 kallisti5[m] has joined #dri-devel

12:15 madhavpcm has joined #dri-devel

12:15 kelbaz[m] has joined #dri-devel

12:15 kunal10710[m] has joined #dri-devel

12:15 kunal_10185[m] has joined #dri-devel

12:15 KunalAgarwal[m][m] has joined #dri-devel

12:15 kusma has joined #dri-devel

12:15 Labnan[m] has joined #dri-devel

12:15 LaughingMan[m] has joined #dri-devel

12:15 LinuxHackerman has joined #dri-devel

12:15 m00nlit[m] has joined #dri-devel

12:15 mairacanal[m] has joined #dri-devel

12:15 MarkCollins[m] has joined #dri-devel

12:15 marmarek[m] has joined #dri-devel

12:15 martijnbraam has joined #dri-devel

12:15 masush5[m] has joined #dri-devel

12:15 MayeulC has joined #dri-devel

12:15 Mershl[m] has joined #dri-devel

12:15 michael5050[m] has joined #dri-devel

12:15 Mis012[m] has joined #dri-devel

12:15 moben[m] has joined #dri-devel

12:15 mripard has joined #dri-devel

12:15 msizanoen[m] has joined #dri-devel

12:15 Vin[m] has joined #dri-devel

12:15 naheemsays[m] has joined #dri-devel

12:15 nekit[m] has joined #dri-devel

12:15 neobrain[m] has joined #dri-devel

12:15 Newbyte has joined #dri-devel

12:15 nicofee[m] has joined #dri-devel

12:15 eyearesee has joined #dri-devel

12:15 nielsdg has joined #dri-devel

12:15 Nirvin[m] has joined #dri-devel

12:15 nyorain[m] has joined #dri-devel

12:15 ofirbitt[m] has joined #dri-devel

12:15 ohadsharabi[m] has joined #dri-devel

12:15 DavidHeidelberg[m] has joined #dri-devel

12:15 onox[m] has joined #dri-devel

12:15 pac85[m] has joined #dri-devel

12:15 PiGLDN[m] has joined #dri-devel

12:15 pmoreau has joined #dri-devel

12:15 pp[m] has joined #dri-devel

12:15 pushqrdx[m] has joined #dri-devel

12:15 q4a has joined #dri-devel

12:15 Quinten[m] has joined #dri-devel

12:15 ramacassis[m] has joined #dri-devel

12:15 ram15[m] has joined #dri-devel

12:15 reactormonk[m] has joined #dri-devel

12:15 robertmader[m] has joined #dri-devel

12:15 samueldr has joined #dri-devel

12:15 dabrain34[m] has joined #dri-devel

12:15 sewn has joined #dri-devel

12:15 shoffmeister[m] has joined #dri-devel

12:15 siddh has joined #dri-devel

12:15 sigmoidfunc[m] has joined #dri-devel

12:15 Sofi[m] has joined #dri-devel

12:15 sergi has joined #dri-devel

12:15 Sumera[m] has joined #dri-devel

12:15 swick[m] has joined #dri-devel

12:15 knr has joined #dri-devel

12:15 T_UNIX has joined #dri-devel

12:16 talcohen[m] has joined #dri-devel

12:16 tintou has joined #dri-devel

12:16 underpantsgnome[m] has joined #dri-devel

12:16 tleydxdy has joined #dri-devel

12:16 tomba has joined #dri-devel

12:16 tomeu has joined #dri-devel

12:16 Tooniis[m] has joined #dri-devel

12:16 ttayar[m] has joined #dri-devel

12:16 tuxayo has joined #dri-devel

12:16 undvasistas[m] has joined #dri-devel

12:16 Soroush has joined #dri-devel

12:16 vidal72[m] has joined #dri-devel

12:16 viciouss[m] has joined #dri-devel

12:16 MatrixTravelerbot[m]1 has joined #dri-devel

12:16 Weiss-Fder[m] has joined #dri-devel

12:16 x512[m] has joined #dri-devel

12:16 xerpi[m] has joined #dri-devel

12:16 YaLTeR[m] has joined #dri-devel

12:16 yshui` has joined #dri-devel

12:16 zamundaaa[m] has joined #dri-devel

12:16 znullptr[m] has joined #dri-devel

12:16 zzxyb[m] has joined #dri-devel

12:16 pmoreau is now known as Guest10855

12:23 <Lynne> it does, how do I analyze the captures?

12:37 kts has joined #dri-devel

12:45 godvino has joined #dri-devel

12:46 minecrell has quit [Read error: Connection timed out]

12:46 itoral has quit [Remote host closed the connection]

12:46 minecrell has joined #dri-devel

12:58 fxkamd has joined #dri-devel

12:59 godvino has quit [Quit: WeeChat 3.6]

13:01 pochu has quit [Quit: leaving]

13:10 JohnnyonFlame has quit [Ping timeout: 480 seconds]

13:18 <ishitatsuyuki> open it with https://gpuopen.com/rgp/

13:19 <ishitatsuyuki> you first identify the slow pass (in your case, there should be only a single compute pass), then go to instruction timing

13:20 <ishitatsuyuki> the numbers should give you a rough idea of "cost" of instructions

13:22 JohnnyonFlame has joined #dri-devel

13:40 devilhorns has quit []

13:43 FireBurn has quit [Quit: Konversation terminated!]

13:58 Daaanct12 has quit [Remote host closed the connection]

13:58 Daaanct12 has joined #dri-devel

14:01 <tleydxdy> when I look at all the fds opened by a vulkan game and their corresponding drm_file, some of them have the correct pid but they don't seems to be doing any work, and some have the pid of Xwayland and is doing all the work. does anyone know why that is?

14:01 <tleydxdy> I assume it is because those fds are sent over by the X server, but why is it using those to do all the rendering work?

14:02 Daaanct12 has quit [Remote host closed the connection]

14:03 Daaanct12 has joined #dri-devel

14:05 <tursulin> tleydxdy: try if you want https://patchwork.freedesktop.org/patch/526752/?series=109902&rev=4

14:06 <tleydxdy> how fun

14:07 <tleydxdy> I'm mostly curious about why this pattern exist

14:08 <tleydxdy> seeing how the application opens the render node itself anyway

14:09 <danvet> tleydxdy, I thought for vk the render operations should always go through a file that's directly opened

14:09 <danvet> and just winsys might go through one opened by Xwayland (if it's DRI3 proto)

14:11 <danvet> gfxstrand, ^^ or does this work differently?

14:12 rasterman has quit [Quit: Gettin' stinky!]

14:12 <tleydxdy> yeah, if I look at vram used reported by fdinfo for example, the directly opened ones only uses 4K while the Xwayland one have >300MiB

14:14 <emersion> tleydxdy: that's how the X11 DRI3 proto was designed

14:14 <emersion> the Wayland protocol is different, for instance

14:15 <emersion> AFAIK, the X11 DRI3 protocol was designed with DRM authentication in mind, where the X11 server would send authenticated DRM FDs to clients

14:15 <emersion> before render node existed

14:16 alyssa has joined #dri-devel

14:16 <alyssa> jenatali: you've been conscripted to ack https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22123

14:16 kzd has joined #dri-devel

14:16 <alyssa> I think

14:16 <alyssa> Strictly I think maybe I could get away with xfb_size if I make my dispatch more complicated...

14:17 yuq8251 has left #dri-devel [#dri-devel]

14:19 <tleydxdy> emersion I see, so if the underlying wsi for vulkan is not X11 DRI3 then the pattern should not be seen?

14:19 <emersion> on Wayland, all DRM FDs should be opened by the client *

14:20 <emersion> ( * except DRM leasing stuff)

14:21 <emersion> (but that's only for VR apps, and the DRM FD send by the compositor is read-only before the DRM lease is accepted by the compositor)

14:21 <tleydxdy> got it

14:25 <jenatali> alyssa: Ack, but without seeing how it's used it's hard for me to really get why it's needed

14:25 <jenatali> XFB is one of those things that's magic from my POV, I don't know any of the implementation details

14:33 <danvet> emersion, yeah but I thought for vk you still open the render node

14:33 <danvet> since winsys is only set up later

14:33 <emersion> what do you mean?

14:34 <danvet> emersion, like you can set up your entire render pipeline and allocate all the buffers without winsys connection

14:34 <emersion> maybe Mesa will open render nodes, but these won't be coming from the compositor

14:34 <danvet> and so no way to get the DRI3 fd

14:34 <danvet> and only when you set up winsys will that part happen

14:35 <danvet> so I kinda expected that the render nodes will have most buffers and the winsys one opened by xwayland just winsys buffers

14:35 <emersion> sure. the question was about FDs coming from Xwayland though

14:35 <emersion> ah

14:35 <danvet> but it seems to be the other way round

14:35 <danvet> per tleydxdy at least

14:35 <emersion> maybe the swapchain buffers are allocated via Xwayland's FD\

14:35 <danvet> for glx/egl it'll all be on the xwayland fd

14:35 <emersion> the swapchain is tied to WSI

14:36 <danvet> 300mb swapchain seems a bit much :-)

14:36 <tleydxdy> yeah, also all the gfx engine time is logged on the xwayland fd

14:36 <tleydxdy> so it's also doing cs_ioctl on that

14:36 <emersion> that's weird

14:39 sgruszka has quit [Remote host closed the connection]

14:39 <tleydxdy> the game is unity, fwiw, and as far as I can tell it's not doing anything special

14:39 kts has quit [Quit: Leaving]

14:40 cheako has joined #dri-devel

14:45 aravind has quit [Ping timeout: 480 seconds]

14:51 <gfxstrand> danvet: That's how things work initially, yes. I think some drivers are trying to get a master and use that if they can these days.

14:51 <gfxstrand> They shouldn't be getting it from the winsys, though. That'd be weird.

14:52 <danvet> gfxstrand, well the master you only need for display, and for that you pretty much have to lease it

14:52 <danvet> unless bare metal winsys

14:52 <danvet> no one else should be master than the current compositor

14:54 <alyssa> jenatali: purely software xfb implementation, see linked MR :~)

14:55 <jenatali> Ah I missed that link

14:57 <alyssa> ~~just you wait for geometry shaders~~

14:57 <jenatali> Got it, so you run an additional VS with rast discard and just augment the VS to write out the xfb data?

14:58 <alyssa> Yep

14:58 sebastien has joined #dri-devel

14:58 <alyssa> I mean, that's conceptually how it works for GLES3.0 level transform feedback

14:58 <alyssa> and that's what panfrost does

14:59 <alyssa> all the real fun comes in when you want the full big GL thnig

14:59 sebastien is now known as Guest10882

14:59 <jenatali> Hm. Can't you mix VS+XFB+rast?

14:59 <alyssa> indexed draws, triangle strips, all that fun stuff

15:00 <jenatali> I haven't looked at the ES limitations for XFB so I dunno

15:00 <alyssa> GLES3.0 is just glDrawArrays() with POINTS/LINES/STRIPS

15:00 <alyssa> 1 vertex in, 1 vertex our

15:00 <alyssa> *out

15:00 <alyssa> which is all panfrost does (and hence panfrost fails a bunch of piglits for xfb)

15:00 <MrCooper> emersion tleydxdy: there are pending kernel patches which will correctly attribute DRM FDs passed from an X server to a DRI3 client to the latter

15:01 <alyssa> s/STRIPS/TRIANGLES/

15:01 Guest10882 has quit []

15:01 <alyssa> for full GL there's all sorts of batshit interactions allowed, e.g. indirect indexed draw + primitive restart + TRIANGLE_STRIPS + XFB

15:01 <alyssa> how is that supposed to work? don't even worry about it ;-)

15:01 <emersion> MrCooper: my preference would've been to fix the X11 protocol, instead of fixing the kernel…

15:01 <alyssa> spec has a really inane requirement that you can draw strips/loops/fans but they need to streamed out like plain lines/triangles

15:02 siddh has quit [Quit: Reconnecting]

15:02 siddh has joined #dri-devel

15:02 <alyssa> (e.g. drawing 4 vertices with TRIANGLE_STRIPS would emit 6 vertices for streamout, duplicating the shared edge)

15:03 <alyssa> in that case the linked MR does the stupid simple approach of invoking the transform feedback shader 6 times (instead of 4) and doing some arithmetic to work out which vertex should be processed in a given invocation

15:03 <alyssa> this is suboptimal but hopefully nothing ever hits this other than piglit

15:03 <alyssa> (..hopefully)

15:04 siddh has quit []

15:04 siddh has joined #dri-devel

15:07 jewins has joined #dri-devel

15:08 fxkamd has quit []

15:12 Duke`` has joined #dri-devel

15:14 <jenatali> alyssa: Right that all makes sense. But can you not mix XFB+rast?

15:15 <jenatali> Or alternatively, does GLES3 not allow SSBOs/atomics in VS?

15:23 <alyssa> VS side effects are optional in all the khronos apis

15:23 <alyssa> for mali, works on older hw but not newer ones due to arm's questionable interpretations of the spec

15:24 <alyssa> for agx, IDK, haven't tried, Metal doesn't allow it and I don't know what's happening internally

15:24 <alyssa> (VS side effects are unpredictable on tilers in general, the spec language is very 'forgiving' here)

15:27 <alyssa> wouldn't help a ton in every case, consider e.g. GL_TRIANGLE_FANS with 1000 triangles drawn

15:27 <alyssa> vertex 0 needs to be written out 1000 times

15:27 <alyssa> all other vertices are written out just once

15:29 <alyssa> re side effects being unpredictable, I *think* this means the decoupled approach is kosher even if we allow vertex shader side effects

15:29 <alyssa> but I'd need to spec lawyer to find out

15:33 <karolherbst> alyssa: are SSBO writes a thing in vertex shaders?

15:33 <alyssa> 15:23 < alyssa> VS side effects are optional in all the khronos apis

15:34 <karolherbst> right.. was more like about is it a thing in your driver/hardware

15:35 <alyssa> 15:23 < alyssa> for mali, works on older hw but not newer ones due to arm's questionable interpretations of the spec

15:35 <alyssa> 15:24 < alyssa> for agx, IDK, haven't tried, Metal doesn't allow it and I don't know what's happening internally

15:35 <alyssa> 15:24 < alyssa> (VS side effects are unpredictable on tilers in general, the spec language is very 'forgiving' here)

15:37 alyssa has left #dri-devel [#dri-devel]

15:39 <daniels> 'newer ones' being anything with IDVS?

15:41 <jenatali> I was very surprised when I learned that Vulkan not only allows side effects, but also wave ops and even quad ops in VS. Like wtf is the meaning of a quad of vertex invocations?

15:42 <jenatali> FWIW D3D does wave ops, but not quads

15:51 <gfxstrand> jenatali: Well, when you render with GL_QUADS...

15:51 * gfxstrand shows herself out

15:51 <jenatali> Which Vulkan doesn't have, right?

15:51 <jenatali> ... right?

15:52 <gfxstrand> There was a quads extension but we killed it. :)

15:53 <gfxstrand> Also, quad lane groups in a VS have nothing whatsoever to do with GL_QUAD. I was just making dumb jokes.

15:53 <gfxstrand> They're literally just groups of 4 lanes which you can do stuff on.

15:53 <jenatali> Yeah I know, I'm just also confirming :)

15:54 <gfxstrand> They do make sense with certain CS patterns you can do with the NV derivatives extension, though.

15:54 <jenatali> I guess I could lower quad ops to plain wave ops and support them in VS

15:54 <jenatali> Yeah D3D defines quad ops + derivatives in CS

15:56 dsrt^ has joined #dri-devel

15:57 <gfxstrand> Yeah, that's really all they are

15:57 <gfxstrand> In fact, I think we have NIR lowering for it

15:57 <gfxstrand> Yup. lower_quad

15:58 <mareko> quad ops work in VS if num_patches == 4 or 2 and TCS is present

15:59 <mareko> I mean num_input_cp

16:00 <jenatali> Oh cool, I should just run that on non-CS/FS and then I can support quad ops everywhere

16:01 djbw has joined #dri-devel

16:11 <macromorgan> so question... I'm trying to troubleshoot a problem that happens only on suspend and shutdown of regulators unbalanced disables.

16:12 MajorBiscuit has joined #dri-devel

16:13 <macromorgan> As best I can tell when I try to shut down a panel mipi_dsi_drv_shutdown is getting called which runs the panel_nv3051d_shutdown function which calls drm_panel_unprepare which calls panel_nv3051d_unprepare. Then, I also see panel_bridge_post_disable is calling panel_nv3051d_unprepare

16:13 <macromorgan> should there not be a shutdown function for the panel?

16:13 MajorBiscuit has quit []

16:13 MajorBiscuit has joined #dri-devel

16:16 <macromorgan> a lot of panels have a "prepared" or "enabled" flag, but when I was upstreaming the driver I was told not to do that

16:26 <jenatali> Has the branchpoint happened?

16:26 Leopold has joined #dri-devel

16:26 rasterman has joined #dri-devel

16:27 MajorBiscuit has quit [Ping timeout: 480 seconds]

16:28 <jenatali> Oh yep, there it is. Would be nice to have a dedicated label for the post-branch MR that bumps the version for the next release. eric_engestrom

16:28 <jenatali> I'd subscribe to that

16:33 Leopold__ has quit [Ping timeout: 480 seconds]

16:36 swalker__ has quit [Remote host closed the connection]

16:39 heat has joined #dri-devel

16:39 heat has quit [Remote host closed the connection]

16:39 heat has joined #dri-devel

16:41 JohnnyonFlame has quit [Ping timeout: 480 seconds]

16:43 bmodem has quit [Ping timeout: 480 seconds]

16:46 iive has joined #dri-devel

16:53 tursulin has quit [Ping timeout: 480 seconds]

16:53 stuarts has joined #dri-devel

16:55 vliaskov__ has quit [Ping timeout: 480 seconds]

16:59 kts has joined #dri-devel

17:00 kts has quit [Remote host closed the connection]

17:00 kts has joined #dri-devel

17:01 frieder has quit [Ping timeout: 480 seconds]

17:07 ngcortes has joined #dri-devel

17:16 jkrzyszt has quit [Ping timeout: 480 seconds]

17:16 MajorBiscuit has joined #dri-devel

17:26 lynxeye has quit [Quit: Leaving.]

17:27 jeeeun841351 has joined #dri-devel

17:29 jeeeun84135 has quit [Ping timeout: 480 seconds]

17:39 <eric_engestrom> jenatali: I've created the label ~mesa-release and I'll write up some doc in a bit, and hopefully we (dcbaker and I) won't forget to use it too often :]

17:40 <dcbaker> eric_engestrom: thanks for doing that

17:40 <jenatali> Thanks!

17:51 vliaskov__ has joined #dri-devel

18:11 MajorBiscuit has quit [Quit: WeeChat 3.6]

18:15 stuarts has quit [Remote host closed the connection]

18:16 tobiasjakobi has joined #dri-devel

18:16 tobiasjakobi has quit []

18:17 JohnnyonFlame has joined #dri-devel

18:29 <karolherbst> dcbaker: some hackish bindgen_version thing: https://github.com/mesonbuild/meson/pull/11679

18:32 jaganteki has joined #dri-devel

18:34 vliaskov__ has quit [Ping timeout: 480 seconds]

18:35 <dcbaker> karolherbst: I left you a few comments, it's really annoying that they treat the command line as always up for change

18:36 <karolherbst> yeah...

18:37 <karolherbst> get_version seems to work alright, nice

18:37 <dcbaker> sweet

18:38 <karolherbst> it's a bit annoying that rust.bindgen is already taken otherwise it could be some higher level struct and rust.bindgen => rust.bindgen.generate and we could just add rust.bindgen.version_compare().... but oh well...

18:51 <dcbaker> Yeah. A long time ago I'd written a find_program() cache, which would have made this a bit simpler since you could have done something like find_program('bindgen').get_version().version_compare(...) and since all calls to find_program would use the same cache the lookup would be effectively free and we could just recommend that

18:51 <dcbaker> unfortunately I never got it working quite right

18:52 <karolherbst> yeah.. but that's also kinda annoying

18:52 <karolherbst> I'd kinda prefer wrapping those things so it's always in control of meson

18:54 mbrost has joined #dri-devel

18:58 Haaninjo has joined #dri-devel

19:00 Leopold has quit []

19:00 <karolherbst> dcbaker: anyway... would be cool to get my rust stuff resolved for 1.2 so I only have to bump the version once :D

19:01 <karolherbst> do I need to add any kwarg stuff?

19:02 Leopold_ has joined #dri-devel

19:04 <dj-death> oh noes, gitlab 504

19:08 JohnnyonFlame has quit [Ping timeout: 480 seconds]

19:17 stuarts has joined #dri-devel

19:22 <dcbaker> karolherbst: Yeah, I left you a comment, otherwise I think that looks good

19:25 gouchi has joined #dri-devel

19:29 <anholt> starting on the 1.3.5.1 CTS update (with a couple extra bugfixes pulled in)

19:35 rasterman has quit [Quit: Gettin' stinky!]

19:41 <karolherbst> now I need somebody else or me to figure out that isystem stuff ...

19:42 ngcortes has quit [Ping timeout: 480 seconds]

19:52 prahladk has joined #dri-devel

20:00 kts has quit [Quit: Leaving]

20:10 alyssa has joined #dri-devel

20:10 <alyssa> daniels: yeah, Arm's implementation of IDVS is "creative"

20:10 <alyssa> gfxstrand: lol at VK_QUADS ops

20:11 mbrost has quit [Remote host closed the connection]

20:19 prahladk has quit []

20:22 FloGrauper[m] has quit []

20:23 madhavpcm has quit []

20:23 MatrixTravelerbot[m]1 has quit []

20:24 tuxayo has quit []

20:25 LaughingMan[m] has quit []

20:27 cleverca22[m] has quit []

20:28 tintou has quit []

20:30 bluepqnuin has quit []

20:32 Celmor[m] has quit []

20:32 Harvey[m] has quit []

20:32 Guest10825 has quit []

20:32 arisu has quit []

20:33 zzxyb[m] has quit []

20:33 chema has quit []

20:34 hch12907 has quit []

20:34 cmeissl[m] has quit []

20:35 neobrain[m] has quit []

20:35 Andy[m]1 has quit []

20:36 kallisti5[m] has quit []

20:36 YuGiOhJCJ has joined #dri-devel

20:37 Ella[m] has quit []

20:37 bylaws has quit []

20:38 ngcortes has joined #dri-devel

20:39 danvet has quit [Ping timeout: 480 seconds]

20:43 Duke`` has quit [Ping timeout: 480 seconds]

20:45 ngcortes has quit [Remote host closed the connection]

20:46 ngcortes has joined #dri-devel

20:47 <karolherbst> quads? reasonable primitives

20:48 <alyssa> * Catmull-Clark has entered the chat

21:02 <robclark> alyssa: idvs sounds _kinda_ like qcom's VS vs binning shader (except that adreno VS also calcs position/psize)

21:11 <alyssa> robclark: same idea, yeah

21:12 <alyssa> the problem isn't the concept, it's an implementation detail :~)

21:12 <robclark> hmm, ok

21:24 gouchi has quit [Quit: Quitte]

21:31 Leopold_ has quit []

21:33 Leopold_ has joined #dri-devel

21:44 JohnnyonFlame has joined #dri-devel

21:46 bluetail42 has joined #dri-devel

21:53 bluetail4 has quit [Ping timeout: 480 seconds]

21:57 kts has joined #dri-devel

22:03 fxkamd has joined #dri-devel

22:06 <Kayden> there's some mention of nir_register in src/freedreno/ir3/* still, is this dead code now that the backend is SSA?

22:07 <anholt> Kayden: indirect temps are still registers

22:07 <Kayden> oh, interesting, ok

22:09 <gfxstrand> We should convert ir3 to load/store_scratch

22:09 <gfxstrand> Unless you really can indirect on the register file and really want to be doing that.

22:14 <karolherbst> anholt: or scratch mem if lowered to it

22:15 <Kayden> was just a little surprised to see it there still

22:15 <Kayden> wasn't sure if it was leftover or still used :)

22:15 <karolherbst> I don't know if I or somebody else ported codegen to scratch, but I think it was done...

22:15 <karolherbst> ahh nope

22:15 <gfxstrand> Kayden: I mean, the Intel vec4 back-end still uses it last I checked... 😭

22:16 <karolherbst> or was it...

22:16 <karolherbst> mhhh

22:16 <Kayden> gfxstrand: not surprised to see it in general, just in ir3 :)

22:16 <karolherbst> what was the pass again to lower to scratch?

22:16 <gfxstrand> Kayden: Ues, but we should kill NIR register indirects in general.

22:16 <Kayden> ah.

22:17 <Kayden> yeah, probably

22:17 <gfxstrand> I suppose I do have a haswell sitting in the corner over there.

22:17 <anholt> Kayden: register indirects turn into register array accesses. large temps get turned into scratch.

22:17 <karolherbst> I'm sure it's almost trivial to remove `nir_register` in codegen as it already supports scratch memory

22:17 <gfxstrand> NAK won't support it

22:18 <karolherbst> yeah, no point in supporting it on nv hw

22:19 <gfxstrand> There's no point on Intel, either. They go to scratch in the vec4 back-end it's just that the back-end code to do that has been around for a long time and no one has bothered to clean it up.

22:19 <gfxstrand> Technically, Intel can do indirect reads

22:19 <gfxstrand> And indirect stores if the indirect is uniform and the stars align.

22:22 <alyssa> I don't have a great plan for ir3 nir_register use.

22:23 pcercuei has quit [Quit: dodo]

22:35 elongbug has quit [Read error: Connection reset by peer]

22:37 iive has quit [Quit: They came for me...]

22:46 <karolherbst> anybody here every played around with onednn? I kinda want to know what I need to actually use it

22:46 <karolherbst> bonus point: make it non painful

22:50 bluebugs has joined #dri-devel

22:51 <gfxstrand> That sounds like an Intel invention

22:51 <karolherbst> it is

22:51 <gfxstrand> Of course...

22:51 <gfxstrand> They have to have One of everything...

22:51 <karolherbst> but apparently it has a CL backend

22:51 <karolherbst> and it can run pytorch

22:51 <karolherbst> and stuff

22:51 <karolherbst> dunno

22:51 <karolherbst> just want to see what CL extensions it needs

22:51 <karolherbst> but I think it's INTEL_subgroup and INTEL_UVM

23:05 jewins1 has joined #dri-devel

23:05 pzanoni` has joined #dri-devel

23:05 jssummer has joined #dri-devel

23:06 mattrope_ has joined #dri-devel

23:06 djbw_ has joined #dri-devel

23:06 jhli_ has quit []

23:06 jhli has joined #dri-devel

23:11 jewins has quit [Ping timeout: 480 seconds]

23:12 pzanoni has quit [Ping timeout: 480 seconds]

23:12 stuarts has quit [Ping timeout: 480 seconds]

23:13 djbw has quit [Ping timeout: 480 seconds]

23:13 mattrope has quit [Ping timeout: 480 seconds]

23:16 jssummers has joined #dri-devel

23:16 pzanoni has joined #dri-devel

23:16 mattrope has joined #dri-devel

23:16 djbw__ has joined #dri-devel

23:22 <karolherbst> ehhh..

23:22 <karolherbst> why is oneDNN checking if the platform name is "Intel" 🙃

23:22 jssummer has quit [Ping timeout: 480 seconds]

23:23 pzanoni` has quit [Ping timeout: 480 seconds]

23:23 mattrope_ has quit [Ping timeout: 480 seconds]

23:23 jewins1 has quit [Ping timeout: 480 seconds]

23:23 djbw_ has quit [Ping timeout: 480 seconds]

23:24 <karolherbst> they even have a vendor id check 🙃

23:25 <psykose> because it was made by intel

23:25 <karolherbst> you know that this won't stop me!

23:25 <karolherbst> (but rusticl not having all the required features will! 🙃)

23:26 <gfxstrand> What features are you missing? Intel subgroups you should be able to pretty much just turn on

23:26 <gfxstrand> Even on non-intel

23:26 <karolherbst> I don't know yet

23:26 <karolherbst> the stuff just faults randomly

23:27 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

23:28 <karolherbst> but yeah.. maybe I check if there are any tests for that extension and just support it

23:28 TMM has joined #dri-devel

23:28 <karolherbst> the README mentions subgroups and UVM

23:29 <karolherbst> the CTS has tests for cl_khr_subgroups

23:29 <karolherbst> but there is of course cl_intel_subgroups

23:29 <karolherbst> and I don't even know if that's upstream llvm

23:31 <karolherbst> at least now a days those things are open source

23:31 <karolherbst> ahh yes

23:31 <karolherbst> another "is intel platform" check 🙃

23:32 alyssa has left #dri-devel [#dri-devel]

23:32 Azirino has joined #dri-devel

23:35 <karolherbst> uhhhhh

23:35 <karolherbst> :pain:

23:35 <karolherbst> they are doing unholy things

23:35 Haaninjo has quit [Quit: Ex-Chat]

23:36 <karolherbst> they fetch the binary of the compiled CL program and check if it's some ELF thing do to more cursed stuff

23:36 <karolherbst> welll....

23:36 <karolherbst> I guess that's a WONTFIX then

23:36 <gfxstrand> ugh

23:36 <gfxstrand> Am I surprised? No.

23:36 <karolherbst> the file patch gave me the rest: "/home/kherbst/git/oneDNN/src/gpu/jit/ngen/ngen_elf.hpp"

23:37 <karolherbst> anyway...

23:37 <karolherbst> they do seem to support other vendors via SyCL, but.. uhhh

23:37 <karolherbst> why...

23:39 <gfxstrand> It's a One* product

23:39 <gfxstrand> The open-source is a sham. It exists for vendor lock-in.

23:39 <karolherbst> I mean.. yes

23:49 <anholt> mupuf: are you intentially rebooting the DUT after a GPU hang? I'm not getting any info on the test that hung, so I can't mark it a skip.

23:53 <zmike> I think he's still afk another week