#dri-devel on 2021-08-14 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:20 dongwonk has joined #dri-devel

00:38 jhli has quit [Read error: Connection reset by peer]

00:41 <graphitemaster> Oh my god the PCI-E bus is SLOW.

00:41 <HdkR> Compared to memory, yes

00:42 <graphitemaster> Mobile systems map system memory as uncached write-combining memory for the framebuffer and it's damn fast (9000 MiB/s!)

00:42 <graphitemaster> Run the same code on a desktop 20 MiB/s

00:42 <graphitemaster> I'm so salty

00:42 <graphitemaster> I didn't think it was that slow

00:42 <HdkR> Small accesses over PCIe really fuck up your day

00:43 <graphitemaster> I can round trip download a larger bit of data from Australia over my shitty internet connection over like 100 hops faster than I can shove some data to my GPU

00:43 <graphitemaster> Who do we fire for this fail

00:43 <bnieuwenhuizen> graphitemaster: https://www.basnieuwenhuizen.nl/making-reading-from-vram-less-catastrophic/ to make it faster

00:44 <alyssa> graphitemaster: Nvidia tries to sell chips, a lot of chips into the Android market...

00:44 <graphitemaster> I just read this lol, someone legit linked this to me on the graphics programmign discord today...

00:44 <bnieuwenhuizen> there is a graphics programming discord?

00:44 <graphitemaster> Yeah

00:44 <HdkR> There's a discord for everything

00:45 <Sachiel> is there a discord for indexing all the other discords?

00:45 <HdkR> https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/pcie-burst-transfer-paper.pdf This is also a good read

00:45 <graphitemaster> If you can barely shove more than 20 MiB/s over PCI-E why even bother with resizable bar, you cann't fill that 128 MiB of mapped BAR fast enough.

00:45 <graphitemaster> This is real sad

00:45 <bnieuwenhuizen> graphitemaster: reads only

00:45 <bnieuwenhuizen> writes are much faster

00:46 <graphitemaster> So if uncached framebuffer reads are so fast on mobile, why even have shadow framebuffer layer in DDX drivers

00:46 <bnieuwenhuizen> also note that even uncached system memory is an issue and that doesn't go over PCIe for CPU accesses

00:47 <graphitemaster> Anyways what I've been working on https://gist.github.com/graphitemaster/5917148fa6c6c2fc59d9e5a2256ae0dd

00:47 <graphitemaster> I went down a dumb rabbit hole

00:48 <graphitemaster> Been tweaking this crap like crazy

00:53 <HdkR> I wish x86 devices were better at saturating memory on a single core. ARM devices are /really/ good at it

00:54 <graphitemaster> I found the right combination of blood sacrifices to almost ready 20 GiB/s on my DDR4 Ryzen (ThreadRipper) machine. Go run it on a system with the exact same memory but a different CPU (newer Zen Ryzen) and it's slower by more than halve.

00:54 <graphitemaster> s/ready/reach/

00:54 <graphitemaster> It's absolutely insane how fickle memory optimizations are on x86

00:55 <HdkR> Meanwhile Apple M1 has a peak of 68.25GB/s and a basic memcpy without optimizing can hit 90% of that

00:55 <graphitemaster> That's absolutely incredible.

00:55 <HdkR> It really is

00:55 <graphitemaster> The dumb aarch64 I'm testing can reach 30 GiB/s with a dumb `while (src != end) *dst++ = *src++;`

00:56 <bnieuwenhuizen> any idea what the bottleneck typically is?

00:56 <bnieuwenhuizen> oad-store queues or is the memory model the main cause?

00:57 <HdkR> graphitemaster: Unroll that about 4x and free speed boost :P

00:57 <HdkR> I don't think TSO is the cause for x86 being slow here. Since if you throw more cores at the problem then it scales pretty well

00:58 <graphitemaster> The cache system of x86 is basically designed to make all memory slow and all thread scaling non-linear

00:58 <bnieuwenhuizen> well, given that the streaming memcpy is significantly faster for VRAM reads I expect TSO is an issue for uncached reads

00:59 <graphitemaster> I mean it's just cache pollution basically.

01:00 <graphitemaster> Anything larger than L2 should probably _always_ use non-temporal / streaming copies.

01:01 <HdkR> graphitemaster: Did you test these optimizations on a system that reports `Fast REP MOVSB`? Like Icelake+?

01:01 <graphitemaster> Not yet, no. I don't own any Intel's anymore

01:01 <graphitemaster> Other than an old iGPU laptop.

01:01 <HdkR> That was about half my system's memory bandwidth on my Ice Lake system

01:02 <graphitemaster> Wrote this ages ago https://github.com/graphitemaster/xcpumemperf

01:02 <graphitemaster> Funny how correlated it is for single core memory performance

01:03 <HdkR> hah, fun

01:09 <graphitemaster> It's honestly impressive how bad Ryzen x86 is at copying and filling memory, especially considering the ridiculous memory clock of high end DDR4.

01:10 <graphitemaster> We're being robbed.

01:11 <graphitemaster> I want the bandwidth of GPUs XD

01:11 <graphitemaster> The L2 and L3 caches on these things are so massive you could just put the whole working set of application in them at this point unless your application is a webbrowser.

01:12 <graphitemaster> Then just have really high latency memory like GPUs

01:12 <HdkR> Apparently Zen3 has fast rep movsb. Which I've not tried :P

01:14 <HdkR> Probably means Zen3 entirely is just better in this regard

01:14 <graphitemaster> My 1950x https://pastebin.com/raw/AiFpYc6L

01:15 <HdkR> Right, Zen1 wouldn't have this

01:16 <graphitemaster> It's about the same on both AMD based game consoles btw

01:16 <graphitemaster> movsb is 3000 MiB/s on the PS5

01:16 <graphitemaster> While SSE2 nontemporal is 10000 MiB/s

01:17 <graphitemaster> I guess those are not Zen3 either

01:18 <graphitemaster> Anyways, chasing faster memcpy performance on x86 is a fools errand I think.

01:18 <graphitemaster> It's all over the place, less stable than NV's frame timings.

01:20 Emantor has quit [Quit: ZNC - http://znc.in]

01:20 Emantor has joined #dri-devel

01:23 <HdkR> We just need to switch over to ARM and leave x86 behind :)

01:33 <graphitemaster> I have a better idea, what if we just put an x86 instruction decoder on an ARM

01:33 <graphitemaster> I know they have different memory and cache systems but it's probably possible with some tweaking.

01:35 <graphitemaster> The hardware people will figure it out, they work magic.

01:35 <graphitemaster> I have no clue how :P

01:36 <HdkR> oof

01:41 YuGiOhJCJ has joined #dri-devel

01:42 <alyssa> I like arm

01:42 alyssa has left #dri-devel [#dri-devel]

01:43 <HdkR> Me too

01:55 jkrzyszt has quit [Ping timeout: 480 seconds]

02:03 pcercuei has quit [Quit: dodo]

02:18 boistordu has quit [Ping timeout: 480 seconds]

02:25 <graphitemaster> I like aarch64.

02:25 <graphitemaster> Absolutely hate all the ARM before that

02:26 <graphitemaster> neon, vfp, the fifteen abis, the thousands of configurations and extensions, the thumb / non-thumb mode

02:26 <graphitemaster> big.LITTLE, the fact cores can have different endianess

02:26 <HdkR> AArch64 is its best form, the rest were just the larvae stage

02:26 <graphitemaster> The fact endianess can be changed at runtime >_>

02:27 <HdkR> Oh hey, there is going to be a period of big.little where some cores support AArch32, some don't

02:27 <HdkR> So have fun with that

02:29 <graphitemaster> I remember a buddy told me of the hardest bug he ever had to track down on a big.LITTLE was one where the cores had different cache-line sizes

02:29 <HdkR> yep. Exynos fucked that up for everyone

02:38 <graphitemaster> I was already upset at the mess that was development for the WiiU, PPC console with a semi-programmable ARM9 CPU for the hand-held tablet and the fact that endianess was different on both, cache-line size was different on both, and Nintendo in their infinite dumb wisdom used ILP32 on the console (long long being 64-bit), and ran the handheld in THUMB (16-bit mode) which treats everything but pointers as 16-bit >_>

02:39 <graphitemaster> Very few devs did anything with that, just let the dumb thing stream video and go home.

02:41 <graphitemaster> I swear when companies used to build an ARM they must've had something like kconfig where one dev spent five hours tweaking options for their custom built SoC.

02:41 <graphitemaster> Glad to see some standard fabric from aarch64.

02:44 reductum has joined #dri-devel

03:12 Company has quit [Quit: Leaving]

03:46 camus has joined #dri-devel

03:46 heat has quit [Ping timeout: 480 seconds]

03:46 mbrost has quit [Ping timeout: 480 seconds]

04:17 mbrost has joined #dri-devel

04:27 yogesh_mohan has joined #dri-devel

04:28 Duke`` has joined #dri-devel

04:54 mattrope has quit [Read error: Connection reset by peer]

05:28 mlankhorst has joined #dri-devel

06:05 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

06:06 jernej has joined #dri-devel

06:10 tzimmermann has joined #dri-devel

06:24 agd5f has quit [Ping timeout: 480 seconds]

06:35 mbrost has quit [Ping timeout: 480 seconds]

06:37 danvet has joined #dri-devel

06:46 rasterman has joined #dri-devel

07:54 vivijim has quit [Read error: Connection reset by peer]

08:15 dliviu has quit [Ping timeout: 480 seconds]

08:15 dongwonk has quit [Remote host closed the connection]

08:18 dliviu has joined #dri-devel

08:20 mlankhorst has quit [Ping timeout: 480 seconds]

08:24 rasterman has quit [Quit: Gettin' stinky!]

08:28 gouchi has joined #dri-devel

08:44 hch12907 has joined #dri-devel

08:50 JohnnyonFlame has quit [Ping timeout: 480 seconds]

08:51 thellstrom has joined #dri-devel

09:05 pekkari has joined #dri-devel

09:16 pcercuei has joined #dri-devel

09:23 jkrzyszt has joined #dri-devel

09:25 pekkari has quit [Quit: Konversation terminated!]

09:42 mlankhorst has joined #dri-devel

10:18 thellstrom has quit [Remote host closed the connection]

10:19 thellstrom has joined #dri-devel

10:22 tzimmermann has quit [Quit: Leaving]

10:25 hch12907_ has joined #dri-devel

10:28 hch12907 has quit [Ping timeout: 480 seconds]

10:33 flacks has quit [Quit: Quitter]

10:35 flacks has joined #dri-devel

10:38 Company has joined #dri-devel

10:39 iive has joined #dri-devel

11:11 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

11:23 <emersion> nice, compiling mesa now hangs my machine when compiling aco_print_ir.cpp

11:24 <HdkR> Nice, running out of ram?

11:34 <emersion> not sure, wasn't able to ssh in

11:35 <emersion> but that sounds likely, works okay (even if very slow) from a clean state\

11:51 tobiasjakobi has joined #dri-devel

12:13 <haasn> Is there any way to find out what the "correct" drm fourcc code for a particular opengl image format is? For example, GL_RG8 maps to DRM_FORMAT_GR88 (not DRM_FORMAT_RG88) and I have no idea how I would have known that without looking at the driver source code

12:14 <bnieuwenhuizen> haasn: driver code is sadly the only way I know of ...

12:15 <bnieuwenhuizen> most of the diff seems to be GL/Vulkan specifying things in reversed order of fourcc

12:15 <bnieuwenhuizen> (e.g. mostly in memory order, while fourcc mostly seems to assume little endian or so)

12:16 <haasn> I suppose I just have to eglExportDMABUFImageQueryMESA() and see what fourcc it gives me..

12:18 <haasn> I guess this whole mess is why vulkan decided to just abandon the concept of fourccs in its public API

12:20 <emersion> haasn: https://afrantzis.com/pixel-format-guide/

12:21 <emersion> if you want a vulkan table, maybe look at mesa's wsi

12:21 <haasn> I'm not sure what that's supposed to be telling me

12:21 <emersion> https://gitlab.freedesktop.org/mesa/mesa/-/blob/0092edfec0895533260a52c84cb4ce098f6b6bea/src/vulkan/wsi/wsi_common_wayland.c#L127

12:21 <haasn> As an example, how would I use this site to figure out that mesa uses GR88 instead of RG88?

12:22 <emersion> look at the GL page, then look at the DRM page

12:22 <haasn> also I wrote that code :p

12:23 <emersion> if you want GL, maybe https://github.com/swaywm/wlroots/blob/master/render/gles2/pixel_format.c

12:24 <haasn> I didn't realize GL formats had a specified memory order

12:25 <haasn> wait that sentence makes no sense, of course they have a specified memory order, otherwise how would you upload them

12:28 <haasn> anyway I suppose that makes sense, basically because DRM doesn't follow the usual convention of specifying things in memory order for non-packed formats, we need to use inverted format names for non-packed gl formats

12:29 <emersion> and check endianness for packed formats

12:44 <haasn> (fortunately my use case doesn't have the concept of packed formats)

12:44 <haasn> (nor does OpenGL, afaict)

12:44 <haasn> (or if it does, I don't support any on OpenGL yet)

12:50 <emersion> opengl does, if you don't use UNSIGNED_BYTE

12:58 <glennk> hmm, glext.h doesn't declare glDrawArraysInstancedARB for GL_ARB_instanced_arrays, which leads to piglits dispatch code not resolving the function

13:02 <emersion> does it declare the PROC?

13:03 <glennk> no, it just declares the one for glVertexAttribDivisorARB

13:03 <emersion> rip

13:04 <glennk> its a function that is aliased in at least 5 different extensions/core/es versions...

13:04 <emersion> time to send a patch to khronos then i guess

13:13 Surkow|laptop has quit [Remote host closed the connection]

13:25 Surkow|laptop has joined #dri-devel

13:53 <karolherbst> ehhh.... so maybe I missunderstood how interrupting ioctl works, but if I signal a process with SIGKILL and it's busy with e.g. calling dma_resv_wait_timeout with intr = true isn't the expectation that the ioctl would be canceled asap?

14:08 mbrost has joined #dri-devel

14:14 <ishitatsuyuki> kernel threads/operations doesn't always get interrupted although it sounds strange that a polling operation doesn't get interrupted

14:14 <karolherbst> yeah.. my thoughts exactly

14:15 heat has joined #dri-devel

14:17 mbrost_ has joined #dri-devel

14:17 mbrost has quit [Read error: Connection reset by peer]

14:22 sravn has quit []

15:01 tobiasjakobi has quit [Remote host closed the connection]

15:07 gouchi has quit [Remote host closed the connection]

15:24 gouchi has joined #dri-devel

15:34 <karolherbst> is there a way to debu dma_reservation objects?

15:34 <karolherbst> *debug

15:34 <karolherbst> I want to know who reserved one without unreserving it

15:34 <karolherbst> or well..

15:35 <karolherbst> just figuring out where it was reserved would help

15:46 slattann has joined #dri-devel

15:52 V has quit [Ping timeout: 480 seconds]

16:09 gpoo has quit [Ping timeout: 480 seconds]

16:14 mbrost_ has quit [Ping timeout: 480 seconds]

16:18 gpoo has joined #dri-devel

16:26 mlankhorst has quit [Remote host closed the connection]

16:26 mlankhorst has joined #dri-devel

16:30 i-garrison has quit []

16:30 Surkow|laptop has quit [Ping timeout: 480 seconds]

16:32 i-garrison has joined #dri-devel

16:48 Surkow|laptop has joined #dri-devel

16:52 jkrzyszt has quit [Ping timeout: 480 seconds]

17:03 JohnnyonFlame has joined #dri-devel

17:08 V has joined #dri-devel

17:19 V has quit [Ping timeout: 480 seconds]

18:05 tobiasjakobi has joined #dri-devel

18:13 slattann has quit []

18:23 gouchi has quit [Remote host closed the connection]

18:29 V has joined #dri-devel

18:32 V has quit [Remote host closed the connection]

18:34 sdutt has joined #dri-devel

18:34 V has joined #dri-devel

18:49 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

18:55 austriancoder has quit [Ping timeout: 480 seconds]

18:56 markyacoub has quit [Ping timeout: 480 seconds]

18:58 krh has quit [Ping timeout: 480 seconds]

18:59 SanchayanMaity has quit [Ping timeout: 480 seconds]

18:59 jessica_24 has quit [Ping timeout: 480 seconds]

18:59 arnd has quit [Ping timeout: 480 seconds]

18:59 markyacoub has joined #dri-devel

19:00 austriancoder has joined #dri-devel

19:00 krh has joined #dri-devel

19:01 arnd has joined #dri-devel

19:05 jernej has joined #dri-devel

19:05 sagar_ has quit [Remote host closed the connection]

19:07 jessica_24 has joined #dri-devel

19:07 SanchayanMaity has joined #dri-devel

19:11 sagar_ has joined #dri-devel

19:37 alyssa has joined #dri-devel

19:37 <alyssa> anongit.fd.o down?

19:41 Danct12 has quit [Quit: Quitting]

19:41 Danct12 has joined #dri-devel

19:46 sdutt has quit [Ping timeout: 480 seconds]

19:50 <ccr> Connecting to anongit.freedesktop.org (port 9418) ... 131.252.210.161 done.

19:50 <ccr> fatal: Could not read from remote repository.

19:51 <ccr> that was a problem a day ago already iirc

19:52 <alyssa> ccr: Pulling from kernel.org instead \shrug/

19:53 <ccr> I just switched to a https://gitlab.fd.o uri instead (for libdrm)

20:30 rasterman has joined #dri-devel

20:44 jkrzyszt has joined #dri-devel

20:47 sravn has joined #dri-devel

21:00 Daanct12 has joined #dri-devel

21:06 Danct12 has quit [Ping timeout: 480 seconds]

21:16 <glennk> huh, piglit fbo-drawbuffers2-blend tests ARB_draw_buffers_blend rather than EXT_draw_buffers2

21:16 gouchi has joined #dri-devel

21:17 gouchi has quit [Remote host closed the connection]

21:18 gouchi has joined #dri-devel

21:21 danvet has quit [Ping timeout: 480 seconds]

21:33 Duke`` has quit [Ping timeout: 480 seconds]

21:42 xexaxo has quit [Ping timeout: 480 seconds]

21:47 heat has quit [Remote host closed the connection]

21:52 thellstrom has quit [Ping timeout: 480 seconds]

22:15 pcercuei has quit [Quit: dodo]

22:24 alanc has quit [Remote host closed the connection]

22:24 alanc has joined #dri-devel

22:49 gouchi has quit [Remote host closed the connection]

22:52 kem has quit [Ping timeout: 480 seconds]

23:18 Lucretia has quit []

23:20 iive has quit []

23:40 jkrzyszt has quit [Ping timeout: 480 seconds]

23:42 mlankhorst has quit [Ping timeout: 480 seconds]

23:44 <alyssa> Is there a linux kernel helper for

23:44 <alyssa> writel(readl(reg) | arg, reg)

23:44 <alyssa> (i.e. set a bit for some config reg)

23:53 tobiasjakobi has quit [Remote host closed the connection]