ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
dongwonk has joined #dri-devel
jhli has quit [Read error: Connection reset by peer]
<graphitemaster> Oh my god the PCI-E bus is SLOW.
<HdkR> Compared to memory, yes
<graphitemaster> Mobile systems map system memory as uncached write-combining memory for the framebuffer and it's damn fast (9000 MiB/s!)
<graphitemaster> Run the same code on a desktop 20 MiB/s
<graphitemaster> I'm so salty
<graphitemaster> I didn't think it was that slow
<HdkR> Small accesses over PCIe really fuck up your day
<graphitemaster> I can round trip download a larger bit of data from Australia over my shitty internet connection over like 100 hops faster than I can shove some data to my GPU
<graphitemaster> Who do we fire for this fail
<bnieuwenhuizen> graphitemaster: https://www.basnieuwenhuizen.nl/making-reading-from-vram-less-catastrophic/ to make it faster
<alyssa> graphitemaster: Nvidia tries to sell chips, a lot of chips into the Android market...
<graphitemaster> I just read this lol, someone legit linked this to me on the graphics programmign discord today...
<bnieuwenhuizen> there is a graphics programming discord?
<graphitemaster> Yeah
<HdkR> There's a discord for everything
<Sachiel> is there a discord for indexing all the other discords?
<graphitemaster> If you can barely shove more than 20 MiB/s over PCI-E why even bother with resizable bar, you cann't fill that 128 MiB of mapped BAR fast enough.
<graphitemaster> This is real sad
<bnieuwenhuizen> graphitemaster: reads only
<bnieuwenhuizen> writes are much faster
<graphitemaster> So if uncached framebuffer reads are so fast on mobile, why even have shadow framebuffer layer in DDX drivers
<bnieuwenhuizen> also note that even uncached system memory is an issue and that doesn't go over PCIe for CPU accesses
<graphitemaster> Anyways what I've been working on https://gist.github.com/graphitemaster/5917148fa6c6c2fc59d9e5a2256ae0dd
<graphitemaster> I went down a dumb rabbit hole
<graphitemaster> Been tweaking this crap like crazy
<HdkR> I wish x86 devices were better at saturating memory on a single core. ARM devices are /really/ good at it
<graphitemaster> I found the right combination of blood sacrifices to almost ready 20 GiB/s on my DDR4 Ryzen (ThreadRipper) machine. Go run it on a system with the exact same memory but a different CPU (newer Zen Ryzen) and it's slower by more than halve.
<graphitemaster> s/ready/reach/
<graphitemaster> It's absolutely insane how fickle memory optimizations are on x86
<HdkR> Meanwhile Apple M1 has a peak of 68.25GB/s and a basic memcpy without optimizing can hit 90% of that
<graphitemaster> That's absolutely incredible.
<HdkR> It really is
<graphitemaster> The dumb aarch64 I'm testing can reach 30 GiB/s with a dumb `while (src != end) *dst++ = *src++;`
<bnieuwenhuizen> any idea what the bottleneck typically is?
<bnieuwenhuizen> oad-store queues or is the memory model the main cause?
<HdkR> graphitemaster: Unroll that about 4x and free speed boost :P
<HdkR> I don't think TSO is the cause for x86 being slow here. Since if you throw more cores at the problem then it scales pretty well
<graphitemaster> The cache system of x86 is basically designed to make all memory slow and all thread scaling non-linear
<bnieuwenhuizen> well, given that the streaming memcpy is significantly faster for VRAM reads I expect TSO is an issue for uncached reads
<graphitemaster> I mean it's just cache pollution basically.
<graphitemaster> Anything larger than L2 should probably _always_ use non-temporal / streaming copies.
<HdkR> graphitemaster: Did you test these optimizations on a system that reports `Fast REP MOVSB`? Like Icelake+?
<graphitemaster> Not yet, no. I don't own any Intel's anymore
<graphitemaster> Other than an old iGPU laptop.
<HdkR> That was about half my system's memory bandwidth on my Ice Lake system
<graphitemaster> Wrote this ages ago https://github.com/graphitemaster/xcpumemperf
<graphitemaster> Funny how correlated it is for single core memory performance
<HdkR> hah, fun
<graphitemaster> It's honestly impressive how bad Ryzen x86 is at copying and filling memory, especially considering the ridiculous memory clock of high end DDR4.
<graphitemaster> We're being robbed.
<graphitemaster> I want the bandwidth of GPUs XD
<graphitemaster> The L2 and L3 caches on these things are so massive you could just put the whole working set of application in them at this point unless your application is a webbrowser.
<graphitemaster> Then just have really high latency memory like GPUs
<HdkR> Apparently Zen3 has fast rep movsb. Which I've not tried :P
<HdkR> Probably means Zen3 entirely is just better in this regard
<graphitemaster> My 1950x https://pastebin.com/raw/AiFpYc6L
<HdkR> Right, Zen1 wouldn't have this
<graphitemaster> It's about the same on both AMD based game consoles btw
<graphitemaster> movsb is 3000 MiB/s on the PS5
<graphitemaster> While SSE2 nontemporal is 10000 MiB/s
<graphitemaster> I guess those are not Zen3 either
<graphitemaster> Anyways, chasing faster memcpy performance on x86 is a fools errand I think.
<graphitemaster> It's all over the place, less stable than NV's frame timings.
Emantor has quit [Quit: ZNC - http://znc.in]
Emantor has joined #dri-devel
<HdkR> We just need to switch over to ARM and leave x86 behind :)
<graphitemaster> I have a better idea, what if we just put an x86 instruction decoder on an ARM
<graphitemaster> I know they have different memory and cache systems but it's probably possible with some tweaking.
<graphitemaster> The hardware people will figure it out, they work magic.
<graphitemaster> I have no clue how :P
<HdkR> oof
YuGiOhJCJ has joined #dri-devel
<alyssa> I like arm
alyssa has left #dri-devel [#dri-devel]
<HdkR> Me too
jkrzyszt has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: dodo]
boistordu has quit [Ping timeout: 480 seconds]
<graphitemaster> I like aarch64.
<graphitemaster> Absolutely hate all the ARM before that
<graphitemaster> neon, vfp, the fifteen abis, the thousands of configurations and extensions, the thumb / non-thumb mode
<graphitemaster> big.LITTLE, the fact cores can have different endianess
<HdkR> AArch64 is its best form, the rest were just the larvae stage
<graphitemaster> The fact endianess can be changed at runtime >_>
<HdkR> Oh hey, there is going to be a period of big.little where some cores support AArch32, some don't
<HdkR> So have fun with that
<graphitemaster> I remember a buddy told me of the hardest bug he ever had to track down on a big.LITTLE was one where the cores had different cache-line sizes
<HdkR> yep. Exynos fucked that up for everyone
<graphitemaster> I was already upset at the mess that was development for the WiiU, PPC console with a semi-programmable ARM9 CPU for the hand-held tablet and the fact that endianess was different on both, cache-line size was different on both, and Nintendo in their infinite dumb wisdom used ILP32 on the console (long long being 64-bit), and ran the handheld in THUMB (16-bit mode) which treats everything but pointers as 16-bit >_>
<graphitemaster> Very few devs did anything with that, just let the dumb thing stream video and go home.
<graphitemaster> I swear when companies used to build an ARM they must've had something like kconfig where one dev spent five hours tweaking options for their custom built SoC.
<graphitemaster> Glad to see some standard fabric from aarch64.
reductum has joined #dri-devel
Company has quit [Quit: Leaving]
camus has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
mbrost has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
yogesh_mohan has joined #dri-devel
Duke`` has joined #dri-devel
mattrope has quit [Read error: Connection reset by peer]
mlankhorst has joined #dri-devel
jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
jernej has joined #dri-devel
tzimmermann has joined #dri-devel
agd5f has quit [Ping timeout: 480 seconds]
mbrost has quit [Ping timeout: 480 seconds]
danvet has joined #dri-devel
rasterman has joined #dri-devel
vivijim has quit [Read error: Connection reset by peer]
dliviu has quit [Ping timeout: 480 seconds]
dongwonk has quit [Remote host closed the connection]
dliviu has joined #dri-devel
mlankhorst has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
gouchi has joined #dri-devel
hch12907 has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
thellstrom has joined #dri-devel
pekkari has joined #dri-devel
pcercuei has joined #dri-devel
jkrzyszt has joined #dri-devel
pekkari has quit [Quit: Konversation terminated!]
mlankhorst has joined #dri-devel
thellstrom has quit [Remote host closed the connection]
thellstrom has joined #dri-devel
tzimmermann has quit [Quit: Leaving]
hch12907_ has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
Company has joined #dri-devel
iive has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
<emersion> nice, compiling mesa now hangs my machine when compiling aco_print_ir.cpp
<HdkR> Nice, running out of ram?
<emersion> not sure, wasn't able to ssh in
<emersion> but that sounds likely, works okay (even if very slow) from a clean state\
tobiasjakobi has joined #dri-devel
<haasn> Is there any way to find out what the "correct" drm fourcc code for a particular opengl image format is? For example, GL_RG8 maps to DRM_FORMAT_GR88 (not DRM_FORMAT_RG88) and I have no idea how I would have known that without looking at the driver source code
<bnieuwenhuizen> haasn: driver code is sadly the only way I know of ...
<bnieuwenhuizen> most of the diff seems to be GL/Vulkan specifying things in reversed order of fourcc
<bnieuwenhuizen> (e.g. mostly in memory order, while fourcc mostly seems to assume little endian or so)
<haasn> I suppose I just have to eglExportDMABUFImageQueryMESA() and see what fourcc it gives me..
<haasn> I guess this whole mess is why vulkan decided to just abandon the concept of fourccs in its public API
<emersion> if you want a vulkan table, maybe look at mesa's wsi
<haasn> I'm not sure what that's supposed to be telling me
<haasn> As an example, how would I use this site to figure out that mesa uses GR88 instead of RG88?
<emersion> look at the GL page, then look at the DRM page
<haasn> also I wrote that code :p
<haasn> I didn't realize GL formats had a specified memory order
<haasn> wait that sentence makes no sense, of course they have a specified memory order, otherwise how would you upload them
<haasn> anyway I suppose that makes sense, basically because DRM doesn't follow the usual convention of specifying things in memory order for non-packed formats, we need to use inverted format names for non-packed gl formats
<emersion> and check endianness for packed formats
<haasn> (fortunately my use case doesn't have the concept of packed formats)
<haasn> (nor does OpenGL, afaict)
<haasn> (or if it does, I don't support any on OpenGL yet)
<emersion> opengl does, if you don't use UNSIGNED_BYTE
<glennk> hmm, glext.h doesn't declare glDrawArraysInstancedARB for GL_ARB_instanced_arrays, which leads to piglits dispatch code not resolving the function
<emersion> does it declare the PROC?
<glennk> no, it just declares the one for glVertexAttribDivisorARB
<emersion> rip
<glennk> its a function that is aliased in at least 5 different extensions/core/es versions...
<emersion> time to send a patch to khronos then i guess
Surkow|laptop has quit [Remote host closed the connection]
Surkow|laptop has joined #dri-devel
<karolherbst> ehhh.... so maybe I missunderstood how interrupting ioctl works, but if I signal a process with SIGKILL and it's busy with e.g. calling dma_resv_wait_timeout with intr = true isn't the expectation that the ioctl would be canceled asap?
mbrost has joined #dri-devel
<ishitatsuyuki> kernel threads/operations doesn't always get interrupted although it sounds strange that a polling operation doesn't get interrupted
<karolherbst> yeah.. my thoughts exactly
heat has joined #dri-devel
mbrost_ has joined #dri-devel
mbrost has quit [Read error: Connection reset by peer]
sravn has quit []
tobiasjakobi has quit [Remote host closed the connection]
gouchi has quit [Remote host closed the connection]
gouchi has joined #dri-devel
<karolherbst> is there a way to debu dma_reservation objects?
<karolherbst> *debug
<karolherbst> I want to know who reserved one without unreserving it
<karolherbst> or well..
<karolherbst> just figuring out where it was reserved would help
slattann has joined #dri-devel
V has quit [Ping timeout: 480 seconds]
gpoo has quit [Ping timeout: 480 seconds]
mbrost_ has quit [Ping timeout: 480 seconds]
gpoo has joined #dri-devel
mlankhorst has quit [Remote host closed the connection]
mlankhorst has joined #dri-devel
i-garrison has quit []
Surkow|laptop has quit [Ping timeout: 480 seconds]
i-garrison has joined #dri-devel
Surkow|laptop has joined #dri-devel
jkrzyszt has quit [Ping timeout: 480 seconds]
JohnnyonFlame has joined #dri-devel
V has joined #dri-devel
V has quit [Ping timeout: 480 seconds]
tobiasjakobi has joined #dri-devel
slattann has quit []
gouchi has quit [Remote host closed the connection]
V has joined #dri-devel
V has quit [Remote host closed the connection]
sdutt has joined #dri-devel
V has joined #dri-devel
jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
austriancoder has quit [Ping timeout: 480 seconds]
markyacoub has quit [Ping timeout: 480 seconds]
krh has quit [Ping timeout: 480 seconds]
SanchayanMaity has quit [Ping timeout: 480 seconds]
jessica_24 has quit [Ping timeout: 480 seconds]
arnd has quit [Ping timeout: 480 seconds]
markyacoub has joined #dri-devel
austriancoder has joined #dri-devel
krh has joined #dri-devel
arnd has joined #dri-devel
jernej has joined #dri-devel
sagar_ has quit [Remote host closed the connection]
jessica_24 has joined #dri-devel
SanchayanMaity has joined #dri-devel
sagar_ has joined #dri-devel
alyssa has joined #dri-devel
<alyssa> anongit.fd.o down?
Danct12 has quit [Quit: Quitting]
Danct12 has joined #dri-devel
sdutt has quit [Ping timeout: 480 seconds]
<ccr> Connecting to anongit.freedesktop.org (port 9418) ... 131.252.210.161 done.
<ccr> fatal: Could not read from remote repository.
<ccr> that was a problem a day ago already iirc
<alyssa> ccr: Pulling from kernel.org instead \shrug/
<ccr> I just switched to a https://gitlab.fd.o uri instead (for libdrm)
rasterman has joined #dri-devel
jkrzyszt has joined #dri-devel
sravn has joined #dri-devel
Daanct12 has joined #dri-devel
Danct12 has quit [Ping timeout: 480 seconds]
<glennk> huh, piglit fbo-drawbuffers2-blend tests ARB_draw_buffers_blend rather than EXT_draw_buffers2
gouchi has joined #dri-devel
gouchi has quit [Remote host closed the connection]
gouchi has joined #dri-devel
danvet has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
xexaxo has quit [Ping timeout: 480 seconds]
heat has quit [Remote host closed the connection]
thellstrom has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: dodo]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
gouchi has quit [Remote host closed the connection]
kem has quit [Ping timeout: 480 seconds]
Lucretia has quit []
iive has quit []
jkrzyszt has quit [Ping timeout: 480 seconds]
mlankhorst has quit [Ping timeout: 480 seconds]
<alyssa> Is there a linux kernel helper for
<alyssa> writel(readl(reg) | arg, reg)
<alyssa> (i.e. set a bit for some config reg)
tobiasjakobi has quit [Remote host closed the connection]