#dri-devel on 2022-12-12 — irc logs at oftc.irclog.whitequark.org

2022-08-14 19:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:00 dliviu has quit [Ping timeout: 480 seconds]

00:08 dliviu has joined #dri-devel

00:11 jewins has joined #dri-devel

00:22 danvet has quit [Ping timeout: 480 seconds]

00:23 pcercuei has quit [Quit: dodo]

00:30 mhenning has joined #dri-devel

00:36 orbea1 has joined #dri-devel

00:37 orbea has quit [Ping timeout: 480 seconds]

00:38 orbea1 has quit []

00:38 <mareko> what defines __POPCNT__? I see in Mesa, but I don't see it defined

00:38 orbea has joined #dri-devel

00:40 <HdkR> mareko: Compiler

00:41 <HdkR> GCC defines it if the x86 host supports popcnt, so sse4.2 or abm

00:42 <mareko> it's not defined and I compile on zen2

00:43 <HdkR> march=native doesn't hit it? Guess it needs -mpopcnt explicitly which is an...oversight?

00:43 <mareko> it's not supported on all x86_64 CPUs

00:44 <mareko> or some of them rather

00:44 <airlied> yeah it needs sse4

00:45 <HdkR> Right, not all x86-64 CPUs support SSE4.2 or ABM :P

00:46 <airlied> SSE4a I think it is :-)

00:46 <HdkR> woof

00:47 <HdkR> ARM finally introduced a real popcount instruction in their 2023 ISA extension

00:47 <HdkR> prior to that you need to do GPR->FPR move, do popcount in vector instruction, then move back

00:49 <mareko> __builtin_popcount emulates it

00:50 <airlied> yeah the builtin should wrap it

00:51 <HdkR> https://godbolt.org/z/46jG4fM48 Indeed, it'll do the silly dance

00:53 <HdkR> -march=armv9.4-a will finally switch it to a single instruction op

00:54 <mareko> HdkR: how to know whether __builtin_bitcount doesn't emulate it? __POPCNT__?

00:55 <mareko> oh yeah

00:55 <HdkR> That's only defined on x86. ARM land you don't get any indication

01:01 <HdkR> Actually, now I'm surprised there isn't a CSSC extension define for this

01:04 <HdkR> Looks like they forgot to add `__ARM_FEATURE_CSSC`

01:18 co1umbarius has joined #dri-devel

01:20 columbarius has quit [Ping timeout: 480 seconds]

01:24 YuGiOhJCJ has joined #dri-devel

01:42 OftenTimeConsuming has quit [Remote host closed the connection]

01:43 OftenTimeConsuming has joined #dri-devel

02:04 <kisak> wow, that's mean ... in the persuit of trying to get my build environment conpatible with llvm-spirv-15, I updated spirv-tools to 2022.3-1, but surprise, the newer version doesn't provide libSPIRV-Tools.so which is used by glslangValidator. Do I just rebuild glslang for it not to use the missing shared library?

02:05 <kisak> (glslangValidator is called by the mesa build)

02:08 <airlied> kisak: spirv tools doesn't always build shared by default, some distro packages have hacks

02:08 <airlied> also some hacks stop working from time to time

02:09 <kisak> this case is that the runtime dependency package kicked the .so out from under glslang, for https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956510

02:13 <kisak> I think the right answer here is to just rebuild and test

02:16 YuGiOhJCJ has quit [Remote host closed the connection]

02:17 YuGiOhJCJ has joined #dri-devel

02:28 Company has quit [Quit: Leaving]

02:29 jewins has quit [Ping timeout: 480 seconds]

02:36 camus has joined #dri-devel

02:38 camus has quit [Read error: Connection reset by peer]

02:38 camus has joined #dri-devel

02:40 Danct12 has joined #dri-devel

02:42 Danct12 has quit []

02:43 Danct12 has joined #dri-devel

02:52 <kisak> A no change rebuild fixed the snafu.

02:58 egbert has quit [Remote host closed the connection]

03:01 <kisak> a day later of banging my head against the wall, an intel-clc enabled bionic build completed. Too bad I don't have any hardware to test if it actually works.

03:02 jkrzyszt has quit [Ping timeout: 480 seconds]

03:02 <anarsoul> hey folks, I'm trying to debug https://gitlab.freedesktop.org/mesa/mesa/-/issues/7862

03:04 <anarsoul> mpv CPU usage jumps x4 if it uses GL_ARB_buffer_storage

03:05 <anarsoul> perf says that most time is spent in decoding. Do I assume correctly that returning a pointer to write-combined BO from transfer_map() may not be a good idea?

03:08 <airlied> anarsoul: yes generally not a great plan

03:09 <anarsoul> how is it handled in other drivers?

03:09 egbert has joined #dri-devel

03:11 <airlied> anarsoul: copy to a staging cached one?

03:13 <anarsoul> how is it supposed to work for resources mapped with persistent | coherent?

03:14 Kayden has quit [Quit: Leaving]

03:14 Kayden has joined #dri-devel

03:19 <airlied> anarsoul: you are mean to have coherent cpu/gpu access to the resource

03:19 <airlied> on x86 that typically means the gpu is snooping on the cpu caches

03:22 <anarsoul> I mean when do I synchronize the copy if it's mapped as persistent and coherent, if coherent doesn't require memory_barrier() to be called?

03:24 <airlied> yeah you can't do a copy for coherent

03:25 <airlied> coherent mappings might be on the reasons GLES doesn't have GL_ARB_buffer_storage afaik

03:25 Danct12 has quit [Quit: Quitting]

03:25 Danct12 has joined #dri-devel

03:27 <anarsoul> I guess the easiest workaround would be hiding PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT on lima behind a debug flag

03:42 maxzor_ has quit [Ping timeout: 480 seconds]

03:43 YuGiOhJCJ has quit [Remote host closed the connection]

03:44 YuGiOhJCJ has joined #dri-devel

03:57 ran has joined #dri-devel

04:08 ran has quit [Remote host closed the connection]

04:10 lemonzest has joined #dri-devel

04:15 <Lynne> airlied: ping, could you look at the ref frame problem? I'm still working on encoding

04:15 <airlied> Lynne: been staring at it, but in the sense of void, not in a useful sense

04:16 <Lynne> stare a bit into encoding if that clears your mind

04:16 <Lynne> on a related note, the encoding api has some issues that have to be addressed before it's stable

04:17 <airlied> the video I'm looking at doesn't even decdoe it's first frame properly

04:17 <Lynne> make sure B-frames are missing

04:17 <airlied> that nvidia fix looks very wrong though

04:17 <Lynne> no AFT = frames should be printed

04:17 <Lynne> ideally, the first frame should not have any refernces at all

04:17 <airlied> oh I thought we had B-frames :)

04:17 <Lynne> most hardware encoders generate that

04:18 <DavidHeidelberg[m]> <anarsoul> "I guess the easiest workaround..." <- Having different codepath on release and debug sounds bad. What about enable it with driver related variable?

04:18 <Lynne> nope, let's tackle those after we get P-frames working, though they should just work once we do that

04:19 <Lynne> anyway, with encoding, the intention behind VK_VIDEO_ENCODE_RATE_CONTROL_MODE_NONE_BIT_KHR seems to be to let users implement their own rate control (as well they should), and letting them set quantizers

04:19 <Lynne> but there's no way for users to override the frame type the encoder uses

04:19 <Lynne> apart from doing a full RESET, but that's a nuclear option

04:20 <Lynne> if VK_VIDEO_ENCODE_RATE_CONTROL_MODE_NONE_BIT_KHR, there ought to be an enum in the same struct that can override the frame type used

04:26 <airlied> Lynne: still seems like slot index is wrong

04:30 <Lynne> I copied what nvdec does, since it's the closest API-wise to vulkan

04:38 <Lynne> if it's any consolation, av1 should be even simpler than h264, thought will require some invasive spec changes

04:39 <Lynne> (av1 requires you to be able to feed packets into an encoder without being able to output a frame)

04:41 aravind has joined #dri-devel

04:45 rsalvaterra_ has joined #dri-devel

04:45 rsalvaterra is now known as Guest1853

04:45 rsalvaterra_ is now known as rsalvaterra

04:49 Guest1854 has quit [Ping timeout: 480 seconds]

04:51 YuGiOhJCJ has quit [Remote host closed the connection]

04:51 YuGiOhJCJ has joined #dri-devel

05:00 YuGiOhJCJ has quit [Remote host closed the connection]

05:00 YuGiOhJCJ has joined #dri-devel

05:17 heat has quit [Ping timeout: 480 seconds]

05:18 bmodem has joined #dri-devel

05:28 SanchayanMaity_ has quit []

05:28 SanchayanMaity has joined #dri-devel

05:43 srslypascal is now known as Guest1859

05:43 srslypascal has joined #dri-devel

05:50 Guest1859 has quit [Ping timeout: 480 seconds]

05:54 Duke`` has joined #dri-devel

05:55 Jeremy_Rand_Talos__ has quit [Remote host closed the connection]

05:56 Jeremy_Rand_Talos__ has joined #dri-devel

06:03 mhenning has quit [Quit: mhenning]

06:16 itoral has joined #dri-devel

06:23 fab has joined #dri-devel

06:28 Duke`` has quit [Ping timeout: 480 seconds]

06:32 bgs has joined #dri-devel

06:50 aravind has quit [Ping timeout: 480 seconds]

07:00 YuGiOhJCJ has quit [Remote host closed the connection]

07:01 YuGiOhJCJ has joined #dri-devel

07:17 fab has quit [Quit: fab]

07:25 gouchi has joined #dri-devel

07:25 gouchi has quit [Remote host closed the connection]

07:28 <airlied> Lynne: my branch + https://paste.centos.org/view/raw/7af69b8c seems to get it working here

07:28 alanc has quit [Remote host closed the connection]

07:29 alanc has joined #dri-devel

07:30 <airlied> or at least seems to blow up a bit later for me

07:31 Jeremy_Rand_Talos_ has joined #dri-devel

07:32 Jeremy_Rand_Talos__ has quit [Remote host closed the connection]

07:33 <airlied> Lynne: good to know if it helps on the nvidia side

07:37 bgs has quit [Remote host closed the connection]

07:39 <airlied> Lynne: but still not sure I've wrapped my head about how it's meant to work yet!

07:49 rasterman has joined #dri-devel

07:58 frieder has joined #dri-devel

07:59 tursulin has joined #dri-devel

08:03 fab has joined #dri-devel

08:05 jfalempe has joined #dri-devel

08:05 Jeremy_Rand_Talos_ has quit [Read error: Connection reset by peer]

08:05 Jeremy_Rand_Talos_ has joined #dri-devel

08:07 danvet has joined #dri-devel

08:11 tzimmermann has joined #dri-devel

08:21 OftenTimeConsuming has quit [Remote host closed the connection]

08:23 OftenTimeConsuming has joined #dri-devel

08:33 narmstrong_ has quit []

08:33 narmstrong has joined #dri-devel

08:36 sgruszka has joined #dri-devel

08:50 sarahwalker has joined #dri-devel

08:52 swalker_ has joined #dri-devel

08:53 swalker_ is now known as Guest1869

08:55 dcz_ has joined #dri-devel

08:58 sarahwalker has quit [Ping timeout: 480 seconds]

09:00 lynxeye has joined #dri-devel

09:01 camus has quit [Remote host closed the connection]

09:01 camus has joined #dri-devel

09:06 vliaskov has joined #dri-devel

09:07 jkrzyszt has joined #dri-devel

09:17 ra has joined #dri-devel

09:17 ra has quit []

09:20 anon_apple has joined #dri-devel

09:25 mvlad has joined #dri-devel

09:30 anon_apple has quit []

09:33 apinheiro has joined #dri-devel

09:39 kts has joined #dri-devel

09:47 ahajda_ has joined #dri-devel

09:50 pcercuei has joined #dri-devel

09:57 fab has quit [Read error: Connection reset by peer]

09:57 fab_ has joined #dri-devel

09:57 fab_ is now known as Guest1875

10:10 repetitivestrain has quit [Read error: Connection reset by peer]

10:10 MajorBiscuit has joined #dri-devel

10:11 camus has quit [Remote host closed the connection]

10:12 camus has joined #dri-devel

10:17 heat has joined #dri-devel

10:25 kts has quit [Quit: Leaving]

10:36 jkrzyszt has quit [Remote host closed the connection]

10:42 devilhorns has joined #dri-devel

10:43 jkrzyszt has joined #dri-devel

10:57 tintou has joined #dri-devel

10:57 <tintou> Is there anything happening to the GitLab runners? My MR seems to be stuck https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20277

10:58 camus1 has joined #dri-devel

11:04 <daniels> tintou: most of them are fine, but Windows is currently stuck behind some long-running GStreamer jobs

11:05 camus has quit [Ping timeout: 480 seconds]

11:09 maxzor_ has joined #dri-devel

11:11 maxzor__ has joined #dri-devel

11:18 maxzor_ has quit [Ping timeout: 480 seconds]

11:38 Haaninjo has joined #dri-devel

12:00 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

12:06 jkrzyszt has quit [Remote host closed the connection]

12:07 devilhorns has quit [Remote host closed the connection]

12:08 Company has joined #dri-devel

12:13 devilhorns has joined #dri-devel

12:15 heat has quit [Read error: No route to host]

12:15 heat has joined #dri-devel

12:15 jkrzyszt has joined #dri-devel

12:18 junaid has joined #dri-devel

12:24 junaid has quit [Remote host closed the connection]

12:32 heat has quit [Remote host closed the connection]

12:33 heat has joined #dri-devel

12:37 djbw has quit [Read error: Connection reset by peer]

12:37 jkrzyszt has quit [Remote host closed the connection]

12:42 Danct12 has quit [Remote host closed the connection]

13:04 YuGiOhJCJ has joined #dri-devel

13:06 aravind has joined #dri-devel

13:12 itoral has quit [Remote host closed the connection]

13:12 jkrzyszt has joined #dri-devel

13:23 Kayden has quit [Quit: Leaving]

13:23 Kayden has joined #dri-devel

13:49 Akari has joined #dri-devel

13:52 <jenatali> daniels: I don't suppose you've come up with any way to un-prioritize those jobs?

13:54 fxkamd has quit []

13:59 fxkamd has joined #dri-devel

14:28 <daniels> jenatali: GitLab doesn't give us a way :( we just need to keep leaning on them to try to optimise them somehow

14:28 <daniels> or get another machine

14:29 <jenatali> :(

14:36 YuGiOhJCJ has quit [Remote host closed the connection]

14:36 YuGiOhJCJ has joined #dri-devel

14:41 maxzor__ has quit [Ping timeout: 480 seconds]

14:42 jagan_ has joined #dri-devel

14:47 pekkari has joined #dri-devel

14:49 heat has quit [Remote host closed the connection]

14:49 heat has joined #dri-devel

14:51 jkrzyszt has quit [Remote host closed the connection]

15:01 jkrzyszt has joined #dri-devel

15:01 jewins has joined #dri-devel

15:04 fxkamd has quit []

15:04 fxkamd has joined #dri-devel

15:08 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

15:10 cwabbott_ has quit []

15:10 maxzor__ has joined #dri-devel

15:11 cwabbott has joined #dri-devel

15:12 dcz_ has quit [Ping timeout: 480 seconds]

15:13 <cwabbott> do any WSI people know if I'd need to define a modifier for subsampled images on Qualcomm?

15:14 <cwabbott> it's an already-existing concept in VK_EXT_fragment_density_map, which is a normal image combined with some metadata telling the user to sample at reduced resolution in some regions

15:15 Guest1875 has quit []

15:16 <cwabbott> on qualcomm the implementation is pretty much entirely by the driver, so the format of the metadata is up to the driver

15:17 <cwabbott> the usecase for sharing would mainly be passing a subsampled image to a VR compositor to do barrel distortion

15:19 aravind has quit [Ping timeout: 480 seconds]

15:21 <cwabbott> it seems like things would already Just Work if producer and consumer both pretend it's a normal image but pass the subsampled bit when creating it, how we implement the metadata never changes, etc.

15:27 bgs has joined #dri-devel

15:28 kts has joined #dri-devel

15:30 kts has quit []

15:32 Leopold_ has quit [Remote host closed the connection]

15:32 kts has joined #dri-devel

15:33 Leopold_ has joined #dri-devel

15:41 sgruszka has quit [Ping timeout: 480 seconds]

15:46 JohnnyonFlame has joined #dri-devel

15:52 mbrost has joined #dri-devel

15:53 heat_ has joined #dri-devel

15:53 heat has quit [Read error: Connection reset by peer]

16:04 jkrzyszt has quit [Remote host closed the connection]

16:05 <jekstrand> cwabbott: Uh... Maybe?

16:05 <jekstrand> cwabbott: Depends on how much auto-magic you expect there to be.

16:05 <jekstrand> cwabbott: Is it going to be re-imported as a fragment density image? Or are you expecting it to be magic metadata that gets used by the sampler?

16:07 psykose has quit [Remote host closed the connection]

16:08 psykose has joined #dri-devel

16:08 <cwabbott> jekstrand: it would be re-imported and used as a subsampled image in the compositor

16:09 <jekstrand> cwabbott: If it's just a matter of sharing an R8 image that's going to be used as a fragment density map on both sides, you don't need anything special, I don't think, unless it has some special tiling or something like that.

16:10 <jekstrand> If it's being handled as a metadata plane attached to a shared color image, then you probably need a modifier for it.

16:10 <cwabbott> a fragment denisty map isn't anything special

16:10 <cwabbott> a subsampled image has a separate metadata plane (sort-of) with the density to sample at

16:10 <cwabbott> that's derived from but not the same as the fragment density map

16:10 jkrzyszt has joined #dri-devel

16:12 Duke`` has joined #dri-devel

16:12 <cwabbott> the FDM is just a normal 2-channel image that gets added as an extra attachment to the render pass

16:13 tzimmermann has quit [Quit: Leaving]

16:14 <jekstrand> Yeah, that sounds like a metadata plane which needs a modifier

16:18 fab has joined #dri-devel

16:23 maxzor__ has quit [Ping timeout: 480 seconds]

16:24 <emersion> how does it interact with other existing Qualcomm modifiers?

16:26 <emersion> a buffer can only have a single modifier

16:26 <cwabbott> that wouldn't be so bad I guess

16:26 <cwabbott> emersion: there's only one, for UBWC, so I guess we'd have to duplicate them

16:26 <emersion> a common startegy is to reserve bits for each use

16:26 <emersion> but you can just duplicate and come up with a bit layout later

16:26 <emersion> just don't paint yourself in a corner, it's good to think about future uses and extensibility

16:27 <jekstrand> It's also really easy to over-think it like the NV modifiers which have bits reserved for mipmapping as if that'll ever happen.

16:27 <flto> cwabbott: comparing UUIDs and creating the image with identical parameters is enough, you don't need a modifier (unless your use-case actually involves passing the images through WSI, which doesn't seem like a good idea)

16:28 <cwabbott> flto: I was asking the right way to do it, not whether it would work

16:29 <cwabbott> of course that would work but comparing UUIDs etc. is not expected usage for the dma-buf extensions

16:29 <emersion> the compositor and its clients might be running different versions of the drivers

16:30 ybogdano has joined #dri-devel

16:30 jkrzyszt has quit [Remote host closed the connection]

16:30 <emersion> cwabbott: is this Qualcomm-specific or not?

16:30 <cwabbott> emersion: iirc ARM also does subsampled images but I think their implementation is a bit different

16:31 <emersion> VK_EXT_fragment_density_map makes it sound like it's not, but i know nothing about this stuff

16:31 <cwabbott> ARM and Qualcomm are the two that implement VK_EXT_fragment_density_map

16:32 <emersion> hm, and VK_EXT_fragment_density_map is only about the shader invocation i suppose?

16:32 <emersion> not the buffer itself?

16:32 <cwabbott> it defines how to create a subsampled image, too

16:33 <cwabbott> I'm not sure if anyone thought about passing it between processes

16:33 <flto> cwabbott: it is not a "wrong" way to use dma bufs either (VK_EXT_external_memory_dma_buf can exist without VK_EXT_image_drm_format_modifier)

16:34 <cwabbott> flto: you can't bind an image without VK_EXT_image_drm_format_modifier

16:34 <cwabbott> so no, it is a wrong way

16:35 <cwabbott> on linux, you cannot get guarantees about image layout matching across processes without modifiers

16:35 camus1 has quit [Remote host closed the connection]

16:36 camus has joined #dri-devel

16:37 junaid has joined #dri-devel

16:38 <cwabbott> on ARM, I think subsampled images are like mipmapped images and there's still a metadata but it tells which "mip" to use

16:39 <cwabbott> not sure if it literally is mipmapped or not

16:39 jessica_248 is now known as jessica_24

16:42 <flto> cwabbott: dma bufs are just a type of external memory.. I don't think there's an exception that you can't use dma bufs for images like other external memory types

16:44 camus has quit [Ping timeout: 480 seconds]

16:46 <cwabbott> flto: I don't think it provides any guarantees that it'll work either, and in practice it won't be accepted upstream because no one does that

16:46 <cwabbott> the intended way of handling this is modifiers

16:47 <flto> cwabbott: there are guarantees that external memory can be used for images with matching UUIDs

16:50 lynxeye has quit [Quit: Leaving.]

16:51 cheako_ has quit []

16:52 cheako has joined #dri-devel

16:58 frieder has quit [Remote host closed the connection]

17:02 Leopold_ has quit [Ping timeout: 480 seconds]

17:03 alyssa has joined #dri-devel

17:04 <alyssa> Is it legal to call bind_sampler_states multiple times to update subranges?

17:04 <alyssa> Gallium docs say no:

17:04 Leopold_ has joined #dri-devel

17:04 jkrzyszt has joined #dri-devel

17:04 <alyssa> sampler states are bound... with the ``bind_sampler_states`` function. The ``start`` and ``num_samplers`` parameters indicate a range of samplers to change. NOTE: at this time, start is always zero and the CSO module will always replace all samplers at once (no sub-ranges). This may change in the future.

17:04 <alyssa> But Nine does it anyway.

17:05 <alyssa> Are the docs out of date? or is nine buggy?

17:05 <alyssa> softpipe/llvmpipe explicitly handle the subrange case

17:05 <alyssa> freedreno implicitly does

17:05 <alyssa> zink/v3d do not handle the case

17:06 <alyssa> Either we need to update the docs and then update n-3 drivers to handle subranges

17:06 <alyssa> or we need to update nine

17:06 djbw has joined #dri-devel

17:07 <alyssa> mareko: DavidHeidelberg[m]: robclark: Kayden: ^

17:08 mbrost has quit [Ping timeout: 480 seconds]

17:08 <alyssa> q4a[m]: sent a patch to make panfrost honour subranges like softpipe, which fixes nine rendering in a game

17:09 <alyssa> but merging the patch on its own seems surely wrong -- either docs + other drivers also need to be patched, or nine does.

17:09 <robclark> alyssa: `unsigned p = i + start`

17:10 <alyssa> robclark: The issue is what happens to existing samplers [start + num, ...)

17:10 <alyssa> are they implicitly unbound?

17:10 q4a has joined #dri-devel

17:10 vhebert has left #dri-devel [#dri-devel]

17:10 <alyssa> if not, the driver has to be a lot more careful when calculating num_samplers

17:11 <robclark> they are unchanged

17:12 <robclark> I'm pretty sure this topic comes up every couple of years ;-)

17:12 <alyssa> okay.. so it is zink and v3d and panfrost and etnaviv and the docs that are wrong?

17:14 <robclark> I guess the part about it "always zero" is wrong in the docs, if nine uses it

17:14 <alyssa> seemingly

17:20 <anarsoul> DavidHeidelberg[m]: on the other hand, we already have MESA_EXTENSION_OVERRIDE

17:21 <jenatali> alyssa: Relevant: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/frontends/d3d10umd/Shader.cpp#L250

17:21 tursulin has quit [Ping timeout: 480 seconds]

17:21 <jenatali> Seems it was a change at some point

17:21 mbrost has joined #dri-devel

17:21 <anarsoul> alyssa: out of curiosity, have you tested mpv on panfrost with PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT exposed?

17:21 flibit has quit []

17:21 flibitijibibo has joined #dri-devel

17:22 <anarsoul> I expect it to have the same issue as lima on platforms where it uses write-combined BOs

17:30 JohnnyonFlame has quit [Read error: No route to host]

17:31 Guest1869 has quit [Remote host closed the connection]

17:31 devilhorns has quit []

17:35 <alyssa> anarsoul: I don't think I have, no

17:35 <alyssa> What's the issue on lima?

17:36 <alyssa> jenatali: I see

17:37 <anarsoul> alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7862

17:38 junaid has quit [Ping timeout: 480 seconds]

17:38 <alyssa> ughhh

17:38 abhinav__3 is now known as abhinav__

17:38 <alyssa> not sure what the driver is supposed to do

17:38 <alyssa> they asked for a direct map of a persistent coherent buffer, we gave them one

17:39 ybogdano has quit [Ping timeout: 480 seconds]

17:39 <anarsoul> yeah, now they are decoding the video directly into the buffer

17:40 <alyssa> again I don't see how this is Mesa's fault

17:40 <anarsoul> the spec doesn't say anything on whether the buffer is expected to be cached or not

17:41 <alyssa> i guess if we're at a stalemate with mpv, we can driconf it..

17:41 <alyssa> frustrating though.

17:42 <anarsoul> I wonder if mpv devs would consider adding a command line (and config) option for that

17:43 <anholt_> one should expect the buffer to be uncached. it would be very rare for a persistent coherent mapped buffer to be cached. sounds like a busted app.

17:43 junaid has joined #dri-devel

17:44 <anarsoul> anholt_: it's mpv

17:44 kts has quit [Quit: Leaving]

17:45 MajorBiscuit has quit [Ping timeout: 480 seconds]

17:47 ybogdano has joined #dri-devel

17:47 <anarsoul> I believe if they used a staging buffer for decoder and memcpy-ed their decoded image into the buffer it would be faster than doing glTexSubImage()

17:51 bmodem has quit [Ping timeout: 480 seconds]

18:01 Akari has quit [Quit: segmentation fault (core dumped)]

18:01 junaid has quit [Ping timeout: 480 seconds]

18:23 junaid has joined #dri-devel

18:26 lemonzest has quit [Quit: WeeChat 3.6]

18:35 dcz_ has joined #dri-devel

18:44 <airlied> anholt_: huh? pretty sure they are cached on x86

18:45 <airlied> esp with pcie gpus

18:50 <jekstrand> Yeah, typically it'd be WC if you get an actual VRAM map and cached if it's in system RAM.

18:52 <jekstrand> On Mali, I'd hope you at least can do WC for staging stuff but IDK how the memory works there.

18:52 <jekstrand> On Intel, we had it really nice with shared LLC.

18:52 <alyssa> jekstrand: panfrost and presumably lima map everything WC

18:52 <alyssa> everything

18:53 <alyssa> the downstream Arm driver has more caching/coherency flags available but that was never ported upstream

18:53 <alyssa> ad I'm not even sure what we would do with them in either GL or VK

18:56 <jekstrand> Yeah, IDK either, TBH.

18:57 <jekstrand> Maybe you can do something better than WC for CPU reads but if your app is CPU-read-of-GPU-mem-bound, you've got problems.

18:57 * jekstrand chuckles in vkCmdDrawIndexed()

18:58 <airlied> alyssa: there is no gpu snoops the cpu cache?

19:00 <HdkR> You're lucky to have that in ARM land :P

19:02 <graphitemaster> Is it a safe assumption that gl_LocalInvocationID / 32 (or 64) is the same as gl_SubgroupID? Like do most implementations that execute subgroups in lock-step allocate them in terms of full warps/waves?

19:02 <alyssa> airlied: "full system coherency" -- Mali supports that architecturally but not all(?) mali SoCs do

19:03 <alyssa> and even with full system coherency, I think WC slows down reads and helps streaming writes

19:03 <jekstrand> graphitemaster: Safe? Not necessarily. But with the right Vulkan version/extensions it is.

19:04 <graphitemaster> jekstrand, This is more for a fallback for when a GPU does not advertise the KHR extension and only the ARB ballot one.

19:04 <jekstrand> Yeah, no, nothing's safe there.

19:04 <alyssa> jekstrand: we don't joke about indexed draws on bifrost

19:05 pekkari has quit [Quit: Konversation terminated!]

19:05 <graphitemaster> Okay, maybe not safe, but on a scale of like 0 it won't work to 10 it will always work, what are we looking at here?

19:05 <jekstrand> alyssa: No, we just joke about all of bigrost. :P

19:05 <alyssa> graphitemaster: 7?

19:06 <alyssa> IDK

19:06 <graphitemaster> Hay that's pretty good!

19:06 <jekstrand> graphitemaster: I was going to go 9 but that 1 will bite you.

19:06 <graphitemaster> So far you two have given me a range of 70 to 90% success rate, I can take those chances.

19:06 <alyssa> jekstrand: uh oh

19:07 <alyssa> jekstrand: also, I'm pretty sure Arm just kinda retconned Bifrost out of existence

19:07 <jekstrand> graphitemaster: I'll give you a 98% chance someone will file a bug report, though. :)

19:07 <alyssa> Utgard -> Midgard -> Valhall is the official progression (-:

19:09 <airlied> alyssa: so you pick full coherent for coherent maps and wc for others id assume

19:11 <alyssa> oh, that's fair

19:11 <alyssa> still requires UAPI changes which i am not excited for

19:11 <alyssa> although I think these are properties of the BO, not the transfer..

19:13 <graphitemaster> jekstrand, The number of GPUs that don't support other vendor or the KHR extension already reduces the chances of even hitting a ARB_ballot only fallback, so Bayesian says 0% :P

19:14 <jekstrand> graphitemaster: Or it makes the bug report that much harder to reproduce. :P

19:14 <jekstrand> If there's that few of them, why have the fallback at all?

19:16 <agd5f> airlied, anholt_ PCI spec says device access should be cache coherent with the CPU. Non-coherent access is an optional feature which that platform can provide at it's discretion.

19:17 <anholt_> agd5f: device access of main mem, right?

19:17 mbrost has quit [Ping timeout: 480 seconds]

19:19 <anholt_> but not cpu accesses of vram

19:19 flto has quit [Ping timeout: 480 seconds]

19:19 <anholt_> so, mapping some gpu buffer and assuming you'll get cached performance is super wrong.

19:19 <agd5f> anholt_, yes, device access to CPU memory

19:20 flibitijibibo has quit [Quit: Leaving]

19:22 <agd5f> I.e., pci devices should snoop the CPU caches by default

19:23 <alyssa> anholt_: i assume mpv has never seen a device that uses the GPU but not the VPU, gets WC persistent mappings from the GPU, and has a CPU slow enough to notice the problem

19:23 <alyssa> without all 4 of those, you don't get this

19:24 epoll has quit [Ping timeout: 480 seconds]

19:24 <anholt_> "video playback on linux just eats a ton of cpu" is unfortunately normal, it's true.

19:24 <alyssa> perf was fine before by accident, I guess ... wiring up ARB_buffer_storage to lima led to jenga

19:26 <airlied> yeah i question wiring up buffer storage on non coherent platfor.s

19:27 <alyssa> brrrrr

19:27 <alyssa> maybe I should just stop writing code

19:27 <alyssa> that way I can't introduce any more regressions

19:32 <agd5f> in practice, it seems like x86 and PPC are the only platform with PCIe that seem to get this right. IIRC ARM requires some optional IP that no one uses and most others just assume non-coherent

19:33 epoll has joined #dri-devel

19:33 <alyssa> yeah most Arm hw is PCIe-deficient

19:34 <agd5f> ARM is actually probably 50/50. Depends on the SoC

19:34 <alyssa> fair

19:38 <DemiMarie> To what extent will i915 VFs be able to communicate directly with the GuC?

19:40 <airlied> don't think that is the design yet

19:40 <DemiMarie> That is good. I was worried that VFs could submit commands via the GuC and the GuC would parse them.

19:41 <DemiMarie> Even if those commands are restricted to unprivileged ones, there is still the worry of memory unsafety.

19:41 <DemiMarie> On the other hand, if the VFs can only talk to hardware, I am less worried.

19:42 <alyssa> ew, GuC :p

19:43 <airlied> actually not sure with SRIOV what the design is

19:43 junaid has quit [Ping timeout: 480 seconds]

19:44 <DemiMarie> Why did Intel start requiring the GuC anyway?

19:45 <alyssa> yeah what the GuC is up with that

19:46 <airlied> because their execlist driver was vendored beyond maintainable

19:49 mbrost has joined #dri-devel

19:49 <airlied> and their windows driver was moving that directin

19:51 <danvet> DemiMarie, vf submits to guc like pf

19:51 <danvet> like that's why this thing exists pretty much

19:55 <anarsoul> alyssa: perf was suboptimial even without ARB_buffer_storage

19:55 junaid has joined #dri-devel

19:57 <alyssa> anarsoul: apparently i wasn't even the one to hook up that CAP for lima, this one ain't on me :-D

19:57 <anarsoul> alyssa: yeah, that was me :)

19:57 <alyssa> :-D

19:59 <anarsoul> yet I'm not sure if it's better to disable it, driconf it for mpv or keep it as is

19:59 <alyssa> best is to patch mpv i suppose

19:59 <alyssa> driconf second best

20:00 <anarsoul> I'm hesitant to touch mpv code

20:01 <anarsoul> do we happen to have any mpv maintainers here? :)

20:03 <psykose> could open an issue on mpv they all read it

20:04 <DemiMarie> airlied: “vendored beyond maintainable”?

20:04 <DemiMarie> danvet: do you mean that VFs are why the GuC is present? Ouch.

20:05 <danvet> not entirely, but the vf design entirely relies on guc

20:05 <DemiMarie> marmarek: how worried are you about GuC firmware vulns?

20:05 <danvet> at least how you drive it, I have no idea how much of the isolation is hw assisted

20:06 <DemiMarie> Does the GuC firmware validate its inputs properly?

20:06 bgs has quit [Remote host closed the connection]

20:06 <airlied> DemiMarie: it's firmware, so yes it does, except when it doesn't and there's a CVE

20:08 <DemiMarie> airlied: that is obviously true, but not an answer 😆

20:13 <danvet> airlied, sometimes it's also a shrödinger CVE

20:13 <DemiMarie> What is that?

20:13 <danvet> only stops being a superposition when someone looks

20:13 <danvet> then it either becomes correct code or a real cve :-)

20:14 <DemiMarie> I guess worst case is that a patched driver could force everything to be proxies through the host.

20:14 <DemiMarie> Which could validate the command stream

20:15 <danvet> nah, can't do that anymore

20:15 <DemiMarie> Why?

20:15 <danvet> we had to on gen7, and nowadays the cmd stream is just too flexible that you can parse it without time-of-check vs -use issues

20:16 <DemiMarie> What do you mean? Too many pointers?

20:16 <danvet> like for rtx the gpu actually creates some of the batches and chains them, and I think media does similar stuff

20:17 <DemiMarie> Userspace direct submit is a nightmare

20:17 <danvet> but also cmdparser checking would mean the mmu is busted

20:17 <danvet> at that point it's hopeless

20:17 <DemiMarie> What do you mean?

20:17 <danvet> if it's just guc, then you "just" virtualize the guc queues, those are fairly simple

20:17 <danvet> DemiMarie, the stuff I mentioned pretty much means you either have a working mmu to separate userspace cmd submission

20:18 <danvet> or you don't have a secure gpu

20:18 <agd5f> plus MS is pushing scheduling in hw/fw: https://devblogs.microsoft.com/directx/hardware-accelerated-gpu-scheduling/

20:18 rmckeever has joined #dri-devel

20:18 <jenatali> Yep

20:18 <danvet> and so cmd parse would boil down to "implement the gpu in sw"

20:18 <danvet> yeah so hopefully it actually works :-)

20:18 <DemiMarie> danvet: I trust Intel to get the MMU right. It's buffer overflows in fw I am worried about.

20:19 <danvet> agd5f, question was more about VF/PF and fw stuff, which is another can of worms from userspace direct submit

20:19 <DemiMarie> Hence why I was specifically asking about the GuC.

20:19 <DemiMarie> For instance Apple GPU fw does not validate its inputs at all

20:20 <agd5f> Guc or similar is also a solution to the GPU scheduling

20:20 gouchi has joined #dri-devel

20:22 <danvet> agd5f, yeah, but at least with ours userspace never talks to guc directly

20:22 <danvet> it's only the kernel driver, and then guc just schedules the contexts which are (per the kernel driver) marked as runnable

20:22 <danvet> (we're not yet at hw doorbell actually working)

20:23 <danvet> so guc never (currently at least) sees anything created by userspace, only the kernel

20:24 <alyssa> danvet: oh this is going to be fun

20:24 <anarsoul> psykose: done

20:24 <alyssa> the new mali's are based around userspace submission with the doorbell piped directly to userspace

20:24 <psykose> anarsoul: saw :) nice

20:24 <alyssa> and firmware---userspace sync with the kernel completely out of the loop

20:24 <alyssa> Arm has this working in their downstream non-DRM kernel driver and their blob GL/VK

20:25 flibitijibibo has joined #dri-devel

20:25 <alyssa> I get dizzy thinking about what this means for mainline

20:25 <danvet> alyssa, ask jekstrand

20:25 <airlied> yeah then weap in a corner

20:26 <jenatali> That's the eventual goal for Windows too IIRC

20:26 <alyssa> airlied: that's basically what jekstrand said

20:26 <alyssa> jenatali: fwiw macOS does NOT do this at least for M1

20:27 <alyssa> and i have yet to see compelling evidence that it actually matters.

20:27 <jenatali> Yeah, I said eventual. We're not there yet :)

20:28 warpme_____ has joined #dri-devel

20:30 Akari has joined #dri-devel

20:35 flto has joined #dri-devel

20:46 apinheiro has quit [Ping timeout: 480 seconds]

20:48 <DemiMarie> alyssa: can you continue to not do that?

20:51 <alyssa> DemiMarie: what's this replying to?

20:51 <alyssa> I made a lot of comments about things I should continue to not do :-p

20:51 <DemiMarie> Because in Qubes one of our goals is secure GPU virt, and ( marmarek feel free to correct me) direct guest-firmware interaction makes this very hard

20:52 <alyssa> Oh, that's a very interesting point

20:52 <alyssa> Yes, we can continue to use the existing "submit ioctl" model

20:52 <alyssa> In effect, the stuff that Arm designed to run in userspace will just run in kernel space instead

20:53 <alyssa> with a possible performance cost due to extra ctx switching but I find it hard to be too bothered by that right now

20:55 <alyssa> I think we will support that model for the hw's lifetime regardless of if we ever add direct user-fw submit as a fast path later

20:55 <DemiMarie> My worry is that future hardware will require the fast path.

20:55 <alyssa> yes, I hear you

20:57 <DemiMarie> If I trusted the GPU firmware to do proper input validation that would be a totally different story.

20:58 <alyssa> I don't and you shouldn't

20:58 <DemiMarie> Have you found cases where it didn't?

21:00 <DemiMarie> danvet: could i915 virtualize the GuC queues? Would this be good from a security perspective?

21:01 <alyssa> DemiMarie: it's a blob, ain't like we've audited that thing

21:02 mbrost has quit [Ping timeout: 480 seconds]

21:03 pixelcluster has quit [Remote host closed the connection]

21:03 pixelcluster_ has joined #dri-devel

21:03 pixelcluster_ has quit [Remote host closed the connection]

21:03 pixelcluster has joined #dri-devel

21:08 junaid has quit [Ping timeout: 480 seconds]

21:16 hays_ has joined #dri-devel

21:20 mbrost has joined #dri-devel

21:20 maxzor has joined #dri-devel

21:27 <DemiMarie> alyssa: anyone considered clean-room REing an open-source replacement?

21:28 <DemiMarie> alyssa: do you fuzz the kernel driver?

21:31 <alyssa> FOSS replacement for that blob is definitely a possibility, yeah

21:31 <alyssa> I started work on it but right now more interested in supporting the hardware in mainline at all

21:33 <agd5f> alyssa, likely hw won't load it unless it's signed

21:33 <alyssa> agd5f: it will, we have code execution on the thing

21:33 <hays_> Is the Mali-G610 MP4 3D GPU under the provenance/support of panfrost?

21:33 <alyssa> hays_: Yes, that's the hardware under discussion right now

21:34 <hays_> joining a bit late

21:34 <alyssa> It isn't supported upstream yet, but we're working (actively) on it

21:34 <alyssa> If you need GPU acceleration, I recommend against buying RK3588 boards until there's upstream support

21:34 <hays_> I know next to nothing about the details, but I have the hardware and should be in a position to test soon

21:35 <alyssa> (Even aside from GPU, the mainline support isn't there yet for everything else)

21:35 <hays_> yeah im very aware. haha :)

21:36 mvlad has quit [Remote host closed the connection]

21:36 * CounterPillow nervously glanced at the unimplemented clock gating stuff

21:36 <alyssa> woof

21:36 <alyssa> I'd really rather not support the downstream kernel in upstream Mesa, so that's also a complicating factor for Mesa support

21:37 <alyssa> and in an unusual twist, the hard part is going to be kernel support, not Mesa

21:37 <alyssa> the Mesa side is "basically" identical to Mali-G57

21:37 srslypascal has quit [Ping timeout: 480 seconds]

21:38 <alyssa> (and by "basically" I mean a 5KLOC branch but shhh a big chunk of that is copy/paste...)

21:38 <alyssa> all of this is coming soon to upstream but not there yet

21:38 <CounterPillow> anarsoul: psykose: thanks for the mpv bug report, we're currently discussing disabling DR by default until we have an implementation that uses staging buffers. Pls don't driconf us

21:39 <anarsoul> CounterPillow: thanks for looking into it!

21:39 <alyssa> CounterPillow: sorry for suggesting driconf, I might've been a bit too trigger happy there ;-p

21:39 <hays_> alyssa: is the idea to use the mali_csffw.bin firmware or to avoid it

21:39 <alyssa> hays_: right, that's what we were discussing a few minutes ago

21:39 <hays_> i must have just missed it

21:39 <alyssa> yeah

21:39 <hays_> sorry

21:39 <alyssa> no worries

21:40 <alyssa> at a technical level, it's possible to avoid

21:40 <alyssa> there's no sig checking, we have code exec on the MCU

21:40 <alyssa> having *some* firmware is mandatory but it doesn't have to be Arm's

21:41 <alyssa> That said -- we expect the mali_csffw.bin <--> kernel interface to remain relatively stable, but the physical firmware <--> hardware interface to change unpredictably for future Malis

21:41 <alyssa> so bringing up a FOSS firmware right now seems like it might be painting ourselves into a corner

21:42 <alyssa> So I think the current plan is to teach the mainline kernel about mali_csffw.bin

21:42 <alyssa> and teach Mesa to use the mainline kernel with Mali-G610

21:43 <alyssa> after there's mainline support shipping, then we can revisit if we maybe want to free the firmware too, and I should have a lot of code to drop if fd.o folks want to pursue that

21:43 <alyssa> but we'd probably keep both code paths in the kernel if only to ease bring up of future Malis

21:43 apinheiro has joined #dri-devel

21:43 <alyssa> no sense cutting off our nose to spite our face.

21:44 <hays_> yeah all seems very sensible

21:48 <alyssa> bbrezillon: ^^

21:48 <alyssa> hays_: let's be clear they're not *my* sensible decisions ;)

21:48 <anarsoul> alyssa: does ARM allow mali_csffw.bin redistribution?

21:49 <alyssa> anarsoul: don't remember

21:49 <alyssa> hays_: if it were only up to me, well, as soon as I got my hands on the hardware https://rosenzweig.io/TheCSFMCU.png happened :-p

21:49 Duke`` has quit [Ping timeout: 480 seconds]

21:52 <hays_> anarsoul: i think i downloaded it from the rockchip website... so that's not firm confirmation but at least some people can redistribute

21:53 <hays_> alyssa: heh that screenshot is disorienting but i think i get the picture :)

21:57 <hays_> what is the relationship between these drivers and a BSP which appears to include a driver blob with patches to various userland apps most notably xorg

21:58 fab has quit [Quit: fab]

21:59 <hays_> crud like this https://github.com/JeffyCN/xorg-xserver

22:00 maxzor has quit [Ping timeout: 480 seconds]

22:00 <hays_> earlier mpv was under discussion: https://github.com/JeffyCN/mpv (these are i think rockchip forks/trees)

22:01 gouchi has quit [Remote host closed the connection]

22:02 <hays_> and how do you teach the kernel about mali_csffw.bin? is that straight up reverse engineering?

22:02 <mareko> alyssa: nobody had the energy to update all drivers when "start" was added

22:04 <mareko> if a driver can't handle sampler state start > 0, we just assume it's broken

22:13 ahajda_ has quit []

22:14 <alyssa> mareko: OK. The issue isn't start > 0, but not unbinding samplers [start + num, ...)

22:15 <alyssa> which matters for how ctx->num_samplers[stage] is calculated

22:15 <alyssa> hays_: We're trying to do the right thing. Mainline kernel driver, upstream Mesa driver, no vendor trees. That takes more time than shipping blobs.

22:18 <hays_> yeah im just trying to understand the landscape. it almost seems like the vendors have patched various OSS projects to work with the blob, and ship that as a Yocto layer or whatever, but unclear what happens next to all of those forks

22:18 <hays_> not limited to just gpu, but seem to be blobs for many other chips as well

22:18 <alyssa> they live a short life with an old kernel and then people ship to mainline when mainline is ready

22:21 <hays_> yeah and maybe you were saying this earlier, for the rk3588 the kernel seems to be crazy old so a heavy lift on that side

22:22 ybogdano has quit [Ping timeout: 480 seconds]

22:24 bskica has joined #dri-devel

22:29 <alyssa> I really wish people held off on buying rk3588 for a year or so

22:29 <alyssa> the software side really isn't ready

22:30 <alyssa> actually I wish people held off on all new hardware a year or so.

22:30 <psykose> whoa it's A76

22:30 <psykose> at last something not a72 /s

22:30 <alyssa> (people being "anyone not developing drivers for that hardware" I mean)

22:32 <danvet> alyssa, users, they're a menace :-P

22:35 <alyssa> danvet: am user, can confirm

22:39 dcz_ has quit [Ping timeout: 480 seconds]

22:41 apinheiro has quit [Quit: Leaving]

22:42 rasterman has quit [Quit: Gettin' stinky!]

22:45 danvet has quit [Ping timeout: 480 seconds]

23:15 <DemiMarie> alyssa: when you said you had code exec, my first thought was that there was a buffer overflow or similar allowing sig checks to be bypassed

23:16 <DemiMarie> if I may make a suggestion: have you considered using a safe language (such as Ada or Rust) for the rewritten firmware?

23:24 pcercuei has quit [Quit: dodo]

23:37 rsalvaterra has quit []

23:38 rsalvaterra has joined #dri-devel

23:39 <alyssa> it'd be in Rust, yes

23:40 <alyssa> less for safety and more because none of us want to write C code :p

23:50 <Kayden> alyssa: I think the docs for bind_sampler_state should drop that comment. I don't know why anyone would put a note in the docs saying "BTW, we have these parameters, but we don't use them, so feel free to ignore them and write bugs, haha!". if they're really not used, they should be eliminated. but they are actually used, so the docs are wrong

23:50 <Kayden> at least at a glance it looks like zink is handling [start_slot, start_slot + num_samplers). iris does too

23:51 <Kayden> samplers[] has at least count samplers, and samplers[i] maps to driver sampler [start_count + i]

23:51 <Kayden> if samplers[i] is null it's an unbind

23:56 fxkamd has quit []

23:57 <alyssa> yeah, I think Iris is doing the right thing

23:57 <alyssa> the biggest bug with panfrost was incorrectly calculating sampler_count

23:57 <alyssa> but Iris just.. doesn't calculate that ;-p

23:58 <alyssa> thanks to using blorp it doesn't need to. u_blitter needs it.

23:58 <alyssa> if we all agree the docs are wrong, could someone send a doc patch? thanks