ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
dliviu has quit [Ping timeout: 480 seconds]
dliviu has joined #dri-devel
jewins has joined #dri-devel
danvet has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: dodo]
mhenning has joined #dri-devel
orbea1 has joined #dri-devel
orbea has quit [Ping timeout: 480 seconds]
orbea1 has quit []
<mareko> what defines __POPCNT__? I see in Mesa, but I don't see it defined
orbea has joined #dri-devel
<HdkR> mareko: Compiler
<HdkR> GCC defines it if the x86 host supports popcnt, so sse4.2 or abm
<mareko> it's not defined and I compile on zen2
<HdkR> march=native doesn't hit it? Guess it needs -mpopcnt explicitly which is an...oversight?
<mareko> it's not supported on all x86_64 CPUs
<mareko> or some of them rather
<airlied> yeah it needs sse4
<HdkR> Right, not all x86-64 CPUs support SSE4.2 or ABM :P
<airlied> SSE4a I think it is :-)
<HdkR> woof
<HdkR> ARM finally introduced a real popcount instruction in their 2023 ISA extension
<HdkR> prior to that you need to do GPR->FPR move, do popcount in vector instruction, then move back
<mareko> __builtin_popcount emulates it
<airlied> yeah the builtin should wrap it
<HdkR> https://godbolt.org/z/46jG4fM48 Indeed, it'll do the silly dance
<HdkR> -march=armv9.4-a will finally switch it to a single instruction op
<mareko> HdkR: how to know whether __builtin_bitcount doesn't emulate it? __POPCNT__?
<mareko> oh yeah
<HdkR> That's only defined on x86. ARM land you don't get any indication
<HdkR> Actually, now I'm surprised there isn't a CSSC extension define for this
<HdkR> Looks like they forgot to add `__ARM_FEATURE_CSSC`
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has joined #dri-devel
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
<kisak> wow, that's mean ... in the persuit of trying to get my build environment conpatible with llvm-spirv-15, I updated spirv-tools to 2022.3-1, but surprise, the newer version doesn't provide libSPIRV-Tools.so which is used by glslangValidator. Do I just rebuild glslang for it not to use the missing shared library?
<kisak> (glslangValidator is called by the mesa build)
<airlied> kisak: spirv tools doesn't always build shared by default, some distro packages have hacks
<airlied> also some hacks stop working from time to time
<kisak> this case is that the runtime dependency package kicked the .so out from under glslang, for https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956510
<kisak> I think the right answer here is to just rebuild and test
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
Company has quit [Quit: Leaving]
jewins has quit [Ping timeout: 480 seconds]
camus has joined #dri-devel
camus has quit [Read error: Connection reset by peer]
camus has joined #dri-devel
Danct12 has joined #dri-devel
Danct12 has quit []
Danct12 has joined #dri-devel
<kisak> A no change rebuild fixed the snafu.
egbert has quit [Remote host closed the connection]
<kisak> a day later of banging my head against the wall, an intel-clc enabled bionic build completed. Too bad I don't have any hardware to test if it actually works.
jkrzyszt has quit [Ping timeout: 480 seconds]
<anarsoul> hey folks, I'm trying to debug https://gitlab.freedesktop.org/mesa/mesa/-/issues/7862
<anarsoul> mpv CPU usage jumps x4 if it uses GL_ARB_buffer_storage
<anarsoul> perf says that most time is spent in decoding. Do I assume correctly that returning a pointer to write-combined BO from transfer_map() may not be a good idea?
<airlied> anarsoul: yes generally not a great plan
<anarsoul> how is it handled in other drivers?
egbert has joined #dri-devel
<airlied> anarsoul: copy to a staging cached one?
<anarsoul> how is it supposed to work for resources mapped with persistent | coherent?
Kayden has quit [Quit: Leaving]
Kayden has joined #dri-devel
<airlied> anarsoul: you are mean to have coherent cpu/gpu access to the resource
<airlied> on x86 that typically means the gpu is snooping on the cpu caches
<anarsoul> I mean when do I synchronize the copy if it's mapped as persistent and coherent, if coherent doesn't require memory_barrier() to be called?
<airlied> yeah you can't do a copy for coherent
<airlied> coherent mappings might be on the reasons GLES doesn't have GL_ARB_buffer_storage afaik
Danct12 has quit [Quit: Quitting]
Danct12 has joined #dri-devel
<anarsoul> I guess the easiest workaround would be hiding PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT on lima behind a debug flag
maxzor_ has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
ran has joined #dri-devel
ran has quit [Remote host closed the connection]
lemonzest has joined #dri-devel
<Lynne> airlied: ping, could you look at the ref frame problem? I'm still working on encoding
<airlied> Lynne: been staring at it, but in the sense of void, not in a useful sense
<Lynne> stare a bit into encoding if that clears your mind
<Lynne> on a related note, the encoding api has some issues that have to be addressed before it's stable
<airlied> the video I'm looking at doesn't even decdoe it's first frame properly
<Lynne> make sure B-frames are missing
<airlied> that nvidia fix looks very wrong though
<Lynne> no AFT = frames should be printed
<Lynne> ideally, the first frame should not have any refernces at all
<airlied> oh I thought we had B-frames :)
<Lynne> most hardware encoders generate that
<DavidHeidelberg[m]> <anarsoul> "I guess the easiest workaround..." <- Having different codepath on release and debug sounds bad. What about enable it with driver related variable?
<Lynne> nope, let's tackle those after we get P-frames working, though they should just work once we do that
<Lynne> anyway, with encoding, the intention behind VK_VIDEO_ENCODE_RATE_CONTROL_MODE_NONE_BIT_KHR seems to be to let users implement their own rate control (as well they should), and letting them set quantizers
<Lynne> but there's no way for users to override the frame type the encoder uses
<Lynne> apart from doing a full RESET, but that's a nuclear option
<Lynne> if VK_VIDEO_ENCODE_RATE_CONTROL_MODE_NONE_BIT_KHR, there ought to be an enum in the same struct that can override the frame type used
<airlied> Lynne: still seems like slot index is wrong
<Lynne> I copied what nvdec does, since it's the closest API-wise to vulkan
<Lynne> if it's any consolation, av1 should be even simpler than h264, thought will require some invasive spec changes
<Lynne> (av1 requires you to be able to feed packets into an encoder without being able to output a frame)
aravind has joined #dri-devel
rsalvaterra_ has joined #dri-devel
rsalvaterra is now known as Guest1853
rsalvaterra_ is now known as rsalvaterra
Guest1854 has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
bmodem has joined #dri-devel
SanchayanMaity_ has quit []
SanchayanMaity has joined #dri-devel
srslypascal is now known as Guest1859
srslypascal has joined #dri-devel
Guest1859 has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
Jeremy_Rand_Talos__ has quit [Remote host closed the connection]
Jeremy_Rand_Talos__ has joined #dri-devel
mhenning has quit [Quit: mhenning]
itoral has joined #dri-devel
fab has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
bgs has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
fab has quit [Quit: fab]
gouchi has joined #dri-devel
gouchi has quit [Remote host closed the connection]
<airlied> Lynne: my branch + https://paste.centos.org/view/raw/7af69b8c seems to get it working here
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
<airlied> or at least seems to blow up a bit later for me
Jeremy_Rand_Talos_ has joined #dri-devel
Jeremy_Rand_Talos__ has quit [Remote host closed the connection]
<airlied> Lynne: good to know if it helps on the nvidia side
bgs has quit [Remote host closed the connection]
<airlied> Lynne: but still not sure I've wrapped my head about how it's meant to work yet!
rasterman has joined #dri-devel
frieder has joined #dri-devel
tursulin has joined #dri-devel
fab has joined #dri-devel
jfalempe has joined #dri-devel
Jeremy_Rand_Talos_ has quit [Read error: Connection reset by peer]
Jeremy_Rand_Talos_ has joined #dri-devel
danvet has joined #dri-devel
tzimmermann has joined #dri-devel
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
narmstrong_ has quit []
narmstrong has joined #dri-devel
sgruszka has joined #dri-devel
sarahwalker has joined #dri-devel
swalker_ has joined #dri-devel
swalker_ is now known as Guest1869
dcz_ has joined #dri-devel
sarahwalker has quit [Ping timeout: 480 seconds]
lynxeye has joined #dri-devel
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
vliaskov has joined #dri-devel
jkrzyszt has joined #dri-devel
ra has joined #dri-devel
ra has quit []
anon_apple has joined #dri-devel
mvlad has joined #dri-devel
anon_apple has quit []
apinheiro has joined #dri-devel
kts has joined #dri-devel
ahajda_ has joined #dri-devel
pcercuei has joined #dri-devel
fab has quit [Read error: Connection reset by peer]
fab_ has joined #dri-devel
fab_ is now known as Guest1875
repetitivestrain has quit [Read error: Connection reset by peer]
MajorBiscuit has joined #dri-devel
camus has quit [Remote host closed the connection]
camus has joined #dri-devel
heat has joined #dri-devel
kts has quit [Quit: Leaving]
jkrzyszt has quit [Remote host closed the connection]
devilhorns has joined #dri-devel
jkrzyszt has joined #dri-devel
tintou has joined #dri-devel
<tintou> Is there anything happening to the GitLab runners? My MR seems to be stuck https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20277
camus1 has joined #dri-devel
<daniels> tintou: most of them are fine, but Windows is currently stuck behind some long-running GStreamer jobs
camus has quit [Ping timeout: 480 seconds]
maxzor_ has joined #dri-devel
maxzor__ has joined #dri-devel
maxzor_ has quit [Ping timeout: 480 seconds]
Haaninjo has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
jkrzyszt has quit [Remote host closed the connection]
devilhorns has quit [Remote host closed the connection]
Company has joined #dri-devel
devilhorns has joined #dri-devel
heat has quit [Read error: No route to host]
heat has joined #dri-devel
jkrzyszt has joined #dri-devel
junaid has joined #dri-devel
junaid has quit [Remote host closed the connection]
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
djbw has quit [Read error: Connection reset by peer]
jkrzyszt has quit [Remote host closed the connection]
Danct12 has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
aravind has joined #dri-devel
itoral has quit [Remote host closed the connection]
jkrzyszt has joined #dri-devel
Kayden has quit [Quit: Leaving]
Kayden has joined #dri-devel
Akari has joined #dri-devel
<jenatali> daniels: I don't suppose you've come up with any way to un-prioritize those jobs?
fxkamd has quit []
fxkamd has joined #dri-devel
<daniels> jenatali: GitLab doesn't give us a way :( we just need to keep leaning on them to try to optimise them somehow
<daniels> or get another machine
<jenatali> :(
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
maxzor__ has quit [Ping timeout: 480 seconds]
jagan_ has joined #dri-devel
pekkari has joined #dri-devel
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
jkrzyszt has joined #dri-devel
jewins has joined #dri-devel
fxkamd has quit []
fxkamd has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
cwabbott_ has quit []
maxzor__ has joined #dri-devel
cwabbott has joined #dri-devel
dcz_ has quit [Ping timeout: 480 seconds]
<cwabbott> do any WSI people know if I'd need to define a modifier for subsampled images on Qualcomm?
<cwabbott> it's an already-existing concept in VK_EXT_fragment_density_map, which is a normal image combined with some metadata telling the user to sample at reduced resolution in some regions
Guest1875 has quit []
<cwabbott> on qualcomm the implementation is pretty much entirely by the driver, so the format of the metadata is up to the driver
<cwabbott> the usecase for sharing would mainly be passing a subsampled image to a VR compositor to do barrel distortion
aravind has quit [Ping timeout: 480 seconds]
<cwabbott> it seems like things would already Just Work if producer and consumer both pretend it's a normal image but pass the subsampled bit when creating it, how we implement the metadata never changes, etc.
bgs has joined #dri-devel
kts has joined #dri-devel
kts has quit []
Leopold_ has quit [Remote host closed the connection]
kts has joined #dri-devel
Leopold_ has joined #dri-devel
sgruszka has quit [Ping timeout: 480 seconds]
JohnnyonFlame has joined #dri-devel
mbrost has joined #dri-devel
heat_ has joined #dri-devel
heat has quit [Read error: Connection reset by peer]
jkrzyszt has quit [Remote host closed the connection]
<jekstrand> cwabbott: Uh... Maybe?
<jekstrand> cwabbott: Depends on how much auto-magic you expect there to be.
<jekstrand> cwabbott: Is it going to be re-imported as a fragment density image? Or are you expecting it to be magic metadata that gets used by the sampler?
psykose has quit [Remote host closed the connection]
psykose has joined #dri-devel
<cwabbott> jekstrand: it would be re-imported and used as a subsampled image in the compositor
<jekstrand> cwabbott: If it's just a matter of sharing an R8 image that's going to be used as a fragment density map on both sides, you don't need anything special, I don't think, unless it has some special tiling or something like that.
<jekstrand> If it's being handled as a metadata plane attached to a shared color image, then you probably need a modifier for it.
<cwabbott> a fragment denisty map isn't anything special
<cwabbott> a subsampled image has a separate metadata plane (sort-of) with the density to sample at
<cwabbott> that's derived from but not the same as the fragment density map
jkrzyszt has joined #dri-devel
Duke`` has joined #dri-devel
<cwabbott> the FDM is just a normal 2-channel image that gets added as an extra attachment to the render pass
tzimmermann has quit [Quit: Leaving]
<jekstrand> Yeah, that sounds like a metadata plane which needs a modifier
fab has joined #dri-devel
maxzor__ has quit [Ping timeout: 480 seconds]
<emersion> how does it interact with other existing Qualcomm modifiers?
<emersion> a buffer can only have a single modifier
<cwabbott> that wouldn't be so bad I guess
<cwabbott> emersion: there's only one, for UBWC, so I guess we'd have to duplicate them
<emersion> a common startegy is to reserve bits for each use
<emersion> but you can just duplicate and come up with a bit layout later
<emersion> just don't paint yourself in a corner, it's good to think about future uses and extensibility
<jekstrand> It's also really easy to over-think it like the NV modifiers which have bits reserved for mipmapping as if that'll ever happen.
<flto> cwabbott: comparing UUIDs and creating the image with identical parameters is enough, you don't need a modifier (unless your use-case actually involves passing the images through WSI, which doesn't seem like a good idea)
<cwabbott> flto: I was asking the right way to do it, not whether it would work
<cwabbott> of course that would work but comparing UUIDs etc. is not expected usage for the dma-buf extensions
<emersion> the compositor and its clients might be running different versions of the drivers
ybogdano has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
<emersion> cwabbott: is this Qualcomm-specific or not?
<cwabbott> emersion: iirc ARM also does subsampled images but I think their implementation is a bit different
<emersion> VK_EXT_fragment_density_map makes it sound like it's not, but i know nothing about this stuff
<cwabbott> ARM and Qualcomm are the two that implement VK_EXT_fragment_density_map
<emersion> hm, and VK_EXT_fragment_density_map is only about the shader invocation i suppose?
<emersion> not the buffer itself?
<cwabbott> it defines how to create a subsampled image, too
<cwabbott> I'm not sure if anyone thought about passing it between processes
<flto> cwabbott: it is not a "wrong" way to use dma bufs either (VK_EXT_external_memory_dma_buf can exist without VK_EXT_image_drm_format_modifier)
<cwabbott> flto: you can't bind an image without VK_EXT_image_drm_format_modifier
<cwabbott> so no, it is a wrong way
<cwabbott> on linux, you cannot get guarantees about image layout matching across processes without modifiers
camus1 has quit [Remote host closed the connection]
camus has joined #dri-devel
junaid has joined #dri-devel
<cwabbott> on ARM, I think subsampled images are like mipmapped images and there's still a metadata but it tells which "mip" to use
<cwabbott> not sure if it literally is mipmapped or not
jessica_248 is now known as jessica_24
<flto> cwabbott: dma bufs are just a type of external memory.. I don't think there's an exception that you can't use dma bufs for images like other external memory types
camus has quit [Ping timeout: 480 seconds]
<cwabbott> flto: I don't think it provides any guarantees that it'll work either, and in practice it won't be accepted upstream because no one does that
<cwabbott> the intended way of handling this is modifiers
<flto> cwabbott: there are guarantees that external memory can be used for images with matching UUIDs
lynxeye has quit [Quit: Leaving.]
cheako_ has quit []
cheako has joined #dri-devel
frieder has quit [Remote host closed the connection]
Leopold_ has quit [Ping timeout: 480 seconds]
alyssa has joined #dri-devel
<alyssa> Is it legal to call bind_sampler_states multiple times to update subranges?
<alyssa> Gallium docs say no:
Leopold_ has joined #dri-devel
jkrzyszt has joined #dri-devel
<alyssa> sampler states are bound... with the ``bind_sampler_states`` function. The ``start`` and ``num_samplers`` parameters indicate a range of samplers to change. NOTE: at this time, start is always zero and the CSO module will always replace all samplers at once (no sub-ranges). This may change in the future.
<alyssa> But Nine does it anyway.
<alyssa> Are the docs out of date? or is nine buggy?
<alyssa> softpipe/llvmpipe explicitly handle the subrange case
<alyssa> freedreno implicitly does
<alyssa> zink/v3d do not handle the case
<alyssa> Either we need to update the docs and then update n-3 drivers to handle subranges
<alyssa> or we need to update nine
djbw has joined #dri-devel
<alyssa> mareko: DavidHeidelberg[m]: robclark: Kayden: ^
mbrost has quit [Ping timeout: 480 seconds]
<alyssa> q4a[m]: sent a patch to make panfrost honour subranges like softpipe, which fixes nine rendering in a game
<alyssa> but merging the patch on its own seems surely wrong -- either docs + other drivers also need to be patched, or nine does.
<robclark> alyssa: `unsigned p = i + start`
<alyssa> robclark: The issue is what happens to existing samplers [start + num, ...)
<alyssa> are they implicitly unbound?
q4a has joined #dri-devel
vhebert has left #dri-devel [#dri-devel]
<alyssa> if not, the driver has to be a lot more careful when calculating num_samplers
<robclark> they are unchanged
<robclark> I'm pretty sure this topic comes up every couple of years ;-)
<alyssa> okay.. so it is zink and v3d and panfrost and etnaviv and the docs that are wrong?
<robclark> I guess the part about it "always zero" is wrong in the docs, if nine uses it
<alyssa> seemingly
<anarsoul> DavidHeidelberg[m]: on the other hand, we already have MESA_EXTENSION_OVERRIDE
tursulin has quit [Ping timeout: 480 seconds]
<jenatali> Seems it was a change at some point
mbrost has joined #dri-devel
<anarsoul> alyssa: out of curiosity, have you tested mpv on panfrost with PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT exposed?
flibit has quit []
flibitijibibo has joined #dri-devel
<anarsoul> I expect it to have the same issue as lima on platforms where it uses write-combined BOs
JohnnyonFlame has quit [Read error: No route to host]
Guest1869 has quit [Remote host closed the connection]
devilhorns has quit []
<alyssa> anarsoul: I don't think I have, no
<alyssa> What's the issue on lima?
<alyssa> jenatali: I see
junaid has quit [Ping timeout: 480 seconds]
<alyssa> ughhh
abhinav__3 is now known as abhinav__
<alyssa> not sure what the driver is supposed to do
<alyssa> they asked for a direct map of a persistent coherent buffer, we gave them one
ybogdano has quit [Ping timeout: 480 seconds]
<anarsoul> yeah, now they are decoding the video directly into the buffer
<alyssa> again I don't see how this is Mesa's fault
<anarsoul> the spec doesn't say anything on whether the buffer is expected to be cached or not
<alyssa> i guess if we're at a stalemate with mpv, we can driconf it..
<alyssa> frustrating though.
<anarsoul> I wonder if mpv devs would consider adding a command line (and config) option for that
<anholt_> one should expect the buffer to be uncached. it would be very rare for a persistent coherent mapped buffer to be cached. sounds like a busted app.
junaid has joined #dri-devel
<anarsoul> anholt_: it's mpv
kts has quit [Quit: Leaving]
MajorBiscuit has quit [Ping timeout: 480 seconds]
ybogdano has joined #dri-devel
<anarsoul> I believe if they used a staging buffer for decoder and memcpy-ed their decoded image into the buffer it would be faster than doing glTexSubImage()
bmodem has quit [Ping timeout: 480 seconds]
Akari has quit [Quit: segmentation fault (core dumped)]
junaid has quit [Ping timeout: 480 seconds]
junaid has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.6]
dcz_ has joined #dri-devel
<airlied> anholt_: huh? pretty sure they are cached on x86
<airlied> esp with pcie gpus
<jekstrand> Yeah, typically it'd be WC if you get an actual VRAM map and cached if it's in system RAM.
<jekstrand> On Mali, I'd hope you at least can do WC for staging stuff but IDK how the memory works there.
<jekstrand> On Intel, we had it really nice with shared LLC.
<alyssa> jekstrand: panfrost and presumably lima map everything WC
<alyssa> everything
<alyssa> the downstream Arm driver has more caching/coherency flags available but that was never ported upstream
<alyssa> ad I'm not even sure what we would do with them in either GL or VK
<jekstrand> Yeah, IDK either, TBH.
<jekstrand> Maybe you can do something better than WC for CPU reads but if your app is CPU-read-of-GPU-mem-bound, you've got problems.
* jekstrand chuckles in vkCmdDrawIndexed()
<airlied> alyssa: there is no gpu snoops the cpu cache?
<HdkR> You're lucky to have that in ARM land :P
<graphitemaster> Is it a safe assumption that gl_LocalInvocationID / 32 (or 64) is the same as gl_SubgroupID? Like do most implementations that execute subgroups in lock-step allocate them in terms of full warps/waves?
<alyssa> airlied: "full system coherency" -- Mali supports that architecturally but not all(?) mali SoCs do
<alyssa> and even with full system coherency, I think WC slows down reads and helps streaming writes
<jekstrand> graphitemaster: Safe? Not necessarily. But with the right Vulkan version/extensions it is.
<graphitemaster> jekstrand, This is more for a fallback for when a GPU does not advertise the KHR extension and only the ARB ballot one.
<jekstrand> Yeah, no, nothing's safe there.
<alyssa> jekstrand: we don't joke about indexed draws on bifrost
pekkari has quit [Quit: Konversation terminated!]
<graphitemaster> Okay, maybe not safe, but on a scale of like 0 it won't work to 10 it will always work, what are we looking at here?
<jekstrand> alyssa: No, we just joke about all of bigrost. :P
<alyssa> graphitemaster: 7?
<alyssa> IDK
<graphitemaster> Hay that's pretty good!
<jekstrand> graphitemaster: I was going to go 9 but that 1 will bite you.
<graphitemaster> So far you two have given me a range of 70 to 90% success rate, I can take those chances.
<alyssa> jekstrand: uh oh
<alyssa> jekstrand: also, I'm pretty sure Arm just kinda retconned Bifrost out of existence
<jekstrand> graphitemaster: I'll give you a 98% chance someone will file a bug report, though. :)
<alyssa> Utgard -> Midgard -> Valhall is the official progression (-:
<airlied> alyssa: so you pick full coherent for coherent maps and wc for others id assume
<alyssa> oh, that's fair
<alyssa> still requires UAPI changes which i am not excited for
<alyssa> although I think these are properties of the BO, not the transfer..
<graphitemaster> jekstrand, The number of GPUs that don't support other vendor or the KHR extension already reduces the chances of even hitting a ARB_ballot only fallback, so Bayesian says 0% :P
<jekstrand> graphitemaster: Or it makes the bug report that much harder to reproduce. :P
<jekstrand> If there's that few of them, why have the fallback at all?
<agd5f> airlied, anholt_ PCI spec says device access should be cache coherent with the CPU. Non-coherent access is an optional feature which that platform can provide at it's discretion.
<anholt_> agd5f: device access of main mem, right?
mbrost has quit [Ping timeout: 480 seconds]
<anholt_> but not cpu accesses of vram
flto has quit [Ping timeout: 480 seconds]
<anholt_> so, mapping some gpu buffer and assuming you'll get cached performance is super wrong.
<agd5f> anholt_, yes, device access to CPU memory
flibitijibibo has quit [Quit: Leaving]
<agd5f> I.e., pci devices should snoop the CPU caches by default
<alyssa> anholt_: i assume mpv has never seen a device that uses the GPU but not the VPU, gets WC persistent mappings from the GPU, and has a CPU slow enough to notice the problem
<alyssa> without all 4 of those, you don't get this
epoll has quit [Ping timeout: 480 seconds]
<anholt_> "video playback on linux just eats a ton of cpu" is unfortunately normal, it's true.
<alyssa> perf was fine before by accident, I guess ... wiring up ARB_buffer_storage to lima led to jenga
<airlied> yeah i question wiring up buffer storage on non coherent platfor.s
<alyssa> brrrrr
<alyssa> maybe I should just stop writing code
<alyssa> that way I can't introduce any more regressions
<agd5f> in practice, it seems like x86 and PPC are the only platform with PCIe that seem to get this right. IIRC ARM requires some optional IP that no one uses and most others just assume non-coherent
epoll has joined #dri-devel
<alyssa> yeah most Arm hw is PCIe-deficient
<agd5f> ARM is actually probably 50/50. Depends on the SoC
<alyssa> fair
<DemiMarie> To what extent will i915 VFs be able to communicate directly with the GuC?
<airlied> don't think that is the design yet
<DemiMarie> That is good. I was worried that VFs could submit commands via the GuC and the GuC would parse them.
<DemiMarie> Even if those commands are restricted to unprivileged ones, there is still the worry of memory unsafety.
<DemiMarie> On the other hand, if the VFs can only talk to hardware, I am less worried.
<alyssa> ew, GuC :p
<airlied> actually not sure with SRIOV what the design is
junaid has quit [Ping timeout: 480 seconds]
<DemiMarie> Why did Intel start requiring the GuC anyway?
<alyssa> yeah what the GuC is up with that
<airlied> because their execlist driver was vendored beyond maintainable
mbrost has joined #dri-devel
<airlied> and their windows driver was moving that directin
<danvet> DemiMarie, vf submits to guc like pf
<danvet> like that's why this thing exists pretty much
<anarsoul> alyssa: perf was suboptimial even without ARB_buffer_storage
junaid has joined #dri-devel
<alyssa> anarsoul: apparently i wasn't even the one to hook up that CAP for lima, this one ain't on me :-D
<anarsoul> alyssa: yeah, that was me :)
<alyssa> :-D
<anarsoul> yet I'm not sure if it's better to disable it, driconf it for mpv or keep it as is
<alyssa> best is to patch mpv i suppose
<alyssa> driconf second best
<anarsoul> I'm hesitant to touch mpv code
<anarsoul> do we happen to have any mpv maintainers here? :)
<psykose> could open an issue on mpv they all read it
<DemiMarie> airlied: “vendored beyond maintainable”?
<DemiMarie> danvet: do you mean that VFs are why the GuC is present? Ouch.
<danvet> not entirely, but the vf design entirely relies on guc
<DemiMarie> marmarek: how worried are you about GuC firmware vulns?
<danvet> at least how you drive it, I have no idea how much of the isolation is hw assisted
<DemiMarie> Does the GuC firmware validate its inputs properly?
bgs has quit [Remote host closed the connection]
<airlied> DemiMarie: it's firmware, so yes it does, except when it doesn't and there's a CVE
<DemiMarie> airlied: that is obviously true, but not an answer 😆
<danvet> airlied, sometimes it's also a shrödinger CVE
<DemiMarie> What is that?
<danvet> only stops being a superposition when someone looks
<danvet> then it either becomes correct code or a real cve :-)
<DemiMarie> I guess worst case is that a patched driver could force everything to be proxies through the host.
<DemiMarie> Which could validate the command stream
<danvet> nah, can't do that anymore
<DemiMarie> Why?
<danvet> we had to on gen7, and nowadays the cmd stream is just too flexible that you can parse it without time-of-check vs -use issues
<DemiMarie> What do you mean? Too many pointers?
<danvet> like for rtx the gpu actually creates some of the batches and chains them, and I think media does similar stuff
<DemiMarie> Userspace direct submit is a nightmare
<danvet> but also cmdparser checking would mean the mmu is busted
<danvet> at that point it's hopeless
<DemiMarie> What do you mean?
<danvet> if it's just guc, then you "just" virtualize the guc queues, those are fairly simple
<danvet> DemiMarie, the stuff I mentioned pretty much means you either have a working mmu to separate userspace cmd submission
<danvet> or you don't have a secure gpu
rmckeever has joined #dri-devel
<jenatali> Yep
<danvet> and so cmd parse would boil down to "implement the gpu in sw"
<danvet> yeah so hopefully it actually works :-)
<DemiMarie> danvet: I trust Intel to get the MMU right. It's buffer overflows in fw I am worried about.
<danvet> agd5f, question was more about VF/PF and fw stuff, which is another can of worms from userspace direct submit
<DemiMarie> Hence why I was specifically asking about the GuC.
<DemiMarie> For instance Apple GPU fw does not validate its inputs at all
<agd5f> Guc or similar is also a solution to the GPU scheduling
gouchi has joined #dri-devel
<danvet> agd5f, yeah, but at least with ours userspace never talks to guc directly
<danvet> it's only the kernel driver, and then guc just schedules the contexts which are (per the kernel driver) marked as runnable
<danvet> (we're not yet at hw doorbell actually working)
<danvet> so guc never (currently at least) sees anything created by userspace, only the kernel
<alyssa> danvet: oh this is going to be fun
<anarsoul> psykose: done
<alyssa> the new mali's are based around userspace submission with the doorbell piped directly to userspace
<psykose> anarsoul: saw :) nice
<alyssa> and firmware---userspace sync with the kernel completely out of the loop
<alyssa> Arm has this working in their downstream non-DRM kernel driver and their blob GL/VK
flibitijibibo has joined #dri-devel
<alyssa> I get dizzy thinking about what this means for mainline
<danvet> alyssa, ask jekstrand
<airlied> yeah then weap in a corner
<jenatali> That's the eventual goal for Windows too IIRC
<alyssa> airlied: that's basically what jekstrand said
<alyssa> jenatali: fwiw macOS does NOT do this at least for M1
<alyssa> and i have yet to see compelling evidence that it actually matters.
<jenatali> Yeah, I said eventual. We're not there yet :)
warpme_____ has joined #dri-devel
Akari has joined #dri-devel
flto has joined #dri-devel
apinheiro has quit [Ping timeout: 480 seconds]
<DemiMarie> alyssa: can you continue to not do that?
<alyssa> DemiMarie: what's this replying to?
<alyssa> I made a lot of comments about things I should continue to not do :-p
<DemiMarie> Because in Qubes one of our goals is secure GPU virt, and ( marmarek feel free to correct me) direct guest-firmware interaction makes this very hard
<alyssa> Oh, that's a very interesting point
<alyssa> Yes, we can continue to use the existing "submit ioctl" model
<alyssa> In effect, the stuff that Arm designed to run in userspace will just run in kernel space instead
<alyssa> with a possible performance cost due to extra ctx switching but I find it hard to be too bothered by that right now
<alyssa> I think we will support that model for the hw's lifetime regardless of if we ever add direct user-fw submit as a fast path later
<DemiMarie> My worry is that future hardware will require the fast path.
<alyssa> yes, I hear you
<DemiMarie> If I trusted the GPU firmware to do proper input validation that would be a totally different story.
<alyssa> I don't and you shouldn't
<DemiMarie> Have you found cases where it didn't?
<DemiMarie> danvet: could i915 virtualize the GuC queues? Would this be good from a security perspective?
<alyssa> DemiMarie: it's a blob, ain't like we've audited that thing
mbrost has quit [Ping timeout: 480 seconds]
pixelcluster has quit [Remote host closed the connection]
pixelcluster_ has joined #dri-devel
pixelcluster_ has quit [Remote host closed the connection]
pixelcluster has joined #dri-devel
junaid has quit [Ping timeout: 480 seconds]
hays_ has joined #dri-devel
mbrost has joined #dri-devel
maxzor has joined #dri-devel
<DemiMarie> alyssa: anyone considered clean-room REing an open-source replacement?
<DemiMarie> alyssa: do you fuzz the kernel driver?
<alyssa> FOSS replacement for that blob is definitely a possibility, yeah
<alyssa> I started work on it but right now more interested in supporting the hardware in mainline at all
<agd5f> alyssa, likely hw won't load it unless it's signed
<alyssa> agd5f: it will, we have code execution on the thing
<hays_> Is the Mali-G610 MP4 3D GPU under the provenance/support of panfrost?
<alyssa> hays_: Yes, that's the hardware under discussion right now
<hays_> joining a bit late
<alyssa> It isn't supported upstream yet, but we're working (actively) on it
<alyssa> If you need GPU acceleration, I recommend against buying RK3588 boards until there's upstream support
<hays_> I know next to nothing about the details, but I have the hardware and should be in a position to test soon
<alyssa> (Even aside from GPU, the mainline support isn't there yet for everything else)
<hays_> yeah im very aware. haha :)
mvlad has quit [Remote host closed the connection]
* CounterPillow nervously glanced at the unimplemented clock gating stuff
<alyssa> woof
<alyssa> I'd really rather not support the downstream kernel in upstream Mesa, so that's also a complicating factor for Mesa support
<alyssa> and in an unusual twist, the hard part is going to be kernel support, not Mesa
<alyssa> the Mesa side is "basically" identical to Mali-G57
srslypascal has quit [Ping timeout: 480 seconds]
<alyssa> (and by "basically" I mean a 5KLOC branch but shhh a big chunk of that is copy/paste...)
<alyssa> all of this is coming soon to upstream but not there yet
<CounterPillow> anarsoul: psykose: thanks for the mpv bug report, we're currently discussing disabling DR by default until we have an implementation that uses staging buffers. Pls don't driconf us
<anarsoul> CounterPillow: thanks for looking into it!
<alyssa> CounterPillow: sorry for suggesting driconf, I might've been a bit too trigger happy there ;-p
<hays_> alyssa: is the idea to use the mali_csffw.bin firmware or to avoid it
<alyssa> hays_: right, that's what we were discussing a few minutes ago
<hays_> i must have just missed it
<alyssa> yeah
<hays_> sorry
<alyssa> no worries
<alyssa> at a technical level, it's possible to avoid
<alyssa> there's no sig checking, we have code exec on the MCU
<alyssa> having *some* firmware is mandatory but it doesn't have to be Arm's
<alyssa> That said -- we expect the mali_csffw.bin <--> kernel interface to remain relatively stable, but the physical firmware <--> hardware interface to change unpredictably for future Malis
<alyssa> so bringing up a FOSS firmware right now seems like it might be painting ourselves into a corner
<alyssa> So I think the current plan is to teach the mainline kernel about mali_csffw.bin
<alyssa> and teach Mesa to use the mainline kernel with Mali-G610
<alyssa> after there's mainline support shipping, then we can revisit if we maybe want to free the firmware too, and I should have a lot of code to drop if fd.o folks want to pursue that
<alyssa> but we'd probably keep both code paths in the kernel if only to ease bring up of future Malis
apinheiro has joined #dri-devel
<alyssa> no sense cutting off our nose to spite our face.
<hays_> yeah all seems very sensible
<alyssa> bbrezillon: ^^
<alyssa> hays_: let's be clear they're not *my* sensible decisions ;)
<anarsoul> alyssa: does ARM allow mali_csffw.bin redistribution?
<alyssa> anarsoul: don't remember
<alyssa> hays_: if it were only up to me, well, as soon as I got my hands on the hardware https://rosenzweig.io/TheCSFMCU.png happened :-p
Duke`` has quit [Ping timeout: 480 seconds]
<hays_> anarsoul: i think i downloaded it from the rockchip website... so that's not firm confirmation but at least some people can redistribute
<hays_> alyssa: heh that screenshot is disorienting but i think i get the picture :)
<hays_> what is the relationship between these drivers and a BSP which appears to include a driver blob with patches to various userland apps most notably xorg
fab has quit [Quit: fab]
maxzor has quit [Ping timeout: 480 seconds]
<hays_> earlier mpv was under discussion: https://github.com/JeffyCN/mpv (these are i think rockchip forks/trees)
gouchi has quit [Remote host closed the connection]
<hays_> and how do you teach the kernel about mali_csffw.bin? is that straight up reverse engineering?
<mareko> alyssa: nobody had the energy to update all drivers when "start" was added
<mareko> if a driver can't handle sampler state start > 0, we just assume it's broken
ahajda_ has quit []
<alyssa> mareko: OK. The issue isn't start > 0, but not unbinding samplers [start + num, ...)
<alyssa> which matters for how ctx->num_samplers[stage] is calculated
<alyssa> hays_: We're trying to do the right thing. Mainline kernel driver, upstream Mesa driver, no vendor trees. That takes more time than shipping blobs.
<hays_> yeah im just trying to understand the landscape. it almost seems like the vendors have patched various OSS projects to work with the blob, and ship that as a Yocto layer or whatever, but unclear what happens next to all of those forks
<hays_> not limited to just gpu, but seem to be blobs for many other chips as well
<alyssa> they live a short life with an old kernel and then people ship to mainline when mainline is ready
<hays_> yeah and maybe you were saying this earlier, for the rk3588 the kernel seems to be crazy old so a heavy lift on that side
ybogdano has quit [Ping timeout: 480 seconds]
bskica has joined #dri-devel
<alyssa> I really wish people held off on buying rk3588 for a year or so
<alyssa> the software side really isn't ready
<alyssa> actually I wish people held off on all new hardware a year or so.
<psykose> whoa it's A76
<psykose> at last something not a72 /s
<alyssa> (people being "anyone not developing drivers for that hardware" I mean)
<danvet> alyssa, users, they're a menace :-P
<alyssa> danvet: am user, can confirm
dcz_ has quit [Ping timeout: 480 seconds]
apinheiro has quit [Quit: Leaving]
rasterman has quit [Quit: Gettin' stinky!]
danvet has quit [Ping timeout: 480 seconds]
<DemiMarie> alyssa: when you said you had code exec, my first thought was that there was a buffer overflow or similar allowing sig checks to be bypassed
<DemiMarie> if I may make a suggestion: have you considered using a safe language (such as Ada or Rust) for the rewritten firmware?
pcercuei has quit [Quit: dodo]
rsalvaterra has quit []
rsalvaterra has joined #dri-devel
<alyssa> it'd be in Rust, yes
<alyssa> less for safety and more because none of us want to write C code :p
<Kayden> alyssa: I think the docs for bind_sampler_state should drop that comment. I don't know why anyone would put a note in the docs saying "BTW, we have these parameters, but we don't use them, so feel free to ignore them and write bugs, haha!". if they're really not used, they should be eliminated. but they are actually used, so the docs are wrong
<Kayden> at least at a glance it looks like zink is handling [start_slot, start_slot + num_samplers). iris does too
<Kayden> samplers[] has at least count samplers, and samplers[i] maps to driver sampler [start_count + i]
<Kayden> if samplers[i] is null it's an unbind
fxkamd has quit []
<alyssa> yeah, I think Iris is doing the right thing
<alyssa> the biggest bug with panfrost was incorrectly calculating sampler_count
<alyssa> but Iris just.. doesn't calculate that ;-p
<alyssa> thanks to using blorp it doesn't need to. u_blitter needs it.
<alyssa> if we all agree the docs are wrong, could someone send a doc patch? thanks