<HdkR>
-march=armv9.4-a will finally switch it to a single instruction op
<mareko>
HdkR: how to know whether __builtin_bitcount doesn't emulate it? __POPCNT__?
<mareko>
oh yeah
<HdkR>
That's only defined on x86. ARM land you don't get any indication
<HdkR>
Actually, now I'm surprised there isn't a CSSC extension define for this
<HdkR>
Looks like they forgot to add `__ARM_FEATURE_CSSC`
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has joined #dri-devel
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
<kisak>
wow, that's mean ... in the persuit of trying to get my build environment conpatible with llvm-spirv-15, I updated spirv-tools to 2022.3-1, but surprise, the newer version doesn't provide libSPIRV-Tools.so which is used by glslangValidator. Do I just rebuild glslang for it not to use the missing shared library?
<kisak>
(glslangValidator is called by the mesa build)
<airlied>
kisak: spirv tools doesn't always build shared by default, some distro packages have hacks
<airlied>
also some hacks stop working from time to time
<kisak>
I think the right answer here is to just rebuild and test
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
Company has quit [Quit: Leaving]
jewins has quit [Ping timeout: 480 seconds]
camus has joined #dri-devel
camus has quit [Read error: Connection reset by peer]
camus has joined #dri-devel
Danct12 has joined #dri-devel
Danct12 has quit []
Danct12 has joined #dri-devel
<kisak>
A no change rebuild fixed the snafu.
egbert has quit [Remote host closed the connection]
<kisak>
a day later of banging my head against the wall, an intel-clc enabled bionic build completed. Too bad I don't have any hardware to test if it actually works.
<anarsoul>
mpv CPU usage jumps x4 if it uses GL_ARB_buffer_storage
<anarsoul>
perf says that most time is spent in decoding. Do I assume correctly that returning a pointer to write-combined BO from transfer_map() may not be a good idea?
<airlied>
anarsoul: yes generally not a great plan
<anarsoul>
how is it handled in other drivers?
egbert has joined #dri-devel
<airlied>
anarsoul: copy to a staging cached one?
<anarsoul>
how is it supposed to work for resources mapped with persistent | coherent?
Kayden has quit [Quit: Leaving]
Kayden has joined #dri-devel
<airlied>
anarsoul: you are mean to have coherent cpu/gpu access to the resource
<airlied>
on x86 that typically means the gpu is snooping on the cpu caches
<anarsoul>
I mean when do I synchronize the copy if it's mapped as persistent and coherent, if coherent doesn't require memory_barrier() to be called?
<airlied>
yeah you can't do a copy for coherent
<airlied>
coherent mappings might be on the reasons GLES doesn't have GL_ARB_buffer_storage afaik
Danct12 has quit [Quit: Quitting]
Danct12 has joined #dri-devel
<anarsoul>
I guess the easiest workaround would be hiding PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT on lima behind a debug flag
maxzor_ has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
ran has joined #dri-devel
ran has quit [Remote host closed the connection]
lemonzest has joined #dri-devel
<Lynne>
airlied: ping, could you look at the ref frame problem? I'm still working on encoding
<airlied>
Lynne: been staring at it, but in the sense of void, not in a useful sense
<Lynne>
stare a bit into encoding if that clears your mind
<Lynne>
on a related note, the encoding api has some issues that have to be addressed before it's stable
<airlied>
the video I'm looking at doesn't even decdoe it's first frame properly
<Lynne>
make sure B-frames are missing
<airlied>
that nvidia fix looks very wrong though
<Lynne>
no AFT = frames should be printed
<Lynne>
ideally, the first frame should not have any refernces at all
<airlied>
oh I thought we had B-frames :)
<Lynne>
most hardware encoders generate that
<DavidHeidelberg[m]>
<anarsoul> "I guess the easiest workaround..." <- Having different codepath on release and debug sounds bad. What about enable it with driver related variable?
<Lynne>
nope, let's tackle those after we get P-frames working, though they should just work once we do that
<Lynne>
anyway, with encoding, the intention behind VK_VIDEO_ENCODE_RATE_CONTROL_MODE_NONE_BIT_KHR seems to be to let users implement their own rate control (as well they should), and letting them set quantizers
<Lynne>
but there's no way for users to override the frame type the encoder uses
<Lynne>
apart from doing a full RESET, but that's a nuclear option
<Lynne>
if VK_VIDEO_ENCODE_RATE_CONTROL_MODE_NONE_BIT_KHR, there ought to be an enum in the same struct that can override the frame type used
<airlied>
Lynne: still seems like slot index is wrong
<Lynne>
I copied what nvdec does, since it's the closest API-wise to vulkan
<Lynne>
if it's any consolation, av1 should be even simpler than h264, thought will require some invasive spec changes
<Lynne>
(av1 requires you to be able to feed packets into an encoder without being able to output a frame)
aravind has joined #dri-devel
rsalvaterra_ has joined #dri-devel
rsalvaterra is now known as Guest1853
rsalvaterra_ is now known as rsalvaterra
Guest1854 has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
bmodem has joined #dri-devel
SanchayanMaity_ has quit []
SanchayanMaity has joined #dri-devel
srslypascal is now known as Guest1859
srslypascal has joined #dri-devel
Guest1859 has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
Jeremy_Rand_Talos__ has quit [Remote host closed the connection]
Jeremy_Rand_Talos__ has joined #dri-devel
mhenning has quit [Quit: mhenning]
itoral has joined #dri-devel
fab has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
bgs has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
fab has quit [Quit: fab]
gouchi has joined #dri-devel
gouchi has quit [Remote host closed the connection]
<daniels>
tintou: most of them are fine, but Windows is currently stuck behind some long-running GStreamer jobs
camus has quit [Ping timeout: 480 seconds]
maxzor_ has joined #dri-devel
maxzor__ has joined #dri-devel
maxzor_ has quit [Ping timeout: 480 seconds]
Haaninjo has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
jkrzyszt has quit [Remote host closed the connection]
devilhorns has quit [Remote host closed the connection]
Company has joined #dri-devel
devilhorns has joined #dri-devel
heat has quit [Read error: No route to host]
heat has joined #dri-devel
jkrzyszt has joined #dri-devel
junaid has joined #dri-devel
junaid has quit [Remote host closed the connection]
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
djbw has quit [Read error: Connection reset by peer]
jkrzyszt has quit [Remote host closed the connection]
Danct12 has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
aravind has joined #dri-devel
itoral has quit [Remote host closed the connection]
jkrzyszt has joined #dri-devel
Kayden has quit [Quit: Leaving]
Kayden has joined #dri-devel
Akari has joined #dri-devel
<jenatali>
daniels: I don't suppose you've come up with any way to un-prioritize those jobs?
fxkamd has quit []
fxkamd has joined #dri-devel
<daniels>
jenatali: GitLab doesn't give us a way :( we just need to keep leaning on them to try to optimise them somehow
<daniels>
or get another machine
<jenatali>
:(
YuGiOhJCJ has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
maxzor__ has quit [Ping timeout: 480 seconds]
jagan_ has joined #dri-devel
pekkari has joined #dri-devel
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
jkrzyszt has joined #dri-devel
jewins has joined #dri-devel
fxkamd has quit []
fxkamd has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
cwabbott_ has quit []
maxzor__ has joined #dri-devel
cwabbott has joined #dri-devel
dcz_ has quit [Ping timeout: 480 seconds]
<cwabbott>
do any WSI people know if I'd need to define a modifier for subsampled images on Qualcomm?
<cwabbott>
it's an already-existing concept in VK_EXT_fragment_density_map, which is a normal image combined with some metadata telling the user to sample at reduced resolution in some regions
Guest1875 has quit []
<cwabbott>
on qualcomm the implementation is pretty much entirely by the driver, so the format of the metadata is up to the driver
<cwabbott>
the usecase for sharing would mainly be passing a subsampled image to a VR compositor to do barrel distortion
aravind has quit [Ping timeout: 480 seconds]
<cwabbott>
it seems like things would already Just Work if producer and consumer both pretend it's a normal image but pass the subsampled bit when creating it, how we implement the metadata never changes, etc.
bgs has joined #dri-devel
kts has joined #dri-devel
kts has quit []
Leopold_ has quit [Remote host closed the connection]
kts has joined #dri-devel
Leopold_ has joined #dri-devel
sgruszka has quit [Ping timeout: 480 seconds]
JohnnyonFlame has joined #dri-devel
mbrost has joined #dri-devel
heat_ has joined #dri-devel
heat has quit [Read error: Connection reset by peer]
jkrzyszt has quit [Remote host closed the connection]
<jekstrand>
cwabbott: Uh... Maybe?
<jekstrand>
cwabbott: Depends on how much auto-magic you expect there to be.
<jekstrand>
cwabbott: Is it going to be re-imported as a fragment density image? Or are you expecting it to be magic metadata that gets used by the sampler?
psykose has quit [Remote host closed the connection]
psykose has joined #dri-devel
<cwabbott>
jekstrand: it would be re-imported and used as a subsampled image in the compositor
<jekstrand>
cwabbott: If it's just a matter of sharing an R8 image that's going to be used as a fragment density map on both sides, you don't need anything special, I don't think, unless it has some special tiling or something like that.
<jekstrand>
If it's being handled as a metadata plane attached to a shared color image, then you probably need a modifier for it.
<cwabbott>
a fragment denisty map isn't anything special
<cwabbott>
a subsampled image has a separate metadata plane (sort-of) with the density to sample at
<cwabbott>
that's derived from but not the same as the fragment density map
jkrzyszt has joined #dri-devel
Duke`` has joined #dri-devel
<cwabbott>
the FDM is just a normal 2-channel image that gets added as an extra attachment to the render pass
tzimmermann has quit [Quit: Leaving]
<jekstrand>
Yeah, that sounds like a metadata plane which needs a modifier
fab has joined #dri-devel
maxzor__ has quit [Ping timeout: 480 seconds]
<emersion>
how does it interact with other existing Qualcomm modifiers?
<emersion>
a buffer can only have a single modifier
<cwabbott>
that wouldn't be so bad I guess
<cwabbott>
emersion: there's only one, for UBWC, so I guess we'd have to duplicate them
<emersion>
a common startegy is to reserve bits for each use
<emersion>
but you can just duplicate and come up with a bit layout later
<emersion>
just don't paint yourself in a corner, it's good to think about future uses and extensibility
<jekstrand>
It's also really easy to over-think it like the NV modifiers which have bits reserved for mipmapping as if that'll ever happen.
<flto>
cwabbott: comparing UUIDs and creating the image with identical parameters is enough, you don't need a modifier (unless your use-case actually involves passing the images through WSI, which doesn't seem like a good idea)
<cwabbott>
flto: I was asking the right way to do it, not whether it would work
<cwabbott>
of course that would work but comparing UUIDs etc. is not expected usage for the dma-buf extensions
<emersion>
the compositor and its clients might be running different versions of the drivers
ybogdano has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
<emersion>
cwabbott: is this Qualcomm-specific or not?
<cwabbott>
emersion: iirc ARM also does subsampled images but I think their implementation is a bit different
<emersion>
VK_EXT_fragment_density_map makes it sound like it's not, but i know nothing about this stuff
<cwabbott>
ARM and Qualcomm are the two that implement VK_EXT_fragment_density_map
<emersion>
hm, and VK_EXT_fragment_density_map is only about the shader invocation i suppose?
<emersion>
not the buffer itself?
<cwabbott>
it defines how to create a subsampled image, too
<cwabbott>
I'm not sure if anyone thought about passing it between processes
<flto>
cwabbott: it is not a "wrong" way to use dma bufs either (VK_EXT_external_memory_dma_buf can exist without VK_EXT_image_drm_format_modifier)
<cwabbott>
flto: you can't bind an image without VK_EXT_image_drm_format_modifier
<cwabbott>
so no, it is a wrong way
<cwabbott>
on linux, you cannot get guarantees about image layout matching across processes without modifiers
camus1 has quit [Remote host closed the connection]
camus has joined #dri-devel
junaid has joined #dri-devel
<cwabbott>
on ARM, I think subsampled images are like mipmapped images and there's still a metadata but it tells which "mip" to use
<cwabbott>
not sure if it literally is mipmapped or not
jessica_248 is now known as jessica_24
<flto>
cwabbott: dma bufs are just a type of external memory.. I don't think there's an exception that you can't use dma bufs for images like other external memory types
camus has quit [Ping timeout: 480 seconds]
<cwabbott>
flto: I don't think it provides any guarantees that it'll work either, and in practice it won't be accepted upstream because no one does that
<cwabbott>
the intended way of handling this is modifiers
<flto>
cwabbott: there are guarantees that external memory can be used for images with matching UUIDs
lynxeye has quit [Quit: Leaving.]
cheako_ has quit []
cheako has joined #dri-devel
frieder has quit [Remote host closed the connection]
Leopold_ has quit [Ping timeout: 480 seconds]
alyssa has joined #dri-devel
<alyssa>
Is it legal to call bind_sampler_states multiple times to update subranges?
<alyssa>
Gallium docs say no:
Leopold_ has joined #dri-devel
jkrzyszt has joined #dri-devel
<alyssa>
sampler states are bound... with the ``bind_sampler_states`` function. The ``start`` and ``num_samplers`` parameters indicate a range of samplers to change. NOTE: at this time, start is always zero and the CSO module will always replace all samplers at once (no sub-ranges). This may change in the future.
<alyssa>
But Nine does it anyway.
<alyssa>
Are the docs out of date? or is nine buggy?
<alyssa>
softpipe/llvmpipe explicitly handle the subrange case
<alyssa>
freedreno implicitly does
<alyssa>
zink/v3d do not handle the case
<alyssa>
Either we need to update the docs and then update n-3 drivers to handle subranges
<alyssa>
not sure what the driver is supposed to do
<alyssa>
they asked for a direct map of a persistent coherent buffer, we gave them one
ybogdano has quit [Ping timeout: 480 seconds]
<anarsoul>
yeah, now they are decoding the video directly into the buffer
<alyssa>
again I don't see how this is Mesa's fault
<anarsoul>
the spec doesn't say anything on whether the buffer is expected to be cached or not
<alyssa>
i guess if we're at a stalemate with mpv, we can driconf it..
<alyssa>
frustrating though.
<anarsoul>
I wonder if mpv devs would consider adding a command line (and config) option for that
<anholt_>
one should expect the buffer to be uncached. it would be very rare for a persistent coherent mapped buffer to be cached. sounds like a busted app.
junaid has joined #dri-devel
<anarsoul>
anholt_: it's mpv
kts has quit [Quit: Leaving]
MajorBiscuit has quit [Ping timeout: 480 seconds]
ybogdano has joined #dri-devel
<anarsoul>
I believe if they used a staging buffer for decoder and memcpy-ed their decoded image into the buffer it would be faster than doing glTexSubImage()
bmodem has quit [Ping timeout: 480 seconds]
Akari has quit [Quit: segmentation fault (core dumped)]
junaid has quit [Ping timeout: 480 seconds]
junaid has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.6]
dcz_ has joined #dri-devel
<airlied>
anholt_: huh? pretty sure they are cached on x86
<airlied>
esp with pcie gpus
<jekstrand>
Yeah, typically it'd be WC if you get an actual VRAM map and cached if it's in system RAM.
<jekstrand>
On Mali, I'd hope you at least can do WC for staging stuff but IDK how the memory works there.
<jekstrand>
On Intel, we had it really nice with shared LLC.
<alyssa>
jekstrand: panfrost and presumably lima map everything WC
<alyssa>
everything
<alyssa>
the downstream Arm driver has more caching/coherency flags available but that was never ported upstream
<alyssa>
ad I'm not even sure what we would do with them in either GL or VK
<jekstrand>
Yeah, IDK either, TBH.
<jekstrand>
Maybe you can do something better than WC for CPU reads but if your app is CPU-read-of-GPU-mem-bound, you've got problems.
* jekstrand
chuckles in vkCmdDrawIndexed()
<airlied>
alyssa: there is no gpu snoops the cpu cache?
<HdkR>
You're lucky to have that in ARM land :P
<graphitemaster>
Is it a safe assumption that gl_LocalInvocationID / 32 (or 64) is the same as gl_SubgroupID? Like do most implementations that execute subgroups in lock-step allocate them in terms of full warps/waves?
<alyssa>
airlied: "full system coherency" -- Mali supports that architecturally but not all(?) mali SoCs do
<alyssa>
and even with full system coherency, I think WC slows down reads and helps streaming writes
<jekstrand>
graphitemaster: Safe? Not necessarily. But with the right Vulkan version/extensions it is.
<graphitemaster>
jekstrand, This is more for a fallback for when a GPU does not advertise the KHR extension and only the ARB ballot one.
<jekstrand>
Yeah, no, nothing's safe there.
<alyssa>
jekstrand: we don't joke about indexed draws on bifrost
pekkari has quit [Quit: Konversation terminated!]
<graphitemaster>
Okay, maybe not safe, but on a scale of like 0 it won't work to 10 it will always work, what are we looking at here?
<jekstrand>
alyssa: No, we just joke about all of bigrost. :P
<alyssa>
graphitemaster: 7?
<alyssa>
IDK
<graphitemaster>
Hay that's pretty good!
<jekstrand>
graphitemaster: I was going to go 9 but that 1 will bite you.
<graphitemaster>
So far you two have given me a range of 70 to 90% success rate, I can take those chances.
<alyssa>
jekstrand: uh oh
<alyssa>
jekstrand: also, I'm pretty sure Arm just kinda retconned Bifrost out of existence
<jekstrand>
graphitemaster: I'll give you a 98% chance someone will file a bug report, though. :)
<alyssa>
Utgard -> Midgard -> Valhall is the official progression (-:
<airlied>
alyssa: so you pick full coherent for coherent maps and wc for others id assume
<alyssa>
oh, that's fair
<alyssa>
still requires UAPI changes which i am not excited for
<alyssa>
although I think these are properties of the BO, not the transfer..
<graphitemaster>
jekstrand, The number of GPUs that don't support other vendor or the KHR extension already reduces the chances of even hitting a ARB_ballot only fallback, so Bayesian says 0% :P
<jekstrand>
graphitemaster: Or it makes the bug report that much harder to reproduce. :P
<jekstrand>
If there's that few of them, why have the fallback at all?
<agd5f>
airlied, anholt_ PCI spec says device access should be cache coherent with the CPU. Non-coherent access is an optional feature which that platform can provide at it's discretion.
<anholt_>
agd5f: device access of main mem, right?
mbrost has quit [Ping timeout: 480 seconds]
<anholt_>
but not cpu accesses of vram
flto has quit [Ping timeout: 480 seconds]
<anholt_>
so, mapping some gpu buffer and assuming you'll get cached performance is super wrong.
<agd5f>
anholt_, yes, device access to CPU memory
flibitijibibo has quit [Quit: Leaving]
<agd5f>
I.e., pci devices should snoop the CPU caches by default
<alyssa>
anholt_: i assume mpv has never seen a device that uses the GPU but not the VPU, gets WC persistent mappings from the GPU, and has a CPU slow enough to notice the problem
<alyssa>
without all 4 of those, you don't get this
epoll has quit [Ping timeout: 480 seconds]
<anholt_>
"video playback on linux just eats a ton of cpu" is unfortunately normal, it's true.
<alyssa>
perf was fine before by accident, I guess ... wiring up ARB_buffer_storage to lima led to jenga
<airlied>
yeah i question wiring up buffer storage on non coherent platfor.s
<alyssa>
brrrrr
<alyssa>
maybe I should just stop writing code
<alyssa>
that way I can't introduce any more regressions
<agd5f>
in practice, it seems like x86 and PPC are the only platform with PCIe that seem to get this right. IIRC ARM requires some optional IP that no one uses and most others just assume non-coherent
epoll has joined #dri-devel
<alyssa>
yeah most Arm hw is PCIe-deficient
<agd5f>
ARM is actually probably 50/50. Depends on the SoC
<alyssa>
fair
<DemiMarie>
To what extent will i915 VFs be able to communicate directly with the GuC?
<airlied>
don't think that is the design yet
<DemiMarie>
That is good. I was worried that VFs could submit commands via the GuC and the GuC would parse them.
<DemiMarie>
Even if those commands are restricted to unprivileged ones, there is still the worry of memory unsafety.
<DemiMarie>
On the other hand, if the VFs can only talk to hardware, I am less worried.
<alyssa>
ew, GuC :p
<airlied>
actually not sure with SRIOV what the design is
junaid has quit [Ping timeout: 480 seconds]
<DemiMarie>
Why did Intel start requiring the GuC anyway?
<alyssa>
yeah what the GuC is up with that
<airlied>
because their execlist driver was vendored beyond maintainable
mbrost has joined #dri-devel
<airlied>
and their windows driver was moving that directin
<danvet>
DemiMarie, vf submits to guc like pf
<danvet>
like that's why this thing exists pretty much
<anarsoul>
alyssa: perf was suboptimial even without ARB_buffer_storage
junaid has joined #dri-devel
<alyssa>
anarsoul: apparently i wasn't even the one to hook up that CAP for lima, this one ain't on me :-D
<anarsoul>
alyssa: yeah, that was me :)
<alyssa>
:-D
<anarsoul>
yet I'm not sure if it's better to disable it, driconf it for mpv or keep it as is
<alyssa>
best is to patch mpv i suppose
<alyssa>
driconf second best
<anarsoul>
I'm hesitant to touch mpv code
<anarsoul>
do we happen to have any mpv maintainers here? :)
<psykose>
could open an issue on mpv they all read it
<danvet>
and so cmd parse would boil down to "implement the gpu in sw"
<danvet>
yeah so hopefully it actually works :-)
<DemiMarie>
danvet: I trust Intel to get the MMU right. It's buffer overflows in fw I am worried about.
<danvet>
agd5f, question was more about VF/PF and fw stuff, which is another can of worms from userspace direct submit
<DemiMarie>
Hence why I was specifically asking about the GuC.
<DemiMarie>
For instance Apple GPU fw does not validate its inputs at all
<agd5f>
Guc or similar is also a solution to the GPU scheduling
gouchi has joined #dri-devel
<danvet>
agd5f, yeah, but at least with ours userspace never talks to guc directly
<danvet>
it's only the kernel driver, and then guc just schedules the contexts which are (per the kernel driver) marked as runnable
<danvet>
(we're not yet at hw doorbell actually working)
<danvet>
so guc never (currently at least) sees anything created by userspace, only the kernel
<alyssa>
danvet: oh this is going to be fun
<anarsoul>
psykose: done
<alyssa>
the new mali's are based around userspace submission with the doorbell piped directly to userspace
<psykose>
anarsoul: saw :) nice
<alyssa>
and firmware---userspace sync with the kernel completely out of the loop
<alyssa>
Arm has this working in their downstream non-DRM kernel driver and their blob GL/VK
flibitijibibo has joined #dri-devel
<alyssa>
I get dizzy thinking about what this means for mainline
<danvet>
alyssa, ask jekstrand
<airlied>
yeah then weap in a corner
<jenatali>
That's the eventual goal for Windows too IIRC
<alyssa>
airlied: that's basically what jekstrand said
<alyssa>
jenatali: fwiw macOS does NOT do this at least for M1
<alyssa>
and i have yet to see compelling evidence that it actually matters.
<jenatali>
Yeah, I said eventual. We're not there yet :)
warpme_____ has joined #dri-devel
Akari has joined #dri-devel
flto has joined #dri-devel
apinheiro has quit [Ping timeout: 480 seconds]
<DemiMarie>
alyssa: can you continue to not do that?
<alyssa>
DemiMarie: what's this replying to?
<alyssa>
I made a lot of comments about things I should continue to not do :-p
<DemiMarie>
Because in Qubes one of our goals is secure GPU virt, and ( marmarek feel free to correct me) direct guest-firmware interaction makes this very hard
<alyssa>
Oh, that's a very interesting point
<alyssa>
Yes, we can continue to use the existing "submit ioctl" model
<alyssa>
In effect, the stuff that Arm designed to run in userspace will just run in kernel space instead
<alyssa>
with a possible performance cost due to extra ctx switching but I find it hard to be too bothered by that right now
<alyssa>
I think we will support that model for the hw's lifetime regardless of if we ever add direct user-fw submit as a fast path later
<DemiMarie>
My worry is that future hardware will require the fast path.
<alyssa>
yes, I hear you
<DemiMarie>
If I trusted the GPU firmware to do proper input validation that would be a totally different story.
<alyssa>
I don't and you shouldn't
<DemiMarie>
Have you found cases where it didn't?
<DemiMarie>
danvet: could i915 virtualize the GuC queues? Would this be good from a security perspective?
<alyssa>
DemiMarie: it's a blob, ain't like we've audited that thing
mbrost has quit [Ping timeout: 480 seconds]
pixelcluster has quit [Remote host closed the connection]
pixelcluster_ has joined #dri-devel
pixelcluster_ has quit [Remote host closed the connection]
pixelcluster has joined #dri-devel
junaid has quit [Ping timeout: 480 seconds]
hays_ has joined #dri-devel
mbrost has joined #dri-devel
maxzor has joined #dri-devel
<DemiMarie>
alyssa: anyone considered clean-room REing an open-source replacement?
<DemiMarie>
alyssa: do you fuzz the kernel driver?
<alyssa>
FOSS replacement for that blob is definitely a possibility, yeah
<alyssa>
I started work on it but right now more interested in supporting the hardware in mainline at all
<agd5f>
alyssa, likely hw won't load it unless it's signed
<alyssa>
agd5f: it will, we have code execution on the thing
<hays_>
Is the Mali-G610 MP4 3D GPU under the provenance/support of panfrost?
<alyssa>
hays_: Yes, that's the hardware under discussion right now
<hays_>
joining a bit late
<alyssa>
It isn't supported upstream yet, but we're working (actively) on it
<alyssa>
If you need GPU acceleration, I recommend against buying RK3588 boards until there's upstream support
<hays_>
I know next to nothing about the details, but I have the hardware and should be in a position to test soon
<alyssa>
(Even aside from GPU, the mainline support isn't there yet for everything else)
<hays_>
yeah im very aware. haha :)
mvlad has quit [Remote host closed the connection]
* CounterPillow
nervously glanced at the unimplemented clock gating stuff
<alyssa>
woof
<alyssa>
I'd really rather not support the downstream kernel in upstream Mesa, so that's also a complicating factor for Mesa support
<alyssa>
and in an unusual twist, the hard part is going to be kernel support, not Mesa
<alyssa>
the Mesa side is "basically" identical to Mali-G57
srslypascal has quit [Ping timeout: 480 seconds]
<alyssa>
(and by "basically" I mean a 5KLOC branch but shhh a big chunk of that is copy/paste...)
<alyssa>
all of this is coming soon to upstream but not there yet
<CounterPillow>
anarsoul: psykose: thanks for the mpv bug report, we're currently discussing disabling DR by default until we have an implementation that uses staging buffers. Pls don't driconf us
<anarsoul>
CounterPillow: thanks for looking into it!
<alyssa>
CounterPillow: sorry for suggesting driconf, I might've been a bit too trigger happy there ;-p
<hays_>
alyssa: is the idea to use the mali_csffw.bin firmware or to avoid it
<alyssa>
hays_: right, that's what we were discussing a few minutes ago
<hays_>
i must have just missed it
<alyssa>
yeah
<hays_>
sorry
<alyssa>
no worries
<alyssa>
at a technical level, it's possible to avoid
<alyssa>
there's no sig checking, we have code exec on the MCU
<alyssa>
having *some* firmware is mandatory but it doesn't have to be Arm's
<alyssa>
That said -- we expect the mali_csffw.bin <--> kernel interface to remain relatively stable, but the physical firmware <--> hardware interface to change unpredictably for future Malis
<alyssa>
so bringing up a FOSS firmware right now seems like it might be painting ourselves into a corner
<alyssa>
So I think the current plan is to teach the mainline kernel about mali_csffw.bin
<alyssa>
and teach Mesa to use the mainline kernel with Mali-G610
<alyssa>
after there's mainline support shipping, then we can revisit if we maybe want to free the firmware too, and I should have a lot of code to drop if fd.o folks want to pursue that
<alyssa>
but we'd probably keep both code paths in the kernel if only to ease bring up of future Malis
apinheiro has joined #dri-devel
<alyssa>
no sense cutting off our nose to spite our face.
<hays_>
yeah all seems very sensible
<alyssa>
bbrezillon: ^^
<alyssa>
hays_: let's be clear they're not *my* sensible decisions ;)
<anarsoul>
alyssa: does ARM allow mali_csffw.bin redistribution?
<hays_>
anarsoul: i think i downloaded it from the rockchip website... so that's not firm confirmation but at least some people can redistribute
<hays_>
alyssa: heh that screenshot is disorienting but i think i get the picture :)
<hays_>
what is the relationship between these drivers and a BSP which appears to include a driver blob with patches to various userland apps most notably xorg
gouchi has quit [Remote host closed the connection]
<hays_>
and how do you teach the kernel about mali_csffw.bin? is that straight up reverse engineering?
<mareko>
alyssa: nobody had the energy to update all drivers when "start" was added
<mareko>
if a driver can't handle sampler state start > 0, we just assume it's broken
ahajda_ has quit []
<alyssa>
mareko: OK. The issue isn't start > 0, but not unbinding samplers [start + num, ...)
<alyssa>
which matters for how ctx->num_samplers[stage] is calculated
<alyssa>
hays_: We're trying to do the right thing. Mainline kernel driver, upstream Mesa driver, no vendor trees. That takes more time than shipping blobs.
<hays_>
yeah im just trying to understand the landscape. it almost seems like the vendors have patched various OSS projects to work with the blob, and ship that as a Yocto layer or whatever, but unclear what happens next to all of those forks
<hays_>
not limited to just gpu, but seem to be blobs for many other chips as well
<alyssa>
they live a short life with an old kernel and then people ship to mainline when mainline is ready
<hays_>
yeah and maybe you were saying this earlier, for the rk3588 the kernel seems to be crazy old so a heavy lift on that side
ybogdano has quit [Ping timeout: 480 seconds]
bskica has joined #dri-devel
<alyssa>
I really wish people held off on buying rk3588 for a year or so
<alyssa>
the software side really isn't ready
<alyssa>
actually I wish people held off on all new hardware a year or so.
<psykose>
whoa it's A76
<psykose>
at last something not a72 /s
<alyssa>
(people being "anyone not developing drivers for that hardware" I mean)
<danvet>
alyssa, users, they're a menace :-P
<alyssa>
danvet: am user, can confirm
dcz_ has quit [Ping timeout: 480 seconds]
apinheiro has quit [Quit: Leaving]
rasterman has quit [Quit: Gettin' stinky!]
danvet has quit [Ping timeout: 480 seconds]
<DemiMarie>
alyssa: when you said you had code exec, my first thought was that there was a buffer overflow or similar allowing sig checks to be bypassed
<DemiMarie>
if I may make a suggestion: have you considered using a safe language (such as Ada or Rust) for the rewritten firmware?
pcercuei has quit [Quit: dodo]
rsalvaterra has quit []
rsalvaterra has joined #dri-devel
<alyssa>
it'd be in Rust, yes
<alyssa>
less for safety and more because none of us want to write C code :p
<Kayden>
alyssa: I think the docs for bind_sampler_state should drop that comment. I don't know why anyone would put a note in the docs saying "BTW, we have these parameters, but we don't use them, so feel free to ignore them and write bugs, haha!". if they're really not used, they should be eliminated. but they are actually used, so the docs are wrong
<Kayden>
at least at a glance it looks like zink is handling [start_slot, start_slot + num_samplers). iris does too
<Kayden>
samplers[] has at least count samplers, and samplers[i] maps to driver sampler [start_count + i]
<Kayden>
if samplers[i] is null it's an unbind
fxkamd has quit []
<alyssa>
yeah, I think Iris is doing the right thing
<alyssa>
the biggest bug with panfrost was incorrectly calculating sampler_count
<alyssa>
but Iris just.. doesn't calculate that ;-p
<alyssa>
thanks to using blorp it doesn't need to. u_blitter needs it.
<alyssa>
if we all agree the docs are wrong, could someone send a doc patch? thanks