ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
<airlied>
no Red Hat used to ship that crap as our hypervisor :-P
<DemiMarie>
crap?
<airlied>
yeah at least from our paid customers pov, I'm sure it has use cases, but kvm just made life a lot simpler
<DemiMarie>
not surprised
<DemiMarie>
My understanding is that most uses of Xen nowadays are for things KVM just can’t do, at least not without a ton of additional work.
Haaninjo has quit [Quit: Ex-Chat]
<airlied>
or because someone still uses citrix?
<DemiMarie>
not my use-case
<DemiMarie>
In Qubes OS we use PCI passthrough so that the NICs and USB controllers are handled by less-privileged VMs, thus protecting the host.
<karolherbst>
but couldn't that be done with KVM as wekk?
<karolherbst>
*well
<Ermine>
doesn't xen utilize kvm?
Leopold has joined #dri-devel
Thymo_ has joined #dri-devel
<DemiMarie>
karolherbst: not easily at least, because KVM doesn’t support one VM providing networking to another directly
<airlied>
nope
pcercuei has quit [Quit: dodo]
<karolherbst>
DemiMarie: that kinda sounds like a configuration problem?
<DemiMarie>
karolherbst: nope, it’s way more fundamental.
<DemiMarie>
In Xen, two VMs can communicate directly
<karolherbst>
mhh
<DemiMarie>
In KVM, you have to write a bunch of userspace stuff and try to make it work
<DemiMarie>
The closest you can get is virtio-vhost-user but that is still experimental and also assumes that the backend is trusted. In Qubes OS, the backend is not trusted.
<DemiMarie>
Xen is also working on being able to deprivilege dom0 and on safety certification, so that one can have e.g. a safety-critical QNX guest alongside Linux guests doing infotainment stuff.
<DemiMarie>
Trying to do that under KVM would be a nightmare, if it can even be done at all.
<DemiMarie>
Also Xen’s security process is vastly better than Linux’s, and therefore KVM’s.
Thymo has quit [Ping timeout: 480 seconds]
Leopold has quit [Remote host closed the connection]
Leopold_ has quit [Remote host closed the connection]
<zmike>
mareko: I've still had the tab open, but I've been too busy to get back to reviewing
<zmike>
I didn't intend for that to be a blocking comment if it's been stalling pepp's review
<zmike>
hoping to get to it in the next couple days if someone doesn't beat me to it
<HdkR>
`amdgpu 0005:03:00.0: [drm] *ERROR* Error waiting for INBOX0 HW Lock Ack` Anyone ever see this error spamming in dmesg, or should I try updating my kernel?
Leopold has joined #dri-devel
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
Leopold has quit [Remote host closed the connection]
<DemiMarie>
airlied: do `FOLL_LONGTERM` pins of VRAM work?
Leopold has joined #dri-devel
<airlied>
DemiMarie: don't think it makes any sense
<DemiMarie>
airlied: Ouch. Why?
<airlied>
VRAM isn't like RAM
Daanct12 has joined #dri-devel
<airlied>
the PCIE bar can act like a remapping table on some gpus
<airlied>
probably easier to not expose mappable VRAM to guests
<DemiMarie>
airlied: what will that break?
macromorgan has quit [Ping timeout: 480 seconds]
<DemiMarie>
Does it mean no Vulkan and no OpenGL4.6+?
davispuh has quit [Ping timeout: 480 seconds]
flynnjiang has joined #dri-devel
<DemiMarie>
If so, it’s probably better to make whatever Xen-side changes are needed to make it work.
<airlied>
probably hurts performance on those
<DemiMarie>
how much?
<airlied>
but I think you can get away with just doing everything in RAM instead of VRAM where you want mappings
<DemiMarie>
what about vkMapMemory?
<airlied>
though not sure if some apps always assume you can map vram
<airlied>
they shouldn't but who knows what ppl do
benjaminl has quit [Read error: Connection reset by peer]
benjaminl has joined #dri-devel
<DemiMarie>
yeah
<DemiMarie>
I suspect it will really hurt compute, though.
<DemiMarie>
Compute seems to care much more about shared mappings
Leopold has quit [Remote host closed the connection]
<jenatali>
D3D has gotten by without mapping VRAM until *very* recently
<DemiMarie>
How recently?
<jenatali>
Today
<DemiMarie>
Literally March 11 2024?
<jenatali>
Yes
<DemiMarie>
What was announced?
<DemiMarie>
If I can avoid mapping VRAM that makes things vastly simpler.
<DemiMarie>
jenatali: how long before applications will require upload heaps?
Leopold_ has joined #dri-devel
* DemiMarie
wishes that upload heaps had never been created
<jenatali>
Dunno. Probably a while
<karolherbst>
not that it changes much because drivers need it anyway
benjaminl has quit [Read error: Connection reset by peer]
<kode54>
Have teaching everyone how to turn on rebar, if they can
benjaminl has joined #dri-devel
guru_ has joined #dri-devel
<DemiMarie>
karolherbst: why is that?
<karolherbst>
because they wanna upload stuff to the GPU in a non painful way
<karolherbst>
like NVK relied on being able to map VRAM since forever
sudeepd_ has quit [Ping timeout: 480 seconds]
<DemiMarie>
what about Intel and AMD?
<kode54>
Arc already requires it outright or else it runs like crap
<karolherbst>
dunno, but probably the same
<DemiMarie>
kode54: crap?
<kode54>
Like worse performance than really old hardware
<DemiMarie>
because of faulting on each access?
anujp has joined #dri-devel
<DemiMarie>
Anyway, so this will need to be dealt with in Xen or we will need to switch to KVM or not support GPU acceleration.
<DemiMarie>
The old version that runs the userspace driver on the host is not even being considered.
<jenatali>
Out of curiosity, is anybody working on an implementation of the AMD work graph extension (for RADV or otherwise)?
oneforall2 has quit [Ping timeout: 480 seconds]
<kode54>
I don’t know if faulting is why the windows drivers are slow since I don’t have that information
<kode54>
I just know they outright tell people to have rebar support, and reviewers have found the card performance to be a stuttery mess without it
columbarius has joined #dri-devel
yyds has joined #dri-devel
co1umbarius has quit [Ping timeout: 480 seconds]
anujp has quit [Ping timeout: 480 seconds]
anujp has joined #dri-devel
<DemiMarie>
How common is rebar support nowadays?
<DemiMarie>
airlied: on which GPUs can the BAR act as a translation table?
<airlied>
amd and nvidia do it
<airlied>
though I don't think amdgpu takes too much advantage of it
<airlied>
but I haven't looked in a while
Marcand has joined #dri-devel
<agd5f>
we've supported it in amdgpu for almost maybe 8-10 years? Christian added the bar resizing the the kernel PCI code.
<agd5f>
As long as you have enough MMIO space
<agd5f>
Simplifies the kernel side since you never have to worry about faulting BOs in and out of the BAR window
<airlied>
agd5f: I don't think you do translations in the BAR though
<airlied>
like you resize it
<airlied>
but you map BAR address 0x100 to VRAM address 0x100 always
<airlied>
nvidia has a page table between VRAM and the BAR
Dark-Show has quit [Quit: Leaving]
<airlied>
so even with a 256MB you can map any parts of the 8GB VRAM on a page granularity, it's just a pain because you have to do evictions
dogukan has quit [Quit: Konversation terminated!]
anujp has quit [Ping timeout: 480 seconds]
<DemiMarie>
agd5f: why does amdgpu use non-refcounted pages?
Marcand_ has joined #dri-devel
Marcand_ has quit []
guru_ has quit []
oneforall2 has joined #dri-devel
Mangix has quit [Ping timeout: 480 seconds]
JohnnyonFlame has joined #dri-devel
yyds_ has joined #dri-devel
yyds has quit [Ping timeout: 480 seconds]
Leopold_ has quit [Remote host closed the connection]
Marcand_ has joined #dri-devel
Marcand has quit [Read error: Connection reset by peer]
kzd has quit [Quit: kzd]
Leopold has joined #dri-devel
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
macromorgan has joined #dri-devel
ghishadow_ has joined #dri-devel
ghishadow has quit [Ping timeout: 480 seconds]
Mangix has joined #dri-devel
kzd has joined #dri-devel
ghishadow_ has left #dri-devel [#dri-devel]
ghishadow_ has joined #dri-devel
Marcand_ has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
anujp has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
halves has quit [Quit: Ping timeout (120 seconds)]
halves has joined #dri-devel
bmodem has joined #dri-devel
aravind has quit [Remote host closed the connection]
sudeepd has joined #dri-devel
<marex>
airlied: I wouldn't mind that, but is there something convenient like drivers/gpu/drm/drm_gem_dma_helper.c with iommu support ? Or how do I even ... what do I even grep for ?
<marex>
iommu_iova_to_phys maybe ?
eocwzdenn^ has joined #dri-devel
<marex>
nope
<airlied>
you enable the iommu and the dma layer should do the magic
<marex>
airlied: hmmm, so ... I would use the SHMEM allocator, then use -something- from probably include/linux/iommu.h to turn buffer I get from that shmem allocator into ... uh ... IOVA ? ... and pass that to the device ?
surajkandpal1 has quit [Ping timeout: 480 seconds]
itoral has quit [Ping timeout: 480 seconds]
glennk has quit [Ping timeout: 480 seconds]
Leopold_ has quit [Ping timeout: 480 seconds]
Leopold_ has joined #dri-devel
Company has quit [Quit: Leaving]
itoral__ has joined #dri-devel
<airlied>
bcheng, Lynne : okay I'm pretty unsure how this is meant to work :-P
<airlied>
tchar__: I don't trust that code in radv for filling out the ref frame map
<airlied>
but I'm unsure how the API is meant to be used here
itoral_ has quit [Ping timeout: 480 seconds]
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
warpme has joined #dri-devel
tzimmermann has joined #dri-devel
itoral_ has joined #dri-devel
itoral__ has quit [Remote host closed the connection]
Daanct12 has quit [Quit: WeeChat 4.2.1]
sima has joined #dri-devel
Daanct12 has joined #dri-devel
itoral__ has joined #dri-devel
fab has quit [Quit: fab]
itoral_ has quit [Ping timeout: 480 seconds]
jsa has joined #dri-devel
enick_416 has quit []
warpme has quit []
surajkandpal has joined #dri-devel
fab has joined #dri-devel
macromorgan has quit [Ping timeout: 480 seconds]
itoral_ has joined #dri-devel
vliaskov has joined #dri-devel
mvlad has joined #dri-devel
frieder has joined #dri-devel
sgruszka has joined #dri-devel
itoral__ has quit [Ping timeout: 480 seconds]
warpme has joined #dri-devel
ghishadow_ has quit [Ping timeout: 480 seconds]
itoral__ has joined #dri-devel
glennk has joined #dri-devel
sarnex has quit [Read error: Connection reset by peer]
hansg has joined #dri-devel
frankbinns has joined #dri-devel
itoral_ has quit [Read error: No route to host]
itoral__ has quit [Remote host closed the connection]
itoral has joined #dri-devel
frankbinns1 has joined #dri-devel
frankbinns is now known as Guest2512
frankbinns1 is now known as frankbinns
jkrzyszt has joined #dri-devel
sarnex has joined #dri-devel
Guest2512 has quit [Ping timeout: 480 seconds]
imre has quit [Ping timeout: 480 seconds]
itoral_ has joined #dri-devel
itoral has quit [Ping timeout: 480 seconds]
dorcaslitunya has quit [Remote host closed the connection]
dorcaslitunya has joined #dri-devel
dorcaslitunya has quit [Remote host closed the connection]
<robmur01>
marex: you only need to bother with the IOMMU API if you care about managing the address space and exactly *where* buffers are mapped in the device view
<robmur01>
otherwise just use drm_gem_dma_helpers and it all simply happens by magic
<robmur01>
(the iommu-dma layer can't strictly *guarantee* to linearise any given scatterlist, but it does try its best)
warpme has quit []
itoral__ has joined #dri-devel
itoral_ has quit [Remote host closed the connection]
itoral__ has quit [Remote host closed the connection]
<pq>
SIGBUS would be bad, if you expect userspace to not crash on the spot.
warpme has joined #dri-devel
<HdkR>
Oh, it looks like Anv and Iris compiles on AArch64 now?
<HdkR>
Might need to get a Battlemage GPU for testing
<airlied>
Arc you mean
<HdkR>
Well, I'm in no rush so I can wait for Battlemage :P
<psykose>
still surprised they actually named it that because it's a cool name
lynxeye has joined #dri-devel
flynnjiang has quit [Remote host closed the connection]
itoral_ has joined #dri-devel
<HdkR>
Using DND class names sorted alphabetically is cute
itoral__ has quit [Ping timeout: 480 seconds]
<pepp>
DemiMarie: my coworkers are aware of how Xen works and figured out a way to expose GPU memory to the guest correctly
itoral__ has joined #dri-devel
<pepp>
DemiMarie: but I don't know much about the details of the implementation
tchar__ is now known as tchar
<tchar>
airlied: can you elaborate on the issue you are seeing? That code is working around some firmware bugs, so it's a bit cursed.
<tchar>
Oh, I see there's a new FFmpeg branch, I'll give it a spin
itoral_ has quit [Ping timeout: 480 seconds]
<airlied>
tchar: I'm not sure the new ffmpeg is right either
<airlied>
the main problem is around how many reference frames/slots we need to send
<airlied>
I'm not sure how to fill out ref_frame_map properly
<airlied>
like if we have 7 frame refs pointing at 2 references, I'm not sure what ref_frame_map needs to contain
<airlied>
if we only send two dpb references vs the old code which sends 8 slots with the same image in different indices
simondnnsn has quit [Read error: Connection reset by peer]
ninjaaaaa has quit [Write error: connection closed]
simondnnsn has joined #dri-devel
<airlied>
so the old code and my hacks that work send referenceSlotCount = 8 pretty much always
<airlied>
and in those 8 there are repeated slot indices
<airlied>
and that fills out ref_frame_map all properly
<airlied>
but if we don't send 8, but instead only send say 2, I'm having trouble working out the ref_frame_map contents that will work
itoral_ has joined #dri-devel
ninjaaaaa has joined #dri-devel
<airlied>
I'll try and attack it again tomorrow when I've had a coffee to see if I can at least work out a good description :-P
bolson has quit [Remote host closed the connection]
pcercuei has joined #dri-devel
itoral__ has quit [Read error: No route to host]
itoral__ has joined #dri-devel
surajkandpal has quit [Ping timeout: 480 seconds]
davispuh has joined #dri-devel
itoral__ has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral_ has quit [Ping timeout: 480 seconds]
itoral_ has joined #dri-devel
<tchar>
airlied: yeah, you're right that the intention is only to send 2 dpb references in pReferenceSlots in that case, and you use the referenceNameSlotIndices to convey the "duplicates"
<tchar>
i'll see if I can spot any issue in the meantime
itoral has quit [Ping timeout: 480 seconds]
manospitsid has joined #dri-devel
jsa has quit [Ping timeout: 480 seconds]
itoral__ has joined #dri-devel
itoral_ has quit [Ping timeout: 480 seconds]
itoral__ has quit [Remote host closed the connection]
pepp has quit [Ping timeout: 480 seconds]
jsa has joined #dri-devel
pepp has joined #dri-devel
kts has joined #dri-devel
rasterman has joined #dri-devel
sgruszka_ has joined #dri-devel
sgruszka_ has quit []
kts has quit [Ping timeout: 480 seconds]
ninjaaaaa has quit [Read error: Connection reset by peer]
simondnnsn has quit [Read error: Connection reset by peer]
<Calandracas>
"Other graphics APIs (Vulkan, OpenCL) are not supported at this time."
randevouz has joined #dri-devel
<Calandracas>
Neat, i have hardware for all supported drivers
kts has quit [Quit: Konversation terminated!]
Daanct12 has quit [Quit: WeeChat 4.2.1]
yyds_ has quit [Remote host closed the connection]
simondnnsn has quit [Remote host closed the connection]
ninjaaaaa has quit [Read error: Connection reset by peer]
<Calandracas>
rusticl is super awesome and a complete game changer. Now I can do OpenCL development on my pinebookpro
ninjaaaaa has joined #dri-devel
simondnnsn has joined #dri-devel
Company has joined #dri-devel
Company has quit []
zxrom has joined #dri-devel
<randevouz>
Yeah the basic arithmetic with all the list/seq/predicate functionality is in the core of w3c libraries, that confused me , so disambiguation there, it has unlicensed github project, turtle is subset of n3 https://ruby-rdf.github.io/rdf-n3/RDF/N3/Algebra/Math/Negation.html, but i am lost as i got tired. Not sure if i have to write my own stack, needs testing. And they have no logarithms . Fell a sleep yesterday before inspecting the
<randevouz>
compression or doing any testing.
<aleasto>
is there an environment variable to disable the egl zink fallback if it breaks an app?
<zmike>
no
<aleasto>
sad
sgruszka has quit [Ping timeout: 480 seconds]
dogukan has joined #dri-devel
<agd5f>
airlied, we have a page table too, but we don't use it.
zxrom has quit []
<bcheng>
airlied: filling out referenceSlotCount = 8, with repeated slotIndex is illegal I believe.
<MrCooper>
aleasto: LIBGL_ALWAYS_SOFTWARE=1 ?
<bcheng>
airlied: the problem that ref_frame_map filling code is working around is that the vulkan api only provides references used by the frame, not all 8 codec defined slots. But the FW was designed for other APIs which are always given the state of the codec DPB, in which case if a frame stops being seen in ref_frame_map, the FW will drop the metadata for that slot
<bcheng>
the code is trying to fill into ref_frame_map, the real references first, then as a workaround, fill in the slots that were not specified by the app, in order for the FW to keep the metadata alive
dorcaslitunya has joined #dri-devel
<mripard>
we're moving the drm-misc repo to gitlab, expect disruption for some time
hansg has quit [Remote host closed the connection]
<Lynne>
on nvidia, it crashes in vkCreateVideoSessionParametersKHR, which is very weird
fab has joined #dri-devel
<Lynne>
right, fixed, I forgot that the spec forbids empty av1 session params, which we use for flushing the decoder on startup
<Lynne>
now both drivers crash during queue submission (though with radv, the kernel outright rejects the CS)
hansg has joined #dri-devel
yyds has joined #dri-devel
<MrCooper>
jfalempe: AFAIK a cache flush doesn't flush WC buffers
randevouz has quit [Remote host closed the connection]
randevouz has joined #dri-devel
heat has joined #dri-devel
warpme has quit []
<marex>
robmur01: ack, thanks for the input. I need to dig into this first, then I'll come back (or probably won't, because this seems like a perfect fit for my purposes)
<mareko>
DemiMarie: one of the Xen architects now works for AMD
<DemiMarie>
pepp: does it support dGPUs or only iGPUs?
frieder has quit [Read error: Connection reset by peer]
<Lynne>
airlied: with f3ab454f0 reverted in mesa, it sorta works, I get an image!
<Lynne>
it's not correct, but it's a start
<pepp>
DemiMarie: both
<DemiMarie>
pepp: does it handle paging of GPU memory and resizeable BAR?
<pepp>
DemiMarie: I don't ReBAR is supported. If "paging" means "moving memory around", then yes
<pepp>
I don't *think* ReBAR is supported
<DemiMarie>
No ReBAR could be a significant problem for broader use.
<pepp>
DemiMarie: I don't see why?
<DemiMarie>
pepp: because some GPU drivers simply require it
<DemiMarie>
Intel Arc and NVK for example
hansg has quit [Read error: No route to host]
warpme has joined #dri-devel
<mareko>
it's open source, people can change it
yyds has quit [Remote host closed the connection]
jsa has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
<DemiMarie>
mareko: my concern is that Xen-specific code in Intel and Nvidia drivers will receive no testing in upstream CI, and users will be left with bugs that are nigh-impossible to track down, much less fix.
dogukan has quit [Remote host closed the connection]
kts has joined #dri-devel
<DemiMarie>
pepp: do AMD GPUs have ReBAR?
warpme has quit []
cmichael has quit [Quit: Leaving]
<pepp>
DemiMarie: yes.
cmichael has joined #dri-devel
<mareko>
DemiMarie: that's what SW development is for
<tchar>
I'm looking now at the rvp / rav mgmt, to try and better match what the spec ended up on
<Lynne>
tchar: yes, all are fixed
tzimmermann has quit [Quit: Leaving]
hansg has joined #dri-devel
sravn has joined #dri-devel
rasterman has joined #dri-devel
dorcaslitunya has quit [Remote host closed the connection]
dorcaslitunya has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
junaid has joined #dri-devel
ungeskriptet is now known as Guest2550
ungeskriptet has joined #dri-devel
ninjaaaaa has quit [Ping timeout: 480 seconds]
simondnnsn has quit [Ping timeout: 480 seconds]
Guest2550 has quit [Ping timeout: 480 seconds]
lynxeye has quit [Quit: Leaving.]
ninjaaaaa has joined #dri-devel
simondnnsn has joined #dri-devel
simondnnsn has quit [Read error: Connection reset by peer]
ninjaaaaa has quit [Read error: Connection reset by peer]
ninjaaaaa has joined #dri-devel
dorcaslitunya has quit [Remote host closed the connection]
dorcaslitunya has joined #dri-devel
simondnnsn has joined #dri-devel
<DemiMarie>
pepp: would it be reasonable to write an email to both xen-devel and dri-devel to try to get this all sorted out?
<airlied>
bcheng: i do wonder shouldnt we according to the spec but the complete dpb state in begin video coding?
<DemiMarie>
mareko: Neither I nor anyone else at Invisible Things Lab is a full-time GPU driver developer, and even if I was, I don’t have the resources to do the kind of testing that Intel and AMD do. If the same APIs that work outside of Xen also work under Xen, then Xen users benefit from the non-Xen testing for free.
<DemiMarie>
mareko: Also, there are quite a few additional hypervisors, mostly in the embedded space. Having GPU drivers need to know about a hypervisor is a giant layering violation, IMO.
<DemiMarie>
The GPU driver should just call the appropriate kernel APIs, and it should be the responsibility of the hypervisor interface to ensure that everything just works.
cmichael has quit [Quit: Leaving]
alyssa has joined #dri-devel
<alyssa>
mareko: i'm planning to rereview opt_varyings today or tomorrow
<alyssa>
fwiw
fab has quit [Quit: fab]
<eric_engestrom>
karolherbst: marge's timeout starts when it finishes pushing the branch, and then gitlab starts processing, figures out it needs to create a pipeline and how, and then the pipeline is created. Depending on the load on gitlab, I've seen this take up to a couple of minutes, so I'm guessing this is where the extra 25+ seconds you are missing went
<eric_engestrom>
it's really unfortunate though, and we've been discussing to try to find a way to address these "the pipeline is almost finished, please give me X extra time instead of the normal timeout" but we don't have a solution yet
<eric_engestrom>
(also, there's the obvious problem of abuse of extra time, but I believe this is a social problem that will require a social solution)
<tchar>
airlied: isn't that currently the case? the complete dpb state being the current frame and all its unique dependent frames. The idea of putting the whole "VBI" in the API was rejected
<alyssa>
eric_engestrom: I mean... given the 60m timeout and the 25m expected worst time, if we're hitting timeouts stuffs already on fire, so engineering a "please give me extra time" mechanism doesn't seem too necessary?
<eric_engestrom>
agreed, but until those who take too long stop taking too long, the users are left with nothing but the "womp womp, try again later" reassign button
<jenatali>
A "please give me 5 more minutes" seems better than "oh well, guess I'll have to take another ~hour by starting from scratch later"
<tchar>
just in case it is of any use... I will take another look my tomorrow, I didn't figure out how to handle the frame_id wrapping yet
<alyssa>
jenatali: what I mean is, we're defaulting to "please give me 35 more minutes" on every pipeline
<alyssa>
and if that's not enough... things are really on fire!
<jenatali>
Oh I agree, but borrowing 5 minutes now can help get things back in shape sooner
Leopold__ has quit [Ping timeout: 480 seconds]
dorcaslitunya has quit [Remote host closed the connection]
<DemiMarie>
airlied (and others): if a `FOLL_LONGTERM` pin of VRAM succeeds, does that mean there is a kernel bug?
<airlied>
DemiMarie: just means the bar pages are pinned, doesn't mean what is behind the pages is
<airlied>
agd5f: yeah that was my understanding, it would definitely relieve having to move BOs to use that page table, but also with rebar it's probably not much point
<DemiMarie>
airlied: what does that mean in practice? My understanding is that other subsystems (such as RDMA) also use `FOLL_LONGTERM` pins, and I don’t know if RDMA can handle faults.
<airlied>
DemiMarie: they don't usually have stuff in device memory space
<DemiMarie>
airlied: what happens if userspace tries?
<DemiMarie>
Suppose userspace maps some VRAM with vkMapMemory and passes the resulting address to an RDMA verb.
Leopold has joined #dri-devel
<airlied>
no idea
<airlied>
tchar: interesting, definitely looks bigger than I'd considered
<airlied>
Lynne: ^ you might want to start taking a look
<DemiMarie>
would this be considered a userspace bug?
<airlied>
probably, unless the kernel oops
<DemiMarie>
Ouch
<DemiMarie>
Does this mean that GPU acceleration under Xen will require either per-driver patches or recoverable page fault support in Xen?
agd5f has quit [Read error: No route to host]
<airlied>
don't know maybe ask Xen developers
<Lynne>
tchar: would you mind using my code
<Lynne>
there are a lot of incomplete stuff in the branches of both of you
<DemiMarie>
airlied: I am going to ask them, but first I need to know what the GPU drivers require.
<DemiMarie>
From the kernel perspective, not Xen's.
<zamundaaa[m]>
I've recently hit a GPU reset, which happened while a pageflip from the compositor was pending - because of the reset, that pageflip never happened and timed out.
<zamundaaa[m]>
When I tried to work around that timeout in KWin, the result of the next commit was EBUSY because a pageflip was still pending on the kernel side...
<zamundaaa[m]>
Afaict, there's no way for the kernel to signal to userspace that a commit failed. So when this happens, could / should the kernel signal the pageflip as completed, and allow commits to happen again?
<zamundaaa[m]>
Because the GPU reset itself seemed successful, after restarting the compositor everything worked fine again
<DemiMarie>
airlied: I think one might be able to trigger an oops with vmsplice.
Haaninjo has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
hansg has quit [Quit: Leaving]
tobiasjakobi has joined #dri-devel
agd5f has joined #dri-devel
tobiasjakobi has quit []
<airlied>
Lynne: I've got it to decode properly by using the API illegally with my code
<airlied>
but I'll rebase on yours to see if I can hack it
rasterman has joined #dri-devel
<airlied>
Lynne: your branch has inconsistent loop restoration
<airlied>
at least with radv, but I'm not sure where the spec ended up
<airlied>
removing dedup fixes rendering for me on radv
<airlied>
if I comment out tchar's assert
urja has joined #dri-devel
agd5f_ has joined #dri-devel
agd5f_ has quit [Remote host closed the connection]
agd5f_ has joined #dri-devel
agd5f_ has quit []
<DemiMarie>
airlied: I see. Does that mean that P2PDMA between an RNIC and a GPU isn’t currently supported upstream?
agd5f has quit [Ping timeout: 480 seconds]
<airlied>
DemiMarie: probably depends on the gpu driver cooperating
pzanoni has quit [Ping timeout: 480 seconds]
<Lynne>
airlied: the loop restoration unit sizes?
<Lynne>
they're supposed to be log2, but the spec misnamed them, there's a spec fix that should've been merged already
heat is now known as Guest2558
Guest2558 has quit [Remote host closed the connection]
<DemiMarie>
airlied: Same cooperation I need, as it turns out.
<DavidHeidelberg>
details: in general it's wine tests adapted to run directly on Linux + gallium-nine with GTest suite (for deqp-runner integration). Very small, much fast, much wow. Ofc MIT/LGPL licensed.
mvlad has quit [Remote host closed the connection]
<Lynne>
tchar: do you think frame_id is still currently incorrect?
<alyssa>
DavidHeidelberg: been a while since i saw a doge meme. nice :3
greenjustin_ has joined #dri-devel
pzanoni has joined #dri-devel
greenjustin has quit [Ping timeout: 480 seconds]
greenjustin has joined #dri-devel
greenjustin_ has quit [Read error: Connection reset by peer]
Haaninjo has quit [Quit: Ex-Chat]
<tchar>
Lynne: it's possible I was mishandling them in my exploratory patch, it looks like the av1dec layer was taking care of the values not going over the maxDpbSlots value (9)
<tchar>
where's the latest working stuff in terms of branches atm? I will test here in the morning too
<Lynne>
passes superres, film grain (not on intel), weird invalid files we found crashes on with the old extension
<Lynne>
it should be on par with the old extension
<Lynne>
airlied: it still has the same 8/10bit flickering issues as before (and 10bit hevc), you should talk to jkqxz again
<DemiMarie>
robclark: are there any Chromebooks with discrete GPUs?
glennk has quit [Ping timeout: 480 seconds]
<airlied>
Lynne: cool, I really have to figure out how to reproduce that here
<airlied>
do you see it on both navi2x and navi3x?
<Lynne>
only 3x
<Lynne>
not swapchain related, I've seen it in ffmpeg
<Lynne>
latest status was that jkqxz thought that the 10-8 conversion was activated by uninitialized structs which radv wasn't using or filling in (but you said they are)
<Lynne>
we did try zeroing every single piece of memory manually, but that didn't work, only RADV_DEBUG=zerovram helps
rasterman has quit [Quit: Gettin' stinky!]
<bcheng>
Lynne: is there a sample clip?
<bcheng>
I can check on my end
<Lynne>
no sample clip because I've been able to replicate with everything with the same consistency
<bcheng>
just any random av1/hevc 10 bit clip?
<bcheng>
do you do -vf "format=nv12" or something to get ffmpeg to do a 10-8 conversion?
<Lynne>
no no, any bit depth for av1, but for hevc, only 10bit
<Lynne>
no conversion either, just output whatever the file's pixel format is
<Lynne>
happens randomly, more frequently that rarely, less frequently that often, and if a single process does RADV_DEBUG=zerovram, globally it stops happening for a random amount of time
<robclark>
DemiMarie: currently no
<DemiMarie>
robclark: so right now a major decision to be made is whether to pin all VRAM shared with guests.
<Lynne>
bcheng: I recommend using my branch of ffmpeg, patching mpv with https://0x0.st/HhD5.diff to use the new extension, and opening and closing the same clip until it happens
<DemiMarie>
With iGPUs this is a non-issue, with dGPUs it is a significant concern.
<bcheng>
Lynne: thanks, will see if I can replicate