crabbedhaloablut has quit [Ping timeout: 480 seconds]
camus has quit [Ping timeout: 480 seconds]
benjaminl has joined #dri-devel
pallavim has quit [Ping timeout: 480 seconds]
test has joined #dri-devel
<RAOF>
Hm. Poking around our multi-GPU output support code has me once again annoyed at the "import EGLImage -> render to a fullscreen rect" pipeline that is required to migrate an external dmabuf into scanoutable memory.
<RAOF>
It looks like gbm_bo_import almost does enough to render this unnecessary, but USE_SCANOUT doesn't actually change behaviour, just checks afterwards whether the bo is scanoutable.
<RAOF>
It looks like all the infrastructure is there to make gbm_bo_import do a blit into scanoutable memory if it needed to.
<RAOF>
But perhaps a better API would take an in_fence and associate an out_fence with the gbm_bo?
camus has joined #dri-devel
<RAOF>
Is that something that would be vaguely sane to implement? Something like an ALLOW_MIGRATION flag to gbm_bo_import, valid with GBM_BO_IMPORT_FD_MODIFIER (maybe _FENCE)?
camus has quit []
camus has joined #dri-devel
bbrezill1 has joined #dri-devel
bbrezillon has quit [Ping timeout: 480 seconds]
kts has quit [Quit: Konversation terminated!]
<DemiMarie>
Is Nouveau known to be buggy and crashy in general?
<DemiMarie>
airlied: is that because the performance was so horrible nobody considered it to be worth working on?
<DemiMarie>
and is this expected to improve in the future?
<airlied>
maybe, and maybe? :)
<airlied>
there's a lot of nvidia hw, and lots of quirks and lots of corner cases
smilessh has quit [Ping timeout: 480 seconds]
<airlied>
along with a lack of developer manpower
<psykose>
something something 'nvk+zink is the future' etc
<DemiMarie>
How much of that will be dealt with by the GSP firmware in the future?
<airlied>
depends on what is causing the crashes :)
<airlied>
but most of the lower level hw stuff should be, the GL driver not so much
<DemiMarie>
I ask because a Qubes user just reported a nouveau crash and my recommendation was to disable the nvidia card if they could get away with it.
<DemiMarie>
Is the plan to use NVK + Zink?
<psykose>
i don't think there is any "plan" you can count on
<DemiMarie>
These are kernel crashes
<airlied>
recent kernels did see a bunch of stabilisation work, so the kernel recovered better from userspace screwing up
<airlied>
but not sure what the crash is to classify it
Piraty has quit [Remote host closed the connection]
Piraty has joined #dri-devel
<daniels>
sgruszka: sure, done
<sgruszka>
daniels: thank you very much!
test has quit [Ping timeout: 480 seconds]
bbhtt has joined #dri-devel
<daniels>
np
bbhtt has quit []
Leopold_ has quit [Remote host closed the connection]
Leopold_ has joined #dri-devel
dtmrzgl has quit []
dtmrzgl has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
jkrzyszt has joined #dri-devel
<karolherbst>
jenatali: ever looked at CL subgroups, specifically how one would implement the SubgroupsPerWorkgroupId part?
lynxeye has quit [Ping timeout: 480 seconds]
<jenatali>
No, I've done Vulkan subgroups but that's it
<karolherbst>
I'm inclined to just ignore SubgroupsPerWorkgroupId for now...
frieder has quit [Ping timeout: 480 seconds]
<karolherbst>
mhhh.. uhh...
<karolherbst>
I'm stupid... ignore me :D
<karolherbst>
we should parse those things in vtn and not clc
aravind has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.6]
macromorgan has quit [Remote host closed the connection]
macromorgan has joined #dri-devel
frieder has joined #dri-devel
lynxeye has joined #dri-devel
apinheiro has quit [Quit: Leaving]
lemonzest has joined #dri-devel
heat_ has quit [Remote host closed the connection]
heat_ has joined #dri-devel
frieder has quit [Ping timeout: 480 seconds]
agd5f has quit [Read error: Connection reset by peer]
<DemiMarie>
karolherbst: this sounds like something is corrupting kernel memory via DMA. I recommend forcing everything to use shadow buffers and seeing if anything that the device should not be writing to gets written to.
<karolherbst>
DemiMarie: I even boot with iommu enabled
<karolherbst>
and there have been iommu related bugs at least.. not sure if a GPU could just get around it or if we map memory we shouldn't?
frieder has joined #dri-devel
agd5f has joined #dri-devel
<robmur01>
karolherbst: if the IOMMU is actually being used for DMA and you're not seeing any faults, I'd be inclined to suspect a bug the other way round
<robmur01>
i.e. something in the driver on the CPU side assumes a DMA address ( == arbitrary IOMMU VA) is a physical address and ends up stomping the wrong memory that way
<karolherbst>
yeah..... dunno really
<karolherbst>
I just wished this would be easier to debug...
<robmur01>
assuming the GPU supports >32-bit DMA, I'd try booting with "iommu.forcedac=1" and seeing if that makes any obvious difference
oneforall2 has quit [Remote host closed the connection]
<robmur01>
(that will move DMA addresses up to larger IOVAs that are less likely to collide with physical memory and more likely to make bugs of that type blow up visibly)
oneforall2 has joined #dri-devel
jfalempe has quit [Read error: Connection reset by peer]
jfalempe_ has joined #dri-devel
<DemiMarie>
karolherbst: booting with IOMMU enabled isn’t enough.
<DemiMarie>
You need to ensure that IOTLB flushes are not deferred and that sub-page regions at the start or end of a mapping are bounced via shadow buffers.
flibitijibibo has joined #dri-devel
<DemiMarie>
This is not the default because it has a severe performance penalty, but it is necessary when dealing with untrusted devices. A device that may be corrupting memory certainly qualifies, and right now that could be any device in the system.
<karolherbst>
so what kernel parameters do I have to set to figure this out?
<robmur01>
"iommu.strict=1" does the former, I don't think we have a way to force the full-on external-facing port behaviour in the kernel
<robmur01>
you might need a small hack in set_pcie_untrusted() if you want to try that
pallavim has joined #dri-devel
bmodem has quit [Ping timeout: 480 seconds]
frieder has quit [Remote host closed the connection]
<robmur01>
(assuming that the idea of constructing entire ACPI table overrides in your initrd to do it "properly" is unappealing :D)
frieder has joined #dri-devel
<karolherbst>
well.. let's see what happens
smilessh has quit [Read error: Connection reset by peer]
smiles_1111 has joined #dri-devel
itoral has quit [Remote host closed the connection]
fab has quit [Quit: fab]
heat has joined #dri-devel
heat_ has quit [Read error: No route to host]
yuq825 has quit []
macromorgan has quit [Quit: Leaving]
agd5f has quit [Read error: Connection reset by peer]
agd5f has joined #dri-devel
iive has joined #dri-devel
i509vcb has joined #dri-devel
jkrzyszt_ has joined #dri-devel
jkrzyszt has quit [Ping timeout: 480 seconds]
fab has joined #dri-devel
frieder has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
<DemiMarie>
karolherbst: what about adding a command-line option to treat all devices as untrusted? TrenchBoot needs this anyway.
<karolherbst>
I just want to fix the bug I'm having
<DemiMarie>
This should help
<DemiMarie>
If the bug is a rogue DMA transfer, strict IOMMU invalidation will hopefully catch it.
<karolherbst>
yeah... no idea
<DemiMarie>
I suggest trying iommu.strict=1 and seeing if you get any faults.
<karolherbst>
not yet at least
<karolherbst>
but maybe I should remove iommu.forcedac=1 again
<karolherbst>
it's a pain that it might take like 2 hours for the bug to trigger
Duke`` has joined #dri-devel
<DemiMarie>
Is this on i915 or Xe?
<karolherbst>
nouveau
fxkamd has joined #dri-devel
anholt has joined #dri-devel
djbw_ has quit [Read error: Connection reset by peer]
tzimmermann has quit [Quit: Leaving]
heat_ has joined #dri-devel
heat has quit [Read error: No route to host]
benjaminl has joined #dri-devel
jewins has joined #dri-devel
pixelcluster_ has joined #dri-devel
pixelcluster has quit [Ping timeout: 480 seconds]
djbw_ has joined #dri-devel
pochu has quit [Ping timeout: 480 seconds]
sgruszka has quit [Remote host closed the connection]
dfip^ has joined #dri-devel
lynxeye has quit [Quit: Leaving.]
mbrost has joined #dri-devel
<karolherbst>
mhhhhh "[ 559.500131] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI"
<HdkR>
That's a powerful address
<karolherbst>
#define POISON_FREE 0x6b /* for use-after-free poisoning */
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
<karolherbst>
maybe it's just a core kernel bug...
<karolherbst>
maybe I try KASAN again, because now I have a different reproduce which might make it more reliable...
<mareko>
I'm considering making pipe_resource* a 32-bit pointer on x86_64, using fixed high 32 bits
<karolherbst>
what would be the expected perf increase? Kinda doesn't feel worth the effort
<mareko>
the question is how can we reserve/allocate a 4GB range of virtual address space and then make physical allocation within that range
<karolherbst>
mmap
Guest2235 has quit [Ping timeout: 480 seconds]
<karolherbst>
so you cut out a private hole and then manage that region yourself
<karolherbst>
e.g. with util_vma_heap
<karolherbst>
but it kinda doens't feel worth the effort as you will have to concat the address on the consumer side? And the memory savings don't feel like worth the effort?
<karolherbst>
maybe a hash table would be a more robust approach here
<karolherbst>
mhh but then comes with higher mem usages...
<karolherbst>
potentially
<karolherbst>
I think the main question here is, what's the problem which is solved by this
<mareko>
we are going to have buffer lists as array of pointers in TC which are going to be large, and then we get a lot of 64-bit pointers in the gallium interface that we push into TC command buffers
<mareko>
the goal is to save cache space
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
<karolherbst>
alternatively, all pipe_resources are 0x40 aligned, with that the pointer would fit into 5 bytes on most/all consumer platforms
<karolherbst>
ehh.. 6 I guess
<jenatali>
Please keep in mind that Linux isn't the only Gallium consumer here
<karolherbst>
mhh
pixelcluster_ has quit []
pixelcluster has joined #dri-devel
<HdkR>
128-bit pointer ISAs want a word with that assumption :P
<karolherbst>
maybe the solution here is to make all of our pointer hashing thing aware of the VM pointer size at runtime and use optimized caching/storage
<karolherbst>
so if the VM is 48 bit, just use that instead of full 64
<karolherbst>
alignment of the data being hashed _could_ be factored in as well
<karolherbst>
to drop another byte if things align well
junaid has joined #dri-devel
<karolherbst>
but I don't think making the consumption of pipe_resources in general is worth the effort here of making them 32 bit pointers everywhere
<mareko>
only in the interface
<mareko>
you can call them 32-bit handles instead of pointers
<karolherbst>
I think my point is rather, it makes everything more complex with gains we don't know are even worth it, because everybody will have to translate from pointer to handle and back
<mareko>
that seems like the easy part
<DemiMarie>
karolherbst: ugh, _that_ driver. I would not be surprised if the long-term fix is to rewrite it from scratch, preferably in Rust.
<karolherbst>
mareko: I don't say it's hard, I say it's _annoying_
<mareko>
I think it would be _lovely_
<karolherbst>
I'm sure we have other places we can save more memory than that
<mareko>
I think that's the only improvement we can make for TC
<karolherbst>
I meant it more general
<robclark>
mareko: alternative is big table of prsc ptrs with each prsc having a unique idx into the table?
<robclark>
you should then be able to get away with less than 32b to id a prsc
<mareko>
each pipe_resource already has a unique index
<mareko>
a table adds cache usage
<mareko>
and an indirection
<karolherbst>
you already have that by cutting the high bits
<karolherbst>
because you have to store and read those from somewhere anyway
<robclark>
depends on when and where you deref it via the table
<mareko>
that's not an indirection
<karolherbst>
it kinda is
<mareko>
load(load()) is an indirection
<karolherbst>
okay.. so load(load() + offset) isn't?
<DemiMarie>
TC = ?
<karolherbst>
threaded context
<mareko>
more like load(low32 | high32)
<karolherbst>
and where do you get the high32 from?
<karolherbst>
they are not magically there if you need them
<mareko>
L1 cache
<hch12907>
high32 is small enough to not need a load() I think
<DemiMarie>
How about using relative addresses?
<karolherbst>
so you'd pin that value into L1?
<mareko>
I don't follow
<DemiMarie>
Are these CPU or GPU addresses?
<karolherbst>
CPU
<karolherbst>
how do you make sure the high bits stay in L1?
<DemiMarie>
I think the idea is that high32 would be stored in one place and used a lot, so it should be in the L1 cache most of the time.
<karolherbst>
I dunno, this feels like premature optimization without even knowing the gains
<karolherbst>
what if it's <1%
<mareko>
there is a massive perf difference between an indirection table and a single value in memory that's equal for all resources
<karolherbst>
this kinda feels like not worth the effort honestly
<karolherbst>
oh sure
<karolherbst>
but you still have to construct the actual pointer via data in some memory location
<mareko>
no
<DemiMarie>
Why are these large?
<DemiMarie>
Lots of buffer objects?
<mareko>
the compiler will CSE loading the high bits
<karolherbst>
it still has to load the high bits...
<mareko>
there is probably 1000 other loads that have nothing to do with this
<mareko>
it's a win for cache usage
<mareko>
TC braches will be smaller and TC buffer lists will be smaller
<mareko>
*batches
<karolherbst>
if TC wants to do anything smart internally, that's fine, I just say it's not worth the effort making life harder for everybody else
<karolherbst>
I already suggested that high bits could be cut out generally in hash tables where pointers are keys or something
<karolherbst>
and then we don't talk about 64 vs 32 anymore, but more like 48 vs 32
<karolherbst>
could condense it even more if you try hard enough
<karolherbst>
could align allocations to 0x100 or even 0x10000
<karolherbst>
which is still better than cutting high bits
<mareko>
sounds the same to me
<karolherbst>
no, because you can do the magic transparent to everybody
<karolherbst>
if you'd have data showing "this saves us 10%+ memory usage" I'd reconsider, but without data I say it's not worth it
<mareko>
only perf data
<robclark>
mareko: if driver tells TC sizeof(struct drv_pipe_resource) then they could be suballocated out of a big array of drv_pipe_resource
<mareko>
robclark: it needs to be an accessible pointer
<robclark>
then you could easily map btwn idx and ptr.. and not even need special alignement of that block
<robclark>
run but you can easily covert idx to ptr or visa versa via ptr math
<mareko>
ok
<robclark>
and that only touches drivers that do use TC, so at least it isn't treewide
<jenatali>
You'd probably still want to reserve the array instead of committing it upfront and then commit bits of it as you allocate more resources
<robclark>
just need to convert TC drivers to use new allocator for their drv_resource structs
<robclark>
jenatali: yeah, allocate it as a big bunch and rely on unused parts not getting faulted in ?
<jenatali>
At least on Windows, there's a difference between commit and working set. Working set is what you touch, where malloc adds to commit. You can also reserve just VA without any commit backing it
<jenatali>
I don't know too much about Linux in this space
<karolherbst>
kinda the same
<robclark>
yeah, you might want to allocate it via mmap instead of malloc
<mareko>
32-bit pointers seem the least intrusive
<robclark>
with table, you could probably get away w/ 16b
<mareko>
oh we do have much more buffers than 2^16
<karolherbst>
could also split it in two 16 bit values and if we run out of space you just allocate a second table and use the higher bits to offset the table
<karolherbst>
but mmaping space is annoying because than you increase VM usage quite a bit
<mareko>
no tables
<karolherbst>
yeah...
<karolherbst>
I'd rather aligned_alloc all pipe_resources with a suitable alignment
<karolherbst>
so you only have 32 bits used
<karolherbst>
and then you can handle it all inside TC
<karolherbst>
but I guess that has other drawbacks...
<mareko>
yes, sparsely mapped page tables
jkrzyszt has joined #dri-devel
jkrzyszt_ has quit [Ping timeout: 480 seconds]
jkrzyszt has quit [Ping timeout: 480 seconds]
aravind has quit [Ping timeout: 480 seconds]
rauji___ has quit []
OftenTimeConsuming_ has joined #dri-devel
OftenTimeConsuming is now known as Guest2269
Leopold_ has quit [Remote host closed the connection]
OftenTimeConsuming_ is now known as OftenTimeConsuming
Guest2269 has quit [Remote host closed the connection]
<karolherbst>
this is like 5 bugs in one I think..
pochu has joined #dri-devel
pochu has quit []
idr has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
dliviu has quit [Remote host closed the connection]
dliviu has joined #dri-devel
junaid has quit [Remote host closed the connection]
<DavidHeidelberg[m]>
MrCooper: hey! The Fedora changes. You want to keep the custom builds? I mean, they can be always added when new version is needed. What do you think?
Kayden has quit [Quit: -> JF]
ngcortes has joined #dri-devel
tursulin has quit [Ping timeout: 480 seconds]
smiles_1111 has quit [Read error: Connection reset by peer]
alyssa has joined #dri-devel
gouchi has joined #dri-devel
gouchi has quit [Remote host closed the connection]
benjaminl has quit [Ping timeout: 480 seconds]
jewins has quit [Ping timeout: 480 seconds]
vliaskov has quit []
benjaminl has joined #dri-devel
<airlied>
uggh is the ppc64le ci buildroot broken?
<airlied>
oh maybe it was temporary
benjamin1 has joined #dri-devel
JohnnyonFlame has joined #dri-devel
jljusten has quit [Quit: WeeChat 3.8]
<karolherbst>
wasn't there an option in the kernel to trace who was calling a kworker thing?
benjaminl has quit [Ping timeout: 480 seconds]
jewins has joined #dri-devel
jljusten has joined #dri-devel
sima has quit [Ping timeout: 480 seconds]
Kayden has joined #dri-devel
agd5f_ has joined #dri-devel
agd5f has quit [Read error: Connection reset by peer]
fab has quit [Quit: fab]
fab has joined #dri-devel
benjamin1 has quit [Quit: WeeChat 3.8]
peelz has joined #dri-devel
benjaminl has joined #dri-devel
peelz is now known as Guest2279
qyliss has quit [Quit: bye]
qyliss has joined #dri-devel
agd5f has joined #dri-devel
agd5f_ has quit [Read error: Connection reset by peer]
rasterman has quit [Quit: Gettin' stinky!]
junaid has joined #dri-devel
junaid has quit []
Duke`` has quit [Ping timeout: 480 seconds]
junaid has joined #dri-devel
junaid has quit []
junaid has joined #dri-devel
junaid has quit []
junaid has joined #dri-devel
<mareko>
"git log DEAD" in Mesa opens a time machine
<alyssa>
I don't think I realized Emma went that far back =D
Kayden has quit [Quit: coordinate change]
fab has quit [Quit: fab]
<FLHerne>
If you mean the name, isn't that just .mailmap ?
<FLHerne>
if you mean the person, then I didn't either, so my previous line is kind of pointless, sorry
<mareko>
that's because the commit sha1 begins with dead
<mattst88>
oh, hahaha
<idr>
SiS driver. Dang.
Kayden has joined #dri-devel
<gfxstrand>
dcbaker: Little ping on meson + crates. Are we getting to where I should be trying to use a magic meson branch again and drop my mesonified crates?
<gfxstrand>
dcbaker: I'm eyeing some other really nifty crates I'd like to use like enum_map
<dcbaker>
gfxstrand: The plan is to start landing pieces of it. I have a series that adds the core of the translation layer (lexers, parsers, some meson AST builder helpers, etc). I *think* that part will land this week, there's no serious review requests left on it
<gfxstrand>
dcbaker: Cool!
<dcbaker>
Then the plan is to start adding/fixing stuff piece by piece
<dcbaker>
There a gstreamer/gnome dev who's been doing a ton of the followup work already
<dcbaker>
so I'm hopeful by 1.2 we'll have at least some experimental support
<karolherbst>
ohh nice..
<karolherbst>
I want to use syn :3
<gfxstrand>
dcbaker: When is 1.2?
<dcbaker>
there's no set date yet
<dcbaker>
and there's still a lot of stuff that devs want merged that's not even reviewed yet, so...
<gfxstrand>
So, if I'm planning to merge NVK and NAK into Mesa main yet this year, will the stuff I need be in a released Meson version?
<dcbaker>
Usually we have a release ~every two months
<dcbaker>
and the last release was in April...
<dcbaker>
I'd guess 1.2 will freeze by the end of the month
kts has quit [Quit: Konversation terminated!]
<gfxstrand>
Ok, that sounds reasonable.
iive has quit [Quit: They came for me...]
heat__ has joined #dri-devel
heat_ has quit [Read error: Connection reset by peer]
oneforall2 has quit [Remote host closed the connection]