ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard + Bifrost + Valhall - Logs https://oftc.irclog.whitequark.org/panfrost - I don't know anything about WSI. That's my story and I'm sticking to it.
kinkinkijkin has quit [Read error: Connection reset by peer]
kinkinkijkin has joined #panfrost
alyssa has joined #panfrost
<alyssa>
bbrezillon: stepri01: Do we have a way for userspace to allocate GPU memory that's cached on the CPU (rather than WC)?
<alyssa>
i.e. BASE_MEM_CACHED_CPU, I think
<alyssa>
seemingly panfrost_gem_create_object maps as WC if the device is not coherent
<alyssa>
if the device is coherent, then we get cached
<alyssa>
so then my question is, who is coherent, and was that a heuristic or a rule
<alyssa>
grepping the arm64 folder, no mediatek is coherent
<alyssa>
no rockchip
<alyssa>
meson-g12b *is* coherent
<alyssa>
(on the Mali!)
<alyssa>
so noneo f the devices I care about are coherent
<alyssa>
which is why I'm getting WC mappings
<alyssa>
I guess I should write the easy patch to let userspace override that
<alyssa>
because we *do* have enough info in userspace to make decent decisions about when to cache GPU memory
<alyssa>
and we *are* getting bit by getting uncached memory when we don't expect it
DVulgaris has quit [Ping timeout: 480 seconds]
DVulgaris has joined #panfrost
Dr_Who has quit []
Dr_Who has joined #panfrost
stipa is now known as Guest1551
stipa has joined #panfrost
Guest1551 has quit [Ping timeout: 480 seconds]
Leopold_ has quit [Remote host closed the connection]
Dr_Who has quit []
Leopold_ has joined #panfrost
rasterman has joined #panfrost
<bbrezillon>
alyssa: creating CPU-cached mapping is doable, even on non-coherent systems (etnaviv allows it), you just need to provide new ioctls to flush/invalidate caches (see ETNAVIV_GEM_CPU_{PREP,FINI})
<bbrezillon>
and then surround userspace accesses to such cached BOs with CPU_PREP+FINI calls
<bbrezillon>
it's probably safer to keep BOs you intend to export uncached, because the importer has to take care of flushing/invalidating the cache too. There's DMA_BUF_IOCTL_SYNC for that, and any BO you re-import to panfrost can have CPU_PREP/FINI hooked up for those, so, in theory, if all actors do the right thing it's safe, but I don't know if that's the case in practice.
paulk-ter has joined #panfrost
paulk-bis has quit [Ping timeout: 480 seconds]
Leopold_ has quit [Remote host closed the connection]
Leopold has joined #panfrost
<stepri01>
alyssa: bbrezillon's answers look right to me. In some cases it's actually better for user space to generate the data in a shadow buffer and then memcpy it into the WC buffer for the GPU (or similarly memcpy out for reading).
<stepri01>
It also can vary somewhat from platform to platform the actual performance, which makes optimising tricky.
<stepri01>
Patches welcome ;)
Daanct12 has quit [Remote host closed the connection]
Dr_Who has joined #panfrost
<italove>
is there a good way to debug `Access to unknown memory a01e140 in ../src/panfrost/lib/genxml/decode.c:1148` ?
<alyssa>
bbrezillon: oh, grumble.....
<alyssa>
on Apple it sufficed to just set `map_wc = false`
<alyssa>
but I guess that means Apple chips are system-coherent
<alyssa>
stepri01: The case I'm interested in is accelerating reads from GPU memory
<alyssa>
for a few predictable cases
kinkinkijkin has quit [Quit: Leaving]
rasterman has quit [Quit: Gettin' stinky!]
atler is now known as Guest1629
atler has joined #panfrost
guillaume_g has quit []
Guest1629 has quit [Ping timeout: 480 seconds]
stipa has quit [Ping timeout: 480 seconds]
alyssa has quit [Quit: leaving]
stipa has joined #panfrost
alyssa has joined #panfrost
<alyssa>
ok, yes, I see that mt8192 is very much not coherent
<alyssa>
maybe I don't want to read GPU memory so badly after all ...
<robclark>
if you can get away with only supporting cached on aarch64 you don't need uabi for cache ops (although maybe you need uabi when allocating the buffer in the first place)
<alyssa>
Interesting
<alyssa>
I see etnaviv has dma_sync_sgtable_* calls
<alyssa>
But obviously if I can avoid the extra ioctls I would like to
<alyssa>
what makes arm64 different from arm32?
<robclark>
cache ops via dma-api are a bit less great for msm because of how the iommu integration works.. I guess most drivers can just use dma_sync_*
<alyssa>
but I don't really have to care about arm32 as long as I don't regress it, so happy to keep the new fancy as arm64 only
<robclark>
That was my conclusion too ;-)
stipa has quit [Ping timeout: 480 seconds]
<alyssa>
:-D
avane has quit [Ping timeout: 480 seconds]
avane has joined #panfrost
stipa has joined #panfrost
<HdkR>
good news, FEX also only cares about arm64 ;P
karolherbst has quit [Remote host closed the connection]