ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard + Bifrost + Valhall - Logs - I don't know anything about WSI. That's my story and I'm sticking to it.
kinkinkijkin has quit [Read error: Connection reset by peer]
kinkinkijkin has joined #panfrost
alyssa has joined #panfrost
<alyssa> bbrezillon: stepri01: Do we have a way for userspace to allocate GPU memory that's cached on the CPU (rather than WC)?
<alyssa> i.e. BASE_MEM_CACHED_CPU, I think
<alyssa> seemingly panfrost_gem_create_object maps as WC if the device is not coherent
<alyssa> if the device is coherent, then we get cached
<alyssa> so then my question is, who is coherent, and was that a heuristic or a rule
<alyssa> grepping the arm64 folder, no mediatek is coherent
<alyssa> no rockchip
<alyssa> meson-g12b *is* coherent
<alyssa> (on the Mali!)
<alyssa> so noneo f the devices I care about are coherent
<alyssa> which is why I'm getting WC mappings
<alyssa> I guess I should write the easy patch to let userspace override that
<alyssa> because we *do* have enough info in userspace to make decent decisions about when to cache GPU memory
<alyssa> and we *are* getting bit by getting uncached memory when we don't expect it
DVulgaris has quit [Ping timeout: 480 seconds]
DVulgaris has joined #panfrost
Dr_Who has quit []
Dr_Who has joined #panfrost
stipa is now known as Guest1551
stipa has joined #panfrost
Guest1551 has quit [Ping timeout: 480 seconds]
Leopold_ has quit [Remote host closed the connection]
Dr_Who has quit []
Leopold_ has joined #panfrost
rasterman has joined #panfrost
<bbrezillon> alyssa: creating CPU-cached mapping is doable, even on non-coherent systems (etnaviv allows it), you just need to provide new ioctls to flush/invalidate caches (see ETNAVIV_GEM_CPU_{PREP,FINI})
<bbrezillon> and then surround userspace accesses to such cached BOs with CPU_PREP+FINI calls
<bbrezillon> it's probably safer to keep BOs you intend to export uncached, because the importer has to take care of flushing/invalidating the cache too. There's DMA_BUF_IOCTL_SYNC for that, and any BO you re-import to panfrost can have CPU_PREP/FINI hooked up for those, so, in theory, if all actors do the right thing it's safe, but I don't know if that's the case in practice.
paulk-ter has joined #panfrost
paulk-bis has quit [Ping timeout: 480 seconds]
Leopold_ has quit [Remote host closed the connection]
Leopold has joined #panfrost
<stepri01> alyssa: bbrezillon's answers look right to me. In some cases it's actually better for user space to generate the data in a shadow buffer and then memcpy it into the WC buffer for the GPU (or similarly memcpy out for reading).
<stepri01> It also can vary somewhat from platform to platform the actual performance, which makes optimising tricky.
<stepri01> Patches welcome ;)
Daanct12 has quit [Remote host closed the connection]
Dr_Who has joined #panfrost
<italove> is there a good way to debug `Access to unknown memory a01e140 in ../src/panfrost/lib/genxml/decode.c:1148` ?
<alyssa> bbrezillon: oh, grumble.....
<alyssa> on Apple it sufficed to just set `map_wc = false`
<alyssa> but I guess that means Apple chips are system-coherent
<alyssa> stepri01: The case I'm interested in is accelerating reads from GPU memory
<alyssa> for a few predictable cases
kinkinkijkin has quit [Quit: Leaving]
rasterman has quit [Quit: Gettin' stinky!]
atler is now known as Guest1629
atler has joined #panfrost
guillaume_g has quit []
Guest1629 has quit [Ping timeout: 480 seconds]
stipa has quit [Ping timeout: 480 seconds]
alyssa has quit [Quit: leaving]
stipa has joined #panfrost
alyssa has joined #panfrost
<alyssa> ok, yes, I see that mt8192 is very much not coherent
<alyssa> robclark: oof.
<alyssa> maybe I don't want to read GPU memory so badly after all ...
<robclark> if you can get away with only supporting cached on aarch64 you don't need uabi for cache ops (although maybe you need uabi when allocating the buffer in the first place)
<alyssa> Interesting
<alyssa> I see etnaviv has dma_sync_sgtable_* calls
<alyssa> But obviously if I can avoid the extra ioctls I would like to
<alyssa> what makes arm64 different from arm32?
<robclark> cache ops via dma-api are a bit less great for msm because of how the iommu integration works.. I guess most drivers can just use dma_sync_*
<alyssa> but I don't really have to care about arm32 as long as I don't regress it, so happy to keep the new fancy as arm64 only
<robclark> That was my conclusion too ;-)
stipa has quit [Ping timeout: 480 seconds]
<alyssa> :-D
avane has quit [Ping timeout: 480 seconds]
avane has joined #panfrost
stipa has joined #panfrost
<HdkR> good news, FEX also only cares about arm64 ;P
karolherbst has quit [Remote host closed the connection]
karolherbst has joined #panfrost
stipa has quit [Quit: leaving]
stipa has joined #panfrost
stipa has quit []
stipa has joined #panfrost