#panfrost on 2023-01-17 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard + Bifrost + Valhall - Logs https://oftc.irclog.whitequark.org/panfrost - I don't know anything about WSI. That's my story and I'm sticking to it.

00:04 kinkinkijkin has quit [Read error: Connection reset by peer]

01:18 kinkinkijkin has joined #panfrost

02:05 alyssa has joined #panfrost

02:06 <alyssa> bbrezillon: stepri01: Do we have a way for userspace to allocate GPU memory that's cached on the CPU (rather than WC)?

02:06 <alyssa> i.e. BASE_MEM_CACHED_CPU, I think

02:11 <alyssa> seemingly panfrost_gem_create_object maps as WC if the device is not coherent

02:11 <alyssa> if the device is coherent, then we get cached

02:11 <alyssa> so then my question is, who is coherent, and was that a heuristic or a rule

02:13 <alyssa> grepping the arm64 folder, no mediatek is coherent

02:13 <alyssa> no rockchip

02:13 <alyssa> meson-g12b *is* coherent

02:13 <alyssa> (on the Mali!)

02:14 <alyssa> so noneo f the devices I care about are coherent

02:14 <alyssa> which is why I'm getting WC mappings

02:14 <alyssa> I guess I should write the easy patch to let userspace override that

02:15 <alyssa> because we *do* have enough info in userspace to make decent decisions about when to cache GPU memory

02:15 <alyssa> and we *are* getting bit by getting uncached memory when we don't expect it

02:32 DVulgaris has quit [Ping timeout: 480 seconds]

02:41 DVulgaris has joined #panfrost

04:22 Dr_Who has quit []

04:26 Dr_Who has joined #panfrost

04:35 stipa is now known as Guest1551

04:36 stipa has joined #panfrost

04:41 Guest1551 has quit [Ping timeout: 480 seconds]

05:01 Leopold_ has quit [Remote host closed the connection]

05:04 Dr_Who has quit []

05:07 Leopold_ has joined #panfrost

07:54 rasterman has joined #panfrost

08:19 <bbrezillon> alyssa: creating CPU-cached mapping is doable, even on non-coherent systems (etnaviv allows it), you just need to provide new ioctls to flush/invalidate caches (see ETNAVIV_GEM_CPU_{PREP,FINI})

08:25 <bbrezillon> and then surround userspace accesses to such cached BOs with CPU_PREP+FINI calls

08:34 <bbrezillon> it's probably safer to keep BOs you intend to export uncached, because the importer has to take care of flushing/invalidating the cache too. There's DMA_BUF_IOCTL_SYNC for that, and any BO you re-import to panfrost can have CPU_PREP/FINI hooked up for those, so, in theory, if all actors do the right thing it's safe, but I don't know if that's the case in practice.

09:28 paulk-ter has joined #panfrost

09:30 paulk-bis has quit [Ping timeout: 480 seconds]

11:32 Leopold_ has quit [Remote host closed the connection]

11:33 Leopold has joined #panfrost

12:24 <stepri01> alyssa: bbrezillon's answers look right to me. In some cases it's actually better for user space to generate the data in a shadow buffer and then memcpy it into the WC buffer for the GPU (or similarly memcpy out for reading).

12:25 <stepri01> It also can vary somewhat from platform to platform the actual performance, which makes optimising tricky.

12:25 <stepri01> Patches welcome ;)

13:28 Daanct12 has quit [Remote host closed the connection]

13:29 Dr_Who has joined #panfrost

15:22 <italove> is there a good way to debug `Access to unknown memory a01e140 in ../src/panfrost/lib/genxml/decode.c:1148` ?

15:48 <alyssa> bbrezillon: oh, grumble.....

15:49 <alyssa> on Apple it sufficed to just set `map_wc = false`

15:49 <alyssa> but I guess that means Apple chips are system-coherent

15:52 <alyssa> stepri01: The case I'm interested in is accelerating reads from GPU memory

15:52 <alyssa> for a few predictable cases

16:08 kinkinkijkin has quit [Quit: Leaving]

16:16 rasterman has quit [Quit: Gettin' stinky!]

16:47 atler is now known as Guest1629

16:47 atler has joined #panfrost

16:47 guillaume_g has quit []

16:48 Guest1629 has quit [Ping timeout: 480 seconds]

17:25 stipa has quit [Ping timeout: 480 seconds]

17:46 alyssa has quit [Quit: leaving]

18:30 stipa has joined #panfrost

18:35 alyssa has joined #panfrost

18:35 <alyssa> ok, yes, I see that mt8192 is very much not coherent

18:39 <robclark> alyssa: fwiw, https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20550/diffs?commit_id=b177842ec7e443154eed698dcf9e049faa1c8ab9

18:40 <alyssa> robclark: oof.

18:40 <alyssa> maybe I don't want to read GPU memory so badly after all ...

18:41 <robclark> if you can get away with only supporting cached on aarch64 you don't need uabi for cache ops (although maybe you need uabi when allocating the buffer in the first place)

18:42 <alyssa> Interesting

18:42 <alyssa> I see etnaviv has dma_sync_sgtable_* calls

18:42 <alyssa> But obviously if I can avoid the extra ioctls I would like to

18:42 <alyssa> what makes arm64 different from arm32?

18:43 <robclark> cache ops via dma-api are a bit less great for msm because of how the iommu integration works.. I guess most drivers can just use dma_sync_*

18:43 <alyssa> but I don't really have to care about arm32 as long as I don't regress it, so happy to keep the new fancy as arm64 only

18:44 <robclark> That was my conclusion too ;-)

18:46 stipa has quit [Ping timeout: 480 seconds]

18:46 <alyssa> :-D

19:11 avane has quit [Ping timeout: 480 seconds]

19:15 avane has joined #panfrost

20:25 stipa has joined #panfrost

21:08 <HdkR> good news, FEX also only cares about arm64 ;P

22:50 karolherbst has quit [Remote host closed the connection]

23:04 karolherbst has joined #panfrost

23:21 stipa has quit [Quit: leaving]

23:21 stipa has joined #panfrost

23:22 stipa has quit []

23:22 stipa has joined #panfrost