ChanServ changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://oftc.irclog.whitequark.org/etnaviv
mvlad has joined #etnaviv
JohnnyonFlame has quit [Read error: Connection reset by peer]
frieder has joined #etnaviv
pcercuei has joined #etnaviv
lynxeye has joined #etnaviv
dos1 has quit [Ping timeout: 480 seconds]
dos1 has joined #etnaviv
<tomeu> lynxeye: and btw, I had to do this so the device would poweroff on unbind: https://paste.debian.net/1286986/
<tomeu> that's on 5.17, hopefully it has been fixed in mainline properly
<lynxeye> tomeu: Uh, unbind is a bit of a not really well tested path. Not sure if we even make sure the GPU is cleanly shut down there...
<lynxeye> Thanks for bringing this to my attention.
<lynxeye> tomeu: regarding you caching issue: what cache mode are you using on the result BO?
<tomeu> lynxeye: all buffers are the default DRM_ETNA_GEM_CACHE_WC
<lynxeye> tomeu: is the NPU dma-coherent on this platform?
<lynxeye> We had some issues with the writecombined buffers and the PL310 L2 cache, which had a bad default setting, so bufferable reads could hit the cache. But I haven't seen such an issue on any arm64 platform.
<tomeu> don't really know, but the blob is calling flush and invalidate operations on that buffer quite a bit
<tomeu> well, on all buffers, for that matter
<lynxeye> if it's dma coherent it shouldn't need any manual cache maintenance, so flush/invalidate would be no-ops
<tomeu> hmm, then I guess I should make sure that whatever cache the NPU is using is flushed, and invalidate the CPU cache?
<lynxeye> WC reads bypass the CPU cache if the platform isn't broken in some way that would allow such reads to hit stale cache entries.
<tomeu> the buffer that the blob uses is userspace-allocated, btw
<lynxeye> Looking at NPU caches seems like a good course of action, however I would have expected that the issue also shows up with small buffers if it's simply data stuck in a NPU write cache.
<tomeu> I'm not 100% sure it is a cache issue with the output buffer though, because when it happens, I get a GPU hang
<tomeu> so it could be a caching issue with any of the input buffers?
<lynxeye> does the GPU hang happen on exactly the job with the bad data? Or do you see bad data in a before the hang also?
<tomeu> there is no bad data if there is no hang
<lynxeye> okay. reason I'm asking is that with the write caches there is a common pattern of: data getting stuck in the cache due to insufficient flushing in one job, next job pushes out data due to cache replacement with the writes then targeting a invalid address or corrupting data of the new job.
<tomeu> oh, this happens also on the first job
<tomeu> lynxeye: something that might be interesting is that when buffers are big enough to get into trouble, at that point the tile width doesn't match the tile height as the latter is limited to 10
<tomeu> this HW has the SEPARATE_TILE_STATUS_WHEN_INTERLEAVED feature enabled
<tomeu> and the blob logs interleavemode: 1 when tile width != tile height
<tomeu> I don't see anywhere in the cmdstream how a separate address for the tile status buffer could be passed, though
<lynxeye> does the NPU even use tile status?
sravn has quit [Read error: Connection reset by peer]
frieder has quit [Ping timeout: 480 seconds]
frieder has joined #etnaviv
frieder has quit [Ping timeout: 480 seconds]
frieder has joined #etnaviv
sravn has joined #etnaviv
Leopold has joined #etnaviv
<tomeu> lynxeye: haven't seen any evidence of such, but there are features named gcvFEATURE_NN_INTERLEAVE8 and gcvFEATURE_NN_FULLCACHE_KERNEL_INTERLEAVE_FIX that point towards it
frieder has quit [Remote host closed the connection]
lynxeye has quit [Quit: Leaving.]
JohnnyonFlame has joined #etnaviv
JohnnyonFlame has quit [Ping timeout: 480 seconds]
mvlad has quit [Remote host closed the connection]
JohnnyonFlame has joined #etnaviv
Leopold has quit [Remote host closed the connection]
Leopold has joined #etnaviv
pcercuei has quit [Quit: dodo]