ChanServ changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://oftc.irclog.whitequark.org/etnaviv
chewitt has quit [Quit: Zzz..]
_whitelogger has joined #etnaviv
cengiz_io has quit [Quit: cengiz_io]
cengiz_io has joined #etnaviv
frieder has joined #etnaviv
DPA- has joined #etnaviv
DPA has quit [Read error: Connection reset by peer]
pcercuei has joined #etnaviv
JohnnyonFlame has quit [Read error: Connection reset by peer]
lynxeye has joined #etnaviv
<mwalle> lynxeye: welcome back :p
chewitt has joined #etnaviv
agx has quit [Read error: Connection reset by peer]
agx has joined #etnaviv
agx has quit [Read error: Connection reset by peer]
agx has joined #etnaviv
frieder has quit [Ping timeout: 480 seconds]
frieder has joined #etnaviv
<mwalle> lynxeye: I saw you changed the dma_coherent_bitmask from 32 to 40 bits in 1a866306e0fbf3ca. The LS1028A as well as in the IMX8MM and IMX8MQ reference manual says the address bus is 32bit wide. And in fact, it doesn't work for the ls1028a, whereas 32bit DMA addresses work just fine
<mwalle> lynxeye: do you know if you are getting DMA addresses >32bit from dma_alloc_wc() ?
<lynxeye> mwalle: The DMA addressing restriction on 8MQ and 8MM is done via a dma-ranges in the DT.
<lynxeye> mwalle: 40bit addressing should work fine for the MMU mapped pages, but you might need to restrict the initial command buffer allocation to 32bits, as the FE is started via a 32bit physical address before the MMU is set up.
<lynxeye> mwalle: Does it work if you add GFP_DMA32 to the dma_alloc_wc in etnaviv_cmdbuf.c ?
<mwalle> lynxeye: I'm just trying that
<lynxeye> mwalle: I mean I have no clue about the ls1028a bus architecture, might as well have a addressing restricted bus between SMMU and GPU. In that case you need to add dma-ranges in the DT and not change the driver.
<mwalle> lynxeye: makes sense, btw that of_dma_configure() doesn't work at the point it is currently called, because the iommu isn't correctly set up yet
<mwalle> (i need to call it in _probe())
<lynxeye> mwalle: Oh good to know. I don't have any system with Vivante GPU cores and an IOMMU yet, so everything in the driver is just written in a way to hopefully don't blow up spectacularly when used with a IOMMU, but it has never been tested before.
<mwalle> lynxeye: i'll post a patch but I haven't come around to explain whats going wrong yet
<mwalle> lynxeye: mh, I'm still getting dma handles > 32bit with GFP_DMA32. Now I'm wondering if this only affect the main memory addresses, eg after the iommu translation
sravn has quit [Remote host closed the connection]
sravn has joined #etnaviv
<lynxeye> marex: mwalle: reading the backlog: WC isn't cacheable, but bufferable, so there is no cache flush for the pagetable entries, just a barrier. The relevant barrier is the full memory barrier executed when changing the wait/link in the kernel commandbuffer to point to the user command stream. This ensures that the writes to the pagetables are done before the GPU sees the write that will instruct it to jump to the command stream using
<lynxeye> pagetable.
<mwalle> lynxeye: funny, I just saw that this morning (the mb() call). Although need_flush is true, I'm getting an iommu fault where the GPU is trying to access the old cmdbuf
<mwalle> (for the buffer which should flush the cache, if I'm not mistaken)
<marex> lynxeye: CPU mb() causes memory to be flushed from write buffers now ? I thought mb() prevents write reordering
<marex> lynxeye: so if you do mb() , how does that guarantee the GPU seems the write to the page tables ?
<lynxeye> marex: Sure, if you have a write after the mb the barrier will cause the write buffers to drain.
<lynxeye> The mb defines the order in which other observers will see the writes, as the PT writes are before the barrier those writes are visible to the GPU before the jump command becomes visible.
<lynxeye> According to the ARM architecture the mb and writes could be deferred indefinitely, so you would need to do a readback if you really want to flush the jump command write. But then no actual HW implementation does this, as withholding a write for too long isn't improving performance and will break assumptions of existing programs.
<marex> lynxeye: so, maybe this needs to be re-evaluated ? It seems the issue is present at least since 2017 and it was always shot down with some such explanation, but never fixed
<mwalle> lynxeye: what I don't understand, the mb() doesnt influence the GPU cached, does it? are these coherent?
<marex> mwalle: the devcoredump dumps the CPU-side view of the MMUv2 btw, so if you see old entry there, it is the CPU that has the old data
<lynxeye> mwalle: No, the relevant caches on the GPU side for the pagetable is the TLB, which is flushed explicitly in one way or the other (different ways for different GPU gens).
<lynxeye> marex: Withholding the last write does not influence correctness, it's strictly a performance tradeoff.
JohnnyonFlame has joined #etnaviv
<lynxeye> marex: The GPU job execution might run behind the PT updates for quite some jobs. A bug in our pagetable/tlb management code is much more likely than a memory ordering issue.
<mwalle> lynxeye: mh, the MTLB has pointers to the STLB page, right? if i'm reading the code correctly, that is only 32bit
<lynxeye> mwalle: Uh, right. The upper 8bits of the 40bit address should be folded into the lower bits 4..12 of the address just like it's done for the stlb entries.
<lynxeye> At least as far as I remember, Vivante kernel driver might tell you more...
<mwalle> lynxeye: that nice piece of code *g
<mwalle> lynxeye: ok, after which point is the mmu enabled?
<lynxeye> mwalle: Yea, before looking at it you might want to either a) remove all alcoholic beverages from your proximity to avoid starting drinking or b) prepare lots of alcoholic beverages so you don't run out.
<lynxeye> mwalle: etnaviv_iommuv2_restore_sec loads the MMU state. After that point the MMU is enabled and can not be disabled again until the GPU is fully reset.
<mwalle> lynxeye: too late, I've already had a peek last week and know whats ahead ;)
<lynxeye> marex: Also, please don't conflate all MMU fault issues you come across as having the same reason. The are a lot of reasons why you might encounter a MMU fault, other than a misprogrammed pagetable. With MMUv2 we allow userspace to manage the address space and all other GPU state, so userspace is totally free to shoot itself in the foot by e.g. doing a draw to a buffer programmed to be larger than the actual buffer allocation, whic
<lynxeye> l lead to a MMU fault. As the fault is isolated to a single process userspace is free to do so, much like you get a segfault on the CPU if you do a OOB access to malloc memory. It should not happen, but bugs in the Mesa driver can cause faults due to a multitude of reasons.
<marex> oddly enough , the fault always happens after etnaviv_iommuv2_restore is called, see above
<lynxeye> marex: We don't do a full GPU reset in the runtime resume path. So if your GPU loses partial state over a runtime PM cycle (MMU setup gone, but MMU enable bit in the MMIO register still set) we will wrongly skip the MMU setup.
<marex> lynxeye: and that happnes on mx6qp/mx8mq/stm32mp1 all of them ?
<marex> while the system is under extreme load no less ?
<marex> seems rather unlikely
<lynxeye> marex: I haven't seen such an issue on mx6 and mx8mq. Have you actually root caused this to be the same issue on all those chips? The i.MX6QP issue you linked to is definitely not happening in the MMU restore. This was caused by OOB accesses from the 2D GPU, as the armada driver wasn't clipping some operations properly.
<lynxeye> Also please note that MMUv1 is not even able to raise exceptions, so there's no surprise in all the MMU fault issues only happening on MMUv2 GPUs...
<marex> lynxeye: the mx6qp issue, based on that discussion, was never really solved
<marex> lynxeye: the mx8mq issue was never really solved either
frieder has quit [Remote host closed the connection]
<lynxeye> marex: Please don't conflate all those MMU fault issues implying they have the same reason. You wouldn't do this with segfaults on the CPU, would you?
<marex> lynxeye: if you read the whole discussion for the mx6qp, it doesn't seem like this really fully solved the MMU faults there though
<lynxeye> marex: Even if that's the case, you don't want to mix in other ill-specified reports of MMU faults into your specific issue. If you can pinpoint that your issue is always happening in the mmu restore, which might be caused by inconsistent state, the first thing to try would be to check if your issue goes away if you reset the GPU in the runtime PM resume. We probably still don't want to do this generally, as it would cause more lat
<lynxeye> in the resume path, but it would enable us to actually work on a solution for your problem at hand, instead of discussing a multitude of different issues.
lynxeye has quit [Quit: Leaving.]
pcercuei has quit [Quit: dodo]