ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
Level has quit [Remote host closed the connection]
rkanwal has quit [Ping timeout: 480 seconds]
erle has joined #panfrost
danct12_ has joined #panfrost
danct12_ has quit []
danct12_ has joined #panfrost
danct12_ has quit []
Daanct12 has joined #panfrost
Daanct12 has quit [Remote host closed the connection]
Danct12 has quit [Remote host closed the connection]
Net147 has quit [Quit: Quit]
Net147 has joined #panfrost
Danct12 has joined #panfrost
davidlt has joined #panfrost
Daanct12 has joined #panfrost
Daanct12 has quit [Remote host closed the connection]
Daanct12 has joined #panfrost
Daanct12 has quit []
davidlt has quit [Ping timeout: 480 seconds]
CME has quit [Ping timeout: 480 seconds]
CME has joined #panfrost
guillaume_g has joined #panfrost
q4a has quit [Quit: Page closed]
davidlt has joined #panfrost
MajorBiscuit has joined #panfrost
Major_Biscuit has joined #panfrost
MajorBiscuit has quit [Ping timeout: 480 seconds]
rasterman has joined #panfrost
erle has quit [Read error: Connection reset by peer]
rkanwal has joined #panfrost
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 480 seconds]
erle has joined #panfrost
davidlt has joined #panfrost
MajorBiscuit has joined #panfrost
Major_Biscuit has quit [Ping timeout: 480 seconds]
nlhowell has joined #panfrost
Major_Biscuit has joined #panfrost
MajorBiscuit has quit [Ping timeout: 480 seconds]
Danct12 has quit [Remote host closed the connection]
Danct12 has joined #panfrost
Major_Biscuit has quit [Ping timeout: 480 seconds]
Net147 has quit [Ping timeout: 480 seconds]
Net147 has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
Major_Biscuit has joined #panfrost
Major_Biscuit has quit [Ping timeout: 480 seconds]
atler is now known as Guest1777
atler has joined #panfrost
Guest1777 has quit [Ping timeout: 480 seconds]
davidlt has quit [Ping timeout: 480 seconds]
Net147 has quit [Ping timeout: 480 seconds]
Net147 has joined #panfrost
jambalaya has quit [Remote host closed the connection]
jambalaya has joined #panfrost
Rathann has joined #panfrost
davidlt has joined #panfrost
Pu244 has joined #panfrost
<Pu244> I'm doing some analysis with an ODroid N2+ that I'm using as a desktop, with the 5.16 kernel and Panfrost (of the built in variety, I assume), and I'm seeing what appear to be some fairly massive memory leaks, or at least a massive (and non-returning) reduction in the CmaFree value in /proc/meminfo after doing various things in the GUI - browsers, etc. Killing gdm3 and allowing it to restart also leads to a ~50MB drop in CmaFree, even once it restarts.
<Pu244> I don't quite understand Cma well enough to know if this is expected behavior that will resolve when it "hits zero," but when the available Cma is low, I start getting failures to launch new windows - they complain of DRM_IOCTL_MODE_CREATE_DUMB failed: Cannot allocate memory.
<Pu244> Which implies that if there is some cleanup that ought to be happening, it's not happening.
<Pu244> Unfortunately, my familiarity with the Linux kernel doesn't extend to the GPU driver side of things on ARM, so I'm not quite sure what the right path forward is. The "Applications not able to open windows because there's nothing left in the CMA pool" does seem to imply some variety of bug, but my attempts to search haven't found much of use.
<Pu244> (and, yes, I'll hang around for replies, IRC is quite familiar to me)
<macc24> Pu244: what config do you use?
<Pu244> Ubuntu 22.04 dev, kernel 5.16.0-odroid, stock Ubuntu desktop (GDM).
<Pu244> What details do you need?
<macc24> the kernel config
<macc24> what mesa version do you use?
<Pu244> dpkg --list | grep mesa output?
<macc24> glxinfo | grep OpenGL
<Pu244> OpenGL vendor string: Panfrost
<Pu244> OpenGL core profile version string: 3.1 Mesa 22.0.1
<Pu244> OpenGL renderer string: Mali-G52 (Panfrost)
<Pu244> OpenGL core profile shading language version string: 1.40
<Pu244> OpenGL core profile context flags: (none)
<Pu244> OpenGL core profile extensions:
<Pu244> OpenGL version string: 3.1 Mesa 22.0.1
<Pu244> OpenGL shading language version string: 1.40
<Pu244> OpenGL context flags: (none)
<Pu244> OpenGL extensions:
<Pu244> OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.1
<Pu244> OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
<Pu244> OpenGL ES profile extensions:
* macc24 starts doubting the "IRC is quite familiar to me" quote
<Pu244> Sorry, quiet room. :p
<Pu244> It felt absurd to pastebin when it was an idle room.
<Pu244> In any case, the answer appears to be 22.0.1 for Mesa version.
<macc24> ...
<Pu244> Sorry, I'm not familiar with the open source Linux kernel graphics drivers in the slightest, I'm not sure which parts matter.
<Pu244> I normally lurk in other corners of the kernel.
<Pu244> Again, I'm not even 100% sure where or if this is a bug, but "Running out of memory and then unable to open new windows" seems to qualify for that category, I'm just not sure how to go about troubleshooting further.
<daniels> Pu244: so the dumb buffers are managed by the AmLogic DRM driver, not by Panfrost
<Pu244> Ok. I've seen some buffer use in /sys/kernel/debug/dma_buf/bufinfo, but it doesn't seem to account for all the memory (and that, too, seems to just grow with time).
<daniels> that shouldn't be growing over time if you're killing your clients to release memory back
<daniels> (but they also shouldn't be allocating dumb buffers)
<daniels> are you using Xorg by any chance?
<Pu244> 22.04 on the ODroid N2+ uses Wayland by default, but there's an X11 to Wayland bridge in use for some stuff.
<Pu244> Hm. There seems to be a huge patchset that very frequently references leaks: https://lore.kernel.org/lkml/20220412062944.117853472@linuxfoundation.org/
<Pu244> Though a lot of those are for other sections of the kernel.
<daniels> that's for external DisplayPort connections on AMD GPUs, and only when you're reading AMD-specific files through debugfs
<Pu244> Yeah, I noticed. :/
<Pu244> Google is exceedingly useless for me lately.
<daniels> the first step would be to grub through debugfs (including /sys/kernel/dri/) and see if clients have the allocations still active (which would be a client issue) or if the allocations should've been destroyed but are in fact being leaked somewhere
<daniels> I'm assuming from the patchset you posted that you're running upstream 5.16?
<Pu244> Ok.
<Pu244> I have no /sys/kernel/dri.
<Pu244> 5.16.0-odroid-arm64 #1 SMP PREEMPT Ubuntu 5.16.14-202203190049~jammy (2022-03-18) aarch64 aarch64 aarch64 GNU/Linux
<Pu244> It's an odroid fork of the kernel, not pure mainline.
<Pu244> I'm not sure how many changes there are between them anymore, though.
<Pu244> Perhaps /sys/debug/dri?
<Pu244> er, /sys/kernel/debug/dri
<daniels> right
<daniels> clients shouldn't be using dumb buffers though, they should just be using Panfrost's own allocations
<daniels> which clients do you see that with?
<Pu244> dri/0 has systemd-logind, Xwayland, Xwayland. dri/1 has gnome-shell, Compositor, XWayland, XWayland, gsd-xsettings (which is *not* running that I know of). dri/128 has gnome-shell, Compisotir, XWayland, XWayland, and gsd-xsettings.
<Pu244> dri/1 and dri/128 only have clients, gem_names, name files, dri/0 has a bunch more.
<cphealy> daniels: With Amlogic SoC, IIRC, the display OSD planes do not have MMU and require contiguous buffers. Would Panfrost's own allocations still be used?
<daniels> they'd still be used for GBM, yeah
<daniels> but there's no need for every normal winsys client to try to allocate dumb buffers in that case
<Pu244> The initial cma size is set to 800MB on this board, which is quite large for a system with 4GB of RAM...
<Pu244> I guess gsd-xsettings is running, doing something... it's not onscreen, maybe it's related to the configuration/power menu or something.
Rathann has quit [Ping timeout: 480 seconds]
<Pu244> Would instrumenting CMA allocations and frees be a way to go about running this down? See what's allocating? My *expectation* would be that the GPU would release everything when the GUI was shut down.
indy has quit [Ping timeout: 480 seconds]
pendingchaos_ has joined #panfrost
<Pu244> Hrm, there's CMA_DEBUG and CMA_DEBUGFS. I'll enable those, see what I find.
pendingchaos has quit [Ping timeout: 480 seconds]
kenzie has quit [Quit: The Lounge - https://thelounge.chat]
pendingchaos_ is now known as pendingchaos
kenzie has joined #panfrost
<robmur01> CMA_DEBUGFS will give you stats about the general state of the allocator; CMA_DEBUG is basically just equivalent to "dyndbg='file cma.c +p'" (if you have dynamic debug enabled), with the net result of spamming up your log with every individual alloc/free
<Pu244> Well, if I'm trying to figure out what's going on with it...
<Pu244> If they're large allocs/frees not happening, that's useful to know about.
<Pu244> I'm losing 800MB in 10 minutes, so it can't be too subtle.
<Pu244> Am I wrong in my assumption that I shouldn't have vast swaths of CMA disappearing?
<robmur01> heh, "log spam" is necessarily meant to connotate a *bad* thing - indeed it's invaluable information in the context - just the general level of activity you may observe ;)
<Pu244> If it's designed to fragment and get reallocated, and this is "working as intended," OK, but it doesn't seem to be.
<Pu244> ... but, yes, that's some impressive logspam while being useless about who's actually requesting it.
<robmur01> IIRC, fragmentation can kill it pretty badly if you end up not being able to migrate pages out of the CMA area due to general memory pressure
<Pu244> Hm.
<Pu244> The bitmap indicates a lot of it is used for something, at least.
<robmur01> the "retrying" or "alloc failed" messages are probably the most interesting to watch for
<robmur01> ah, in fact the one unique thing CONFIG_CMA_DEBUG will do is give you additional dumps upon total failure, I'd forgotten about that
<Pu244> So, viewing the bitmap, about half of it is used - 2^32 in int form. The other half is 0s, so a lot more of those per line.
<Pu244> Despite that, CmaFree is only showing ~100MB of 800MB.
<Pu244> That... seems at odds with at least my first guess as to what the bitmap means.
<robmur01> but whether it's failing "early" due to fragmentation or just genuinely filling up, it can't tell you who's leaking or why
<Pu244> Ok, I've got some "memory range at... is busy, retrying" errors out of cma_alloc() now.
<robmur01> kmemleak might be worth a shot if you really want to dig in
<Pu244> And a lot of the "Purging xxxx bytes" output from... something GPU related, I was of the impression it was panfrost.
<robmur01> yeah, that's just panfrost reclaiming its VA space
<Pu244> I've used Cma up entirely, but the second half of the bitmap from /sys/kernel/debug/cma/cma-reserved/bitmap is still showing 0s.
<Pu244> So... either something's weird, or that bitmap isn't at all what I think it is, which is bits for allocated pages in the CMA region.
<robmur01> I'm not 100% on how all the various memory stats work, but the general mechanism is that any part of the CMA zone which isn't explicitly allocated for a CMA buffer may still be used by other things, which are *supposed* to be kicked out when CMA does actually want the space
<robmur01> so depending on how you look at it the amount of "used" CMA area may well be more than has actually been allocated through CMA itself
<Pu244> Hm, ok.
<Pu244> maxchunk is 173093, which works out, with 4k pages, to about 700MB.
<Pu244> But CmaFree: 42992 kB
<Pu244> used:25830 <-- ~105MB used.
<Pu244> count = 204800, free = 25830, maxchunk = 173093, so free + maxchunk = 200814, which would imply that there's a tiny bit of fragmentation, but... that's 100MB used, 700MB free, vs CmaFree reporting only 40MB free.
<Pu244> The system is certainly under some memory pressure, though.
<Pu244> So maybe other stuff is just using that region and can't be evicted?
<robmur01> Yeah, the CMA zone is filled up with supposedly-moveable pages, which CMA then finds itself unable to actually move when it needs to - that's the "retrying" message
<Pu244> Hm, ok.
<Pu244> Yeah, allocations are failing now.
<Pu244> reserved/alloc_pages_success:314212
<Pu244> reserved/alloc_pages_fail:5265
davidlt has quit [Ping timeout: 480 seconds]
<robmur01> I'd be inclined to try a *smaller* CMA reservation, so that there's more space to kick the other stuff out to, plus less chance of it being put in the CMA zone to begin with
<Pu244> cma_alloc(): return 0000000000000000
<Pu244> That does *not* seem valid. :/
<robmur01> if that works better, it points the the problem being general memory pressure; if it gets proportionally worse, it might indicate another root cause like someone pinning loads of pages such that they get in the way (unfortunately I don't know any good stats for diagnosing compaction/migration behaviour)
<robmur01> returning NULL is how allocation fails; nothing surprising there
<rasterman> lame. that's not surprising enough.
<rasterman> it should return like the current timestamp or something on failure
<rasterman> more exciting
<rasterman> :)
<Pu244> I suppose that's true.
<Pu244> Ok, yeah.
<Pu244> Apr 12 13:25:27 office-n2-2204 kernel: [ 2801.329096] cma: cma_alloc(): returned 0000000000000000
<Pu244> Apr 12 13:25:27 office-n2-2204 gnome-shell[4940]: DRM_IOCTL_MODE_CREATE_DUMB failed: Cannot allocate memory
<Pu244> So that is a legitimate response to the CMA failing to return anything.
<rasterman> was just scrolling up... why is CMA being used?
<robmur01> rasterman: IIRC there have been some kernel functions that could return either NULL or an ERR_PTR for failure; those are "fun"
<Pu244> Panfrost, ODroid N2+, GPU memory allocation, it seems?
<rasterman> i know the earlier s3c6410's only allowed CMA for the fb - like the first 16m of ram
<Pu244> "Running out of CMA" ~= "Graphics start glitching hard."
<rasterman> robmur01: hahaha! much more fun! :)
<Pu244> And when the display freezes for a bit, that's when it's in the retry loops.
<robmur01> yup, no display MMU, so everything has to be physically contiguous
<Pu244> Ok.
<rasterman> oh wait. thats not an exynos. it's a amlogic. so amlogic needs cma for dpu mappable/displayable buffers?
<robmur01> yes
<rasterman> aaaaarrrrr. ugh. i hate that hardware.
<rasterman> it's just asking for fragmentation pain.
<Pu244> Amlogic S922X Processor (12nm fab)
<Pu244> Mali-G52 GPU with 6 x Execution Engines (800Mhz)
<Pu244> Ok.
<rasterman> yeah - just looked it up. sorry. was thinking exynos
<Pu244> So, a more capable unit can use the IOMMU to merge physical regions from all over?
<rasterman> yeah
<rasterman> it's certainly possible to run out of CMA mem but still have available memory due to fragmentation. it becomes a problem for long-lived systems
<Pu244> *nods*
<Pu244> And trying to use these as a desktop, that would put a lot of stress on those things.
<robmur01> I have a feeling it's the video decoder that's really super-hungry, and the main reason for the ridiculous CMA size commonly used on Amlogic - display wanting ~128MB as here seems fairly typical AFAIK
<Pu244> Maybe. Should I try a smaller size?
<Pu244> See if general allocations will stay out of it for a while longer?
<robmur01> as I say it's certainly worth a go; if the problem is mostly one of non-CMA memory being exhausted such that the allocator just ends up playing whack-a-mole moving things around under its own feet then it should get better
<Pu244> Easy enough to drop it to 200MB and see what happens.
<Pu244> Browsers are certainly memory hungry pigs on all sides.
<Pu244> Bleh, set it to 200M, still easy to drive into failures.
<Pu244> ... no worse than it was, though.
<Pu244> Hm.
<Pu244> There's the GCMA patch set, Guaranteed Contiguous Memory Allocator, a good bit faster than CMA...
<Pu244> Alternately, is there a way to just carve off CMA regions, "Nobody else touch this" style?
<Pu244> So they're *not* usable by other code?
<robmur01> in principle, yes - you could try hacking in CONFIG_DMA_GLOBAL_POOL and replacing "linux,cma-default" with "linux,dma-default" to reserve the region purely for DMA buffers
<robmur01> you can also set up device-specific DMA pools, but those require the driver to be in on the game by calling of_reserved_mem_device_init() - quite a few DRM drivers support this, but apparently not meson, so that would need more hacking
<robmur01> (oh, and in the first instance also remove the "reusable" property - that's the defining difference between CMA and regular DMA carveouts)
<Pu244> In the dtbs?
<Pu244> I'll take a stab at that tomorrow - thanks!
<Pu244> I don't mind dedicating 200MB to the GPU *if I can actually get it when I need it.*
<Pu244> Not being too familiar with device memory allocation, CONFIG_DMA_GLOBAL_POOL enables some stuff in kernel/dma, and if I change that property in the dtb, it would then use the DMA pool for allocations, vs the CMA shared pool?
<Pu244> I'll have to find the dtbs to mess with.
<robmur01> yes, DMA_GLOBAL_POOL was really meant for obscure no-MMU setups with caches, but I don't see why having a non-specific carveout shared by all device shouldn't work as a slightly lower-effort alternative to the "normal" approach of making the display driver understand the notion of having its own carveout
<Pu244> Ok, I'll take a look. Thanks, at least I've got a better understanding of the problem space now.
<robmur01> although by pure coincidence (while looking at something else entirely) there are potentially other stupid hacks too :D - https://github.com/raspberrypi/linux/commit/5d7ff1eb9325d91ce0b1036d2a24ba88b5d09352
icecream95 has joined #panfrost
<robmur01> "half the entire system memory" - yeah don't do that
<Pu244> Hah, yeah.
<anholt> rpi video accel can require like 1/4 of the system memory from cma without even any fragmentation, and cma is the worst, so...
<Pu244> I'd rather have something like 256MB or 384MB "dedicated GPU memory" to throw at the problem...
<Pu244> Though I guess even that will fragment over time.
<Pu244> This is all related to my perpetual insanity of using ARM SBCs as desktops...
<robmur01> As an Arm SMMU guy I will happily encourage silicon vendors towards one particular solution to the CMA problem... :D
<icecream95> Pu244: I wonder if doing this to patch Mesa would help:
<icecream95> sed -i 's/ | PIPE_BIND_SHARED//' src/gallium/drivers/panfrost/pan_resource.c
<icecream95> That should stop using CMA for window surfaces
<Pu244> robmur01, SMMU == IOMMU?
<Pu244> icecream95, I can try it!
<daniels> icecream95: _SHARED should absolutely not be invoking kmsro
<Pu244> Oh, hm, I have the kernel source checked out, not Mesa.
<Pu244> (that appears to be Mesa code, not kernel)
<Pu244> If it's not using CMA for window surfaces, won't that irritate the GPU, though? If it's non-contiguous?
<daniels> no, that's fine
<Pu244> "should not be invoking kmsro" ?
<daniels> kmsro is the thing which allocates from KMS/display devices instead of GPU devices
<daniels> there is absolutely no reason that PIPE_BIND_SHARED should cause that codepath to be taken instead of regular GPU allocations
<daniels> kmsro is only required when you need the display controller to be able to source from it directly
<Pu244> Ok, so the suggestion of removing PIPE_BIND_SHARED won't actually change anything?
<daniels> if it does, I'll be annoyed
<Pu244> Ok. :)
<Pu244> Well, the DMA approach seems worth pulling strings on, and tbh I'm more comfortable with kernel and DTB hacking right now.
guillaume_g has quit [Remote host closed the connection]
guillaume_g has joined #panfrost
<daniels> icecream95: also, don't use the dma_buf_export extension
<icecream95> daniels: What's the problem with it?
<daniels> icecream95: there are bugs with the implementation, but mostly apart from that it's just a spectacularly flawed concept
<daniels> apart from that ... no issues
<daniels> less glibly, GL as an API gives you no useful upfront specification of the thing it is you're trying to export, so you already have little hope of being able to usefully export and instead have to reallocate
guillaume_g has quit [Ping timeout: 480 seconds]
<daniels> even when you've done that, do you expect it to be a one-shot thing or at what point do you expect the backing storage for your texture to be detached from the dmabuf you exported?
<daniels> if you want to have externally-addressed storage for an FBO, just use (as much as I don't love this being the answer) GBM
<icecream95> This is an application that will only ever work with Panfrost anyway, so as long as it works there I don't care how reliable it is
<daniels> GBM works with Panfrost too :)
<icecream95> Allocating from GBM wouldn't work so well because I'm reading from the CRC data, and Panfrost currently uses out-of-band CRC for imported resources
<daniels> yeeeeaaaahhhhhhh, I mean, exporting a dmabuf should really either fail or kill the attached CRC
<daniels> since it does invalidate all assumptions you can otherwise make about what the CRC should be :P
<icecream95> OR.. the importing side should use in-band CRC and try to keep the checksums consistent
<icecream95> I have some patches which help a bit with that
<daniels> only if you report that as a separate modifier
<daniels> otherwise it has no way to know what format the CRC takes, where it is (which should be passed as a secondary plane), or that it even exists
<icecream95> My patches just assume that if there is enough leftover space in the BO, then there is CRC data...
<daniels> ...
<daniels> this is literally the example of a) what _SHARED means (that you can't do these things), and b) why dma_buf_export should not exist (because it retroactively applies _SHARED which may very well invalidate the existing resource, and even if you can swallow that, that it doesn't allow Mesa as the allocator and the client as the importer to negotiate a mutually-acceptable layout)
rkanwal has quit [Ping timeout: 480 seconds]
guillaume_g has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]