ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs - <macc24> i have been here before it was popular
derzahl has joined #panfrost
Daanct12 has joined #panfrost
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 480 seconds]
Daanct12 has quit [Remote host closed the connection]
Daanct12 has joined #panfrost
derzahl has quit [Read error: Connection reset by peer]
pch has quit []
hanetzer has quit []
jambalaya has quit []
DVulgaris has quit []
Lyude has quit []
soreau has quit []
floof58 has quit []
jstultz has quit []
bluebugs has quit []
rcf has quit []
cphealy has quit []
tlwoerner has quit []
narmstrong_ has quit []
erle has quit []
FLHerne has quit []
indy has quit []
mav has quit []
Consolatis has quit []
xdarklight has quit []
strongtz[m] has quit []
AreaScout_ has quit []
stebler[m] has quit []
sigmaris has quit []
enunes has quit []
go4godvin has quit []
hl` has quit []
karolherbst has quit []
atler has quit []
tomeu has quit []
ndufresne has quit []
mriesch has quit []
pjakobsson has quit []
jernej has quit []
alpernebbi has quit []
jelly has quit []
DPA- has quit []
pendingchaos has quit []
unevenrhombus[m] has quit []
urja has quit []
cyrozap has quit []
simon-perretta-img has quit []
falk689 has quit []
italove has quit []
jschwart has quit []
robmur01 has quit []
tanty has quit []
kenzie has quit []
CME has quit []
alarumbe has quit []
dhewg has quit []
bbrezillon has quit []
digetx has quit []
erle has joined #panfrost
spawacz has joined #panfrost
Major_Biscuit has joined #panfrost
davidlt has joined #panfrost
Daaanct12 has joined #panfrost
Daaanct12 has quit [Remote host closed the connection]
Daanct12 has quit [Ping timeout: 480 seconds]
rasterman has joined #panfrost
Major_Biscuit has quit [Ping timeout: 480 seconds]
<icecream95> Hmm... should I rely on userfaultfd or mprotect/SIGSEGV handlers for tracking writes to the doorbell page for v10 panwrap?
<icecream95> I guess I could even point the blob to a completely different set of pages, then proxy writes over in a different thread
<HdkR> any reason to use userfaultfd if you're monitoring in-process? That's more useful for out of process fault handling isn't it?
<icecream95> HdkR: Someone might want to override the SEGV handler, but it's less likely for userfaultfds to be messed with by the application being traced
<HdkR> Oh, you're tracing arbitrary applications? Then yea, userfaultfd is the way to go. I've never really seen anything use it
<icecream95> But does userfaultfd allow reprotecting a page as soon as an access completes?
<HdkR> faulting thread sleeps until a response is given in the userfaultfd handling, so it should?
<HdkR> Can kind of do whatever you want
Major_Biscuit has joined #panfrost
<icecream95> Hmm.. the other problem is how to copy writes from the MCU back to the userfaultfd-protected pages
<icecream95> But with how the blob works, I don't think it's super important for that to always be updated.. mostly it's to prevent overflowing the ring buffer, I think
<icecream95> And it'd take about a thousand batches for that to overflow, so it doesn't matter if we are a *little* behind
Daanct12 has joined #panfrost
guillaume_g has joined #panfrost
<icecream95> But then how do I make a page become 'missing' again? Otherwise I can only catch writes...
nlhowell has joined #panfrost
Major_Biscuit has quit [Ping timeout: 480 seconds]
nlhowell has quit [Ping timeout: 480 seconds]
<icecream95> I guess I could just mmap(MAP_FIXED) in a new page which has not been faulted in yet..
camus has quit [Read error: Connection reset by peer]
Major_Biscuit has joined #panfrost
camus has joined #panfrost
<icecream95> I wonder if it would be better to just poll memory in another thread... but that wouldn't be as fun
Daanct12 has quit [Ping timeout: 480 seconds]
<icecream95> Meh, I'll go with the wait. It's not as if the blob ever renders at more than 100 fps anyway
Major_Biscuit has quit [Ping timeout: 480 seconds]
pch has joined #panfrost
<icecream95> I have to say that Arm are very forward thinking... I'm sure people will eventually need > 4 GB ring buffers for submitting GPU command lists /s
camus has quit [Remote host closed the connection]
camus has joined #panfrost
<icecream95> Merge request for v10 support created! (against panloader, Mesa will hopefully be soon)
rkanwal has joined #panfrost
<icecream95> For those curious about what a command stream dump looks like, here is one for an IVDS job:
icecream95 has quit [Ping timeout: 480 seconds]
pch has quit [Read error: Connection reset by peer]
erle has quit [Ping timeout: 480 seconds]
MoeIcenowy has quit [Read error: Connection reset by peer]
MoeIcenowy has joined #panfrost
erle has joined #panfrost
camus has quit []
camus has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
guillaume_g has quit []
derzahl has joined #panfrost
erle has quit [Ping timeout: 480 seconds]
davidlt has quit [Ping timeout: 480 seconds]
derzahl has quit [Remote host closed the connection]
camus has quit [Ping timeout: 480 seconds]
davidlt has joined #panfrost
CME_ has joined #panfrost
alyssa has joined #panfrost
<alyssa> I'm sure good at userspace
<alyssa> what gets me is that it's a different oops each time
<alyssa> always about the same time in the OpenCL CTS
<alyssa> those ones don't even implicate panfrost ...
<alyssa> (this is with Dmitry's fixes)
<alyssa> I don't even know where to begin with that
* alyssa enables CONFIG_DEBUG_PREEMPT
<alyssa> and lockdep, why is lockdep not enabled?
<alyssa> OK. Just enabled a big pile of debug options. Now we wait and see if I get more useless information out of this splat
<alyssa> (Unfortunately my current reproducer takes like 15 minutes)
davidlt has quit [Ping timeout: 480 seconds]
<alyssa> I must say, on the list of things I wanted to do today, I don't think "debug lock splat" made the cut .....
<alyssa> but I suppose the relevant bugs affect much more than just OpenCL
<jekstrand> Yeah, the OpenCL CTS likes to torment your threading
<alyssa> OK, I have splat!
<alyssa> jekstrand: OpenCL CTS + Karol's runner for squaring the torment
<alyssa> this is really deep in kernel guts, but at least I have helpful debug info
<alyssa> drm_gem_get_pages called shmem_read_mapping_page
<alyssa> which uses the GFP from the mapping passede
<alyssa> this mapping is seemingly GFP_KERNEL
<alyssa> so that's part 1
<alyssa> part 2 is investigating the preemption disable
<alyssa> logs say it was disabled in get_page_from_freelist
<alyssa> that's the shrinker, I guess
<robclark> alyssa: I've seen some folio splats like that on -rc2.. hmm, but also w/ some of my own patches that do more eviction/shrink.. that said other than the shrinker connection it seems like unrelated bug?
<alyssa> robclark: IDK, I'm way over my head here
<alyssa> I don't understand where get_page_from_freelist is called from, and why it disables preemption
<robclark> it's called in page allocation path.. which can be basically anything that can allocate memory.. but things like GFP_ATOMIC should be used in allocation paths when you hold spin locks and things like that
<alyssa> right, what I mean is:
<alyssa> [ 675.102007] Preemption disabled at:
<alyssa> [ 675.102008] [<ffffb636d9c73460>] get_page_from_freelist+0x230/0x1460
<robclark> so scripts/ is a useful thing to know about.. Ie. cat your splat into that
<robclark> that'll give you line #'s
<alyssa> oh!
<alyssa> that's awesome, thanks!
<alyssa> in the past I had disassembled the kernel for that >_>
<alyssa> ok, that answers that nonsense
<alyssa> get_page_from_freelist -> rmqueue -> rmqueue_pcplist -> pcp_spin_trylock_irqsave
<alyssa> so then we're holding a spinlock when we call __rmqueue_pcplist, I guess
<robclark> the script does for lols try and decode the instructions near a crash as x86.. maybe I need to set $CROSS_COMPILE
<alyssa> You know how I solve that one ;)
<robmur01> yeah, several scripts want ARCH and/or CROSS_COMPILE
<alyssa> my next question is how we end up in this completely unrelated looking call trace while (seemingly) within this spinlock
<alyssa> mm/page_alloc.c guts are way out of my depth
<alyssa> ( for anyone following along at home)
<robmur01> FWIW first thing I'd do is try something newer than rc2. There have definitely been... issues... this cycle - rc3 didn't even boot for some of us
<alyssa> OK
<alyssa> sounds like an "interesting" rebase, currently on some downstream hell because mainlining for this SoC is stalled...
<alyssa> what would you recommend I rebase against?
<robclark> that doesn't look like something that should be atomic.. and yeah, only reason I'm on -rc2 is because that is what drm-next is on and msm-next can't be ahead of drm-next
<alyssa> robclark: hm? (the first sentence)
<robmur01> I'd expect rc5 to be a bit more solid
<robclark> you can sprinkle might_sleep() around the call-stack
<alyssa> (this branch is linux-next 20220614 plus ~200 patches, mostly SoC specific, really delightful actually ....)
<alyssa> (admittedly a lot of this seems specific to mt8195 and is maybe not needed on mt8192)
<robclark> try git-rebase first and see how badly it goes.. usually there isn't as much churn btwn -rc's compared to trying to rebase across a merge window
<alyssa> rebase on..?
<alyssa> oh, rc5, er ok
<robclark> oh, that said, the splat actually tells you:
<robclark> [ 675.102007] Preemption disabled at:
<robclark> [ 675.102008] get_page_from_freelist (mm/page_alloc.c:3813 mm/page_alloc.c:3858 mm/page_alloc.c:4293)
<alyssa> right
<robclark> that said, my line no's don't seem to match yours
<robclark> hmm, do you have CONFIG_FAIL_PAGE_ALLOC enabled?
<alyssa> no, should I?
<robclark> hmm, no.. was just trying to make sense of your line #s
<alyssa> it's linux-next 20220614, might've been changes since then
<alyssa> this might be a new regression then?
* alyssa chanced it with the rc5 rebase
<alyssa> we'll see how rc5 fairs instead of linux-next, assuming no SoC support slipped through the cracks of commits in next \ rc5
<robclark> oh, linux-next .. is a great way to beta test everyone else's bugs ;-)
<alyssa> truth.
<robmur01> sounds like that one started in next-20220614 and lived for maybe a day or two - such great luck you have there!
<alyssa> robmur01: truthfully.
<robmur01> yup, my general rule of thumb would be run -next if you want to find bugs in -next, run mainline before about rc4 to check for critical bugs, run late RCs or release tags to do any actual development work
<alyssa> that sounds sane.
<robclark> yeah, same.. I try to stick to mainline when developing my own bugs and regressions :-P
<robmur01> developing on -next might count as some form of self-flagellation
<robclark> yeah
alyssa has quit [Quit: leaving]
alyssa has joined #panfrost
<alyssa> `Purging 1275068416 bytes`
<alyssa> That's a lot of bytes :|
<alyssa> Ok, so it gets a little furhter after the uprev, more splat though coming right up
<alyssa> that part is very clearly in panfrost, though
<robclark> hmm, 0x4c000000 bytes.. is a fairly roundish #
<alyssa> might be able to get it myself
<alyssa> robclark: TBF might just be the CL CTS being dumb
<alyssa> although refusing to cache BOs above a certain size might be wise.
<robclark> oh, yeah, I think we cap it at 64MB
Major_Biscuit has joined #panfrost
<alyssa> robclark: btw, any plans to do conformant cl 3.0 on freedreno?
<robclark> hmm, doesn't cl3 want you to have annoying things like generic pointers?
<alyssa> Seemingly not
<alyssa> cl3 made optional a pile of stuff that was mandatory in cl2
<alyssa> because that's not confusing or anything
rkanwal has quit [Ping timeout: 480 seconds]
<robclark> at any rate.. cl is firmly in the category of "I poke at it from time to time on weekends, and not a thing $day_job cares about at all"
<alyssa> got it
<alyssa> that's m1 for me, so. :p
<robclark> there is some work for clvk but (which IMO.. cl on vk still has some, umm, gaps).. but we apparently don't want to ship any native cl drivers
<alyssa> no?
<robclark> we apparently don't like things that aren't vk ;-)
<alyssa> right.
<robclark> idk, situation might be different if amd and intel had production quality mesa based cl stacks
<robclark> but I can't argue against not having more vendor gpu stacks.. intel's non-mesa video stack is bad enough ;-)
<alyssa> yeah
<anarsoul> well, it works
<anarsoul> I assume you're talking about video-decoding stack
<alyssa> mm, tasty tasty circular locking
<robclark> anarsoul: vaapi? I think we have at least three different versions of it depending on which intel chip you are talking about.. it's a mess
<anarsoul> yet it works (at least in firefox)
<anarsoul> but yeah, I agree that overall videodecoding stack in linux is a mess
<alyssa> Pass 2290 Fails 16 Crashes 6 Timeouts 0
<alyssa> so >99% by a hair. I'll take it.
<alyssa> most of the fails are math_brute_force .... delightful ....
<alyssa> crashes seemingly are more kernel bugs
<anarsoul> cl kernel or linux kernel? :)
<alyssa> linux for the crashes, cl for the fails
Major_Biscuit has quit [Ping timeout: 480 seconds]
<alyssa> officially a tomorrow problem
<alyssa> pop pop and away!
alyssa has quit [Quit: leaving]
icecream95 has joined #panfrost