#panfrost on 2022-07-05 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:57 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

00:38 derzahl has joined #panfrost

01:20 Daanct12 has joined #panfrost

03:18 davidlt has joined #panfrost

04:27 davidlt has quit [Ping timeout: 480 seconds]

05:11 Daanct12 has quit [Remote host closed the connection]

05:23 Daanct12 has joined #panfrost

05:30 derzahl has quit [Read error: Connection reset by peer]

05:45 pch has quit [singleton.oftc.net synthon.oftc.net]

05:45 hanetzer has quit [singleton.oftc.net synthon.oftc.net]

05:45 jambalaya has quit [singleton.oftc.net synthon.oftc.net]

05:45 DVulgaris has quit [singleton.oftc.net synthon.oftc.net]

05:45 Lyude has quit [singleton.oftc.net synthon.oftc.net]

05:45 soreau has quit [singleton.oftc.net synthon.oftc.net]

05:45 floof58 has quit [singleton.oftc.net synthon.oftc.net]

05:45 jstultz has quit [singleton.oftc.net synthon.oftc.net]

05:45 bluebugs has quit [singleton.oftc.net synthon.oftc.net]

05:45 rcf has quit [singleton.oftc.net synthon.oftc.net]

05:45 cphealy has quit [singleton.oftc.net synthon.oftc.net]

05:45 tlwoerner has quit [singleton.oftc.net synthon.oftc.net]

05:45 narmstrong_ has quit [singleton.oftc.net synthon.oftc.net]

05:45 erle has quit [singleton.oftc.net synthon.oftc.net]

05:45 FLHerne has quit [singleton.oftc.net synthon.oftc.net]

05:45 indy has quit [singleton.oftc.net synthon.oftc.net]

05:45 mav has quit [singleton.oftc.net synthon.oftc.net]

05:45 Consolatis has quit [singleton.oftc.net synthon.oftc.net]

05:45 xdarklight has quit [singleton.oftc.net synthon.oftc.net]

05:45 strongtz[m] has quit [singleton.oftc.net synthon.oftc.net]

05:45 AreaScout_ has quit [singleton.oftc.net synthon.oftc.net]

05:45 stebler[m] has quit [singleton.oftc.net synthon.oftc.net]

05:45 sigmaris has quit [singleton.oftc.net synthon.oftc.net]

05:45 enunes has quit [singleton.oftc.net synthon.oftc.net]

05:45 go4godvin has quit [singleton.oftc.net synthon.oftc.net]

05:45 hl` has quit [singleton.oftc.net synthon.oftc.net]

05:45 karolherbst has quit [singleton.oftc.net synthon.oftc.net]

05:45 atler has quit [singleton.oftc.net synthon.oftc.net]

05:45 tomeu has quit [singleton.oftc.net synthon.oftc.net]

05:45 ndufresne has quit [singleton.oftc.net synthon.oftc.net]

05:45 mriesch has quit [singleton.oftc.net synthon.oftc.net]

05:45 pjakobsson has quit [singleton.oftc.net synthon.oftc.net]

05:45 jernej has quit [singleton.oftc.net synthon.oftc.net]

05:45 alpernebbi has quit [singleton.oftc.net synthon.oftc.net]

05:45 jelly has quit [singleton.oftc.net synthon.oftc.net]

05:45 DPA- has quit [singleton.oftc.net synthon.oftc.net]

05:45 pendingchaos has quit [singleton.oftc.net synthon.oftc.net]

05:45 unevenrhombus[m] has quit [singleton.oftc.net synthon.oftc.net]

05:45 urja has quit [singleton.oftc.net synthon.oftc.net]

05:45 cyrozap has quit [singleton.oftc.net synthon.oftc.net]

05:45 simon-perretta-img has quit [singleton.oftc.net synthon.oftc.net]

05:45 falk689 has quit [singleton.oftc.net synthon.oftc.net]

05:45 italove has quit [singleton.oftc.net synthon.oftc.net]

05:45 jschwart has quit [singleton.oftc.net synthon.oftc.net]

05:45 robmur01 has quit [singleton.oftc.net synthon.oftc.net]

05:45 tanty has quit [singleton.oftc.net synthon.oftc.net]

05:45 kenzie has quit [singleton.oftc.net synthon.oftc.net]

05:45 CME has quit [singleton.oftc.net synthon.oftc.net]

05:45 alarumbe has quit [singleton.oftc.net synthon.oftc.net]

05:45 dhewg has quit [singleton.oftc.net synthon.oftc.net]

05:45 bbrezillon has quit [singleton.oftc.net synthon.oftc.net]

05:45 digetx has quit [singleton.oftc.net synthon.oftc.net]

05:45 erle has joined #panfrost

05:45 spawacz has joined #panfrost

06:12 Major_Biscuit has joined #panfrost

06:26 davidlt has joined #panfrost

07:01 Daaanct12 has joined #panfrost

07:06 Daaanct12 has quit [Remote host closed the connection]

07:07 Daanct12 has quit [Ping timeout: 480 seconds]

07:16 rasterman has joined #panfrost

07:17 Major_Biscuit has quit [Ping timeout: 480 seconds]

07:26 <icecream95> Hmm... should I rely on userfaultfd or mprotect/SIGSEGV handlers for tracking writes to the doorbell page for v10 panwrap?

07:27 <icecream95> I guess I could even point the blob to a completely different set of pages, then proxy writes over in a different thread

07:28 <HdkR> any reason to use userfaultfd if you're monitoring in-process? That's more useful for out of process fault handling isn't it?

07:30 <icecream95> HdkR: Someone might want to override the SEGV handler, but it's less likely for userfaultfds to be messed with by the application being traced

07:30 <HdkR> Oh, you're tracing arbitrary applications? Then yea, userfaultfd is the way to go. I've never really seen anything use it

07:33 <icecream95> But does userfaultfd allow reprotecting a page as soon as an access completes?

07:34 <HdkR> faulting thread sleeps until a response is given in the userfaultfd handling, so it should?

07:35 <HdkR> Can kind of do whatever you want

07:35 Major_Biscuit has joined #panfrost

07:37 <icecream95> Hmm.. the other problem is how to copy writes from the MCU back to the userfaultfd-protected pages

07:40 <icecream95> But with how the blob works, I don't think it's super important for that to always be updated.. mostly it's to prevent overflowing the ring buffer, I think

07:41 <icecream95> And it'd take about a thousand batches for that to overflow, so it doesn't matter if we are a *little* behind

07:44 Daanct12 has joined #panfrost

07:46 guillaume_g has joined #panfrost

07:57 <icecream95> But then how do I make a page become 'missing' again? Otherwise I can only catch writes...

08:01 nlhowell has joined #panfrost

08:04 Major_Biscuit has quit [Ping timeout: 480 seconds]

08:09 nlhowell has quit [Ping timeout: 480 seconds]

08:09 <icecream95> I guess I could just mmap(MAP_FIXED) in a new page which has not been faulted in yet..

08:12 camus has quit [Read error: Connection reset by peer]

08:14 Major_Biscuit has joined #panfrost

08:14 camus has joined #panfrost

08:16 <icecream95> I wonder if it would be better to just poll memory in another thread... but that wouldn't be as fun

08:33 Daanct12 has quit [Ping timeout: 480 seconds]

08:36 <icecream95> Meh, I'll go with the wait. It's not as if the blob ever renders at more than 100 fps anyway

08:45 Major_Biscuit has quit [Ping timeout: 480 seconds]

09:08 pch has joined #panfrost

09:12 <icecream95> I have to say that Arm are very forward thinking... I'm sure people will eventually need > 4 GB ring buffers for submitting GPU command lists /s

09:23 camus has quit [Remote host closed the connection]

09:23 camus has joined #panfrost

09:48 <icecream95> Merge request for v10 support created! (against panloader, Mesa will hopefully be soon)

09:51 rkanwal has joined #panfrost

10:30 <icecream95> For those curious about what a command stream dump looks like, here is one for an IVDS job: https://gitlab.freedesktop.org/icecream95/panloader/-/snippets/6771/raw/main/snippetfile1.txt

10:38 icecream95 has quit [Ping timeout: 480 seconds]

11:14 pch has quit [Read error: Connection reset by peer]

12:08 erle has quit [Ping timeout: 480 seconds]

12:44 MoeIcenowy has quit [Read error: Connection reset by peer]

12:46 MoeIcenowy has joined #panfrost

12:55 erle has joined #panfrost

13:17 camus has quit []

15:21 camus has joined #panfrost

16:10 rasterman has quit [Quit: Gettin' stinky!]

16:13 guillaume_g has quit []

18:01 derzahl has joined #panfrost

18:02 erle has quit [Ping timeout: 480 seconds]

18:23 davidlt has quit [Ping timeout: 480 seconds]

18:55 derzahl has quit [Remote host closed the connection]

19:06 camus has quit [Ping timeout: 480 seconds]

19:07 davidlt has joined #panfrost

19:20 CME_ has joined #panfrost

19:46 alyssa has joined #panfrost

19:46 <alyssa> robmur01: https://rosenzweig.io/nextup.txt

19:46 <alyssa> I'm sure good at userspace

20:01 <alyssa> what gets me is that it's a different oops each time

20:01 <alyssa> https://rosenzweig.io/blink.txt

20:01 <alyssa> always about the same time in the OpenCL CTS

20:02 <alyssa> those ones don't even implicate panfrost ...

20:02 <alyssa> (this is with Dmitry's fixes)

20:07 <alyssa> I don't even know where to begin with that

20:10 * alyssa enables CONFIG_DEBUG_PREEMPT

20:11 <alyssa> and lockdep, why is lockdep not enabled?

20:29 <alyssa> OK. Just enabled a big pile of debug options. Now we wait and see if I get more useless information out of this splat

20:29 <alyssa> (Unfortunately my current reproducer takes like 15 minutes)

20:29 davidlt has quit [Ping timeout: 480 seconds]

20:33 <alyssa> I must say, on the list of things I wanted to do today, I don't think "debug lock splat" made the cut .....

20:34 <alyssa> but I suppose the relevant bugs affect much more than just OpenCL

20:35 <jekstrand> Yeah, the OpenCL CTS likes to torment your threading

20:39 <alyssa> OK, I have splat!

20:39 <alyssa> jekstrand: OpenCL CTS + Karol's runner for squaring the torment

20:41 <alyssa> this is really deep in kernel guts, but at least I have helpful debug info

20:44 <alyssa> drm_gem_get_pages called shmem_read_mapping_page

20:45 <alyssa> which uses the GFP from the mapping passede

20:45 <alyssa> this mapping is seemingly GFP_KERNEL

20:45 <alyssa> so that's part 1

20:46 <alyssa> part 2 is investigating the preemption disable

20:47 <alyssa> logs say it was disabled in get_page_from_freelist

20:47 <alyssa> that's the shrinker, I guess

20:48 <robclark> alyssa: I've seen some folio splats like that on -rc2.. hmm, but also w/ some of my own patches that do more eviction/shrink.. that said other than the shrinker connection it seems like unrelated bug?

20:48 <alyssa> robclark: IDK, I'm way over my head here

20:49 <alyssa> I don't understand where get_page_from_freelist is called from, and why it disables preemption

20:52 <robclark> it's called in page allocation path.. which can be basically anything that can allocate memory.. but things like GFP_ATOMIC should be used in allocation paths when you hold spin locks and things like that

20:52 <alyssa> right, what I mean is:

20:52 <alyssa> [ 675.102007] Preemption disabled at:

20:52 <alyssa> [ 675.102008] [<ffffb636d9c73460>] get_page_from_freelist+0x230/0x1460

20:54 <robclark> so scripts/decode_stacktrace.sh is a useful thing to know about.. Ie. cat your splat into that

20:55 <robclark> that'll give you line #'s

20:55 <alyssa> oh!

20:56 <alyssa> that's awesome, thanks!

20:56 <alyssa> in the past I had disassembled the kernel for that >_>

20:57 <alyssa> ok, that answers that nonsense

20:57 <alyssa> get_page_from_freelist -> rmqueue -> rmqueue_pcplist -> pcp_spin_trylock_irqsave

20:58 <alyssa> so then we're holding a spinlock when we call __rmqueue_pcplist, I guess

20:58 <robclark> the script does for lols try and decode the instructions near a crash as x86.. maybe I need to set $CROSS_COMPILE

20:58 <alyssa> You know how I solve that one ;)

21:00 <robmur01> yeah, several scripts want ARCH and/or CROSS_COMPILE

21:00 <alyssa> my next question is how we end up in this completely unrelated looking call trace while (seemingly) within this spinlock

21:00 <alyssa> mm/page_alloc.c guts are way out of my depth

21:01 <alyssa> (https://rosenzweig.io/splat-lines.txt for anyone following along at home)

21:02 <robmur01> FWIW first thing I'd do is try something newer than rc2. There have definitely been... issues... this cycle - rc3 didn't even boot for some of us

21:03 <alyssa> OK

21:03 <alyssa> sounds like an "interesting" rebase, currently on some downstream hell because mainlining for this SoC is stalled...

21:04 <alyssa> what would you recommend I rebase against?

21:04 <robclark> that doesn't look like something that should be atomic.. and yeah, only reason I'm on -rc2 is because that is what drm-next is on and msm-next can't be ahead of drm-next

21:04 <alyssa> robclark: hm? (the first sentence)

21:05 <robmur01> I'd expect rc5 to be a bit more solid

21:05 <robclark> you can sprinkle might_sleep() around the call-stack

21:05 <alyssa> (this branch is linux-next 20220614 plus ~200 patches, mostly SoC specific, really delightful actually ....)

21:06 <alyssa> (admittedly a lot of this seems specific to mt8195 and is maybe not needed on mt8192)

21:07 <robclark> try git-rebase first and see how badly it goes.. usually there isn't as much churn btwn -rc's compared to trying to rebase across a merge window

21:07 <alyssa> rebase on..?

21:07 <alyssa> oh, rc5, er ok

21:08 <robclark> oh, that said, the splat actually tells you:

21:08 <robclark> [ 675.102007] Preemption disabled at:

21:08 <robclark> [ 675.102008] get_page_from_freelist (mm/page_alloc.c:3813 mm/page_alloc.c:3858 mm/page_alloc.c:4293)

21:09 <alyssa> right

21:09 <robclark> that said, my line no's don't seem to match yours

21:10 <robclark> hmm, do you have CONFIG_FAIL_PAGE_ALLOC enabled?

21:11 <alyssa> no, should I?

21:11 <robclark> hmm, no.. was just trying to make sense of your line #s

21:11 <alyssa> it's linux-next 20220614, might've been changes since then

21:12 <robmur01> see here, probably: https://lore.kernel.org/linux-mm/20220613125622.18628-8-mgorman@techsingularity.net/

21:13 <alyssa> this might be a new regression then?

21:14 * alyssa chanced it with the rc5 rebase

21:15 <alyssa> we'll see how rc5 fairs instead of linux-next, assuming no SoC support slipped through the cracks of commits in next \ rc5

21:16 <robclark> oh, linux-next .. is a great way to beta test everyone else's bugs ;-)

21:16 <alyssa> truth.

21:17 <robmur01> sounds like that one started in next-20220614 and lived for maybe a day or two - such great luck you have there!

21:18 <alyssa> robmur01: truthfully.

21:25 <robmur01> yup, my general rule of thumb would be run -next if you want to find bugs in -next, run mainline before about rc4 to check for critical bugs, run late RCs or release tags to do any actual development work

21:25 <alyssa> that sounds sane.

21:26 <robclark> yeah, same.. I try to stick to mainline when developing my own bugs and regressions :-P

21:26 <robmur01> developing on -next might count as some form of self-flagellation

21:26 <robclark> yeah

21:49 alyssa has quit [Quit: leaving]

22:04 alyssa has joined #panfrost

22:04 <alyssa> `Purging 1275068416 bytes`

22:04 <alyssa> That's a lot of bytes :|

22:07 <alyssa> Ok, so it gets a little furhter after the uprev, more splat though coming right up

22:11 <alyssa> that part is very clearly in panfrost, though

22:11 <robclark> hmm, 0x4c000000 bytes.. is a fairly roundish #

22:11 <alyssa> might be able to get it myself

22:11 <alyssa> robclark: TBF might just be the CL CTS being dumb

22:11 <alyssa> although refusing to cache BOs above a certain size might be wise.

22:12 <robclark> oh, yeah, I think we cap it at 64MB

22:13 Major_Biscuit has joined #panfrost

22:13 <alyssa> robclark: btw, any plans to do conformant cl 3.0 on freedreno?

22:14 <robclark> hmm, doesn't cl3 want you to have annoying things like generic pointers?

22:14 <alyssa> Seemingly not

22:15 <alyssa> cl3 made optional a pile of stuff that was mandatory in cl2

22:15 <alyssa> because that's not confusing or anything

22:15 rkanwal has quit [Ping timeout: 480 seconds]

22:15 <robclark> at any rate.. cl is firmly in the category of "I poke at it from time to time on weekends, and not a thing $day_job cares about at all"

22:15 <alyssa> got it

22:16 <alyssa> that's m1 for me, so. :p

22:16 <robclark> there is some work for clvk but (which IMO.. cl on vk still has some, umm, gaps).. but we apparently don't want to ship any native cl drivers

22:16 <alyssa> no?

22:17 <robclark> we apparently don't like things that aren't vk ;-)

22:17 <alyssa> right.

22:18 <robclark> idk, situation might be different if amd and intel had production quality mesa based cl stacks

22:19 <robclark> but I can't argue against not having more vendor gpu stacks.. intel's non-mesa video stack is bad enough ;-)

22:19 <alyssa> yeah

22:20 <anarsoul> well, it works

22:20 <anarsoul> I assume you're talking about video-decoding stack

22:23 <alyssa> mm, tasty tasty circular locking

22:26 <robclark> anarsoul: vaapi? I think we have at least three different versions of it depending on which intel chip you are talking about.. it's a mess

22:26 <anarsoul> yet it works (at least in firefox)

22:27 <anarsoul> but yeah, I agree that overall videodecoding stack in linux is a mess

22:34 <alyssa> Pass 2290 Fails 16 Crashes 6 Timeouts 0

22:34 <alyssa> so >99% by a hair. I'll take it.

22:34 <alyssa> most of the fails are math_brute_force .... delightful ....

22:35 <alyssa> crashes seemingly are more kernel bugs

22:35 <anarsoul> cl kernel or linux kernel? :)

22:36 <alyssa> linux for the crashes, cl for the fails

22:37 Major_Biscuit has quit [Ping timeout: 480 seconds]

22:42 <alyssa> officially a tomorrow problem

22:43 <alyssa> pop pop and away!

22:43 alyssa has quit [Quit: leaving]

23:48 icecream95 has joined #panfrost