ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
jambalaya has quit [Remote host closed the connection]
<Rathann> what's "speedy"?
jambalaya has joined #panfrost
<alyssa> STORE.i128 needs its staging register aligned to a pair. this is... different than bifrost...
<alyssa> Same with LOAD.i127
<alyssa> 8
<alyssa> and LOAD.i64
<alyssa> and i48 but not i32 or smaller
<alyssa> I guess that makes sense
Rathann has quit [Quit: Leaving]
WoC has quit [Remote host closed the connection]
WoC has joined #panfrost
macc24 has quit [Ping timeout: 480 seconds]
<icecream95> :w
<alyssa> :wq!
<icecream95> :x
<HdkR> :wqa!
<icecream95> Rathann: If you bother reading the online IRC logs, "veyron speedy" is a 2015 chromebook with an RK3288 / Mali t760, and every time I compile on it I am reminded that it is *not* speedy
<icecream95> I believe it is called the "ASUS c201 chromebook" by some, but that's just boring
<alyssa> spent most of today hooking up / debugging branches on valhall
<alyssa> ^ in the valhall compiler
<alyssa> so I now have loops working :]
<alyssa> also fixed a silly bug in a valhall-specific pass that was breaking sin() and cos()
<alyssa> (and added a unit test for that case so it stays working)
<alyssa> oh, also added more SSBO stuff so now as far as pure compute goes I can test whatever
<alyssa> still need to r/e some details about atomics and images but those can probably wait, at some point I should switch gears to graphics stuff since I still can't compile any vertex or fragment shaders ;P
<HdkR> I hear if you only implement OpenCL then you only need compute ;)
cphealy has joined #panfrost
atler is now known as Guest2390
atler has joined #panfrost
Guest2390 has quit [Ping timeout: 480 seconds]
tchebb has quit [Quit: ZNC - http://znc.in]
tchebb has joined #panfrost
warpme_ has joined #panfrost
<urja> it was kinda speedy, for 2015 :P (and certainly compared to anything other ARM32 that i've had around lol, which is not much tbh)
<tomeu> bbrezillon: back to panvk, so what needs to happen before we merge the clearattachments patch?
rasterman has joined #panfrost
<bbrezillon> tomeu: did you try dEQP-VK.renderpass.suballocation.unused_clear_attachments ?
<bbrezillon> tomeu: but I guess we can merge this version and figure it out later
macc24 has joined #panfrost
<macc24> urja: rk3288 outperformed my every other arm machine and impressed me with performance xD
wwilly_ has joined #panfrost
<tomeu> bbrezillon: those ones hit the CmdClearColorImage stub, then:
<tomeu> deqp-vk: ../mesa/src/panfrost/vulkan/panvk_cmd_buffer.c:707: panvk_CmdBeginRenderPass2: Assertion `pRenderPassBegin->clearValueCount == pass->attachment_count' failed.
wwilly_ has quit []
<tomeu> bbrezillon: btw, what was alyssa's concern about how you do blits in your branch?
wwilly has quit [Ping timeout: 480 seconds]
<bbrezillon> "In general I skipped the copyimage/buf code. I won't pretend to understand it but it seems extremely complicated and I don't understand where the complications come from. Is the complexity warranted? How do other VK drivers handle this?"
<bbrezillon> but I remember we discussed an approach involving a copy to a shadow resource to get rid of some special cases when copying AFBC <-> tiled/linear
<tomeu> what I have seen of the other drivers was quite similar, other that they used the vulkan api to submit the draws
<tomeu> instead of cooking descriptors directly
<bbrezillon> we can't do that if we want to get rid of the vertex job
<tomeu> except when they did have stuff in the cmdstream to do just what is being asked, as freedreno does
<tomeu> bbrezillon: maybe we could do something in the middle and add common functions that implement just what is needed, but is general enough to be reused across all meta commands?
<tomeu> something like what you did for panvk_meta_blit_close_batch
<bbrezillon> tomeu: that's what I tried to do
<bbrezillon> exactly, but I didn't find much other things that could be shared
<tomeu> but I don't know, it took me some work to get it, but I think the approach makes sense
<bbrezillon> Re: CmdClearColorImage, isn't it just a wrapper around ClearAttachments?
<tomeu> it's just a lot of code that does similar things in the same file
<tomeu> oh right, you raised that up the other day
<bbrezillon> similar, but not quite identical :)
<tomeu> guess I could do that next
<tomeu> right, otherwise it would be done just once :)
<tomeu> I do find cumbersome to navigate panvk_meta.c though, maybe we could split some code out to panvk_meta_shaders.c or panvk_meta_cs.c
<tomeu> radv has a bunch of src/amd/vulkan/radv_meta_*.c files
<bbrezillon> sounds good to me. The reason I put everything in panvk_meta is that I was hoping we could share more code, other drivers tend to split it per functionality (copy, blit, ...)
<bbrezillon> but then we have some functions that are used by both the blit and copy logic
<tomeu> guess they can remain in panvk_meta.c
<macc24> icecream95: if ye want me to get archlinuxarm working on cadmim ye can pm me
<tomeu> bbrezillon: I would split it similarly as radv, but not as fine grained
<tomeu> so copy, blit and clear
<tomeu> and leave the vulkan entry points and common functions in panvk_meta.c
<bbrezillon> works for me
<tomeu> ok, I thought alyssa wanted a completely different approach for meta
<bbrezillon> well, using a shadow res for AFBC <-> tiled/linear copies requires quite a few changes
<icecream95> macc24: I used ArchLinuxARM for a while and.. do not have much of a desire to use it again
<icecream95> Passed: 18/23 (78.3%)
<icecream95> Not supported: 5/23 (21.7%)
<icecream95> (This is for dEQP-VK.pipeline.push_constant.*)
<icecream95> Still many hacks of course...
<icecream95> I currently use a single set of push constants for all stages, but especially on Midgard where "RMUs" eat register space, this could be improved by finding the lowest used address for each stage and subtracting that from the offsets.
<icecream95> The push constants would still be uploaded once, but different offsets to the uploaded data would be passsed to the GPU
<bbrezillon> icecream95: nice!
wwilly has joined #panfrost
wwilly_ has joined #panfrost
wwilly has quit [Ping timeout: 480 seconds]
camus has joined #panfrost
camus1 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
<tomeu> bbrezillon: ok, so what if we merge the cleanattachments patch as it is now, then I start adding the meta stuff in its own files? first without AFBC support, then we do the shadow bo trick for AFBC
camus1 has quit [Ping timeout: 480 seconds]
<tomeu> the meta stuff is really important for using deqp for testing
camus has joined #panfrost
<bbrezillon> tomeu: sure, let's do that
nlhowell has quit [Ping timeout: 480 seconds]
<tomeu> bbrezillon: ok, do I have your r-b then?
<alyssa> icecream95: nicely done! =)
<alyssa> midgard RMUs were such a mistake, really happy that got fixed in bifrost
<alyssa> then again bifrost went and broke a bunch of other things midgard got right so... progress..
<alyssa> bbrezillon: To be clear - you maintain panvk, not me. If you think $foo is the right way to go (and it doesn't touch common code to do so), you don't need my permission to do $foo.
<alyssa> However speaking from experience with similar types of code on the GL side -- just visually, I can guarnatee the copy image code in the original MR will become a maintenance burden and is likely subtly broken
<macc24> uuuuuuh "kompanio 1300t" will have mali g77 mc9
<macc24> chonky gpu for a chromebook chip
camus has quit [Remote host closed the connection]
camus has joined #panfrost
<alyssa> macc24: ..chonky?
<HdkR> Only MC9? It's a chromebook, it should have full 16 cores :)
<macc24> alyssa: for a chromebook
<macc24> I WISH IT WAS MORE
<alyssa> > The first tablets with the Kompanio 1300T will be unveiled in the July-September quarter, so any moment now. The company didn’t confirm this, but a well-known leakster suggests that the Honor V7 Pro tablet (which may be unveiled in mid-August) will be powered by the 1300T.
<alyssa> Oh shoot.
<alyssa> daniels: can I submit a pre-purchasing request for hw that doesn't exist yet ;-P
<HdkR> Considering there is a company shipping a max config G78MP24 in a cell phone. We need bigger designs in chromebooks :)
<alyssa> HdkR: ...does that company start with an S and is that config sitting on my desk?
<HdkR> nope
<HdkR> That one is only an MP14 that you have there
<macc24> HdkR: hisilicon?
<HdkR> Correct
<macc24> i vaguely remember seeing that
<alyssa> HdkR: Drat. :-p
<alyssa> and I Can't even talk to the GPU!
camus has quit [Remote host closed the connection]
<HdkR> poor GPU, not getting the social interaction it craves
camus has joined #panfrost
<alyssa> HdkR: I mean I talk to it it just doesn't listen
<alyssa> Petulant child
<macc24> alyssa: maybe that gpu is ghosting you
<macc24> start talking to another gpu and it will get jealous
<alyssa> macc24: literally bringing up the apple gpu on the same machine
<HdkR> GPU left you on read? Rude
camus has quit [Ping timeout: 480 seconds]
<daniels> alyssa: heheh
<robclark> HdkR: tbf nothing in the price bracket that the arm chromebooks live in has had a huge GPU.. a618 is like 2x faster than other arm and intel chromebooks in that category, and it is the smallest that qcom can come up with ;-)
<robclark> certainly there are bigger arm SoCs from multiple vendors.. but not in that price range
<robclark> otoh, the bigger SoCs probably spend most of their life thermal throttling, so meh
<HdkR> I'm surprised there still isn't an 8cx(g2) Chromebook
<HdkR> TFW nobody caters to the premium ARM market :(
camus has joined #panfrost
<macc24> robclark: ?
<macc24> a618 is 2x faster than mali g72?
<macc24> it sure doesn't feel this way
<robclark> is g72 what is in duet? Then based on things like gfxbench, etc, yeah..
<macc24> icecream95: too late, some cadmium user got archlinuxarm working xD https://cdn.discordapp.com/attachments/869407878807695390/869505042833604648/image0.jpg
<robclark> iirc the g<whatever> that mtk puts in things tends to be super low core counts
<macc24> D:
<HdkR> That's an MP3, so very tiny
<macc24> very tiny and still quite fast imo
<macc24> i could probably game on it
<robclark> IME a lot of the "heavier" games seem to be throttling themselves to 30fps (which is causing some weird/bad gpu devfreq interactions for me.. more of the perf things I'm working on these days aren't actually the GPU themselves)
<HdkR> I've been getting CPU scheduling problems for games I'm testing
<HdkR> Game's primary thread which is nearly maxing out a CPU core gets dropped to the lower end cores instead of the prime. This then drops the perf from 50fps to 40fps
<robclark> yeah, getting cpufreq to cooperate can be a challenge
<macc24> uh
<macc24> i should move away from conservative governor xD
<robclark> heh, yeah.. doesn't conservative just run at min freq? That is going to be pretty slow on anything with a wide spread of opp's
<macc24> nah, powersave runs on lowest freq
<macc24> i'll benchmark governors and choose something balanced for battery life and performance
<macc24> i do too little to have good battery life...
nlhowell has joined #panfrost
<macc24> alyssa: HMM https://bpa.st/BFHQ
<macc24> hmm*
<macc24> it froze again, this time on discord opened in firefox with krita in background
<alyssa> macc24: again, these are bugs but I can't debug anything from a dmesg alone
<alyssa> I mean
<alyssa> I could if Arm could release the source ID mappings
<alyssa> Hint hint
<alyssa> robmur01: you wouldn't happen to know off hand what source ID 0x240 is on Mali G72 🙃
<robmur01> Off-hand? Ha! I can't even remember which GPUs I've managed to find the documents for or where I've put said documents :P
<robmur01> GPU fault or MMU fault? (because of course there isn't just one "source ID" namespace...)
<alyssa> uhhh
<alyssa> 15:28 < macc24> alyssa: HMM https://bpa.st/BFHQ
<alyssa> MMU fault I think
<robclark> alyssa, robmur01: fwiw gpu devcoredump has been pretty useful for debugging gpu crashes on msm side of things for the "user reports crash that I have no way to repro" scenarios.. (ie. capture as much gpu state, plus cmdstream that triggered crash)
<alyssa> robclark: I'm not sure how to do those for mali
<alyssa> maybe if I extended the UABI.
<robclark> hmm, do you have a way to tell *what* was running at the time of the crash, such as via last signaled fence? And can you keep a reference around to the cmdstream associated with the next unsignaled fence?
<alyssa> bbrezillon: ^^
* macc24 stands awkwardly
<macc24> so what do i do
<robclark> I'm not super familiar w/ panfrost UABI, or what debug state you can get mali to spit out.. we didn't have to add anything to msm UABI for devcoredump, and I guess blob driver is crashy enough that they built some nice debugability into the hw
* robclark notices that kbase in CrOS boosts to max freq when GPU is more than 35% busy! (default is 90% busy)
* macc24 smells mtk hacks
<alyssa> macc24: Are these issues reproducible?
<macc24> alyssa: yesterday it froze when i was watching youtube so i guess
<alyssa> macc24: If you could run with the environment variable `PAN_MESA_DEBUG=trace,sync` it should spew out a huge amount of debug info and then when the bug is hit, crash the app.
<alyssa> And then the files it spits out should give us a clue what went wrong.
<macc24> alyssa: is it supposed to make everything slow as fuck?
<HdkR> "huge amount of debug info" Yes.
<alyssa> Yes.
<macc24> ugh
<macc24> and now it doesn't want to do freeze
<macc24> it certainly does make youtube videos look super trippy
<macc24> o! it crashed
<macc24> alyssa: what files do you want? all 4gb of them?
<alyssa> macc24: first, what was the last line when it crashed
<alyssa> "Incomplete job or timeout"?
<macc24> yep
<macc24> "Incomplete job or timeoutRedirecting call to abort() to mozalloc_abort"
<alyssa> Ok. Send me the _last_ file (so the highest pandeocde number)
<alyssa> Keep the rest for now in case I need earlier ones for context
<macc24> ok
<macc24> do you want dmesg log?
<alyssa> yes please
<macc24> alyssa: http://macc24.bieda.it/pandecode.dump.4114 with sha1 of 182c78b42fdd8d4286dfdf9ae38c937481fd4280
nlhowell has quit [Ping timeout: 480 seconds]
<macc24> and this is dmesg https://bpa.st/EK2Q
<robmur01> hmm, different again - that one seems to have simply branched to NULL
<alyssa> but.. there are no branch instructions.
<macc24> alyssa: are you going to need anyhting more from me? friend came over
<robmur01> s/branched to/somehow got an instruction address of/, then. Am I supposed to know how GPUs work? :P
<alyssa> macc24: I'm not sure what to make of this
<macc24> alyssa: we probably going outside and won't be reachable
<macc24> so i'
<macc24> so i'm asking of you are going to need me
<alyssa> macc24: Could you try with `PAN_MESA_DEBUG=noafbc,nofp16 BIFROST_MESA_DEBUG=nosched,inorder`
<macc24> alyssa: ill try later
<macc24> bbl
<alyssa> \o
wwilly_ has quit []
camus1 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
camus has joined #panfrost
wwilly has joined #panfrost
camus1 has quit [Ping timeout: 480 seconds]
<cphealy> robclark: alyssa: etnaviv has devcore dump support too. Might be worth looking at: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/etnaviv/etnaviv_dump.c?h=v5.14-rc3
<alyssa> bbrezillon: ^
atler is now known as Guest2470
atler has joined #panfrost
Guest2470 has quit [Ping timeout: 480 seconds]
<alyssa> bbrezillon: Just opened the valhall compiler MR. RFC
<HdkR> yay
<alyssa> HdkR: RFC from you too if you want :p
<HdkR> "Yep. All this is perfect. No questions asked."
<alyssa> HdkR: 🌈
<alyssa> HdkR: The most awful thing is that it's using the Bifrost IR
<alyssa> but the Bifrost IR is defined by the Bifrost encoding
<alyssa> and Valhall doesn't remotely use Bifrost encoding
<alyssa> That said, I think decoupling the bifrost IR from bifrost packing is doable without rewrig half the compiler
<macc24> \o/
leah has quit [Quit: WeeChat 2.8]
<icecream95> Remember my stupid idea to send ioctls/BO data over a websocket?
<icecream95> I *may* have started implementing that..
<icecream95> Use case: Fix Midgard bugs while running everything on duet
<icecream95> alyssa: You want a kbase backend for Valhall RE?
<alyssa> Errrr
<macc24> icecream95: um, please don't tell me you are doing it in js
<icecream95> macc24: WASM will be optional
<macc24> icecream95: that's it, i'll give making it in C a shot
<icecream95> (otherwise it runs natively)
<macc24> i will not allow ANYONE to make anything fast in js
<macc24> without one upping them
<alyssa> Speaking of
<alyssa> at what point do I get frustrated enough with Chromebook slowness and macOS weirdness and use m1 linux for daily driver :p
<macc24> when i get cadmium running on m1 xD
* alyssa looks at clock
Joe has joined #panfrost
<urja> /-\_/-\_ ...
<macc24> xnopyt
* Lyude pixelates out of the universe
Joe has quit []
<macc24> Lyude: are u always watching
<Lyude> no, but I know a good meme when I see it 😎
<macc24> that's a bit sussy
<Lyude> amogu
<icecream95> macc24: Do you still get those faults you were complaining about earlier if you reduce the GPU frequency?
<macc24> icecream95: hm
<macc24> pleasedontbedevfreq
leah has joined #panfrost
warpme_ has quit [Quit: Connection closed for inactivity]
<macc24> icecream95: ughhhhhhhhhhhhhhhhh nothing now
* macc24 runs away to sleep
vstehle2 has joined #panfrost
vstehle1 has quit [Ping timeout: 480 seconds]
atler is now known as Guest2504
atler has joined #panfrost
Guest2504 has quit [Ping timeout: 480 seconds]
rasterman has quit [Ping timeout: 480 seconds]