#panfrost on 2022-02-07 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

00:34 pendingchaos has quit [Ping timeout: 480 seconds]

00:43 pendingchaos has joined #panfrost

00:56 rasterman has quit [Quit: Gettin' stinky!]

01:28 pendingchaos has quit [Ping timeout: 480 seconds]

01:55 cphealy has joined #panfrost

02:01 camus has joined #panfrost

02:13 pendingchaos has joined #panfrost

02:16 camus1 has joined #panfrost

02:16 alpernebbi has quit [Ping timeout: 480 seconds]

02:17 alpernebbi has joined #panfrost

02:19 camus has quit [Ping timeout: 480 seconds]

02:43 camus has joined #panfrost

02:43 camus1 has quit [Read error: Connection reset by peer]

04:30 Lyude has quit [Quit: WeeChat 3.4]

04:31 Lyude has joined #panfrost

05:00 JulianGro has joined #panfrost

05:31 samuelig has quit [Server closed connection]

05:31 samuelig has joined #panfrost

05:31 FLHerne has joined #panfrost

05:32 FLHerne is now known as Guest2059

05:33 minicom has joined #panfrost

05:35 wilkom has joined #panfrost

07:30 enick_689 has joined #panfrost

07:54 erlehmann has quit [Ping timeout: 480 seconds]

07:59 Major_Biscuit has joined #panfrost

08:02 erlehmann has joined #panfrost

08:34 MajorBiscuit has joined #panfrost

08:36 Major_Biscuit has quit [Ping timeout: 480 seconds]

09:10 tolszak has joined #panfrost

10:09 rasterman has joined #panfrost

10:58 Guest2059 has quit []

10:58 FLHerne has joined #panfrost

10:58 FLHerne is now known as Guest2079

10:59 Guest2079 has quit [Remote host closed the connection]

10:59 FLHerne_ has joined #panfrost

11:09 FLHerne_ is now known as FLHerne

12:06 nlhowell has joined #panfrost

12:08 rasterman has quit [Quit: Gettin' stinky!]

12:09 rasterman has joined #panfrost

12:31 JulianGro has quit [Ping timeout: 480 seconds]

12:50 nlhowell is now known as Guest2089

12:50 nlhowell has joined #panfrost

12:57 Guest2089 has quit [Ping timeout: 480 seconds]

13:08 oftcpass has joined #panfrost

13:13 tolszak has quit [Ping timeout: 480 seconds]

13:14 nlhowell has quit [Ping timeout: 480 seconds]

13:41 nlhowell has joined #panfrost

14:24 tolszak has joined #panfrost

14:31 oftcpass has quit [Ping timeout: 480 seconds]

16:02 tjcorley has quit [Ping timeout: 480 seconds]

16:03 tjcorley has joined #panfrost

17:23 <alyssa> bbrezillon: how do blits work in (pan)vk?

17:24 <alyssa> are they like draws (special "draw a quad with the contents of this image" added to a render pass?)

17:24 <alyssa> or are they self-contained batches? (1 fragment job per blit)

17:26 jekstrand has joined #panfrost

17:28 <anholt> bbrezillon: I'm trying to uprev to cts 1.3.1 and the custom caselist for panvk is totally breaking

17:28 <anholt> I think I need to switch you over to just using include filters if you really want to be doing caselist subsetting.

17:29 <alyssa> anholt: caselist subsetting is non-optional for panvk right now, too much unimplemented

17:29 <alyssa> jekstrand: welcome ^^

17:30 JulianGro has joined #panfrost

17:31 <alyssa> bbrezillon: If it's a self-contained batch, then on Bifrost+ we can always use pre-frame shaders

17:31 <jekstrand> alyssa: I even added #panfrost to my auto-join list so I should be here from now on.

17:32 <alyssa> bbrezillon: I.e. to blit from image 1 (a, b, c, d) to image 2 (x, y, z, w), we make a fragment job rendering to image 2 with bounding box (x, y, z, w) with an empty polygon list but a preframe shader sampling from image 1

17:32 <alyssa> bypassing the tiler entirely

17:35 <macc24> jekstrand: i recognize your nick from somewhere

17:35 <alyssa> macc24: Jason Ekstrand, gfx ninja

17:36 <alyssa> just joined Collabora

17:36 <alyssa> he used to work for AMD or Intel or Valve or something like that

17:36 <macc24> hmm

17:36 <alyssa> bbrezillon: If I'm not mistaken, GENX(pan_blit) is only used in the inner loop of meta_blit

17:37 <alyssa> and that loop has the structure of "open new batch, pan_blit, close batch"

17:37 <jekstrand> alyssa: Yeah, I'm sure it was one of those.

17:37 <alyssa> where closing a batch creates a fragment jobs

17:39 <alyssa> Meaning panvk wants the "blit as its own dedicated batch" too, the API to pan_blit just doesn't make that clear

17:39 MajorBiscuit has quit [Quit: WeeChat 3.4]

18:04 <jekstrand> bbrezillon: I think my biggest question about !14406 is, "why?" Is it really not possible to implement secondaries any better way than a full capture+replay?

18:07 <jekstrand> I guess that's sort-of explained in the header.

18:58 <bbrezillon> alyssa: oh, ok

18:59 <bbrezillon> (Re: no need for a tiler job for blits)

19:00 <bbrezillon> jekstrand: we're currently evaluating this option

19:01 <bbrezillon> Manas (AKA sin3point14) is working on a native secondary cmdbuf implementation for panvk

19:01 <jekstrand> dozen's going to need this, isn't it?

19:02 <jekstrand> I don't think D3D12 has them

19:03 <bbrezillon> but the preliminary perf results are a bit disappointing (the memcpy + address relocation logic is worse than the SW-replay approach, at least for simple draw replays)

19:03 <jekstrand> !

19:03 <jekstrand> That's... surprising.

19:04 <jekstrand> But I guess it could be

19:04 <bbrezillon> jekstrand: yeah, I wouldn't exclude a silly mistake in our code

19:04 <bbrezillon> so I wouldn't trust those preliminary results just yet

19:04 <jekstrand> k

19:04 <alyssa> bbrezillon: alright, wasn't sure if this was an oversight (understandable one, we have 5 major archs and 2000 pages of API specs to worry about..), or a deliberate design

19:04 <bbrezillon> and yes, we need it for dozen, at least until d3d12 provides us with a way to duplicate a cmdlist

19:05 <alyssa> if oversight, will see if I can excise that code path

19:05 <bbrezillon> alyssa: cool

19:05 <jekstrand> bbrezillon: Dozen makes total sense to me.

19:06 <alyssa> bbrezillon: re address relocation, there's a lot less of it on Valhall

19:06 <jekstrand> bbrezillon: For panvk, this is one of those things that makes me suddenly put my maintainer hat on and start asking questions. If there are architectural decisions that are required to support secondaries efficiently natively, it's better to make those early than a giant refactor later.

19:06 <alyssa> hardware-allocated varying buffers <3

19:07 <jekstrand> But, knowing little to nothing about Mali hardware, I can totally believe that it's intractable for some reason I'm not aware of.

19:07 <bbrezillon> alyssa: ah, that's awesome news!

19:07 <alyssa> bbrezillon: not 100% sure how to do XFB on valhall though ;-p

19:08 <bbrezillon> jekstrand: it's doable, but doesn't seem to provide the expected perf improvements

19:08 <alyssa> what's the hot path?

19:08 <bbrezillon> memcpy() is pretty hot

19:09 <bbrezillon> and the other one in BO allocation

19:09 <alyssa> don't you have both of those (effectively) for replay though?

19:09 <alyssa> what are you memcpy'ing *from*?

19:09 <bbrezillon> but the SW-implem has pretty much the same BO alloc overhead

19:10 <bbrezillon> CPU-only buffer (basically an mmap(private)) to a BO that's big enough to contain all the descs contained in the cmdbuf

19:10 <alyssa> mmap(private)? not just malloc?

19:11 <bbrezillon> the advantage of mmap() is that you can make the buffer grow without copying with mremap()

19:12 <alyssa> right..

19:12 <bbrezillon> and our test case (a modified version of dEQP-VK.api.command_buffers.record_many_draws_secondary_1 where we execute 50 times a secondary cmdbuf with thousands of draws instead of once) generates arounds 18MB of descs

19:13 <bbrezillon> the relocation logic is pretty cheap (just like the SW-cmd-queue replay) compared to the allocation/write instructions

19:14 <bbrezillon> the thing with the current relocation step is that we

19:15 <bbrezillon> 1/ copy holes (represent something around 6% of the total amount of memory)

19:15 <bbrezillon> 2/ write twice to the address slots (once during the BO copy, and once when relocating)

19:16 <bbrezillon> I guess we could have similar perfs if we weren't copy the whole BO, relocating/writing in one step

19:17 <bbrezillon> but I'm doubtful we'd end up if significantly better perfs, at least not with simple draws (might be a bit different with blits, or other more complex operations though)

19:18 <bbrezillon> *end up with

19:20 <jekstrand> bbrezillon: When you're evaluating perf, the perf of vkCmdExecuteCommands() is what matters. If the over-all cost is the same but weighted such that vkCmdExecuteCommands() is cheaper, that's still a win.

19:21 <jekstrand> (Sorry, I can't tell exactly what bits are being referred to when you say it's about the same.)

19:21 <bbrezillon> jekstrand: yep, I'm filtering on CmdExecuteCommands, of course

19:21 <jekstrand> Ok, cool

19:21 <jekstrand> So why is that doing any BO allocation?

19:21 <jekstrand> It should just be memcpy and reloc

19:22 <bbrezillon> it's allocating a BO in the primary cmdbuf

19:22 <jekstrand> Right, to memcpy into, I suppose

19:22 <bbrezillon> yep

19:22 <jekstrand> Is that just part of the natural BO growing or do you always have to allocate for CmdExecuteCommands()?

19:23 <bbrezillon> I even tried to pass MAP_POPULATE to pre-populate the CPU MMU and avoid faults during the copy, but it didn't help much

19:24 <bbrezillon> CPU-visible BOs are currently not growable

19:24 <bbrezillon> so we just allocate 64kb chunk at a time (or more if we know we'll need more, which is the case here, since the src buffer is 18MB wide)

19:25 <bbrezillon> jekstrand: BTW, I don't know if you noticed, but I have a sligthly reworked implem of the secondary-cmdbuf here https://gitlab.freedesktop.org/bbrezillon/mesa/-/commits/panvk-exec-cmd-alt

19:26 <bbrezillon> (see https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14406#note_1248701)

19:30 tomeu829 has joined #panfrost

19:32 <bbrezillon> anholt: can we filter out tests with `dEQP-VK.<tests>.*` rules? if yes, then we can probably for a skip list

19:33 <anholt> bbrezillon: skips and includes are both regexes.

19:34 CounterPillow_ has quit [charon.oftc.net helix.oftc.net]

19:34 megi1 has quit [charon.oftc.net helix.oftc.net]

19:34 mriesch has quit [charon.oftc.net helix.oftc.net]

19:34 br has quit [charon.oftc.net helix.oftc.net]

19:34 vstehle has quit [charon.oftc.net helix.oftc.net]

19:34 Stary has quit [charon.oftc.net helix.oftc.net]

19:34 jernej has quit [charon.oftc.net helix.oftc.net]

19:36 <bbrezillon> anholt: ok, I'll take a look tomorrow then, unless you want to give it a try

19:37 <anholt> bbrezillon: if you have any hints of how you generated your list, that would help, since I'm putting the MR together currently

19:37 <bbrezillon> we just test dEQP-VK.api.copy_and_blit.* and dEQP-VK.pipeline.blend.*

19:38 <bbrezillon> so anything else should be skipped

19:39 CounterPillow_ has joined #panfrost

19:39 megi1 has joined #panfrost

19:39 br has joined #panfrost

19:39 mriesch has joined #panfrost

19:39 vstehle has joined #panfrost

19:39 Stary has joined #panfrost

19:39 sigmaris has joined #panfrost

19:39 jernej has joined #panfrost

19:39 stepri01 has joined #panfrost

19:39 stebler[m] has joined #panfrost

19:39 mmind00 has joined #panfrost

19:39 <anholt> bbrezillon: great!

19:41 <alyssa> bbrezillon: Is implementing nir_load_push_constant todo..?

19:44 <bbrezillon> alyssa: IIRC, I had something in my WIP branch

19:45 <alyssa> ack

19:45 <bbrezillon> https://gitlab.freedesktop.org/bbrezillon/mesa/-/commits/panvk/

19:45 <bbrezillon> https://gitlab.freedesktop.org/bbrezillon/mesa/-/commit/789d45174d964fb56f56c79d6ab64c907ec22630

19:45 <bbrezillon> right now it's just a dedicated UBO

19:46 <alyssa> might snarf it, having meta shaders do load_push_constant is a lot easier to verify than "load_ubo and pray it gets pushed"

19:46 <alyssa> er wait, that doesn't implement push constants, that lowers them

19:47 <bbrezillon> yeah, I considered implementing it as real push constants

19:47 <bbrezillon> but then I realized it might take slots that could be used by sysvals

19:48 <alyssa> right..

19:48 <bbrezillon> so I decided to leave them as UBOs and let the UBO -> push_constant opt pass do its job

19:50 <bbrezillon> (this being said, this opt pass is disabled in panvk, 'cause I wanted to keep things simple at first :p)

19:50 <alyssa> heh :)

19:51 <alyssa> once we support pilot shaders the push code will be reworked I imagine

19:51 <alyssa> freedreno/vulkan/tu_shader.c

19:51 <alyssa> um

19:51 <alyssa> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13147

19:52 <alyssa> (In the mean time I'd rather not break the downstream push optimizations)

19:53 <bbrezillon> ah, nice

19:55 <bbrezillon> jekstrand: FYI, here are the preliminary perf results we got https://gitlab.freedesktop.org/-/snippets/4417

19:56 <bbrezillon> panvk_v7_native_CmdExecuteCommands() is the native implementation, but for some reason, the memcpy that's inside this function doesn't appear under it in perf report

19:57 <bbrezillon> panvk_v7_sw_CmdExecuteCommands() is the SW-queue implementation

19:57 <bbrezillon> they both execute the same secondary cmdbuf (that's recorded in both forms in the vkCmdXxx() functions)

20:01 <jekstrand> bbrezillon: Well, that is disappointing...

20:03 <bbrezillon> couldn't agree more

20:04 <bbrezillon> as I said, I'm pretty confident we can reach the same level of perfs with some optimizations, but I'm not sure things will be significantly better with the native implem

20:04 <anholt> is native doing some WC reads or something?

20:04 <bbrezillon> nope, we're using a CPU-only buffer (allocated with mmap(MAP_PRIVATE))

20:05 <bbrezillon> as the source

20:05 <bbrezillon> the destination is a WC buffer, but we don't read from it (at least, we shouldn't)

20:06 <bbrezillon> I didn't figure out why we have this page fault in the memcpy path though

20:06 <jekstrand> anholt: I was going to ask that too but I figured I'd already asked enough obvoius questions. 😂

20:06 <anholt> ha

20:06 <bbrezillon> I did an mlock() to make things resident on the CPU-buffer (just as a test)

20:07 <bbrezillon> and I keep having those faults

20:07 <bbrezillon> also added MAP_POPULATE to the BO mmap() so pre-populate the CPU pagetable

20:08 <anholt> oh, interesting. I was wondering if this was something like your cmd pool didn't have bos in it.

20:08 <bbrezillon> but if I trust perf, that only accounts for 1% on the 2.5% diff we have between the 2 implems

20:10 <bbrezillon> anholt: yep, the cmd pool is definitely empty (no directly available bufs) when we start executing the secondary cmd buf

20:10 <bbrezillon> but the same goes for the SW implementation

20:16 rasterman has quit [Quit: Gettin' stinky!]

20:16 evx256 has joined #panfrost

20:23 nlhowell has quit [Ping timeout: 480 seconds]

20:25 <jekstrand> bbrezillon: Is it just me or has no one implemented DestroyCommandPool for panvk?

20:28 <bbrezillon> jekstrand: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/panfrost/vulkan/panvk_vX_cmd_buffer.c#L1149

20:29 <jekstrand> Ok, didn't expect it to be in a different file....

20:30 <bbrezillon> yeah, having panvk_cmd_buffer.c doesn't make much sense

20:30 <bbrezillon> we should probably merge them

20:30 <jekstrand> Yeah, ANV's split is awkward too

20:30 <bbrezillon> even if that means duplicating some code

20:31 <bbrezillon> I mean, duplicating code in the .o

20:31 <jekstrand> yeah

20:31 <bbrezillon> all the files prefixed panvk_vX are per-arch

20:34 <jekstrand> Yup. Figured that one out already. :)

20:39 <alyssa> jekstrand: not going to complain about panvk_per_arch? :-p

20:42 <jekstrand> alyssa: Yeah, it's a bit long.

20:52 <bbrezillon> alyssa: VKGENX()? can't use GENX() because it's suffixing stuff with vX, and we need a prefix for entrypoints generation

20:52 q4a has joined #panfrost

20:52 <jekstrand> vX()?

20:53 * jekstrand isn't going to be opinionated here. -ENOTMYPROJECT

21:03 <bbrezillon> wfm. s/panvk_per_arch(/vX(/ should be pretty easy to review compared to a new feature addition ;-)

21:11 stebler[m] has quit [Server closed connection]

21:11 stebler[m] has joined #panfrost

21:32 q4a has left #panfrost [#panfrost]

21:42 tolszak has quit [Ping timeout: 480 seconds]