#panfrost on 2021-07-20 — irc logs at oftc.irclog.whitequark.org

2021-06-22 12:29 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

01:00 atler is now known as Guest1449

01:00 atler has joined #panfrost

01:07 Guest1449 has quit [Ping timeout: 480 seconds]

01:30 stano_ has joined #panfrost

01:34 stano has quit [Ping timeout: 480 seconds]

01:59 camus has joined #panfrost

02:04 camus1 has quit [Ping timeout: 480 seconds]

02:26 rando25892 has quit [Ping timeout: 480 seconds]

02:53 camus1 has joined #panfrost

02:57 camus has quit [Remote host closed the connection]

03:25 camus has joined #panfrost

03:25 camus1 has quit [Read error: Connection reset by peer]

03:40 vstehle1 has quit [Ping timeout: 480 seconds]

03:51 camus1 has joined #panfrost

03:51 camus has quit [Remote host closed the connection]

05:00 vstehle1 has joined #panfrost

06:20 rando25892 has joined #panfrost

06:53 rando25892 has quit [Remote host closed the connection]

06:53 rando25892 has joined #panfrost

07:01 <tomeu> bbrezillon: when executing, you mean on batch_close?

07:07 <bbrezillon> tomeu: I mean when vkCmdExecuteCommands() is called

07:08 <bbrezillon> you should have a valid state (FB, pipeline, ...) when that happens

07:10 <bbrezillon> tomeu: BTW, you shouldn't really 'close' the batch when recording secondary commands (at least not the sort of close we do for primary cmd buffers)

07:19 <bbrezillon> correction: you only need to special case the close for secondary cmdbufs that are supposed to be called inside a render pass initiated by the primary cmdbuf

07:22 <tomeu> ok, ok, I think I'm starting to see how this is supposed to work

07:23 rasterman has joined #panfrost

07:23 <tomeu> bbrezillon: we indeed need to do some closing, as for example thousands of draw commands could be recorded in the same secondary buffer, causing an overflow of the job index

07:24 <tomeu> bbrezillon: what are you referring to when you say special casing the closes of "incomplete" secondary buffers?

07:25 <bbrezillon> tomeu: depends how you handle that I guess. I mean, if those draws are part of a render pass initiated by the primary command buf, I'd let the ExecuteCommands() do the split

07:26 <bbrezillon> tomeu: I mean you shouldn't emit the TLS, FBD, ... when closing the batch (I'm not even sure close_batch() should be called actually)

07:29 wwilly has joined #panfrost

07:29 <bbrezillon> when recording a secondary cmdbuf, store tiler/vertex jobs in CPU memory and keep a reference to each job issued. Then, when ExecuteCommands is called, you patch all those jobs and queue them to the primary batch

07:33 <tomeu> ok, so we would not be using any of the batch code

07:33 <bbrezillon> of course, if you're not in a primary render pass, or have all the information you need (FB and render pass passed through VkCommandBufferInheritanceInfo), you can issue the draws normally

07:34 <bbrezillon> tomeu: you might be able to re-use some bits, but most of it would be different, yes

07:35 <tomeu> ok, will we see how we can reshuffle things so we don't end up with 2 different drivers :)

07:37 <bbrezillon> actually, even if you're not supposed to be called in a primary render pass, if VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT is set, you have to keep the various descs in CPU memory, and copy them to GPU mem when ExecuteCommands is called

07:41 <bbrezillon> tomeu: yep, those changes are quite invasive, anything containing pointers has to be patched when vkCmdExecuteCommands()

07:41 <bbrezillon> is called

07:44 <bbrezillon> tomeu: there's still the option of recording vkCmdXxx() calls in a high-level representation (basically keeping the VkXxx structs around for each of these call) and replaying them when ExecuteCommands() is called

07:44 <tomeu> yeah, was wondering about that, and if it's something that could be reused among vulkan drivers

07:44 <tomeu> was thinking of the perf implications of ithat

07:45 <bbrezillon> well, you get the Vk -> HW-desc conversion overhead each time the secondary command buf is executed, so it's not ideal

07:45 <bbrezillon> but it's way simpler to implement :)

07:46 <tomeu> guess the idea behind secondary cmdbufs is that translating vulkan state to GPU state is done once and reused multiple times, but if due to the GPU design we anyway need to patch it so heavily, it just might not be a win on mali

07:53 <bbrezillon> I guess it depends how often the secondary buf is re-used, and it's hard to tell how much CPU time can be saved without implementing both solutions, but I'm all for implementing the simplest approach first and leaving this potential optimization for later

07:59 <tomeu> yeah, me too

07:59 <tomeu> specially, as "Specifying the exact framebuffer that the secondary command buffer will be executed with may result in better performance at command buffer execution time."

07:59 <tomeu> the spec warns about it

08:02 wwilly_ has joined #panfrost

08:09 wwilly has quit [Ping timeout: 480 seconds]

08:12 camus has joined #panfrost

08:12 <bbrezillon> tomeu: sure, but that's not enough. I mean, emitting draws outside of an existing batch (which you can do if you have a FB) when we could have merged those with the batch coming from the primary cmdbuf has a cost (extra FB reloads)

08:14 camus1 has quit [Read error: Connection reset by peer]

08:32 warpme_ has joined #panfrost

09:57 <daniels> icecream95: there's nothing stopping you from getting involved and working if you want to - you can buy the same phone, download the same images, work on the same hardware

10:09 <tomeu> bbrezillon: wonder if by looking at this cmdstream diff you may have any ideas: https://paste.debian.net/1204984/

10:10 <tomeu> in the RAW32 case, the result is all zeroes

10:11 <tomeu> I'm not sure why the stride is different

10:11 <bbrezillon> the clear color is different

10:12 <bbrezillon> but it's probably normal (different clear value for uint and unorm)

10:13 <bbrezillon> looks like the FB is not cleared

10:13 <bbrezillon> Color:(0, 0, 0, 0)

10:14 <bbrezillon> are you sure you pass the right value through the uniform?

10:15 <bbrezillon> also need to check the shader (should be an uint color in one case, and a float in the other)

10:15 <bbrezillon> blend descriptors are also worth a check (they should not be the same for UNORM and UINT)

10:16 <bbrezillon> tomeu: did you check the TestResults.qpa output?

10:16 <tomeu> yeah, there's no dump, but mentions that the first pixel in the failing case is all zeroes

10:18 <tomeu> by blend descriptor, you mean the internal conversion descriptor?

10:19 <tomeu> that's the same because it's using the same shader

10:19 <tomeu> (that's how it's done in buf2img, which I used as a template)

10:22 camus1 has joined #panfrost

10:23 camus has quit [Remote host closed the connection]

10:42 camus has joined #panfrost

10:43 camus1 has quit [Read error: Connection reset by peer]

10:59 nlhowell has quit [Ping timeout: 480 seconds]

11:08 camus1 has joined #panfrost

11:12 camus has quit [Ping timeout: 480 seconds]

11:49 wwilly has joined #panfrost

11:50 wwilly__ has joined #panfrost

11:52 wwilly_ has quit [Ping timeout: 480 seconds]

11:58 wwilly has quit [Ping timeout: 480 seconds]

12:16 <bbrezillon> tomeu: buf2img is a bit different though, it's expected to do a raw copy, but what you want here is draw a rect with a fixed color, and the clear value will depend on the format

12:18 <bbrezillon> tomeu: do you this code pushed to a public repo?

12:21 <tomeu> bbrezillon: yep: https://gitlab.freedesktop.org/tomeu/mesa/-/commits/panvk-vkCmdClearAttachments/

12:22 <tomeu> btw, what I'm looking at here is at how I'm reading the clear values from the UBO

12:22 <tomeu> I think that's what is being the problem in this case

12:57 camus has joined #panfrost

13:02 camus1 has quit [Ping timeout: 480 seconds]

15:50 camus1 has joined #panfrost

15:51 wwilly_ has joined #panfrost

15:52 megi has quit [Quit: WeeChat 3.2]

15:52 megi has joined #panfrost

15:54 camus has quit [Ping timeout: 480 seconds]

15:58 wwilly__ has quit [Ping timeout: 480 seconds]

16:41 camus has joined #panfrost

16:42 camus1 has quit [Ping timeout: 480 seconds]

18:14 macc241 has joined #panfrost

18:14 macc24 has quit [Read error: Connection reset by peer]

18:14 macc241 has quit []

18:15 macc24 has joined #panfrost

18:47 macc24 has quit [Quit: WeeChat 3.2]

18:48 macc24 has joined #panfrost

19:09 <macc24> woah

19:09 <macc24> minecraft bedrock edition still isn't broken

19:16 <macc24> https://i.imgur.com/myHxzgv.png

19:16 <macc24> and runs pretty well actually

19:17 <macc24> (yes i know that my desk is a mess)

19:30 <alyssa> After a month of reverse-engineering, I present documentation on the Arm® Mali™-G78 instruction set. Get it while it's hot 🔥

19:31 <alyssa> https://www.collabora.com/news-and-blog/news-and-events/reverse-engineering-the-mali-g78.html

19:32 <macc24> :eyes:

19:50 <urja> wow, this pdf looks nice

19:50 <alyssa> urja: thank you :)

19:50 <alyssa> will be moved to collabora.com shortly, am having technical difficulties oops

19:51 <alyssa> No GPUs were harmed in the making of this document.

19:53 <macc24> fun fact: sku0 duets are shipping lol got cadmium user with one

19:57 <HdkR> alyssa: Good job!

19:57 <alyssa> HdkR: Thank you <3

20:01 <alyssa> HdkR: My Twitter is being very loud :-p

20:01 <HdkR> So many birds

20:03 <anarsoul> alyssa: so ARM didn't provide ISA documentation for Valhall?

20:03 <alyssa> anarsoul: Collabora is providing ISA documentation for Valhall 😉

20:04 <alyssa> That is my story and I'm sticking to it.

20:04 <alyssa> daniels: ^ 🙃

20:07 <urja> alyssa: on page 6 the table for blend shaders ABI is not being an actual table, i think

20:10 <urja> or page 5 as the page claims itself to be, 6 as by the pdf reader :P

20:17 <alyssa> uhhh

20:18 <robclark> alyssa: heh, arm still uses the same program-binary format? I think qcom is on their ~3rd iteration (for ir3)

20:18 <alyssa> urja: Oh, oof, thanks. Fixing

20:19 <alyssa> robclark: I mean, I think they keep shuffling things around, but the "scan for magic word and then start disassembling" hack still works :-p

20:19 <robclark> oh, qcom's first iteration was "scan for magic word".. they later went to a more structured things with headers and offsets to different sections (which have offsets to different sections, and so on)

20:20 <alyssa> Nod, that sounds like MBS

20:20 <alyssa> again I didn't actually pay attention to the MBS, just knew enough to look for OBC :p

20:20 <alyssa> *OBJC

20:20 <alyssa> urja: Fixed now, thanks for spotting it!

20:22 * urja refreshes

20:22 <urja> yup, fixed :)

20:22 <alyssa> urja: Didn't realize that part of pandoc markdown was case sensitive

20:22 <alyssa> Need to find a home for the docs

20:23 <alyssa> most of it is generated from ISA.xml which will be upstreamed to Mesa when I can get around to git rebase'ing harder

20:23 <alyssa> the intro/appendix is extra prose (CC BY-SA) which I guess I can stick on gitlab.freedesktop.org

20:24 <urja> I was sorta amused by "Putting it together gives a code sequence for sin" :D

20:25 <urja> sinful code :P

20:25 <alyssa> 👿

20:27 <urja> not 😈?

20:28 <alyssa> sure

20:29 <macc24> hmm? https://i.imgur.com/lhKnqs9.jpeg

20:43 camus1 has joined #panfrost

20:44 Daanct12 has joined #panfrost

20:48 camus has quit [Ping timeout: 480 seconds]

20:49 Danct12 has quit [Ping timeout: 480 seconds]

20:50 Daanct12 is now known as Danct12

20:58 <cphealy> macc24: is that a Nintendo Switch? ;-)

20:59 <macc24> cphealy: nintendo switch isn't worth my time ;-)

21:04 <icecream95> alyssa: Typo: "kerenl interface". At least you didn't follow Commode by calling it "kernal"...

21:14 <alyssa> icecream95: Fixed, thanks :)

21:30 <icecream95> alyssa: "Level-of-detail bias (as a 16-bit fixed-point". Are you sure you don't mean "a signed two's-complement 16-bit fixed-point"?

21:30 <icecream95> That would mean my frist patch for all of Midgard, Bifrost and Valhall would be for negative LOD bias :)

21:31 <alyssa> icecream95: I assume so, not sure I checked negative values correctly but yes.

21:31 <alyssa> s/correctly/at all/

21:31 <alyssa> I don't think the text as written is wrong

21:35 <alyssa> Valhall really is "Bifrost, the good parts"

21:36 <icecream95> Because unsigned addition is the same as signed addition for twos-complement, or?

21:37 <alyssa> I mean I didn't specify a sign?

21:37 <alyssa> https://i.redd.it/h7nt4keyd7oy.jpg

21:38 <alyssa> except Bifrost vs Valhall

21:39 <alyssa> Clauses? Gone. Irregular encoding? Gone. Monolithic data structures? Split up.

21:39 <alyssa> TBF that last one dates back to utgard

21:49 <icecream95> alyssa: "number of instructions to the next BLEND intsruction minus". Missing "one"? So an off-by-one error on the off-by-one?

21:50 <alyssa> yes, and it should be fixed now

21:50 <alyssa> well already was fixed but uh

21:53 <icecream95> "instructions et giving the same building blocks". ?

21:54 camus has joined #panfrost

21:55 camus1 has quit [Ping timeout: 480 seconds]

21:57 <icecream95> "FMA_RSCALE.f32 scaled, x, #24" seems to be missing a source

22:12 <alyssa> uhhh indeed it is

22:13 <alyssa> Corrected, thanks

22:19 rasterman has quit [Quit: Gettin' stinky!]

23:05 camus1 has joined #panfrost

23:07 camus has quit [Ping timeout: 480 seconds]

23:11 <icecream95> I do wonder why 0.75 << 20 (intBitsToFloat(0x49400000)) was chosen as the "SINCOS_BIAS", it should only differ from 0.5 << 20 for values where sin wouldn't give a sensible answer anyway

23:11 * icecream95 tries some negative numbers and finds out why

23:12 <icecream95> (Otherwise for negative numbers the exponent would decrease so the answer would be shifted left by one place)

23:13 <alyssa> Not sure what the story with the magic # is but it's the same for bifrost/valhall

23:33 <icecream95> alyssa: Floats have 24 mantissa^W significand bits.

23:33 <icecream95> After multiplying by 2/pi, the range of values is [0, 4) (because 2*pi * 2/pi = 4), so we want to put (value % 4) in the lower 6 bits of the float

23:33 <icecream95> 4 is represented in floating-point with a significand of 0x800000 >> 24 (0.5) and an exponent of 3, as 0.5 << 3 == 4

23:33 <icecream95> We want to find a number that, when added to 4, shifts the 1 bit in the significand to bit 7 ^H 6

23:33 <icecream95> 0x800000 >> x = 0x40. log2(0x800000) - x = log2(0x40). x = log2(0x800000) - log2(0x40) = 23 - 6 = 17

23:33 <icecream95> But 4 already has an exponent of three, so the magic value must have an exponent of 17 + 3 = 20

23:36 <icecream95> (The significand is fixed point 0:24 in the range [0.5, 1), so the first bit is always 1 and not stored)

23:36 <alyssa> Ah, clever. Nice find!

23:43 <icecream95> The Valhall documentation could probably make it clearer that it is only the lower six bits of the float/significand that equal 32/pi*(x mod 2pi), and add parentheses to show that it is 32/pi*(x mod 2pi) and not (32/pi*x) mod 2pi

23:47 <alyssa> Clarified, thanks.

23:51 <anholt_> https://gitlab.freedesktop.org/anholt/deqp-runner/-/merge_requests/14

23:53 <alyssa> Shiny!

23:53 <anholt_> oh, this isn't the channel I meant to drop that in.

23:53 <anholt_> but enjoy

23:59 * icecream95 learnt about the floating point format from the ZX Spectrum BASIC manual, which had far more of this sort of low-level stuff than any of the C++ books he later read