#panfrost on 2022-05-03 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:57 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

00:00 rasterman has quit [Quit: Gettin' stinky!]

00:04 rkanwal has quit [Quit: rkanwal]

00:04 rkanwal has joined #panfrost

00:37 * icecream95 wonders if a G52r1 blob can be used with G52r0 after messing with the GPU properties a little

00:38 <icecream95> First I have to work out why eglGetDisplay is returning EGL_NO_DISPLAY

00:41 <HdkR> `EGL_DEFAULT_DISPLAY`? :)

00:42 jambalaya has quit [Ping timeout: 480 seconds]

00:43 <icecream95> Maybe debugging would be easier if I used EGL directly rather than via Waffle

00:51 jambalaya has joined #panfrost

00:58 <alyssa> icecream95: I assume so

00:59 <alyssa> Probably G76 too

01:07 rkanwal has quit [Quit: rkanwal]

01:11 <icecream95> Oops, Waffle is loading GL symbols from GLVND libGLESv2 rather than libmali

01:12 <alyssa> Yum

01:12 <alyssa> I wonder if that works on old versions of fedora

01:12 <alyssa> yum install waffle

01:12 <alyssa> nom nom nom

01:13 <icecream95> And of course pandecode is crashing: "Access to unknown memory 20900000001"

01:14 <alyssa> pandecode: not the best software out there

01:14 <alyssa> pandecode: at least it uses genxml, now.

01:15 <icecream95> Maybe my Valhall branch of pandecode can still work on Bifrost? That had a few changes to make it work better

01:15 <alyssa> eh?

01:15 <alyssa> something I keep wanting is teaching GenXML about state dependence

01:16 <alyssa> e.g. only print the secondary preload section if secondary_shader is set

01:16 <alyssa> (and ideally, complain if it's nondefault.)

01:17 <alyssa> unfortunately, hard to do so in a way that isn't horrible, so i've Just Dealt with the verbosity...

01:17 <icecream95> panwrap will probably become even more of a mess when the next Mali architecture is released...

01:17 <alyssa> ugh.. probably..

01:18 <alyssa> v11, you mean?

01:18 <icecream95> Even v10 is a big mess, because I don't know all the command-stream commands

01:19 erle has quit [Ping timeout: 480 seconds]

01:20 <alyssa> *nod*

01:20 <alyssa> I guess they announce these things in May, don't they.. crap! :-p

01:20 <icecream95> Like: how does calling other buffers work? Does it keep using the same state, or start clean? There are four command-stream buffers, why is that?

01:20 * alyssa is still secretly hoping for FOSS csf firmware some day

01:21 <icecream95> alyssa: Say you've hacked into Arm and will release the entire kbase source unless they make free CSF firmware

01:21 <alyssa> oh no, public kbase source code! :p

01:22 <icecream95> Threaten to release the CPU and GPU designs as well if you need to

01:22 <icecream95> "Everyone is going to know how to make an ARM1 and you'll go bankrupt!"

01:24 <icecream95> http://visual6502.org/sim/varm/armgl.html

01:27 <icecream95> Oops: JS: Job Hard-Stopped (took more than 50 ticks at 100 ms/tick)

01:28 <icecream95> alyssa: Weren't you getting GPU timeouts on G52 with panfrost before fixing some bug? I'm jusing using a stock Arm kbase, maybe it is broken in the same way

01:30 <icecream95> "brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout". This is the third bit of hardware I've used where the seems to be a mysterious connection between the GPU and wifi!

03:04 erle has joined #panfrost

04:27 JulianGro has quit [Remote host closed the connection]

05:34 guillaume_g has joined #panfrost

06:42 digetx has quit [Ping timeout: 480 seconds]

06:43 digetx has joined #panfrost

06:48 MajorBiscuit has joined #panfrost

07:58 Lyude has quit [Ping timeout: 480 seconds]

07:58 Lyude has joined #panfrost

08:48 rasterman has joined #panfrost

08:57 <icecream95> Fun.. the new base_jd_atom struct is 16 bytes larger, but panwrap ignores the stride in the ioctl

09:12 robmur01__ has quit []

09:13 robmur01 has joined #panfrost

09:51 <icecream95> The mali blob I'm using works better under panfrost.ko than mali_kbase.ko :D

10:00 digetx has quit [Ping timeout: 480 seconds]

10:15 digetx has joined #panfrost

10:24 <macc24> icecream95: h-how did you make it run under panfrost.ko?

10:34 <icecream95> macc24: A few hacks to panloader to make it call functions in Mesa to create BOs and submit jobs

10:35 <macc24> icecream95: i assume that by using it i'm throwing away all chances of getting support when it doesn't work, right?

10:38 <icecream95> macc24: There's no window-system integration, so have fun trying to use it for anything but Piglit, dEQP or apitrace

10:39 <macc24> ok.

10:40 <icecream95> I don't know that adding it would be all that hard, but then people might actually try to use it...

11:00 MajorBiscuit has quit [Ping timeout: 480 seconds]

11:01 rkanwal has joined #panfrost

11:11 <q4a> Does panvk have some milestones like https://github.com/Yours3lf/rpi-vk-driver/wiki/Development-milestones ? Or any roadmap (without ETA)?

11:12 <q4a> And may be it's time to add panvk to https://docs.mesa3d.org/drivers/panfrost.html#building ? Now I see "-Dvulkan-drivers="

11:13 <macc24> q4a: idk but adding a driver that fails without env var that's basically "i want a broken vulkan driver" might imply it's working and that would bring many bogus complaints^W bug reports

11:15 rasterman has quit [Quit: Gettin' stinky!]

11:16 rasterman has joined #panfrost

11:51 MajorBiscuit has joined #panfrost

11:53 anarsoul has quit [Quit: ZNC 1.8.2 - https://znc.in]

11:53 anarsoul has joined #panfrost

12:18 WoC has quit [Remote host closed the connection]

12:44 icecream95 has quit [Ping timeout: 480 seconds]

12:50 <alyssa> icecream95: Yeah, a lot of bugs that are logically faults (and show up as faults on Midgard) end up as hangs/timeouts on G52. Guessing it's a hardware quirk. PAN_MESA_DEBUG=sync catches both.

13:12 MajorBiscuit has quit [Ping timeout: 480 seconds]

13:16 MajorBiscuit has joined #panfrost

14:45 MajorBiscuit has quit [Ping timeout: 480 seconds]

14:51 MajorBiscuit has joined #panfrost

14:55 JulianGro has joined #panfrost

15:25 JulianGro has quit [Remote host closed the connection]

15:40 <alyssa> Re separable shader woes on Valhall, here's a proposal:

15:41 <alyssa> The spec guarantees that slots are compatible across shader stages. The implication is that linking happens per slot.

15:41 <alyssa> It also implicitly assumes slots are vec4s.

15:42 <alyssa> Assume we do all fp32/flat32 varyings for the separable shader case. (This could be optimized but that's an orthogonal change.)

15:42 <alyssa> If the shader uses only VARYING_SLOT_VARn, then each varying is at offset ((slot - VARYING_SLOT_VAR0) * 16).

15:43 <alyssa> And the size of the varyings needed for a shader is given by util_last_bit(varyings >> VARYING_SLOT_VAR0)

15:43 <alyssa> * 16

15:43 <alyssa> One shader might have extra varyings, to prevent them from stomping over other things, have the hardware allocate MAX2(vs varying size, fs varying size)

15:48 <alyssa> So this suffices for GLES. What about big GL?

15:48 <alyssa> For big GL, we also need to allocate space for things like COL0 and TEX0

15:49 <alyssa> Allocating space for all of them unconditionally won't work (efficiently)

15:50 <alyssa> We have a few options (on one extreme -- dynamically link with more sysvals, no variants but complex with a runtime cost; on the other hand, key entire shaders together to defeat the point of separable shaders)

15:50 <alyssa> I think the right compromise is in the middle: Key to the mask of special varyings written by the VS.

15:55 <alyssa> How would that work?

15:55 <alyssa> Practically, it just becomes an extra item in the FS key. Nothing tricky there.

15:56 <alyssa> We introduce some new ABI, deciding how varyings should be laid out.

15:56 <alyssa> The natural ABI is "special varyings come before general varyings, and are tightly packed, sorted by locations, 16 byte vec4 each"

15:57 <alyssa> With this ABI, separable vertex shaders do not need any keying. To store varying at slot S, store at offset:

15:57 <alyssa> if (slot >= VARYING_SLOT_VAR0) {

15:58 <alyssa> return (util_bitcount(special_outputs_mask) + (slot - VARYING_SLOT_VAR0)) * 16

15:58 <alyssa> } else {

16:00 <alyssa> return select(special_outputs_mask, slot) * 16

16:01 <alyssa> }

16:01 <alyssa> Note that special_outputs_mask is just info.outputs_written without VARYING_SLOT_VAR*

16:01 <alyssa> Fragment shaders need to be keyed to the VS's special_outputs_mask, from that snippet it's obvious way.

16:02 <alyssa> Then fragment shaders load varyings at offsets given by that same code.

16:03 <alyssa> Some details need extra attention:

16:04 <alyssa> In the VS, that routine is only called for slots being written. Therefore, if slot < VARYING_SLOT_VAR0, then (1 << slot) & special_outputs_mask. So the select is well-defined.

16:04 <alyssa> In the FS, the same holds, provided the FS doesn't read any uninitialized varyings.

16:06 <alyssa> The ARB_separate_shader_objects spec says on the matter:

16:06 <alyssa> Separate linking creates the possibility that certain output varyings

16:06 <alyssa> possible input varyings from a shader may not be written as output

16:06 <alyssa> In this case, the output varyings are simply ignored. It is also

16:06 <alyssa> of a shader may go unread by the subsequent shader inputting varyings.

16:06 <alyssa> varyings of a preceding shader. In this case, the unwritten input

16:06 <alyssa> varying values are undefined.

16:06 <alyssa> So we may assume the select is well-defined for the same, if it's not, the app has hit undefined behaviour.

16:08 <alyssa> I also didn't formally define select.

16:09 <alyssa> Given a bit b set in a mask m, select(m, b) returns the index of b in the list of *set* bits of m.

16:09 <alyssa> (implementing the compaction semantics)

16:09 <alyssa> Equivalently, select(m, b) returns the number of set bits in m[0:b]

16:10 <alyssa> Easily implemented as util_bitcount(m & BITFIELD_MASK(b)), nice and constant time.

16:11 <alyssa> For fun: we already have that helper for linking varyings on Midgard ;)

16:11 <alyssa> slightly different use, same idea though

16:13 <alyssa> The varying size calculation needs to be updated: we need to add (to both the VS and FS sizes)-- util_bitcount(special_outputs_mask) * 16

16:13 <alyssa> But other than that, everything is the same.

16:13 <alyssa> Now, let's talk why this is a good idea.

16:15 <alyssa> + For GLES, no special varyings exist. So the special varying mask is always 0, so there are 0 additional shader variants. Additionally, the offset util_bitcount(special_outputs_mask) * 16 = 0, so we degrade to the optimal GLES case softly.

16:15 <alyssa> (You only pay for what you use.)

16:19 <alyssa> + For GL, when separable shaders are *not* in use, the VS's special_outputs_mask equals the FS's special_inputs_mask. So there's no hit for !separable shaders, and this is a reasonable ABI for that case too.

16:19 <alyssa> + So we don't special case separable vs not-separable anywhere, we just get correct behaviour for both by default.

16:20 MajorBiscuit has quit [Ping timeout: 480 seconds]

16:21 <alyssa> + Mixing and matching the same FS with multiple VSes that write different special GL varyings seems... not unlikely but limited. And big GL on new hardware is all about shader variants anyway :p So this is a pretty cheap solution. Not sure if anyone even uses separable + compat GL varyings, although IIRC it's in the spec (Kayden and I checked :( )

16:21 <alyssa> + Variants aside, all of this is pretty cheap at draw time and the code is dead simple (especially compared to the dynamic linking we do on Midgard)

17:53 guillaume_g has quit []

19:36 rasterman has quit [Quit: Gettin' stinky!]

19:45 <alyssa> Mhhhhhh

19:45 <alyssa> multiple draws with vertex shaders that spill is probably broken

19:45 <alyssa> no dep

19:45 <alyssa> although I guess job_barrier band-aids over that

19:50 pch has quit [Ping timeout: 480 seconds]

21:15 rasterman has joined #panfrost

21:21 anarsoul has quit [Ping timeout: 480 seconds]

21:23 JulianGro has joined #panfrost

21:35 rasterman has quit [Quit: Gettin' stinky!]

21:37 rasterman has joined #panfrost

21:44 rasterman has quit [Quit: Gettin' stinky!]

21:46 rasterman has joined #panfrost

21:51 <alyssa> 32 files changed, 780 insertions(+), 322 deletions(-)

21:51 <alyssa> ugh, how does this diffstat keep growing? D:

21:57 kenzie has quit [Quit: The Lounge - https://thelounge.chat]

22:08 rasterman has quit [Quit: Gettin' stinky!]

22:22 kenzie has joined #panfrost

23:07 icecream95 has joined #panfrost

23:07 icecream95 has quit []

23:08 icecream95 has joined #panfrost

23:12 kenzie has quit [Quit: The Lounge - https://thelounge.chat]

23:20 anarsoul has joined #panfrost

23:25 kenzie has joined #panfrost

23:43 kenzie has quit [Quit: The Lounge - https://thelounge.chat]

23:47 kenzie has joined #panfrost

23:50 rkanwal has quit [Ping timeout: 480 seconds]