#panfrost on 2022-05-12 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:57 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

00:08 erle has quit [Ping timeout: 480 seconds]

00:13 erle has joined #panfrost

00:43 psydroid[m] has quit [Ping timeout: 480 seconds]

00:43 JulianGroOld[m] has quit [Ping timeout: 480 seconds]

00:43 jenneron[m] has quit [Ping timeout: 480 seconds]

00:43 toggleton[m] has quit [Ping timeout: 480 seconds]

00:43 go4godvin has quit [Ping timeout: 480 seconds]

00:43 unevenrhombus[m] has quit [Ping timeout: 480 seconds]

00:43 Dylanger has quit [Ping timeout: 480 seconds]

00:43 strongtz[m] has quit [Ping timeout: 480 seconds]

00:43 stebler[m] has quit [Ping timeout: 480 seconds]

00:43 CalebFontenotHaileysCuteNerdyB has quit [Ping timeout: 480 seconds]

01:38 rasterman has quit [Quit: Gettin' stinky!]

02:42 erle has quit [Ping timeout: 480 seconds]

05:01 erle has joined #panfrost

05:24 jenneron[m] has joined #panfrost

05:41 JulianGroOld[m] has joined #panfrost

06:06 toggleton[m] has joined #panfrost

06:07 anarsoul is now known as anarsoul|2

06:09 anarsoul|2 is now known as anarsoul

06:28 pjakobsson has quit []

06:28 guillaume_g has joined #panfrost

07:31 pendingchaos has quit [Remote host closed the connection]

07:32 pendingchaos has joined #panfrost

08:25 greenjustin has quit [Ping timeout: 480 seconds]

08:40 hexdump01 has joined #panfrost

08:42 hexdump01 has quit []

08:42 hexdump0815 has joined #panfrost

08:43 <hexdump0815> greenjustin: (in case you read this offline in the irc logs) i have tested your reworked gpu freq scaling patch on mainline and it seems to work well so far

08:44 <hexdump0815> i did not do too much testing yet, but so far i did not see any gpu page faults etc. and it seems to scale the gpu freq properly

08:44 <hexdump0815> this is on kukui-jacuzzi-kappa with v5.18-rc6

08:49 <hexdump0815> i'm using it on xorg xfce on debian bullseye still with some old mesa version right now (21.3.2)

09:37 camus has quit [Remote host closed the connection]

09:41 rasterman has joined #panfrost

09:43 camus has joined #panfrost

10:39 CounterPillow has quit [Quit: Bye.]

10:48 CounterPillow has joined #panfrost

10:48 CounterPillow has quit []

10:49 CounterPillow has joined #panfrost

10:50 CounterPillow has quit []

10:53 CounterPillow has joined #panfrost

10:54 CounterPillow has quit []

10:54 CounterPillow has joined #panfrost

11:03 rkanwal has joined #panfrost

11:07 MajorBiscuit has joined #panfrost

11:33 MajorBiscuit has quit [Quit: WeeChat 3.4]

12:50 megi has quit [Quit: WeeChat 3.5]

12:51 megi has joined #panfrost

13:03 pjakobsson has joined #panfrost

14:37 <jekstrand> alyssa: I think the time has come to split up bi_finalize_nir/bi_optimize_nir a bit.

14:37 <alyssa> jekstrand: Uh oh

14:37 <jekstrand> alyssa: For panvk, we want the same preprocess/finalize pattern the Intel drivers have.

14:38 <jekstrand> Basic idea is preprocess does up-front stuff that you know you want for all shaders like lower_tex, etc. and runs the optimization loop

14:38 <jekstrand> finalize does everything you need to do at the last minute before passing to the back-end.

14:38 * alyssa is listening

14:38 <jekstrand> Then panvk will do:

14:39 <jekstrand> 1. SPIR-V -> NIR and maybe a tiny bit to inline functions, etc.

14:39 <jekstrand> 2. bi_preprocess_nir

14:39 <jekstrand> 3. panvk-specific lowering like handling descriptor sets

14:39 <jekstrand> 4. compile_from_nir which calls bi_finalize_nir

14:39 <alyssa> ok...

14:39 <alyssa> Why is that better than what panvk does now?

14:40 <jekstrand> Because we're having do duplicate a bunch of stuff that's currently in bi_finalize_nir or bi_optimize_nir so it can happen before descriptor set lowering.

14:40 <alyssa> also can we kill the nir_lower_flrp nonsense across the tree while we're at it? I moved it from opt loop to "preprocess" on AGX and it hasn't blown up yet

14:40 <alyssa> Hmmmm, okay.

14:41 <jekstrand> Also, blend lowering should go between preprocess and finalize

14:41 <alyssa> Aha, sure

14:41 <alyssa> wait, but then the blend code doesn't get optimized

14:41 <alyssa> is that ok?

14:42 <alyssa> I guess for Bifrost it should be fine... Midgard relies heavily on opt_vectorize running after lower_blend

14:42 <jekstrand> The optimization loop gets run twice, once in preprocess and once in finalize

14:42 <alyssa> oh?

14:42 <jekstrand> The one in finalize shouldn't do much

14:42 <alyssa> doesn't that hurt compile time (at least for GLES)?

14:42 <jekstrand> We'll have to see. A single noop run shouldn't be bad

14:42 <alyssa> Hope so

14:43 <alyssa> In general I'm a little suspicious of our optimization loops in Mesa

14:43 <jekstrand> Me too

14:43 <alyssa> I realize there are some opts (esp. algebraic) that really need many runs to converge

14:44 <alyssa> but that's not true of every opt. and I think we just throw everything into the big loop and don't think about convergence issues (unless passes fight) and that's it..

14:44 <jekstrand> Yeah

14:44 greenjustin has joined #panfrost

14:45 <jekstrand> ugh... I'd forgotten that flrp was that ugly. I think I dug into why a while ago but gave up

14:47 <jekstrand> Of course, the commit message says nothing about why it needs to be part of the opt loop and why it needs this weird lower once buisiness.

14:47 <jekstrand> *sigh*

14:48 <alyssa> I guess if you're Intel with funny flrp lowering it might make sense

14:49 <alyssa> for any driver that wants the straightforward lowering... just call it once with eg lower_tex?

14:49 <jekstrand> maybe?

14:49 <alyssa> I should shader-db that I guess

15:19 <jekstrand> alyssa: Does GLES not have support for struct varyings?

15:20 <alyssa> jekstrand: It gets lowered

15:20 <alyssa> not sure if by NIR or by GLSL

15:21 <alyssa> Ideally panvk would lower and the backend remains unaware

15:30 psydroid[m] has joined #panfrost

15:38 go4godvin has joined #panfrost

15:38 go4godvin is now known as Guest513

15:58 guillaume_g has quit []

15:58 <jekstrand> alyssa: Agreed. I just don't know what lowering to call. :-/

15:59 <alyssa> Woof.

16:00 <alyssa> jekstrand: split_struct_vars doesn't want to do it

16:00 <jekstrand> I think the reason Intel drivers don't care is because they never look at varying variables

16:01 <alyssa> and look at what instead? sem.location?

16:01 <jekstrand> Just location

16:01 <jekstrand> Which we map to sensible things

16:02 <alyssa> Hm, might be doable for us too

16:02 <jekstrand> Iris might be doing some remapping somewhere. I don't remember

16:02 <alyssa> I can give that a look when I'm done staring at dEQP-GLES2.functional.shaders.loops.do_while_dynamic_iterations.nested_tricky_dataflow_1_fragment

16:02 <jekstrand> It's all a bit of a mess. There are about 4 different ways the information can get passed around and everyone who touches seems to feel the need to add another. :angry:

16:03 <alyssa> woof

16:03 <jekstrand> Like bi_emit_fragment_out uses driver_location to look up the varible so it can get the regular location. Why don't we just use the regular location?!?

16:03 <jekstrand> Because gallium...

16:04 <alyssa> Uhhhh

16:04 <alyssa> that might be unnecessary?

16:04 <alyssa> sem.location would probably be better?

16:05 <jekstrand> idk. sem is also a gallium thing

16:06 <alyssa> it is?

16:06 <jekstrand> Oh, that's right... The intel drivers do their own remapping using the VUE map thing

16:06 <jekstrand> woof

16:24 <cwabbott> looking up the variable sounds like a leftover from before sem.location was added

16:24 <cwabbott> and sem.location is definitely not just a gallium thing, nir_lower_io sets it

16:25 <cwabbott> the original point of driver_location is that it's the index into the table o' inputs/outputs setup by the backend

16:26 <cwabbott> or if your driver can do some fancy compression, it would facilitate that

16:26 <cwabbott> but then intel bypasses that and does their own compression

16:26 jekstrand has quit [Remote host closed the connection]

16:30 jekstrand has joined #panfrost

16:32 soreau has quit [Ping timeout: 480 seconds]

17:09 pch has quit [Remote host closed the connection]

17:14 pch has joined #panfrost

17:40 jekstrand has quit [Ping timeout: 480 seconds]

17:51 soreau has joined #panfrost

18:06 robmur01 has quit [Quit: Leaving]

18:35 rasterman has quit [Remote host closed the connection]

18:40 rasterman has joined #panfrost

19:18 jekstrand has joined #panfrost

19:19 jekstrand is now known as Guest534

19:19 jekstrand has joined #panfrost

19:19 Guest534 has quit []

19:19 jekstrand has quit []

19:20 jekstrand has joined #panfrost

19:21 <jekstrand> alyssa: Does panfrost support FB fetch from depth/stencil?

19:22 <jekstrand> I expect no

19:22 <alyssa> jekstrand: No, but the hardware does, and we could support it if we cared

19:22 <alyssa> No use case yet.

19:23 <jekstrand> kk

19:23 <jekstrand> Nifty

19:25 <alyssa> jekstrand: Also, future Malis might remove it. not sure yet.

19:36 <jekstrand> alyssa: Vulkan requires support for depth input attachments. You can go through textures if you want to split the render pass after every draw.

19:36 <alyssa> jekstrand: Ahhh

19:37 <jekstrand> For an immediate renderer, it's a flush. It sucks but oh, well. On a tiler, it's death.

19:38 <anholt_> is mali like qcm where you can just texture your tile buffer?

19:39 <alyssa> anholt_: depends what you mean by txture

19:40 <alyssa> It goes through the load/store pipe, not the texture pipe

19:40 <alyssa> (so it's samplerless)

19:40 <alyssa> but you can definitely read from the tilebuffer cheaply, which the driver does to implement blending in a bunch of cases

19:41 <anholt_> oh, I just misunderstood jekstrand's question, it was about fbfetch specifically, not just input attachments.

19:43 <jekstrand> I'm reworking input attachment lowering so it can optionally re-route to FB fetch

19:44 <jekstrand> For stuff in previous subpasses, you'll still get texturing. For self-dependencies, though, you'll get FB fetch.

19:44 <alyssa> do real apps (not benchmarks) make use of multiple subpasses?

19:45 <alyssa> (Android Vulkan deferred renderers I guess?)

19:51 <jekstrand> aztec ruins. :)

19:51 <jekstrand> Yes, there are some real apps that do

19:51 <jekstrand> I know someone who's working on writing a brand new render engine right now that's very much designed to take advantage of subpasses.

19:52 <jekstrand> In the mobile world, people do actually target subpasses.

19:52 <jekstrand> In desktop, not so much.

19:52 <alyssa> right, ok

19:52 <alyssa> SuperTuxKart on mobile would benefit a lot from subpasses

19:53 <alyssa> icecream95: You played with that, right?

20:05 rasterman has quit [Quit: Gettin' stinky!]

20:16 * jekstrand hates FB fetch. :-/

20:17 Danct12 has quit [Quit: Quitting]

20:17 <alyssa> jekstrand: Why?

20:18 <jekstrand> I get to do even more FS output var vec size mangling. :-(

20:18 <alyssa> Oh..

20:18 <alyssa> maybe load_output is a bad idea and we'd rather expose a load_tile intrinsic independent of the var...

20:18 <jekstrand> We could, maybe

21:04 rasterman has joined #panfrost

21:25 icecream95 has joined #panfrost

21:30 strongtz[m] has joined #panfrost

21:31 stebler[m] has joined #panfrost

21:32 <icecream95> alyssa: I tried making STK use framebuffer fetch rather than switching framebuffers all the time, but I think I gave up because recompiling after each change took too long

21:48 Danct12 has joined #panfrost

22:41 anarsoul|2 has joined #panfrost

22:42 anarsoul has quit [Read error: Connection reset by peer]

22:52 CalebFontenotHaileysCuteNerdyB has joined #panfrost

23:02 rasterman has quit [Quit: Gettin' stinky!]

23:05 rkanwal has quit [Ping timeout: 480 seconds]

23:08 anarsoul|2 is now known as anarsoul

23:14 <alyssa> icecream95: Ah, got it

23:29 Dylanger has joined #panfrost