#lima on 2021-11-26 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel driver has landed in mainline, userspace driver is part of mesa - Logs at https://oftc.irclog.whitequark.org/lima/

01:30 <anarsoul> rellla: can you please redo the dumps of http://imkreisrum.de/deqp/deqp-complete-dumps_mali400-r7p0_on-allwinner-a20/results/dEQP-GLES2.functional/multisample/ with --deqp-gl-config-name=rgba8888d24s8ms4 ?

04:03 chewitt has quit [Ping timeout: 480 seconds]

04:27 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

04:28 jernej has joined #lima

04:44 camus has joined #lima

04:45 Danct12 has quit [Remote host closed the connection]

04:46 Danct12 has joined #lima

04:48 Danct12 has quit [Remote host closed the connection]

04:48 Danct12 has joined #lima

05:07 chewitt has joined #lima

06:40 camus1 has joined #lima

06:40 chewitt has quit [Read error: Connection reset by peer]

06:41 camus has quit [Ping timeout: 480 seconds]

06:43 Danct12 has quit [Quit: Quitting]

06:50 Danct12 has joined #lima

06:54 <rellla> anarsoul: uff, but sure. i will check if mali is still set up on the soc. if i find the soc :)

07:18 <anarsoul> rellla: I just need multisample tests, but if you can't do it that's fine

07:33 <rellla> anarsoul: if all is set up, it's just a onliner. i guess i will also update the syscall-tracker before

07:34 <anarsoul> I almost got msaa working

07:34 <anarsoul> only alpha_to_coverage fails

07:34 <anarsoul> oh, and stencil fails

07:35 <anarsoul> dEQP-GLES2.functional.multisample.stencil

07:40 <anarsoul> oh, looks like we need to disable early_z for alpha_to_coverage

07:41 <anarsoul> OK, now only stencil fails

08:23 <anarsoul> rellla: btw, any further comments on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13873 ?

08:44 <rellla> anarsoul: will look at it later. will upload the dumps in a few mins...

08:52 <rellla> anarsoul: http://imkreisrum.de/deqp/dEQP-GLES2.functional/

09:17 <rellla> btw, i don't have dEQP-GLES2.functional.multisampled_render_to_texture.readpixels in my caselist...

09:21 <rellla> ah, probably an old deqp version ;)

09:31 chewitt has joined #lima

13:32 camus1 has quit [Ping timeout: 480 seconds]

13:39 camus has joined #lima

13:54 camus has quit []

14:11 misdirections has joined #lima

14:24 misdirections has quit [Remote host closed the connection]

15:18 drod has joined #lima

15:25 camus has joined #lima

17:17 <anarsoul> rellla: thanks!

17:17 <anarsoul> .stencil actually passes on blob

17:23 <anarsoul> it sets wb1.zero = 0x40000

17:24 <anarsoul> and also wb1.mrt_pitch = 0xf

17:55 camus has quit []

18:02 <anarsoul> marex: I'm comparing mali regs from https://www.xilinx.com/html_docs/registers/ug1087/ug1087-zynq-ultrascale-registers.html with what we have in lima and it looks like there's 2 missing regs in pp_frame

18:03 <anarsoul> we call PP0_SUBPIXEL_SPECIFIER dubya in lima, it's value is always 0x77

18:04 <anarsoul> and PP0_ORIGIN_OFFSET_Y seems to be unused_2 in lima

18:04 <anarsoul> so we don't have 2 regs that are called "one" and "supersampled_height"

18:06 <anarsoul> so if documentation is correct I've no idea how lima may work on zynqmp

18:53 <anarsoul> OK, I think dEQP-GLES2.functional.multisample.stencil fails because we have reload in the middle :)

18:53 <anarsoul> so msaa state is lost

19:04 <enunes> anarsoul: I remember hitting the GL_MAX_SAMPLES GL_INVALID_ENUM thing when looking at the multisample implementation

19:04 <enunes> anarsoul: I came up with this patch for it back then https://paste.centos.org/view/raw/d39472c3 , should I send a separate MR for it or should we integrate it with yours?

19:05 <anarsoul> you can send it as a separate MR

19:07 <enunes> if you just force 3.0 with MESA_GLES_VERSION_OVERRIDE it also works, for debugging purposes

19:09 <anarsoul> let me try that...

19:10 <anarsoul> yeah, it works with MESA_GLES_VERSION_OVERRIDE=3.0

19:31 <marex> anarsoul: uh ... it works rather well on zynqmp

19:31 <anarsoul> marex: then doc is incorrect

19:32 <marex> anarsoul: I need one clock patch for the kernel which was rejected because xilinx didn't provide any helpful feedback on the clock topology, so I carry it downstream

19:32 <marex> anarsoul: I can report that to xilinx if you want

19:32 <anarsoul> marex: up to you :)

19:38 <anarsoul> oh, looks like reload for multisample is more complex for stencil

19:38 <anarsoul> blob does 4 draws instead of 1, one for each sample

19:39 <anarsoul> and it iterates over sample

19:39 <anarsoul> s/sample/sample_mask

19:41 <anarsoul> enunes: rellla: I just noticed that vertex selector for reload job is points

19:41 <anarsoul> so looks like it uses point sprites for reload

19:41 <anarsoul> (that's in case if anyone is looking into implementing point sprites)

19:58 <marex> anarsoul: give me a minute, NMI, I will be back in say 30 minutes

20:04 <anarsoul> oh, and it uses 4 different textures for reload :\

20:12 <anarsoul> hehe

20:12 <anarsoul> turns out MSAA 4x is not that free if you need to preserve depth/stencil buffer

20:13 <anarsoul> it needs 4x size of depth/stencil buffer

20:25 <anarsoul> OK, I think our lima_pp_wb_reg definition is incorrect

20:25 <anarsoul> zero should be named flags and should go before mrt_bits

20:26 <anarsoul> in this case wb_reg definition matches zynqmp docs, however pp_reg still lacks 2 regs

20:27 <anarsoul> so for MSAA blobs enables 4 MRTs for depth/stencil buffer and allocates 4x buffer size

20:27 <anarsoul> then for reload it reloads each MRT individually

20:33 <enunes> anarsoul: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13967

20:36 <anarsoul> enunes: LGTM, but you'll need someone familiar with mesa to review it

20:36 <enunes> yes, just fyi

20:40 <anarsoul> well, the need to reload depth/stencil 4 times for MSAA somewhat explains why we have 4 registers for gl_FragColor

20:41 <anarsoul> i.e. why we need to specify gl_FragColor register 4 times

20:41 <anarsoul> I think dual source blending may also be broken for MSAA

20:42 <anarsoul> and that probably explains why ARM didn't expose it in their driver

21:45 <anarsoul> yay, dEQP-GLES2.functional.multisample.stencil is now fixed :)

21:45 <anarsoul> rellla: thanks a lot for the dumps :)

21:51 <anarsoul> I suspect that dual source blending may be broken with MSAA 16x, but it should be fine with 4x

21:52 <anarsoul> we don't expose MSAA 16x, so it probably doesn't matter

21:53 <marex> I am back (finally)

21:56 <marex> anarsoul: do you need me to test anything on zynqmp or check anything ?

21:56 <anarsoul> marex: not really, I just pointed that gpu reg documentation from xilinx doesn't actually correspond to what we have in the driver

21:57 <anarsoul> but it doesn't matter if it works fine for you

21:58 <marex> anarsoul: are you sure the xilinx docs are wrong ? maybe there are different variants of the mali core ?

21:58 <marex> (wrong with xilinx isn't really surprising though ... sigh)

22:04 <anarsoul> marex: I briefly checked what mesa does and what kernel driver does

22:04 <anarsoul> basically mesa just sends struct lima_pp_frame_reg {} to the driver

22:05 <anarsoul> and driver re-interprets it as array of uint32_t and sends to the hardware

22:05 <anarsoul> so it's either we exclude unused_1 and unused_2 from struct lima_pp_frame_reg {} somewhere

22:05 <anarsoul> or xilinx docs just omit them

22:09 <marex> anarsoul: arent those registers default 0 and you program 0 into them ?

22:10 <marex> (I'm still multiplexing between other things here, sorry)

23:17 drod has quit [Remote host closed the connection]