ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel driver has landed in mainline, userspace driver is part of mesa - Logs at https://oftc.irclog.whitequark.org/lima/
<anarsoul> rellla: can you please redo the dumps of http://imkreisrum.de/deqp/deqp-complete-dumps_mali400-r7p0_on-allwinner-a20/results/dEQP-GLES2.functional/multisample/ with --deqp-gl-config-name=rgba8888d24s8ms4 ?
chewitt has quit [Ping timeout: 480 seconds]
jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
jernej has joined #lima
camus has joined #lima
Danct12 has quit [Remote host closed the connection]
Danct12 has joined #lima
Danct12 has quit [Remote host closed the connection]
Danct12 has joined #lima
chewitt has joined #lima
camus1 has joined #lima
chewitt has quit [Read error: Connection reset by peer]
camus has quit [Ping timeout: 480 seconds]
Danct12 has quit [Quit: Quitting]
Danct12 has joined #lima
<rellla> anarsoul: uff, but sure. i will check if mali is still set up on the soc. if i find the soc :)
<anarsoul> rellla: I just need multisample tests, but if you can't do it that's fine
<rellla> anarsoul: if all is set up, it's just a onliner. i guess i will also update the syscall-tracker before
<anarsoul> I almost got msaa working
<anarsoul> only alpha_to_coverage fails
<anarsoul> oh, and stencil fails
<anarsoul> dEQP-GLES2.functional.multisample.stencil
<anarsoul> oh, looks like we need to disable early_z for alpha_to_coverage
<anarsoul> OK, now only stencil fails
<anarsoul> rellla: btw, any further comments on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13873 ?
<rellla> anarsoul: will look at it later. will upload the dumps in a few mins...
<rellla> btw, i don't have dEQP-GLES2.functional.multisampled_render_to_texture.readpixels in my caselist...
<rellla> ah, probably an old deqp version ;)
chewitt has joined #lima
camus1 has quit [Ping timeout: 480 seconds]
camus has joined #lima
camus has quit []
misdirections has joined #lima
misdirections has quit [Remote host closed the connection]
drod has joined #lima
camus has joined #lima
<anarsoul> rellla: thanks!
<anarsoul> .stencil actually passes on blob
<anarsoul> it sets wb1.zero = 0x40000
<anarsoul> and also wb1.mrt_pitch = 0xf
camus has quit []
<anarsoul> marex: I'm comparing mali regs from https://www.xilinx.com/html_docs/registers/ug1087/ug1087-zynq-ultrascale-registers.html with what we have in lima and it looks like there's 2 missing regs in pp_frame
<anarsoul> we call PP0_SUBPIXEL_SPECIFIER dubya in lima, it's value is always 0x77
<anarsoul> and PP0_ORIGIN_OFFSET_Y seems to be unused_2 in lima
<anarsoul> so we don't have 2 regs that are called "one" and "supersampled_height"
<anarsoul> so if documentation is correct I've no idea how lima may work on zynqmp
<anarsoul> OK, I think dEQP-GLES2.functional.multisample.stencil fails because we have reload in the middle :)
<anarsoul> so msaa state is lost
<enunes> anarsoul: I remember hitting the GL_MAX_SAMPLES GL_INVALID_ENUM thing when looking at the multisample implementation
<enunes> anarsoul: I came up with this patch for it back then https://paste.centos.org/view/raw/d39472c3 , should I send a separate MR for it or should we integrate it with yours?
<anarsoul> you can send it as a separate MR
<enunes> if you just force 3.0 with MESA_GLES_VERSION_OVERRIDE it also works, for debugging purposes
<anarsoul> let me try that...
<anarsoul> yeah, it works with MESA_GLES_VERSION_OVERRIDE=3.0
<marex> anarsoul: uh ... it works rather well on zynqmp
<anarsoul> marex: then doc is incorrect
<marex> anarsoul: I need one clock patch for the kernel which was rejected because xilinx didn't provide any helpful feedback on the clock topology, so I carry it downstream
<marex> anarsoul: I can report that to xilinx if you want
<anarsoul> marex: up to you :)
<anarsoul> oh, looks like reload for multisample is more complex for stencil
<anarsoul> blob does 4 draws instead of 1, one for each sample
<anarsoul> and it iterates over sample
<anarsoul> s/sample/sample_mask
<anarsoul> enunes: rellla: I just noticed that vertex selector for reload job is points
<anarsoul> so looks like it uses point sprites for reload
<anarsoul> (that's in case if anyone is looking into implementing point sprites)
<marex> anarsoul: give me a minute, NMI, I will be back in say 30 minutes
<anarsoul> oh, and it uses 4 different textures for reload :\
<anarsoul> hehe
<anarsoul> turns out MSAA 4x is not that free if you need to preserve depth/stencil buffer
<anarsoul> it needs 4x size of depth/stencil buffer
<anarsoul> OK, I think our lima_pp_wb_reg definition is incorrect
<anarsoul> zero should be named flags and should go before mrt_bits
<anarsoul> in this case wb_reg definition matches zynqmp docs, however pp_reg still lacks 2 regs
<anarsoul> so for MSAA blobs enables 4 MRTs for depth/stencil buffer and allocates 4x buffer size
<anarsoul> then for reload it reloads each MRT individually
<anarsoul> enunes: LGTM, but you'll need someone familiar with mesa to review it
<enunes> yes, just fyi
<anarsoul> well, the need to reload depth/stencil 4 times for MSAA somewhat explains why we have 4 registers for gl_FragColor
<anarsoul> i.e. why we need to specify gl_FragColor register 4 times
<anarsoul> I think dual source blending may also be broken for MSAA
<anarsoul> and that probably explains why ARM didn't expose it in their driver
<anarsoul> yay, dEQP-GLES2.functional.multisample.stencil is now fixed :)
<anarsoul> rellla: thanks a lot for the dumps :)
<anarsoul> I suspect that dual source blending may be broken with MSAA 16x, but it should be fine with 4x
<anarsoul> we don't expose MSAA 16x, so it probably doesn't matter
<marex> I am back (finally)
<marex> anarsoul: do you need me to test anything on zynqmp or check anything ?
<anarsoul> marex: not really, I just pointed that gpu reg documentation from xilinx doesn't actually correspond to what we have in the driver
<anarsoul> but it doesn't matter if it works fine for you
<marex> anarsoul: are you sure the xilinx docs are wrong ? maybe there are different variants of the mali core ?
<marex> (wrong with xilinx isn't really surprising though ... sigh)
<anarsoul> marex: I briefly checked what mesa does and what kernel driver does
<anarsoul> basically mesa just sends struct lima_pp_frame_reg {} to the driver
<anarsoul> and driver re-interprets it as array of uint32_t and sends to the hardware
<anarsoul> so it's either we exclude unused_1 and unused_2 from struct lima_pp_frame_reg {} somewhere
<anarsoul> or xilinx docs just omit them
<marex> anarsoul: arent those registers default 0 and you program 0 into them ?
<marex> (I'm still multiplexing between other things here, sorry)
drod has quit [Remote host closed the connection]