ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
rasterman has quit [Quit: Gettin' stinky!]
rkanwal has quit [Quit: rkanwal]
rkanwal has joined #panfrost
* icecream95 wonders if a G52r1 blob can be used with G52r0 after messing with the GPU properties a little
<icecream95> First I have to work out why eglGetDisplay is returning EGL_NO_DISPLAY
<HdkR> `EGL_DEFAULT_DISPLAY`? :)
jambalaya has quit [Ping timeout: 480 seconds]
<icecream95> Maybe debugging would be easier if I used EGL directly rather than via Waffle
jambalaya has joined #panfrost
<alyssa> icecream95: I assume so
<alyssa> Probably G76 too
rkanwal has quit [Quit: rkanwal]
<icecream95> Oops, Waffle is loading GL symbols from GLVND libGLESv2 rather than libmali
<alyssa> Yum
<alyssa> I wonder if that works on old versions of fedora
<alyssa> yum install waffle
<alyssa> nom nom nom
<icecream95> And of course pandecode is crashing: "Access to unknown memory 20900000001"
<alyssa> pandecode: not the best software out there
<alyssa> pandecode: at least it uses genxml, now.
<icecream95> Maybe my Valhall branch of pandecode can still work on Bifrost? That had a few changes to make it work better
<alyssa> eh?
<alyssa> something I keep wanting is teaching GenXML about state dependence
<alyssa> e.g. only print the secondary preload section if secondary_shader is set
<alyssa> (and ideally, complain if it's nondefault.)
<alyssa> unfortunately, hard to do so in a way that isn't horrible, so i've Just Dealt with the verbosity...
<icecream95> panwrap will probably become even more of a mess when the next Mali architecture is released...
<alyssa> ugh.. probably..
<alyssa> v11, you mean?
<icecream95> Even v10 is a big mess, because I don't know all the command-stream commands
erle has quit [Ping timeout: 480 seconds]
<alyssa> *nod*
<alyssa> I guess they announce these things in May, don't they.. crap! :-p
<icecream95> Like: how does calling other buffers work? Does it keep using the same state, or start clean? There are four command-stream buffers, why is that?
* alyssa is still secretly hoping for FOSS csf firmware some day
<icecream95> alyssa: Say you've hacked into Arm and will release the entire kbase source unless they make free CSF firmware
<alyssa> oh no, public kbase source code! :p
<icecream95> Threaten to release the CPU and GPU designs as well if you need to
<icecream95> "Everyone is going to know how to make an ARM1 and you'll go bankrupt!"
<icecream95> Oops: JS: Job Hard-Stopped (took more than 50 ticks at 100 ms/tick)
<icecream95> alyssa: Weren't you getting GPU timeouts on G52 with panfrost before fixing some bug? I'm jusing using a stock Arm kbase, maybe it is broken in the same way
<icecream95> "brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout". This is the third bit of hardware I've used where the seems to be a mysterious connection between the GPU and wifi!
erle has joined #panfrost
JulianGro has quit [Remote host closed the connection]
guillaume_g has joined #panfrost
digetx has quit [Ping timeout: 480 seconds]
digetx has joined #panfrost
MajorBiscuit has joined #panfrost
Lyude has quit [Ping timeout: 480 seconds]
Lyude has joined #panfrost
rasterman has joined #panfrost
<icecream95> Fun.. the new base_jd_atom struct is 16 bytes larger, but panwrap ignores the stride in the ioctl
robmur01__ has quit []
robmur01 has joined #panfrost
<icecream95> The mali blob I'm using works better under panfrost.ko than mali_kbase.ko :D
digetx has quit [Ping timeout: 480 seconds]
digetx has joined #panfrost
<macc24> icecream95: h-how did you make it run under panfrost.ko?
<icecream95> macc24: A few hacks to panloader to make it call functions in Mesa to create BOs and submit jobs
<macc24> icecream95: i assume that by using it i'm throwing away all chances of getting support when it doesn't work, right?
<icecream95> macc24: There's no window-system integration, so have fun trying to use it for anything but Piglit, dEQP or apitrace
<macc24> ok.
<icecream95> I don't know that adding it would be all that hard, but then people might actually try to use it...
MajorBiscuit has quit [Ping timeout: 480 seconds]
rkanwal has joined #panfrost
<q4a> Does panvk have some milestones like https://github.com/Yours3lf/rpi-vk-driver/wiki/Development-milestones ? Or any roadmap (without ETA)?
<q4a> And may be it's time to add panvk to https://docs.mesa3d.org/drivers/panfrost.html#building ? Now I see "-Dvulkan-drivers="
<macc24> q4a: idk but adding a driver that fails without env var that's basically "i want a broken vulkan driver" might imply it's working and that would bring many bogus complaints^W bug reports
rasterman has quit [Quit: Gettin' stinky!]
rasterman has joined #panfrost
MajorBiscuit has joined #panfrost
anarsoul has quit [Quit: ZNC 1.8.2 - https://znc.in]
anarsoul has joined #panfrost
WoC has quit [Remote host closed the connection]
icecream95 has quit [Ping timeout: 480 seconds]
<alyssa> icecream95: Yeah, a lot of bugs that are logically faults (and show up as faults on Midgard) end up as hangs/timeouts on G52. Guessing it's a hardware quirk. PAN_MESA_DEBUG=sync catches both.
MajorBiscuit has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #panfrost
MajorBiscuit has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #panfrost
JulianGro has joined #panfrost
JulianGro has quit [Remote host closed the connection]
<alyssa> Re separable shader woes on Valhall, here's a proposal:
<alyssa> The spec guarantees that slots are compatible across shader stages. The implication is that linking happens per slot.
<alyssa> It also implicitly assumes slots are vec4s.
<alyssa> Assume we do all fp32/flat32 varyings for the separable shader case. (This could be optimized but that's an orthogonal change.)
<alyssa> If the shader uses only VARYING_SLOT_VARn, then each varying is at offset ((slot - VARYING_SLOT_VAR0) * 16).
<alyssa> And the size of the varyings needed for a shader is given by util_last_bit(varyings >> VARYING_SLOT_VAR0)
<alyssa> * 16
<alyssa> One shader might have extra varyings, to prevent them from stomping over other things, have the hardware allocate MAX2(vs varying size, fs varying size)
<alyssa> So this suffices for GLES. What about big GL?
<alyssa> For big GL, we also need to allocate space for things like COL0 and TEX0
<alyssa> Allocating space for all of them unconditionally won't work (efficiently)
<alyssa> We have a few options (on one extreme -- dynamically link with more sysvals, no variants but complex with a runtime cost; on the other hand, key entire shaders together to defeat the point of separable shaders)
<alyssa> I think the right compromise is in the middle: Key to the mask of special varyings written by the VS.
<alyssa> How would that work?
<alyssa> Practically, it just becomes an extra item in the FS key. Nothing tricky there.
<alyssa> We introduce some new ABI, deciding how varyings should be laid out.
<alyssa> The natural ABI is "special varyings come before general varyings, and are tightly packed, sorted by locations, 16 byte vec4 each"
<alyssa> With this ABI, separable vertex shaders do not need any keying. To store varying at slot S, store at offset:
<alyssa> if (slot >= VARYING_SLOT_VAR0) {
<alyssa> return (util_bitcount(special_outputs_mask) + (slot - VARYING_SLOT_VAR0)) * 16
<alyssa> } else {
<alyssa> return select(special_outputs_mask, slot) * 16
<alyssa> }
<alyssa> Note that special_outputs_mask is just info.outputs_written without VARYING_SLOT_VAR*
<alyssa> Fragment shaders need to be keyed to the VS's special_outputs_mask, from that snippet it's obvious way.
<alyssa> Then fragment shaders load varyings at offsets given by that same code.
<alyssa> Some details need extra attention:
<alyssa> In the VS, that routine is only called for slots being written. Therefore, if slot < VARYING_SLOT_VAR0, then (1 << slot) & special_outputs_mask. So the select is well-defined.
<alyssa> In the FS, the same holds, provided the FS doesn't read any uninitialized varyings.
<alyssa> The ARB_separate_shader_objects spec says on the matter:
<alyssa> Separate linking creates the possibility that certain output varyings
<alyssa> possible input varyings from a shader may not be written as output
<alyssa> In this case, the output varyings are simply ignored. It is also
<alyssa> of a shader may go unread by the subsequent shader inputting varyings.
<alyssa> varyings of a preceding shader. In this case, the unwritten input
<alyssa> varying values are undefined.
<alyssa> So we may assume the select is well-defined for the same, if it's not, the app has hit undefined behaviour.
<alyssa> I also didn't formally define select.
<alyssa> Given a bit b set in a mask m, select(m, b) returns the index of b in the list of *set* bits of m.
<alyssa> (implementing the compaction semantics)
<alyssa> Equivalently, select(m, b) returns the number of set bits in m[0:b]
<alyssa> Easily implemented as util_bitcount(m & BITFIELD_MASK(b)), nice and constant time.
<alyssa> For fun: we already have that helper for linking varyings on Midgard ;)
<alyssa> slightly different use, same idea though
<alyssa> The varying size calculation needs to be updated: we need to add (to both the VS and FS sizes)-- util_bitcount(special_outputs_mask) * 16
<alyssa> But other than that, everything is the same.
<alyssa> Now, let's talk why this is a good idea.
<alyssa> + For GLES, no special varyings exist. So the special varying mask is always 0, so there are 0 additional shader variants. Additionally, the offset util_bitcount(special_outputs_mask) * 16 = 0, so we degrade to the optimal GLES case softly.
<alyssa> (You only pay for what you use.)
<alyssa> + For GL, when separable shaders are *not* in use, the VS's special_outputs_mask equals the FS's special_inputs_mask. So there's no hit for !separable shaders, and this is a reasonable ABI for that case too.
<alyssa> + So we don't special case separable vs not-separable anywhere, we just get correct behaviour for both by default.
MajorBiscuit has quit [Ping timeout: 480 seconds]
<alyssa> + Mixing and matching the same FS with multiple VSes that write different special GL varyings seems... not unlikely but limited. And big GL on new hardware is all about shader variants anyway :p So this is a pretty cheap solution. Not sure if anyone even uses separable + compat GL varyings, although IIRC it's in the spec (Kayden and I checked :( )
<alyssa> + Variants aside, all of this is pretty cheap at draw time and the code is dead simple (especially compared to the dynamic linking we do on Midgard)
guillaume_g has quit []
rasterman has quit [Quit: Gettin' stinky!]
<alyssa> Mhhhhhh
<alyssa> multiple draws with vertex shaders that spill is probably broken
<alyssa> no dep
<alyssa> although I guess job_barrier band-aids over that
pch has quit [Ping timeout: 480 seconds]
rasterman has joined #panfrost
anarsoul has quit [Ping timeout: 480 seconds]
JulianGro has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
rasterman has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
rasterman has joined #panfrost
<alyssa> 32 files changed, 780 insertions(+), 322 deletions(-)
<alyssa> ugh, how does this diffstat keep growing? D:
kenzie has quit [Quit: The Lounge - https://thelounge.chat]
rasterman has quit [Quit: Gettin' stinky!]
kenzie has joined #panfrost
icecream95 has joined #panfrost
icecream95 has quit []
icecream95 has joined #panfrost
kenzie has quit [Quit: The Lounge - https://thelounge.chat]
anarsoul has joined #panfrost
kenzie has joined #panfrost
kenzie has quit [Quit: The Lounge - https://thelounge.chat]
kenzie has joined #panfrost
rkanwal has quit [Ping timeout: 480 seconds]