ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
alarumbe has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]
Rathann has quit [Quit: Leaving]
alarumbe has joined #panfrost
soreau has quit [Read error: Connection reset by peer]
soreau has joined #panfrost
vstehle has quit [Ping timeout: 480 seconds]
camus has quit [Ping timeout: 480 seconds]
camus has joined #panfrost
camus1 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
alyssa has joined #panfrost
<alyssa> Now that things are settling, I'm spending some time poking at Valhall again
<alyssa> Now focusing on understanding the data structures (~command stream)
<alyssa> The OpenGL blob does a lot of crazy stuff, which makes it hard to understand what the hardware is actually capable of..
<alyssa> ..but the OpenCL blob is straight forward, like on Midgard/Bifrost.
<alyssa> so I've been digging into OpenCL samples and trying to piece together the hw that way
<alyssa> Tonight, focusing on OpenCL image stores
<alyssa> this is an area of considerable change versus Midgard/Bifrost
<alyssa> on the older Malis, images were written with special 3D attribute descriptors
<alyssa> which was... kinda weird for the driver, and is super awkward for Vulkan where textures/images aren't distinguished like they are in OpenGL
<alyssa> on Valhall, it turns out the descriptor used for image stores is just the texture descriptor
<alyssa> "wait, what texture descriptor? you reversed the valhall texture descriptor?"
<alyssa> didn't have to-- it's ~unchanged from Bifrost
<alyssa> Nicely, pixel formats are compatible with Bifrost (v7), including all the v7 swizzling craziness
<alyssa> What did change is the surface descriptor
<alyssa> On bifrost, the surface descriptor was at most the address + line stride + surface stride
<alyssa> on valhall, that all seems to be there but there are... more things? maybe? not sure.
<alyssa> I typed out the XML for the basic 2D/no-mipmapping/no-array/no-multisampling case
<alyssa> will be curious how the heavier cases go
<HdkR> I love when data structures get shared across generations
<alyssa> =)
camus1 has quit []
camus has joined #panfrost
rando25892 has joined #panfrost
vstehle has joined #panfrost
tolszak has joined #panfrost
rando25892 has quit [Quit: No windows for this server]
rando25892 has joined #panfrost
rando25892 has quit []
rando25892 has joined #panfrost
camus has quit [Remote host closed the connection]
camus has joined #panfrost
camus1 has joined #panfrost
camus has quit [Remote host closed the connection]
camus1 has quit [Remote host closed the connection]
camus has joined #panfrost
<dschuermann> alyssa: does mali have per-component fneg modifiers? i.e. fneg_lo/hi?
<dschuermann> same question for agx
<HdkR> If it is the scalar architectures I guess per-component fneg would only matter for fp16 packed?
<dschuermann> yes, sorry should have clarified
<dschuermann> question is if you can have things like e.g. fadd a.xy, b.x(-y)
<dschuermann> or even a.x(-x) with swizzle and per-component modifier
camus has quit [Remote host closed the connection]
camus has joined #panfrost
camus1 has joined #panfrost
camus has quit [Read error: Connection reset by peer]
rasterman has joined #panfrost
Rathann has joined #panfrost
<alyssa> dschuermann: mali does not, no. the whole vec2 needs to be neg/abs or not.
<alyssa> agx is purely scalar (no packed fp16) so the question is vacuous
<alyssa> does amd?
<dschuermann> ah, right
<dschuermann> yes, amd can do that, but only for fneg. there is no fabs modifiers on packed instructions
<alyssa> fun
<dschuermann> alyssa: what do you think about (('vec2', ('fneg@16', a), b), ('fmul', ('vec2', a, b), ('vec2', -1.0, 1.0))) ?
<alyssa> uhhhh
<alyssa> i'm going to need more context, is this a thing that happens a lot?
<dschuermann> I found it in the FSR upscaling shader
<dschuermann> the alternative would be to introduce some fneg_lo/hi opcodes, but if nothing else can use it, it might be a bit unnecessary
<dschuermann> we'd emit these opcodes like that anyways and propagate the partial fneg
<alyssa> mmh, right
<alyssa> dschuermann: the lowering seems inefficient on bifrost, but I can't figure out a better one so I guess that's fine
<alyssa> fmul requires an ffma on bifrost, which seems... extra
<alyssa> but an ffma is the same perf as eg. an XOR
<dschuermann> ok, I'll ping you with a MR to try... probably won't have any effect until we have the lower_alu_to_scalar change, and even then you'll have troubles finding an application which does that ;)
<alyssa> fair enough :]
<alyssa> I think the real answer is "Bifrost's bottlenecks are elsewhere right now" ;)
<tolszak> Hello there I'm back. Last time I reported issues with scaling/resizing texture based item in Qt Quick. The result of last research suggested by macc24 was that it does happen with LIBGL_ALWAYS_SOFTWARE=1 so the issue is in dri driver
<tolszak> Unfortunately for some reason even if I set LIBGL_ALWAYS_SOFTWARE=1 still panfrost is used as rendering engine
<tolszak> When I renamed meson_dri.so then I forced to use softpipe and the issue doesn't happen then
<tolszak> Is it possible to eglretrace on desktop somehow? I mean emulate EGL in a window?
<tolszak> I have trace from apitrace but I can't get the attribute values that are passed from Qt there
<tolszak> this seem like an issue. Everything from my perspective look like a precision issue, e.g. When creeen is full hs and image I show is full HD then the issues appear around scale = 15/16 and dissapear around 32
<alyssa> "GPUs from Mali-G77 onwards support formats up to and including 32 bits per pixel regardless of color channel arrangement or sRBG [sic]."
<alyssa> very interestnig
<alyssa> worth noting -- Valhall architecturally inherits the limitation "no AFBC for imageStores"
<alyssa> Unlike midgard/bifrost, there is a way you "could" encode such a data structure.
<alyssa> But it'll be invalid --
<alyssa> Image stores on Valhall, like on Bifrost, consist of a pair of instructions. The first loads the effective address of a single texel of a single image and fetches the format, and the second stores to a given address with a data conversion to a given format.
camus has joined #panfrost
<alyssa> That split is compatible with any interleaving (tiling), since the first instruction can do the interleaving and give an address to a single texel
<alyssa> But it is *not* compatible with any compression whatsoever.
<alyssa> as there is no 1:1 correspondance between texel locations and uncompressed texel addresses in a compressed texture
<alyssa> Then again - even if a single "store image" instruction were provided, the hardware could still not support compression efficiently.
<alyssa> The cost of (de)compressing AFBC is substantial; the reason it's practical is that the cost is amortized across an entire block
<alyssa> Framebuffer writes (via a fragment shader) are efficient because they first write into the tilebuffer, and so the hardware can compress entire blocks at a time.
<alyssa> Compute shaders lift the restriction that a single invocation writes to a single known contiguous location. Which means the hardware can no longer make the compression scheme work.
camus1 has quit [Ping timeout: 480 seconds]
soreau has quit [Remote host closed the connection]
soreau has joined #panfrost
macc242 has joined #panfrost
<tolszak> I was able to finally reproduce it with plain opengl app
<tolszak> the issue appears when vertex coord has value smaller or equal than 1/16
<tolszak> gl_Position is direct copy of this vertex attrib
<tolszak> It means that my rectangle vertices are {-16, -16} .. {16, 16}
<tolszak> That's when the issues happens
<robmur01> tolszak: did PAN_MESA_DEBUG=nofp16 make any difference?
<tolszak> robmur01: Nope
<tolszak> still image dissapears when scale vertices are {-16, -16} .. {16, 16}
<tolszak> "dissapears" means some other part of image is displayed
camus1 has joined #panfrost
<tolszak> Which is actually while
<tolszak> perhaps I should use image with many different details
<tolszak> So I know which part of it is shown
camus has quit [Ping timeout: 480 seconds]
<tolszak> SO when vertices coords span from -16 to 16,... -28..28 it shows incorrect part of texture, 29...33 also incorrect but different part of texture, 34... shows correctly magnified texture
<tolszak> I've used gbm and egl example available on gihub and adjusted it to show the issue: https://github.com/tolszak/gbm_es2_demo/tree/odroid_issue
<tolszak> Is it any possibility that someone with expertise can look at it? Or perhaps I should submit a bug on mesa and add this code?
<tolszak> I can also prepare yocto image that has most recent mesa, gdb, apitrace and this app preinstalled. Would it help?
<cwabbott> alyssa: the way it works with other hardware, the L1 cache holds decompressed image data, so if accesses are relatively coherent then the overhead of compressing/decompressing won't be that bad
<cwabbott> that's also how texture compression like ASTC works
<robclark> robmur01: a bit off topic, but had a chance to look at https://patchwork.freedesktop.org/patch/457538/?series=94968&rev=2 yet?
karolherbst has quit [Read error: Connection reset by peer]
karolherbst has joined #panfrost
<robmur01> robclark: argh, sorry, I've been off in PMU land with frequent detours into ACPI and internal stuff, still not really found any time to think about SMMU lately
JulianGro has quit [Remote host closed the connection]
<robclark> ok.. I suppose not urgent, just want to make sure it is not forgotten.. (it does pass the selftests fwiw, those turned out to be quite useful to debug it)
<robmur01> the fact that it's proven entirely unhelpful in debugging the original issue hasn't done the patch too many favours, though :)
<robclark> I wouldn't say it has been unhelpful.. it at least ruled out some theories
<robmur01> perhaps, but the L0 fault already implied that the PTE of the faulting address was irrelevant
<robclark> I won't rule out the possibility of more than one bug.. I think a lot of the "random weird faults" seem to have been fixed by `drm/msm/a6xx: Track current ctx by seqno`, but a few of the devcore's looked different..
Rathann has quit [Remote host closed the connection]
<alyssa> tolszak: aren't coordinates snapped to a 1/16 grid for tiling?
<alyssa> 1/16 of a pixel, I mean
<alyssa> are you doing something funny with the viewport?
<alyssa> cwabbott: makes sense. evidently Arm does not do that ;-)
JulianGro has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
WoC has quit [Remote host closed the connection]
WoC has joined #panfrost
bluebugs has joined #panfrost
moa has quit [Ping timeout: 480 seconds]
rasterman has joined #panfrost
camus has joined #panfrost
camus1 has quit [Read error: Connection reset by peer]
camus1 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
macc242 has quit [Ping timeout: 480 seconds]
anarsoul has quit [Quit: ZNC 1.8.2 - https://znc.in]
anarsoul has joined #panfrost
<tolszak> alyssa: I have only basic understanding of opengl. My Qt application works on many platforms, x86, arm etc. But has problem on odroid n2+. I have spent some time and prepared example that shows this without Qt.
<tolszak> alyssa: That's how opengl is initialized including shader: https://github.com/tolszak/gbm_es2_demo/blob/odroid_issue/demo/gbm_es2_demo.cpp#L213
<tolszak> super simple
<tolszak> that only resizes the textured rectangle
<tolszak> that's it
<tolszak> when scale is 16 (rectangle top left is (-16, 16) and width and height is 32) then it shows on screen something else that it should show
<tolszak> rendering is ok again from scale 34
<tolszak> this behavior depends on scale and texture height
<tolszak> texture size I mean
<tolszak> if texture size is bigger then the issue happens when scale is smaller
<tolszak> alyssa: viewport equals to the size of the screen
bbrezillon has quit [Ping timeout: 480 seconds]
camus has joined #panfrost
tolszak has quit [Ping timeout: 480 seconds]
tolszak has joined #panfrost
camus1 has quit [Ping timeout: 480 seconds]
bbrezillon has joined #panfrost
<alyssa> code looks innocuous so far
robmur01_ has joined #panfrost
camus1 has joined #panfrost
camus has quit [Remote host closed the connection]
robmur01 has quit [Ping timeout: 480 seconds]
camus has joined #panfrost
camus1 has quit [Remote host closed the connection]
<alyssa> Sampler descriptor is identical between Bifrost/Valhall =)
<alyssa> Still some big mysteries about how resources are accessed on Valhall, though
camus1 has joined #panfrost
camus has quit [Read error: Connection reset by peer]
richbridger has joined #panfrost