ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
nlhowell is now known as Guest1322
nlhowell has joined #panfrost
Guest1322 has quit [Ping timeout: 480 seconds]
icecream95 has joined #panfrost
<icecream95>
Argh, developing v3d is too much work with 250 ms latency to "your" board
<HdkR>
There's only so much that mosh can do :(
<alyssa>
that's a shame for v3d..
<icecream95>
Nah, I don't even have UDP for mosh
<alyssa>
icecream95: My colleagues were quite impressed by your CSF work, by the way.
nlhowell has quit [Ping timeout: 480 seconds]
<icecream95>
alyssa: Can I have actual hardware now?
<icecream95>
There's only so much you can do looking through 2GB of traces from dEQP
<alyssa>
which gen?
<alyssa>
v9 is commercially available, v10 I don't have.
<icecream95>
The manufacturer of a v10 board is offering me hardware when it comes out, but I still have no idea what happened to the v7 board they promised
* alyssa
chuckles
<icecream95>
Currently I'm implementing shader printf for v3d to make debugging the problem there easier
<alyssa>
sounds fun
<alyssa>
for you, not for the v3d hardware ;)
<alyssa>
oh, um, I'm sorry for probably breaking your push optimizations..
<alyssa>
[with 6b2eda6b729 ("pan/bi: Reorder pushed uniforms to avoid moves")]
<icecream95>
What, the reordering patch? I was planning to revert that in my fork
<alyssa>
Fair enough
<icecream95>
Hmm.. I've written tools for replaying ELF core dumps of GPU memory, but what I really need is for the kernel to make a unified CPU and GPU core dump
<icecream95>
(I should look at the devcoredump patches to see if that's a better format.. but then I'd have to work out how to import that into Ghidra)
* alyssa
hasn't looked at either in depth so can't comment there
<icecream95>
Also in the long list of half-finished things is a new scheduler for Bifrost with inspiration from GCC
<alyssa>
oh?
* alyssa
always wanted to do more with the Bifrost compiler, but is now refocused on Valhall :|
<alyssa>
...Same went for the Midgard compiler, to be fair.
<icecream95>
One problem I note with kernel-space BO dumping is that you can't include annotations such as source code for the shaders
<alyssa>
Right..
<HdkR>
Did the ChromeOS BO object naming never make it upstream to Linux?
<HdkR>
ChromeOS/Android
<HdkR>
BO object. redundancy department
<icecream95>
HdkR: Can it include MBs worth of text data with each BO?
<HdkR>
I think it is an arbitrary length "name"
<HdkR>
Just does a memcpy in the kernel implementation afaik
<alyssa>
icecream95: Unrelated, not sure if you're still interested in Midgard/OpenCL, but the "AVG" instruction is hadd
<alyssa>
(and "AVG.round" is "rhadd")
<alyssa>
Bifrost instruction naming tends to be more sensible than midgard..
<icecream95>
That reminds me, both Midgard and Bifrost seem to generate inefficient instruction sequences for loops..
<alyssa>
Yep.
<alyssa>
NIR makes it a bit annoying to do otherwise, and we've all shrugged and said "loops are going to be slow anyway.."
<alyssa>
jekstrand: and I have talked about this a bit.
<icecream95>
I did try to make a NIR pass which rearranges the blocks, but I didn't manage to get it to work properly
<alyssa>
CF is tricky.
<alyssa>
Adding do...while to NIR sounds good on paper, until you realize it means auditing every NIR pass that does nontrivial CF manipulation
<icecream95>
Or just run that pass after all the other ones doing CF manipulation? :)
<alyssa>
heh, well, yes
<alyssa>
I'm sure jekstrand would love that ;)
<alyssa>
A helper to detect do...while from idiomatic NIR, and using that information when translating NIR->BIR, is a possible compromise
<alyssa>
Not invasive to the core NIR data structures, still gets the code gen we need
<alyssa>
and then a "regular" NIR pass can rearrange the blocks, as you put it, to go from idiomatic while loops to `if { do...while }` etc
<icecream95>
At least on Midgard, there are a lot of blocks which only jump to another block which could be removed without touching NIR
<alyssa>
*nod*
<alyssa>
Not sure if there'd be problem interactions with nir_from_ssa, I guess if the pass is careful it's fine
rasterman has joined #panfrost
<icecream95>
..At one point I tried enabling compression (via OpenCL kernels) for every texture upload in my fork, but I had to disable that after switching tabs in Firefox took more than a second
<alyssa>
Oof
<alyssa>
~~Does switching tabs in Firefox usually take less than that?~~
<icecream95>
I know, it's far too slow even with software rendering
<icecream95>
(The problem was that it was decompressing and recompressing the entire texture after every partial upload, many of which were on the order of 64x64 pixels)
<alyssa>
Ouch
<HdkR>
That's a good candidate for a small texture heuristic :D
<icecream95>
It should be possible to only mess with the blocks that actually got changed.. but you still have to move the rest of the data to make space
<icecream95>
I'm thinking of having a single massive BO for all AFBC textures, so that the data blocks can be shared between them
<alyssa>
why is mali like this
<icecream95>
..It doesn't even need to be a single BO, really, as long as they are all within a 4GB region
vstehle has quit [Ping timeout: 480 seconds]
<icecream95>
Unfortunately, it seems that AFRC is only supported on the lower-end v10 GPUs.. but it probably won't be capable of many of the fun tricks AFBC can do
* alyssa
forgot AFRC was a thing
<icecream95>
AFBC took me a week to RE, once I was scared that ARM would release its specification. How long will AFRC take?
* alyssa
laughs
<alyssa>
I don't know anything about AFRC except a few minutes of reading kernel patches and the Arm press release, so...
<icecream95>
Was all the stuff with Arm just for Morello, and they don't care about Panfrost otherwise?
<alyssa>
a weekend? :)
<alyssa>
daniels: ^^
<cphealy>
Is anyone else using the G52 experiencing issues with AFBC where only DRM_FORMAT_MOD_ARM_16X16_BLOCK_U_INTERLEAVED gets used? This is what I'm experiencing on Mesa 22.1-devel
rasterman has quit [Quit: Gettin' stinky!]
<daniels>
icecream95: I really can’t speak for them, sorry (but wb!)
atler is now known as Guest1324
atler has joined #panfrost
Guest1324 has quit [Ping timeout: 480 seconds]
* alyssa
is going to need to fix a bunch of broken Bifrost passes before adding proper 64-bit support, wee...
<alyssa>
hm. v9 has more in common with v6 than v7
<alyssa>
maybe v7 really is the oddball here :'p
* alyssa
supposes message preloads aren't coming back
<alyssa>
..some of those are regressions though >.>
<icecream95>
"Duration: 6:00". Have you tried mimalloc? That at least speeds up shader compilation a lot
<alyssa>
haven't looked at run time yet, the same test suite is under 2 minutes on the n2..
<HdkR>
If only all performance issues could be solved with a drop in memory allocator
<icecream95>
On the topic of tools to make things faster, mold is a really fast linker, especially after I reported a bug that broke linking Mesa
<HdkR>
ooo, I'm super interested in trying out mold. I keep forgetting to try it
tjcorley has quit [Ping timeout: 480 seconds]
tjcorley has joined #panfrost
vstehle has joined #panfrost
camus has joined #panfrost
camus1 has quit [Remote host closed the connection]
dmh_ has joined #panfrost
dmh has quit [Ping timeout: 480 seconds]
pjakobsson has quit [Ping timeout: 480 seconds]
pjakobsson has joined #panfrost
macc24 has quit [Ping timeout: 480 seconds]
unevenrhombus[m] has joined #panfrost
rasterman has joined #panfrost
floof58_ has joined #panfrost
floof58_ has quit []
erlehmann has quit [Ping timeout: 480 seconds]
floof58_ has joined #panfrost
floof58 has quit [Ping timeout: 480 seconds]
floof58_ has quit []
floof58 has joined #panfrost
floof58 has quit []
floof58 has joined #panfrost
erlehmann has joined #panfrost
nlhowell has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
AreaScout_ has quit [Server closed connection]
AreaScout_ has joined #panfrost
rellla has quit [Server closed connection]
rellla has joined #panfrost
wilkom has quit [Server closed connection]
erlehmann has joined #panfrost
wilkom has joined #panfrost
camus1 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
camus1 has quit []
tomeu has quit [Server closed connection]
tomeu has joined #panfrost
stepri01 has quit [Server closed connection]
stepri01 has joined #panfrost
camus has joined #panfrost
megi has quit [Server closed connection]
megi has joined #panfrost
camus has quit []
camus has joined #panfrost
hl` has quit [Server closed connection]
hl` has joined #panfrost
floof58 has quit [Ping timeout: 480 seconds]
floof58 has joined #panfrost
camus has quit [Remote host closed the connection]
floof58 has quit []
floof58 has joined #panfrost
tanty has quit [Server closed connection]
tanty has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
jambalaya has quit [Server closed connection]
jambalaya has joined #panfrost
nlhowell has joined #panfrost
nlhowell is now known as Guest1400
nlhowell has joined #panfrost
Guest1400 has quit [Ping timeout: 480 seconds]
br has quit [Server closed connection]
br has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
HayashiEsme[m] has quit [Server closed connection]
HayashiEsme[m] has joined #panfrost
dhewg has quit [Server closed connection]
dhewg has joined #panfrost
<alyssa>
icecream95: fixed 200+ piglits, so I had to go fix another 200+, so...
<alyssa>
icecream95: tag you're it again :-p
<alyssa>
actually i'm still it, cube array fix incoming
<alyssa>
UnexpectedPass: 168
<alyssa>
ok i'll take it
<icecream95>
alyssa: Is there anything non-Piglit that actually needs GL_CLAMP to work properly?
<alyssa>
probably not
<icecream95>
But my OpenCL patches would probably fix more CL Piglits than that…
<alyssa>
I don't mind OpenCL patches :-)
<icecream95>
I think that Arm using CRC for transaction elimination was maybe not the best idea, because it makes collision attacks ~unfixable
<icecream95>
Probably the only real way of avoiding attacks is to randomly (per application start) change the order of pixels to be fed into the CRC
<alyssa>
I don't disagree.
<DPA>
GL_CLAMP? I would expect programs to use that to render text & icons and similar things, to avoid artifacts at the edge of them from the next glyph / icon inside the same texture.
<alyssa>
DPA: You're looking for GL_CLAMP_TO_EDGE
<alyssa>
GL_CLAMP is a nonsense legacy thing
<DPA>
Oh my god, I had no idea!
<alyssa>
confusing, eh?
<DPA>
Yes, very much.
<icecream95>
"CLAMP .. is broken for nearest filtering". Well, it isn't broken, I think it just handles it exactly the same as it does for linear filtering, which turns out to be different from what APIs expect
<icecream95>
Fixing Piglits seems to be a good way to get negative LOC added from your patch..