ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs - <macc24> i have been here before it was popular
<alyssa> robmur01: Bifrost does and Valhall does not
<alyssa> and it's probably saner to switch the whole compiler to Valhall-style
<alyssa> Not forking the compiler for Valhall was *probably* still the right call but decisions like this make me less sure I admit
<robmur01> does Valhall lose the weird restrictions on crossing 4GB boundaries and suchlike?
<alyssa> which restrictions
<alyssa> I'm just talking about pointers for LOAD and STORE instructions and such
<icecream95> malloc: variables.c:3234: assertion botched. Fun.. panfrost.ko is causing memory corruption in unrelated processes again
<alyssa> Woof
<alyssa> Pass: 4796, Fail: 54, Crash: 1167, Warn: 13, Skip: 7457, Flake: 13, Duration: 3:39, Remaining: 6:35
<icecream95> Only a couple of crashes, can't be a serious problem
<alyssa> :-D
<robmur01> I mean the reasons why we only actually use a 4GB address space in practice, and still have to chop the ends off
<robmur01> by virtue of which we're sort-of-segmented anyway, if you squint a bit
<alyssa> right
<alyssa> I just mean the register allocation details
<alyssa> will I wire up atomics tonight?
<alyssa> Hm. Sure.
<alyssa> Pass: 11259, Fail: 151, Crash: 2799, Warn: 27, Skip: 17234, Flake: 30, Duration: 8:08, Remaining: 1:37
<alyssa> would be nice to turn all those crashes into passes, I'm just sayin
<icecream95> alyssa: Have you worked out exactly how the weird counter-like field in atomic instructions works yet?
<alyssa> Hmm?
<alyssa> I think that was v10 only
* icecream95 does some testing with the v9 blob
<icecream95> Nope, it's there on v9 as well
<icecream95> alyssa: With my version of ISA.xml, for a series of imageAtomicExchange on v9 I get meta:0x36, meta:0x37, meta:0x38, meta:0x39, ...
vstehle has quit [Ping timeout: 480 seconds]
jambalaya has quit [Remote host closed the connection]
<icecream95> alyssa: Here is the shader_test I used:
<icecream95> Note the v9 shader disassembly hidden behind <details>
jambalaya has joined #panfrost
<alyssa> Okay let's see
* alyssa digs up icecream95's ISA.xml
<alyssa> <imm name="meta" start="16" size="6"/> on atomics, ummm
<icecream95> Yup, that's the unknown field
<alyssa> that's the second staging register
<icecream95> ?
<alyssa> 32 00 f6 1b 02 cc 20 09 ATOM_C.axchg.slot0.wait0 @r12, r50, offset:0x0, meta:0x36
<alyssa> that reads from @r12
<alyssa> and writes to @r54 (=0x36)
<icecream95> Ohhhhhhhhhh.....
<icecream95> But it's only used for some atomic instructions, for others it seems to encode whether it's on an image or an SSBO.. maybe?
<alyssa> There are different atom instructions
<alyssa> I'll have XML typed out in a few hours if I don't keep getting distracted by IRC and sketchy ways of cooking fish
<HdkR> Skwtchy ways of cooking fish. In the dishwasher on high-heat loads
<alyssa> HdkR: that's the one!
<icecream95> alyssa: "I think that was v10 only". From comparing shaders between v9 and v10, I still think that it is possible that there are no ISA changes in v10
<icecream95> In fact, it might still be possible (if unlikely) that the *only* change between v9 and v10 is the CSF
<alyssa> v10 definitely has cmdstream changes (beyond CSF itself)
<alyssa> CSF kernel support indicates tiler changes
* icecream95 decides that it would be pointless to argue about what exactly constitutes a "cmdstream change"
<alyssa> ATOM1.i32 and ATOM1_RETURN.i32 have the same opcode
<alyssa> Grumble.
<alyssa> Distinguished only by sr_count
<icecream95> alyssa: You mean the incr/decr instruction?
<alyssa> Yeah
<alyssa> ATOM_C1 on bifrost
<alyssa> Guess I can model as ATOM_C1_RETURN always, and just allow omitting the staging reg
<alyssa> Presumably that's what the hardware is actually doing
<alyssa> and separating out the opcodes is just an assembler syntax detail to make Valhall feel more like Bifrost
MTCoster has quit [Server closed connection]
MTCoster has joined #panfrost
<alyssa> 32 00 00 18 00 80 69 08 ATOM_C1_RETURN.i32.slot0.ainc.wait0 @, r50, offset:0x0
<alyssa> That syntax is a little clunky but it gets the point across
<alyssa> (it "should" be "ATOM_C1.i32.slot0.ainc.wait0 r50, offset:0x0")
<alyssa> For the case of "increment and discard the result"
<icecream95> At least it should make it more clear that there is an empty destination.. the order of registers for atomic operations always confused me
<alyssa> is that a question?
<icecream95> no
camus has joined #panfrost
philpax_ has quit [Server closed connection]
philpax_ has joined #panfrost
erlehmann has joined #panfrost
Daanct12 has joined #panfrost
jstultz has quit [Server closed connection]
jstultz has joined #panfrost
cwabbott has quit [Server closed connection]
cwabbott has joined #panfrost
* alyssa wonders why Valhall phone won't turn on
<alyssa> Dead battery maybe? boring
<alyssa> I have atomics, I have shared mem, I do not yet have the interaction
<anarsoul> alyssa: compute shaders?
<alyssa> yes
<alyssa> Something you don't have to worry about over at #lima ;)
<anarsoul> :P
<icecream95> alyssa: At least ATOM1* using the same opcode is better than FATAN_ASSIST, which does two different operations returning f32 or v2f16 depending on a modifier bit
<alyssa> oof
<icecream95> alyssa: On the topic of random instructions, FSIN_TABLE.u6 is actually floating point--it will return -0 if the source is not in a specific range of values
<alyssa> Hmm?
robclark has quit [Server closed connection]
<alyssa> icecream95: also, just posted the MR adding atomics to ISA.xml
robclark has joined #panfrost
<alyssa> and have been typing away at the mesa side
<alyssa> 32-bit should be correct given the tests are passing :-p
<alyssa> 64-bit is untested so YMMV, it's not used by gles31 at least
<alyssa> Pass: 15901, Fail: 169, Crash: 949, Warn: 36, Skip: 20703, Flake: 33, Duration: 8:39, Remaining: 0
<alyssa> that looks somewhat better
<alyssa> Gets >90% passing on dEQP-GLES31 so that's nice
<HdkR> oooo
<alyssa> so at >90% for the whole gles31 cts, I think
<HdkR> GJ! :D
<alyssa> looks like this stuff will unfortunately miss the branch point because there are too many hacks in my branch and I'm part time but hey
<icecream95> alyssa: The domain of FSIN_TABLE is 524288.0 to 1048575.9375
<alyssa> it's supposed to just look at the bottom 6-bits, hence the name..
<icecream95> But if it did that, then out of bounds values would return a wrong value.. so it returns -0 instead
<alyssa> Interesting
<alyssa> so this is the hardware being "clever"?
bbrezill1 has joined #panfrost
narmstrong has quit [Server closed connection]
<icecream95> Note that the domain is every value of a certain exponent.. the exponent where the seventh mantissa bit has a value of 4
narmstrong has joined #panfrost
bbrezillon has quit [Ping timeout: 480 seconds]
<icecream95> alyssa: I'm confused as to how I managed to apparently get ATOM and ATOM_RETURN mixed up, thinking that the latter was 0x68 rathern than 0x120
<icecream95> I think that it was just a mistake on my part, and there are no weird things like swapping the operations on v10
spawacz has quit [Server closed connection]
spawacz has joined #panfrost
robink has quit [Server closed connection]
robink has joined #panfrost
dschuermann has quit [Server closed connection]
dschuermann has joined #panfrost
daniels has quit [Server closed connection]
daniels has joined #panfrost
steev has quit [Server closed connection]
steev has joined #panfrost
Daanct12 has quit [Read error: Connection reset by peer]
Daanct12 has joined #panfrost
pch_ has joined #panfrost
pch has quit [Ping timeout: 480 seconds]
camus has quit [Ping timeout: 480 seconds]
jolan has quit [Server closed connection]
jolan has joined #panfrost
pch_ is now known as kinkinkijkin
camus has joined #panfrost
orkid has quit [Server closed connection]
orkid has joined #panfrost
austriancoder has quit [Server closed connection]
austriancoder has joined #panfrost
taowa has quit [Server closed connection]
taowa has joined #panfrost
vstehle has joined #panfrost
tolszak has joined #panfrost
ente` has quit [Ping timeout: 481 seconds]
cyrozap has quit [Server closed connection]
cyrozap has joined #panfrost
MoeIcenowy has quit [Server closed connection]
MoeIcenowy has joined #panfrost
camus1 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
remexre has quit [Server closed connection]
remexre has joined #panfrost
ezequielg has quit [Server closed connection]
ezequielg has joined #panfrost
ente` has joined #panfrost
anholt has quit [Server closed connection]
anholt has joined #panfrost
MajorBiscuit has joined #panfrost
MajorBiscuit has quit []
anarsoul has quit [Server closed connection]
anarsoul has joined #panfrost
rkanwal has joined #panfrost
MajorBiscuit has joined #panfrost
Daaanct12 has joined #panfrost
Daanct12 has quit [Ping timeout: 480 seconds]
`join_subline has quit [Server closed connection]
`join_subline has joined #panfrost
Daanct12 has joined #panfrost
Daaanct12 has quit [Ping timeout: 480 seconds]
Daaanct12 has joined #panfrost
Daanct12 has quit [Ping timeout: 480 seconds]
rasterman has joined #panfrost
<alyssa> sure
<icecream95> alyssa: Implementing adjacent nodes for >vec4 support was pretty easy after all of my RA optimisations: 1 file changed, 68 insertions(+), 9 deletions(-)
<alyssa> :)
<icecream95> (nodearray branch on my fdo repo)
<icecream95> Only build-tested, of course
<icecream95> Implementing the actual splitting of nodes is an exercise for the reader
<icecream95> Next step: Splitting nodes even more until everything is vec1 :P
rkanwal has quit [Read error: Connection reset by peer]
rkanwal has joined #panfrost
<alyssa> I mean, if splitting works "that" well it seems like the natural thing to do :-p
<alyssa> you should know that Valhall has real write masks that I'd like to wire up at some point..
<icecream95> Well yeah, I spent quite a bit of time trying to get Ghidra to handle that correctly
<icecream95> About splitting.. vec4 is a good size for NEON to deal with because the constraints are 7 bits, perfect for signed 8-bit operations
* alyssa regretting her life choices intensifies
<alyssa> should've gone with real SSA RA when I had the chance... should've should've should've...
<icecream95> alyssa: But trying to optimise LCRA is just so much fun!
<alyssa> I see that yes
alarumbe has quit [Ping timeout: 480 seconds]
<macc24> fun fact: mt8186 has mali g52
<alyssa> 8186?
<alyssa> is this a thing I have to deal with now?
<macc24> some upcoming mtk chromebook chip
<macc24> they put a gpu that is years old into soc that's still not released in 2022 xD
Daanct12 has joined #panfrost
Daaanct12 has quit [Read error: Connection reset by peer]
alarumbe has joined #panfrost
* alyssa wires up images
<alyssa> ...or maybe I should just be upstreaming harder
<alyssa> Get some Valhall in
<alyssa> Sigh I guess I can do that
<alyssa> 52 files changed, 5570 insertions(+), 639 deletions(-)
<alyssa> Ughhh I guess it really is approaching upstream o'clock
<macc24> upstream upstream upstream!
<alyssa> Pass: 16029, Fail: 136, Crash: 893, Skip: 20703, Flake: 30, Duration: 8:42, Remaining: 0
<alyssa> does that look upstream quality to you? :V
<alyssa> [valhall-42 ce05e58ed69] fixup! pan/va: Pack instructions 1 file changed, 12 insertions(+), 61 deletions(-)
<alyssa> [valhall-42 7a72468a846] fixup! pan/va: Build opcode info structures 2 files changed, 6 insertions(+)
* alyssa applies Delete The Code
<alyssa> 1 file changed, 8 insertions(+), 49 deletions(-)
nlhowell has joined #panfrost
nlhowell is now known as Guest147
nlhowell has joined #panfrost
Guest147 has quit [Ping timeout: 480 seconds]
erlehmann has quit [Ping timeout: 480 seconds]
<alyssa> I think this can land without harming existing code
<alyssa> I *think*
<alyssa> Would appreciate a sanity check, though
nlhowell has quit [Ping timeout: 480 seconds]
<alyssa> once that's in, will be down to a somewhat more manageable
<alyssa> 32 files changed, 2457 insertions(+), 597 deletions(-)
camus1 has quit [Ping timeout: 480 seconds]
tolszak has quit [Ping timeout: 480 seconds]
<alyssa> /me wonders what to do about varying linking on Valhall
* alyssa wonders what to do about varying linking on Valhall
<alyssa> It's all in terms of *bytes*, no more vec4 slots
<alyssa> This is more flexible but makes fp16 varyigns etc a bit more complicated
nlhowell has joined #panfrost
<alyssa> Indirect varyings wired up, down 1 hack
<alyssa> Down to 8 HACK, 3 WIP, and 1 RFC
<alyssa> next up, deleting the mediump hack
<alyssa> and generally fixing fp16
* alyssa flips on fp16 in her run-deqp script, let's see how bad the fallout is
rkanwal has quit [Ping timeout: 480 seconds]
erle has joined #panfrost
<alyssa> 2 fp16 bugs down
guillaume_g has quit []
Daaanct12 has joined #panfrost
rkanwal has joined #panfrost
Daanct12 has quit [Ping timeout: 480 seconds]
MajorBiscuit has quit [Ping timeout: 480 seconds]
<jekstrand> alyssa: \o/
<jekstrand> alyssa: Why does bytes make varying linking hard?
<HdkR> It's different :P
<alyssa> jekstrand: because Because we get slots from the API and have to convert to bytes in the backend
<alyssa> For the initial driver we can get away with (slots * 16)
<alyssa> but in general it has to be sum_{i = 0}^{slots - 1} [(# of components) * (# of bytes per component)]
<alyssa> and there are perverse interactions with separable shaders and the like
<icecream95> alyssa: "separable shaders". You mean IDVS?
<icecream95> Reminds me of how I split compilation in two parts in my fork so that UBO pushing can be done on both IDVS shaders at once
<alyssa> icecream95: no, the GL thing where VS and FS are independent and not linked at an API level
<icecream95> Oh.. that.
<HdkR> Really good feature when used correctly :P
Rathann|Mobile has joined #panfrost
macc24 has quit [Quit: ZNC 1.7.5+deb4 -]
macc24 has joined #panfrost
anarsoul|2 has joined #panfrost
anarsoul has quit [Read error: Connection reset by peer]
ente` has quit [Ping timeout: 480 seconds]
Danct12 has quit [Ping timeout: 480 seconds]
Daaanct12 has quit [Ping timeout: 480 seconds]
ente` has joined #panfrost
<alyssa> oof
<alyssa> Guess who gets to revisit plane descriptors, again? C:
<alyssa> we have something equal to `layout->array_stride / layout->nr_samples`
Danct12 has joined #panfrost
<alyssa> I guess that's the surf stride
nlhowell has quit [Ping timeout: 480 seconds]
<alyssa> dEQP-GLES3.functional.fbo.msaa.4_samples.rgba32f illustrates what I don't like about blend shaders
rcf has quit [Quit: WeeChat 3.2.1]
rcf has joined #panfrost
rcf has quit [Quit: WeeChat 3.2.1]
rcf has joined #panfrost
Rathann|Mobile has quit [Quit: Leaving]
rkanwal has quit [Ping timeout: 480 seconds]
<icecream95> alyssa: "pan/bi: Check return addresses in blend shaders" will break if a blend shader ever gets a 4 GB-aligned address
<icecream95> Uh.. I forgot that shader addresses are only 32-bit maybe, ignore?
<alyssa> ^^
<alyssa> if 0x0 is not a reserved address it should be....
<icecream95> alyssa: What is a Trym? (context: pan/va: Stub packing routine)
<alyssa> icecream95: G77
<alyssa> tTRx
<alyssa> I mean uhh
<icecream95> Ah, him
<icecream95> alyssa: What happens if someone tries to use 32-bit instructions to extract the top half of e.g. the program counter?
<alyssa> elaborate?
<icecream95> va_pack_src won't pack e.g. BIR_FAU_PROGRAM_COUNTER correctly if the index has a nonzero offset
* alyssa grumbles
<alyssa> you are of course correct, fixing
<icecream95> alyssa: e.g. va_pack_atom_opc.. so we have this nice XML with all of the enums, and then you decide to use a bunch of magic numbers?
<alyssa> ...Would you rather I generate piles of C code?
<alyssa> because I can do that C:
<icecream95> alyssa: "va_optimizer". s/r//? Optionally also s/z/s/
<alyssa> done
<alyssa> icecream95: re magic numbers, maybe the right way forward is generating VA_ATOM_OPC_AADD etc enums from the XML, but open coding the Bifrost->Valhall enum translation?
<icecream95> alyssa: "Offending code:". I guess I would find it offensive too if someone printed me to fp when the fprintf was to stderr
<alyssa> Yes, very offensive. Fixed.
<alyssa> (Trying to generate the enum translation would require some degree of coupling between the IR and the Valhall ISA definitions... That coupling is what got us into the current Bifrost ISA.xml mess that you complained about a few hours ago.)
<icecream95> alyssa: on magic numbers.. I guess you could do that. Otherwise you'd have to do something like reference Bifrost names from the Valhall XML and make everything more of a mess
<alyssa> right