ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
<cwabbott> there's no vertex cache and the index buffer doesn't even get read until rasterization
<cwabbott> it's... bad
<jekstrand> Yeah
<jekstrand> But we can't do a CPU crawl in Vulkan. Not an option.
<jekstrand> So we have to eat the suck
<cwabbott> iiuc the vertex_range is just the min/max index, so you don't have to traverse the whole thing
<cwabbott> I think vulkan does a compute shader to set it
<jekstrand> I mean, we could use the size of the whole vertex buffer at least for the allocation
<jekstrand> But then we really want to run a compute shader to figure out size, then run another to actually process the vertex data.
<cwabbott> it crawls the index buffer with a compute shader that determines the min/max and writes the value to the descriptor
<cwabbott> or at least that's what the blob does
<jekstrand> yup
<jekstrand> That's what we need to be doing in panvk
rasterman has joined #panfrost
<icecream95> cwabbott: "until rasterization".. You mean until tiling? Rasterization works on a stream of polygons and does not read index buffers
vstehle has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
* jekstrand kicks off a new panvk run and calls it a day
<alyssa> cwabbott: What's really stupid, I think we still need that index crawl with IDVS on Bifrost
<alyssa> We don't with IDVS on Valhall, finally.
<HdkR> Valhall is so much better \o/
<alyssa> It really is >.>
<alyssa> Ranking of Malis:
<alyssa> 1. Valhall
<alyssa> 2. Midgard
<alyssa> 3. Bifrost
<alyssa> 4. Utgard
<alyssa> Utgard would beat Bifrost if not for the GP
<alyssa> Bifrost loses hard with clauses/exposed pipelines
<alyssa> Midgard loses out on VLIW
<HdkR> Only took ARM four tries to make a Valhall in to the perfect product
<alyssa> The ML-focused bits are still pretty wacky
<alyssa> i8vec4 on a scalar arch is.. unusual..
<HdkR> If it results in more people writing ML upscalers then I'm happy :D
<HdkR> I love a bit of SWAR though. Since it means if you write bad code, then it will be terrible
Moe_Icenowy has quit []
MoeIcenowy has joined #panfrost
<MoeIcenowy> alyssa: I think Utgard PP evolves to Midgard?
JulianGro has quit [Remote host closed the connection]
Daanct12 has joined #panfrost
davidlt has joined #panfrost
vstehle has joined #panfrost
<bbrezillon> jekstrand, cwabbott: we already have this min/max-calculation-in-compute logic in the indirect draw path (src/panfrost/lib/pan_indirect_draw.c), we can probably re-use some of that
davidlt has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #panfrost
Major_Biscuit has joined #panfrost
MajorBiscuit has quit [Ping timeout: 480 seconds]
kenzie35 has quit []
kenzie35 has joined #panfrost
kenzie35 has quit []
kenzie35 has joined #panfrost
kenzie35 has quit []
kenzie35 has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
pendingchaos_ has joined #panfrost
pendingchaos has quit [Ping timeout: 480 seconds]
simon-perretta-img has quit [Read error: No route to host]
simon-perretta-img has joined #panfrost
simon-perretta-img_ has joined #panfrost
Daanct12 has quit [Quit: Quit]
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img_ has quit []
simon-perretta-img has joined #panfrost
rasterman has joined #panfrost
davidlt has joined #panfrost
erlehmann has joined #panfrost
erlehmann has quit []
pendingchaos_ is now known as pendingchaos
JulianGro has joined #panfrost
davidlt has quit [Ping timeout: 480 seconds]
<alyssa> MoeIcenowy: Yes
jambalaya has quit [Quit: Off to see the wizard.]
Anonym3310 has joined #panfrost
Anonym3310 has quit []
Anonym3310 has joined #panfrost
Anonym3310 has quit [Remote host closed the connection]
Anonym3310 has joined #panfrost
Anonym3345 has joined #panfrost
Anonym3345 has quit [Remote host closed the connection]
davidlt has joined #panfrost
Major_Biscuit has quit []
Anonym3310 has quit []
Anonym3310 has joined #panfrost
Anonym3310 has quit [Remote host closed the connection]
camus has quit []
nlhowell has joined #panfrost
<jekstrand> bbrezillon: Yeah. I wasn't too worried about it. The only real question is if we should leave the CPU hack for now and come back to it or start doing compute shader stuff now.
<alyssa> jekstrand: relatedly, we have 5 different ways of loading uniform data into shaders
<alyssa> 1. ld_uniform
<alyssa> 1. ld_ubo
<alyssa> er
<alyssa> 1. ld_uniform
<alyssa> 2. ld_ubo
<alyssa> 3. ld_push_constant
<alyssa> 4. ld_{sysval}, and that "magically" becomes a UBO read in the backend compiler
<alyssa> --Er wait just those
<alyssa> and Vulkan push constants are lowered to UBOs
<alyssa> and UBO reads "magically" become real push constants
<alyssa> and the driver internal shaders create fake UBOs and rely on them getting pushed but maybe reordered
<alyssa> and ld_uniform is lowered to UBO but then pushed again after, poorly
<alyssa> oh, soon:
<alyssa> 5. ld_preamble once that is landed
<alyssa> Not going to lie I kinda hate the complexity we're growing here to support "GL uniforms + UBOs + VK push constants + GL sysvals + VK sysvals + preambles + driver-internal shaders" with a shared compiler, efficiently
<alyssa> part of the problem are the primitives NIR gives us, part is what we do with them
<alyssa> I see Iris converts ld_{sysval} to something else before passing it to the intel compiler
<alyssa> To top it off, there is very little Mali-specific about our crappy UBO push pass
<jekstrand> alyssa: Intel is almost as bad. :(
<jekstrand> And it adds patch constants to the mix. :D
<alyssa> ?
<alyssa> like, binary patching?
<jekstrand> yup
<alyssa> joy
<alyssa> we would benefit from that for blend constants
<jekstrand> We use them for a few things like patching in the address of the magic constant space used by nir_load_constant
<alyssa> we used to do that on midgard
<alyssa> but the encoding is too screwed up to be practical on Bifrost
<jekstrand> And we use them for ray-tracing to patch in shader descriptors with the actual addresses of the uploaded shader.
<alyssa> anyway, if you have any ideas to make it suck less.. all ears
<alyssa> see pan_indirect_dispatch.c for the mess we have now
<alyssa> (Most of the file is magic to make the sysvals work)
<alyssa> ooh, 6. load_kernel_input
Danct12 has quit [Quit: Quitting]
<alyssa> It's like every API wants its own intrinsic
Danct12 has joined #panfrost
<alyssa> load_kernel_input for driver-internal kernels saves a few lines
<alyssa> (..Remind me why load_kernel_input and load_ubo are different?)
<HdkR> Generic intrinsic so the backend can choose between Constant and UBOs depending on whatever they want at the time? :D
<jekstrand> alyssa: We lower stuff to "standard" NIR stuff in the driver on Intel.
<jekstrand> alyssa: That keeps it out of the back-end compiler for the most part.
megi has quit [Quit: WeeChat 3.4]
megi has joined #panfrost
Danct12 has quit [Quit: Quitting]
<jekstrand> It's an annoying dance, for sure.
Danct12 has joined #panfrost
<jekstrand> alyssa: What's up with bifrost_nir_lower_store_component?
<jekstrand> bbrezillon: If you're around, could you provide an opinion or ack the first patch in my bufferview MR? It drops the advertised Vulkan version to 1.0.
<alyssa> jekstrand: Is lowering to "standard" NIR in driver... better?
<alyssa> jekstrand: wdym What's up with bifrost_nir_lower_store_component?
<jekstrand> alyssa: It csn be. It can reduce the amount of back-end code.
<jekstrand> alyssa: It's crashing because of indirect stores
<alyssa> Panfrost doesn't support indirect stores
<alyssa> that code path doesn't exist on Gallium
<alyssa> ^indirect varying stores, indirect SSBO stores are fine
<alyssa> we /could/ support them
<alyssa> interactions with XFB suck under the mali model
<jekstrand> hrm... ok
<alyssa> does VK need them?
* jekstrand is confused. nir_lower_io_to_temporaries should get rid of indirect stores
<alyssa> does panvk call that at the right time
<alyssa> before/after i/o?
<jekstrand> working on that
<alyssa> PIPE_SHADER_CAP_INDIRECT_OUTPUT_ADDR gates on Mali
<alyssa> *Gallium
<alyssa> though that's for the GLSL compiler
<alyssa> jekstrand: I do wonder what we would lose/gain by lowering XFB to store_global and then ignoring XFB in the hw varying path..
<alyssa> It'd be slower for XFB but ... maybe that doesn't matter ...
* jekstrand adds some constant folding
<alyssa> Incidentally I should really add txd support on bifrost/valhall, it isn't that hard....
<alyssa> instead of making lower_tex do it
<jekstrand> alyssa: Just needed a bit of constant folding: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15349
<jekstrand> There's another 1k CTS tests. :D
<alyssa> Woo!
<jekstrand> Fixes a giant pile of dEQP-VK.glsl.matrix.* as well. :D
* jekstrand likes 1-line fixes that fix thousands of tests. :D
<alyssa> jekstrand: I do feel nervous about this fix though
<alyssa> relying on opt passes for conformance, I mean
<alyssa> does VK allow indirect varying stores?
<jekstrand> alyssa: VK does but we lower it in panvk_vX_shader
<jekstrand> alyssa: The problem here is that nir_lower_io is lazy and generates mul+add chains that it knows will constant fold instead of checking for constants and doing the math on the CPU.
<alyssa> Alright...
<alyssa> I guess that's OK, just makes me nervous
<jekstrand> As the author of nir_lower[_explicit]_io, it was intentional. :)
nlhowell has quit [Read error: Connection reset by peer]
nlhowell has joined #panfrost
<alyssa> Alright, I trust you
jambalaya has joined #panfrost
<jekstrand> You probably shouldn't in general (question everything!) but in this case I think you can. :D
<alyssa> :)
* alyssa is procrastinating on another paper due tonight
<alyssa> i should maybe start it.
* jekstrand is trying to avoid the temptation to play with his new toy...
<alyssa> jekstrand: OOI, what % passing are we for VK 1.0?
<alyssa> (has panvk survived a full run?)
<jekstrand> alyssa: We crash a lot still but we're getting there. Currently at 42k fails
<jekstrand> 12k crash
<alyssa> of?
<jekstrand> idk off-hand
<alyssa> alright
<alyssa> point is we're very far? :(
<jekstrand> Yeah
<jekstrand> And stuff like indirect rendering and indexed rendering are hard requirements and we can't pull "look at it from the CPU" tricks.
<alyssa> nod
<jekstrand> Every crash I get rid of makes the CTS faster, though. Runs are now around 10 hours.
<jekstrand> Down from 36 when I started hacking.
<jekstrand> That'll help A LOT
<alyssa> wow, nice
<alyssa> crash recovery sucks hard then lol
<jekstrand> Yeah. deqp-vk is a pig to start up.
<alyssa> anything I can to help?
<alyssa> other than er what i already do
<jekstrand> For now, just answer questions and review panfrost compiler patches. :)
<alyssa> excellent :-D
<alyssa> jekstrand: Oh, pro-tip: PAN_MESA_DEBUG=sync (or panvk equivalent) makes us crash on GPU faults/hangs.
<alyssa> as opposed to them happening silently
<jekstrand> And, snark. Feel free to snark. :D
Anonym3310 has joined #panfrost
<alyssa> I realize that's not helpful yet, but once you get through the CPU assert / UB / etc crashes, those will be good
<alyssa> =sync hurts perf but helps catch flakes early, so we run CI with it
<jekstrand> yup
<jekstrand> ANV has a ANV_ABORT_ON_DEVICE_LOSS env var which does something similar.
<alyssa> PANVK_DEBUG=sync i think
<jekstrand> Actually, it's MESA_VK_ABORT_ON_DEVICE_LOSS now.
Anonym3310 has quit []
<jekstrand> And panvk will pick that one up once I port it to the common device lost code
<alyssa> we don't seem to run panvk CI with it..
<alyssa> common device lost code?
<alyssa> i think we need new UABI for that
<jekstrand> Oh, it's based on the driver's hang detection
nlhowell has quit [Ping timeout: 480 seconds]
<jekstrand> If the driver doesn't detect hangs, it won't work. :)
<alyssa> yeah we don't have proper hang detection
<jekstrand> It just detects VK_ERROR_DEVICE_LOST and aborts
<jekstrand> If you don't throw that properly, it won't work.
<alyssa> need it for GL too though..
<jekstrand> Yeah
Anonym3310 has joined #panfrost
<alyssa> jekstrand: re index_type, see panfrost_translate_index_size for a hand optimized version
Fgfdyh has joined #panfrost
<alyssa> that would work for vk too with an extra >> 3 thrown in
<jekstrand> And... fixing UBO alignments gets rid of a bunch more crashes. :D
<alyssa> :tada:
<alyssa> re setting up attribute buffers to support instancing, wonder if we want that as a common helper
<alyssa> it's a pile of code that should be identical across drivers and is decidedly different on valhall
<Anonym3310> Hi
<jekstrand> alyssa: Yeah, that might not be a bad idea.
<alyssa> jekstrand: let me just.. start this essay first.. umm
* alyssa closes gitlab begrudgingly
<jekstrand> alyssa: I've been thinking about some of the attribute stup stuff in light of indirect. I'm wondering if we don't want some very carefully crafted helpers that we can compile as OpenCL kernels.
<jekstrand> It adds LLVM to the build pipeline which is unfortunate.
<jekstrand> But it might let us write it once in C and get indirect for freeish
<jekstrand> alyssa: I've not gone too far down that road yet. There's a lot of details to make it viable and it may be more infrastructure than it's worth in the end.
<jekstrand> Then again, the indirect code in panfrost has been buggy and it's really hard to debug giant piles of nir_builder
<Anonym3310> Hello, who can help solve the problem with the panfrost driver?
<jekstrand> alyssa: Did you want a "panfrost" or "bifrost" prefix on that compiler patch?
* jekstrand goes with bifrost for now
<Anonym3310> In general, I collect mesa using panfrost, but for some reason glxinfo displays softpipe, not panfrost
<alyssa> jekstrand: Either bifrost or pan/bi
<Anonym3310> translated using Google translate from Russian
<alyssa> panfrost i mostly reserve for gallium + src/panfrost/lib
<anarsoul> Anonym3310: likely your display driver (not GPU driver, these are separate hw) isn't supported. What's your hardware and distro?
nlhowell has joined #panfrost
<icecream95> jekstrand: I've already used a couple of OpenCL kernels for driver-internal purposes in my fork of Mesa
<icecream95> (Based off the intel_clc code)
<Anonym3310> anarsoul Mali-G76 MC-4, android, chroot debian 11
<jekstrand> icecream95: Good to know. I've been thinking a bit about how to best do that.
<alyssa> ^^
<alyssa> icecream95: 's AFBC kernesl scare me but the infrastructure is reasonable :-)
<icecream95> Currently I'm just embedding SPIR-V in a header (or alternatively loading from a file at runtime) but maybe it would be better to precompile it
<jekstrand> icecream95: If there are few enough HW differences or if takes a long time to compile, maybe.
Fgfdyh has quit [Remote host closed the connection]
<jekstrand> icecream95: But if the compile time from SPIR-V is short enough, meh.
<alyssa> roughly 1 kernel per major arch
<icecream95> alyssa: ..And then there are all the Midgard quirks
<alyssa> occasionally need to split per-model due to errata or implementation details
<alyssa> yeah
<anarsoul> Anonym3310: in short: it won't work that way. 1) You don't have panfrost kernel driver 2) Likely you don't have a proper display driver either
<icecream95> Serialized NIR is also an option, but I guess the binary format isn't stable, with SPIR-V we can just include the file in git for people who don't have the bits required for compiling OpenCL
<alyssa> I don't know how I feel about checking in SPIR-V to git
<icecream95> SPIR-V also has the advantage that you can decide whether you want printf at runtime :)
<alyssa> I suspect Debian will complain about it
<icecream95> alyssa: Have an option to ignore precompiled shaders and always compile from source?
<alyssa> I suspect Debian will still complain about it
<icecream95> Users will complain if we force them to compile SPIRV-LLVM-Translator and libclc and all the other stuff
<alyssa> yeah...
<alyssa> now you see why I've dragged my feet on this :)
<jekstrand> icecream95: Serialized NIR may be problematic if you cross-compile.
<jekstrand> I don't think we need to check SPIR-V into git. Just depend on LLVM at build time.
nlhowell has quit [Ping timeout: 480 seconds]
<jekstrand> But maybe someone will complain about that too
<alyssa> That hits the "users will complain" point
<icecream95> jekstrand: Cross-compiling is problematic anyway.. you have to compile at least the clc code for two architectures
<jekstrand> They don't need to compile LLVM stuff unless they're on Gentoo.
<Anonym3310> anarsoul: Hope the translation doesn't fail. 1 person from w3bsit3-dns.com forum was able to build mesa with panfrost and run it in termux in proot. I want to do the same but don't understand the reason for the failure. And it's definitely not in the core, and not in incompatibility. LINK: https://4pda.to/forum/index.php?showtopic=741456&view=findpost&p=109727534
<icecream95> A bunch of distros still don't have some of the other stuff needed, such as SPIRV-LLVM-Translator that I already mentioned
<jekstrand> It won't be like radeon where they need a new LLVM version. 12+ should work more-or-less forever.
<Anonym3310> 4pda forum*
<icecream95> But I guess they'll add it pretty quick if it's required for building Mesa
<jekstrand> Yup
<icecream95> ..Part of your secret plan to get Mesa OpenCL everywhere?
<jekstrand> And SPIRV-LLVM-Translator is super easy to build. You don't need to build LLVM yourself. Just grab the branch corresponding to your LLVM version, cmake, and make
<jekstrand> icecream95: hehe. :)
<icecream95> ..but it's C++, and so takes far longer to compile than Mesa itself
<jekstrand> heh
<jekstrand> Nah, that one's not bad. It's only a few files
<alyssa> does Debian have SPIRV-LLVM-Translator yet
<icecream95> The other question is.. OpenCL Rust when?
<icecream95> (Instead of OpenCL C)
<alyssa> llvm-spirv, yes, cool
<alyssa> in stable, even
<anarsoul> Anonym3310: well, check if you have panfrost kernel driver (lsmod | grep panfrost), then check if you have a display driver (that will depend on your platform), and it has to be supported by kmsro
<Anonym3310> anarsoul: Have display driver. Otherwise, the display itself would simply not work. Also in the core android drivers are not in the form of modules
<Anonym3310> By the way, does this chat have chats in matrix or telegram?
<anarsoul> Anonym3310: it needs to be a proper drm driver, not a usual android mess
<anarsoul> android usually uses kernel driver from ARM, it's not compatible with mesa
<anarsoul> anyway, it goes nowhere, I don't have any magical solution for you
<anarsoul> have fun with debugging it :)
<Anonym3310> anarsoul: How can I do debugging?
<icecream95> alyssa: I note that "panfrost: Process scissor state earlier" (the top commit on the "frozen-dairy" branch) was never merged..
<icecream95> But the way that commit is now, it will just keep emitting viewports if there are a lot of draws which get skipped..
<icecream95> Oh wait, that branch got deleted, where is the commit now?
<anarsoul> Anonym3310: gdb or add printf-s
<icecream95> Anyway, that commit is not upstream and the bug still exists
nlhowell has joined #panfrost
Danct12 has quit [Ping timeout: 480 seconds]
<Anonym3310> anarsoul: Ok thanks, I'll look for a solution
Anonym3310 has quit [Quit: CoreIRC for Android - www.coreirc.com]
Anonym3310 has joined #panfrost
Anonym3310 has quit []
Anonym3310 has joined #panfrost
Anonym3310 has quit []
Danct12 has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
erlehmann has joined #panfrost
<alyssa> icecream95: Uhhh
<alyssa> It was causing some problem (CI flake or something) and I put off debugging to merge the rest (causing the branch to get deleted by gitlab) and then I forgot about it
<alyssa> repushed frozen-dairy
FLHerne has quit [Quit: There's a real world out here!]
<alyssa> it's of course possible the flake was unrelated, I should revisit
<alyssa> and/or if your fix doesn't have the issue I can merge that?
<alyssa> IIRC yours was a lot more sophisticated
FLHerne has joined #panfrost
<alyssa> mine seems a bit simpler, might be perf problems
<icecream95> Well, doing a lot of draws that get skipped is arguably an application bug..
<alyssa> Yes ;D
<icecream95> In terms of memory allocations, it's only 32 bytes per draw, which is less than a normal draw that isn't skipped
<icecream95> (Well at least for !CSF)
<alyssa> CSF is less mem/draw I hope?
<icecream95> Yup, if no state changes you could maybe get away with only eight bytes per draw
* jekstrand is implementing coarse derivatives
<alyssa> icecream95: nice!
<alyssa> jekstrand: also, nice!
<alyssa> you're both nice!
<alyssa> :-p
<alyssa> <icecream95 and jekstrand, simultaneously> Blasphemy
<icecream95> With CSF, viewports are uploaded a field at a time, but each instruction writes six bytes, and I don't know what happens to the upper two bytes of the second word..
<icecream95> (Are they reset? Are they left alone? Do they copy the upper two bytes of the instruction?)
<alyssa> jekstrand: ^
<jekstrand> What can I say? It's Friday. It's not like I have anything else to do. It's not like I've got a new game I'm dying to play.
<alyssa> jekstrand: Re loweing ldexp, uadd_carry, usub_borrow, frexp,
<alyssa> we should have native ops for most of that?
<jekstrand> alyssa: I can look, I guess.
<alyssa> LDEXP.f32, LDEXP.v2f16, IADDC.i32, ISUBB.i32, +FREXPE.f32, +FREXPM.f32, +FREXPE.v2f16, +FREXPM.v2f16
<alyssa> uadd_carry/usub_borrow need to be lowered to 32-bits I guess but they don't make much sense as 8/16-bit anyway
<icecream95> That reminds me of how making 8-bit ops actually 16-bit on Midgard breaks some of them, such as ADDSAT
<alyssa> Weee
<alyssa> TBF have you seen midgard's 8-bit support
<alyssa> ("Yes.")
<jekstrand> alyssa: Yup
<jekstrand> alyssa: I'm not going to care about 16-bit for now
<alyssa> Yeah but I am..
<alyssa> jekstrand: more importantly uadd_carry is already lowered :-p
<alyssa> jekstrand: wait, uadd_carry/usub_borrow are not native ops for us nvm, lower away
<alyssa> IADDC.i32 is a ternary add, (a + b + (c & 1))
<jekstrand> alyssa: Yeah... :-/
<alyssa> still one op though
<alyssa> ICMP.u32.lt.i1 d, a, b
<jekstrand> alyssa: Yeah, but seriously no one uses it
<alyssa> K
<jekstrand> fr/ldexp, on the other hand, those being native is good
<alyssa> bifrost has opcodes for everything
<alyssa> it's kind of a problem
<jekstrand> lol
davidlt has quit [Ping timeout: 480 seconds]
<icecream95> alyssa: It was maybe a mistake to use the same datastructure for liveness and constraints.. I want to remove the binary search on orr for constraints, but that would break liveness..
<icecream95> Or should I just give up and create an MR with the code I have now?
<icecream95> I do wonder how I could port the improvements to Midgard.. Eight bits is not enough for encoding constraints there
<alyssa> dschuermann: ^, questions for you, my unilaterally appointed RA maintainer ;-p
<icecream95> That reminds me that I have two or three blog posts about that which I need to finish writing
<dschuermann> alyssa: uhm, guess I should read your paper about that constraints-solver RA thing you wrote?
<dschuermann> :P
<icecream95> ..Is there even a point to SSA-based RA with all of my optimisations for LCRA?
<alyssa> dschuermann, SSA-based RA; icecream95, hand optimized LCRA
<alyssa> go go go go go go! :-p
* alyssa grabs popcorn
<dschuermann> if you want to go from 600loc to 3000? sure :D
<icecream95> I can write a bunch of NEON intrinsics for LCRA if I want to do that
* jekstrand should figure out why fairly basic texturing stuff doesn't work. :-/
<jekstrand> alyssa: Is this your new troll maintainership strategy? :D
<jekstrand> alyssa: Can Mali's f32->f16 conversion flush denorms on command?
<icecream95> alyssa: What about if I merge my RA optimisations with my scheduler rewrite, so that codegen improves while compile time stays constant?
<icecream95> (The scheduler rewrite currently brute-forces every possible scheduling for each clause to find the best one)
<dschuermann> o_O and you tested that on something other than the triangle demo?
<icecream95> I have not tried it on large shaders yet
<icecream95> At least it doesn't try every possible scheduling for the whole program..
<alyssa> technically polynomial time is the best polynomial time
<alyssa> your algorithm is, errr... O(n^16) ?
<icecream95> I'm sure with a little NEON magic it will all work out fine...
* jekstrand wonders how many MRs he's going to post today
<alyssa> NEON only improves perf by a constant factor
<HdkR> That's fine, just do some loop unrolling and throw six vector pipelines at it
<HdkR> :>
<jekstrand> Run RA on the GPU!
<alyssa> Also, the probably of my merging NEON into the compiler is small and conditional on my mood
<alyssa> probability
<dschuermann> jekstrand: :D
<icecream95> alyssa: But it could be quite a large constant factor if I can make things so that e.g. scheduling an instruction is as simple as an addition operation
<dschuermann> alyssa: having a quick look at your paper and the problem you had to solve, I don't think SSA-based RA is a good fit for vector archs at all
<dschuermann> icecream95: is that postRA or preRA scheduling?
<alyssa> dschuermann: Yeah... but Bifrost is not vector (unlike Midgard)
<alyssa> (LCRA was for Midgard and got copy pasted into Bifrost and has somehow survived )
<icecream95> dschuermann: Scheduling for Bifrost is done after RA
<dschuermann> well... then there is probably still some room
<dschuermann> ACO doesn't have a postRA scheduler at all ;)
<icecream95> Midgard does scheduling before RA, which is partly why the spilling code has so many bugs
<icecream95> alyssa: After rewriting scheduling, maybe some improvements could be made to RA.. such as not reusing a register too soon after it has been made 'free'
<dschuermann> that sounds strange... shouldn't it be more tested because more stressed?
<icecream95> dschuermann: Midgard has some constraints such as IIRC not being able to read from a register used for memory stores, and the spill can't be inserted until the end of the bundle, so if a spilt register is used multiple times inside the bundle, there are problems
<dschuermann> that sounds awful to deal with :|
<dschuermann> have to temporarily assign registers for the bundle until it succeeded and only then commit?
<alyssa> doing RA post-bundling on midgard was another one of those regrets I learned from for Bifrost....
<icecream95> But given that Valhall doesn't have bundling, how should any scheduling fit in there?
<dschuermann> you mean, the first time you have a sane architecture, and you don't know what to do now? :D
<alyssa> :-D
* jekstrand kicks off another panvk run and calls it a day
<alyssa> a week even
<HdkR> panvk run day. The day between Thursday and Friday.
<jekstrand> HdkR: I've got a full run down to about 10 hours
<HdkR> That's a heck of an improvement!
rasterman has quit [Quit: Gettin' stinky!]
<jekstrand> Time to head off into the forbidden west!
* icecream95 looks west
<icecream95> Into the sea?!
<HdkR> Swimming in the ocean must be illegal ever since COVID
<HdkR> forbidden
<alyssa> HdkR: you sure that's not just california
<HdkR> mmm, Forbidden avocado toast
<HdkR> Really panvk just needs to get good enough to run the game's prequel that Jason is talking about :>
<HdkR> I like the image being, "You cannot go east"
<HdkR> I'm making the assumption that the forbidden west is in reference to the Horizon game
<HdkR> Unless your reference was from The Hobbit, which I don't really know :D
<icecream95> HdkR: That the west is forbidden is not even mentioned until LOTR
<HdkR> Neat. I've never read the books or watched the movies
* icecream95 can see several filming locations for the movies from where he sits
<HdkR> https://twitter.com/FEX_Emu/status/1487597165312495619 This game engine already runs under emulation. Looking forward to when the sequel launches on PC in three years to give it a whirl :D
<icecream95> Advantages of having a status bar that updates every second: Your computer makes a nice ticking noise when plugged in