#panfrost on 2022-03-11 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

00:00 <cwabbott> there's no vertex cache and the index buffer doesn't even get read until rasterization

00:00 <cwabbott> it's... bad

00:00 <jekstrand> Yeah

00:00 <jekstrand> But we can't do a CPU crawl in Vulkan. Not an option.

00:00 <jekstrand> So we have to eat the suck

00:00 <cwabbott> iiuc the vertex_range is just the min/max index, so you don't have to traverse the whole thing

00:00 <cwabbott> I think vulkan does a compute shader to set it

00:01 <jekstrand> I mean, we could use the size of the whole vertex buffer at least for the allocation

00:01 <jekstrand> But then we really want to run a compute shader to figure out size, then run another to actually process the vertex data.

00:01 <cwabbott> it crawls the index buffer with a compute shader that determines the min/max and writes the value to the descriptor

00:02 <cwabbott> or at least that's what the blob does

00:02 <jekstrand> yup

00:02 <jekstrand> That's what we need to be doing in panvk

00:08 rasterman has joined #panfrost

00:08 <icecream95> cwabbott: "until rasterization".. You mean until tiling? Rasterization works on a stream of polygons and does not read index buffers

00:29 vstehle has quit [Ping timeout: 480 seconds]

00:29 rasterman has quit [Quit: Gettin' stinky!]

00:30 * jekstrand kicks off a new panvk run and calls it a day

00:44 <alyssa> cwabbott: What's really stupid, I think we still need that index crawl with IDVS on Bifrost

00:46 <alyssa> We don't with IDVS on Valhall, finally.

00:47 <HdkR> Valhall is so much better \o/

00:56 <alyssa> It really is >.>

00:57 <alyssa> Ranking of Malis:

00:57 <alyssa> 1. Valhall

00:57 <alyssa> 2. Midgard

00:57 <alyssa> 3. Bifrost

00:57 <alyssa> 4. Utgard

00:58 <alyssa> Utgard would beat Bifrost if not for the GP

00:58 <alyssa> Bifrost loses hard with clauses/exposed pipelines

00:58 <alyssa> Midgard loses out on VLIW

00:59 <HdkR> Only took ARM four tries to make a Valhall in to the perfect product

01:00 <alyssa> The ML-focused bits are still pretty wacky

01:00 <alyssa> i8vec4 on a scalar arch is.. unusual..

01:00 <HdkR> If it results in more people writing ML upscalers then I'm happy :D

01:08 <HdkR> I love a bit of SWAR though. Since it means if you write bad code, then it will be terrible

02:44 Moe_Icenowy has quit []

02:44 MoeIcenowy has joined #panfrost

02:47 <MoeIcenowy> alyssa: I think Utgard PP evolves to Midgard?

02:59 JulianGro has quit [Remote host closed the connection]

02:59 Daanct12 has joined #panfrost

05:31 davidlt has joined #panfrost

06:00 vstehle has joined #panfrost

07:24 <bbrezillon> jekstrand, cwabbott: we already have this min/max-calculation-in-compute logic in the indirect draw path (src/panfrost/lib/pan_indirect_draw.c), we can probably re-use some of that

07:25 <bbrezillon> https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/panfrost/lib/pan_indirect_draw.c#L896

07:47 davidlt has quit [Ping timeout: 480 seconds]

07:57 MajorBiscuit has joined #panfrost

08:00 Major_Biscuit has joined #panfrost

08:07 MajorBiscuit has quit [Ping timeout: 480 seconds]

08:28 kenzie35 has quit []

08:29 kenzie35 has joined #panfrost

08:31 kenzie35 has quit []

08:33 kenzie35 has joined #panfrost

08:33 kenzie35 has quit []

08:36 kenzie35 has joined #panfrost

08:37 erlehmann has quit [Ping timeout: 480 seconds]

08:43 pendingchaos_ has joined #panfrost

08:44 pendingchaos has quit [Ping timeout: 480 seconds]

09:49 simon-perretta-img has quit [Read error: No route to host]

09:53 simon-perretta-img has joined #panfrost

09:59 simon-perretta-img_ has joined #panfrost

10:01 Daanct12 has quit [Quit: Quit]

10:04 simon-perretta-img has quit [Ping timeout: 480 seconds]

10:17 simon-perretta-img_ has quit []

10:18 simon-perretta-img has joined #panfrost

10:34 rasterman has joined #panfrost

10:53 davidlt has joined #panfrost

11:09 erlehmann has joined #panfrost

11:17 erlehmann has quit []

11:52 pendingchaos_ is now known as pendingchaos

13:10 JulianGro has joined #panfrost

13:47 davidlt has quit [Ping timeout: 480 seconds]

14:03 <alyssa> MoeIcenowy: Yes

15:15 jambalaya has quit [Quit: Off to see the wizard.]

15:26 Anonym3310 has joined #panfrost

15:27 Anonym3310 has quit []

15:34 Anonym3310 has joined #panfrost

15:40 Anonym3310 has quit [Remote host closed the connection]

15:43 Anonym3310 has joined #panfrost

15:46 Anonym3345 has joined #panfrost

15:51 Anonym3345 has quit [Remote host closed the connection]

15:53 davidlt has joined #panfrost

15:59 Major_Biscuit has quit []

16:02 Anonym3310 has quit []

16:41 Anonym3310 has joined #panfrost

16:45 Anonym3310 has quit [Remote host closed the connection]

17:27 camus has quit []

17:28 nlhowell has joined #panfrost

17:33 <jekstrand> bbrezillon: Yeah. I wasn't too worried about it. The only real question is if we should leave the CPU hack for now and come back to it or start doing compute shader stuff now.

17:43 <alyssa> jekstrand: relatedly, we have 5 different ways of loading uniform data into shaders

17:43 <alyssa> 1. ld_uniform

17:43 <alyssa> 1. ld_ubo

17:43 <alyssa> er

17:43 <alyssa> 1. ld_uniform

17:43 <alyssa> 2. ld_ubo

17:43 <alyssa> 3. ld_push_constant

17:43 <alyssa> 4. ld_{sysval}, and that "magically" becomes a UBO read in the backend compiler

17:44 <alyssa> --Er wait just those

17:44 <alyssa> and Vulkan push constants are lowered to UBOs

17:44 <alyssa> and UBO reads "magically" become real push constants

17:44 <alyssa> and the driver internal shaders create fake UBOs and rely on them getting pushed but maybe reordered

17:44 <alyssa> and ld_uniform is lowered to UBO but then pushed again after, poorly

17:44 <alyssa> oh, soon:

17:45 <alyssa> 5. ld_preamble once that is landed

17:46 <alyssa> Not going to lie I kinda hate the complexity we're growing here to support "GL uniforms + UBOs + VK push constants + GL sysvals + VK sysvals + preambles + driver-internal shaders" with a shared compiler, efficiently

17:46 <alyssa> part of the problem are the primitives NIR gives us, part is what we do with them

17:47 <alyssa> I see Iris converts ld_{sysval} to something else before passing it to the intel compiler

17:49 <alyssa> To top it off, there is very little Mali-specific about our crappy UBO push pass

17:50 <jekstrand> alyssa: Intel is almost as bad. :(

17:51 <jekstrand> And it adds patch constants to the mix. :D

17:51 <alyssa> ?

17:51 <alyssa> like, binary patching?

17:51 <jekstrand> yup

17:51 <alyssa> joy

17:51 <alyssa> we would benefit from that for blend constants

17:52 <jekstrand> We use them for a few things like patching in the address of the magic constant space used by nir_load_constant

17:52 <alyssa> we used to do that on midgard

17:52 <alyssa> but the encoding is too screwed up to be practical on Bifrost

17:52 <jekstrand> And we use them for ray-tracing to patch in shader descriptors with the actual addresses of the uploaded shader.

17:58 <alyssa> anyway, if you have any ideas to make it suck less.. all ears

17:59 <alyssa> see pan_indirect_dispatch.c for the mess we have now

17:59 <alyssa> (Most of the file is magic to make the sysvals work)

18:00 <alyssa> ooh, 6. load_kernel_input

18:02 Danct12 has quit [Quit: Quitting]

18:02 <alyssa> It's like every API wants its own intrinsic

18:05 Danct12 has joined #panfrost

18:13 <alyssa> load_kernel_input for driver-internal kernels saves a few lines

18:15 <alyssa> (..Remind me why load_kernel_input and load_ubo are different?)

18:16 <HdkR> Generic intrinsic so the backend can choose between Constant and UBOs depending on whatever they want at the time? :D

18:20 <jekstrand> alyssa: We lower stuff to "standard" NIR stuff in the driver on Intel.

18:20 <jekstrand> alyssa: That keeps it out of the back-end compiler for the most part.

18:22 megi has quit [Quit: WeeChat 3.4]

18:22 megi has joined #panfrost

18:23 Danct12 has quit [Quit: Quitting]

18:24 <jekstrand> It's an annoying dance, for sure.

18:26 Danct12 has joined #panfrost

18:30 <jekstrand> alyssa: What's up with bifrost_nir_lower_store_component?

18:31 <jekstrand> bbrezillon: If you're around, could you provide an opinion or ack the first patch in my bufferview MR? It drops the advertised Vulkan version to 1.0.

18:31 <alyssa> jekstrand: Is lowering to "standard" NIR in driver... better?

18:31 <alyssa> jekstrand: wdym What's up with bifrost_nir_lower_store_component?

18:31 <jekstrand> alyssa: It csn be. It can reduce the amount of back-end code.

18:31 <jekstrand> alyssa: It's crashing because of indirect stores

18:32 <alyssa> Panfrost doesn't support indirect stores

18:32 <alyssa> that code path doesn't exist on Gallium

18:32 <alyssa> ^indirect varying stores, indirect SSBO stores are fine

18:33 <alyssa> we /could/ support them

18:34 <alyssa> interactions with XFB suck under the mali model

18:34 <jekstrand> hrm... ok

18:34 <alyssa> does VK need them?

18:35 * jekstrand is confused. nir_lower_io_to_temporaries should get rid of indirect stores

18:35 <alyssa> does panvk call that at the right time

18:35 <alyssa> before/after i/o?

18:36 <jekstrand> working on that

18:37 <alyssa> PIPE_SHADER_CAP_INDIRECT_OUTPUT_ADDR gates on Mali

18:37 <alyssa> *Gallium

18:38 <alyssa> though that's for the GLSL compiler

18:48 <alyssa> jekstrand: I do wonder what we would lose/gain by lowering XFB to store_global and then ignoring XFB in the hw varying path..

18:48 <alyssa> It'd be slower for XFB but ... maybe that doesn't matter ...

18:49 * jekstrand adds some constant folding

18:49 <alyssa> Incidentally I should really add txd support on bifrost/valhall, it isn't that hard....

18:50 <alyssa> instead of making lower_tex do it

18:54 <jekstrand> alyssa: Just needed a bit of constant folding: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15349

18:55 <jekstrand> There's another 1k CTS tests. :D

18:57 <alyssa> Woo!

18:57 <jekstrand> Fixes a giant pile of dEQP-VK.glsl.matrix.* as well. :D

18:57 * jekstrand likes 1-line fixes that fix thousands of tests. :D

18:57 <alyssa> jekstrand: I do feel nervous about this fix though

18:58 <alyssa> relying on opt passes for conformance, I mean

18:58 <alyssa> does VK allow indirect varying stores?

18:58 <jekstrand> alyssa: VK does but we lower it in panvk_vX_shader

18:59 <jekstrand> alyssa: The problem here is that nir_lower_io is lazy and generates mul+add chains that it knows will constant fold instead of checking for constants and doing the math on the CPU.

19:00 <alyssa> Alright...

19:00 <alyssa> I guess that's OK, just makes me nervous

19:00 <jekstrand> As the author of nir_lower[_explicit]_io, it was intentional. :)

19:02 nlhowell has quit [Read error: Connection reset by peer]

19:02 nlhowell has joined #panfrost

19:03 <alyssa> Alright, I trust you

19:03 jambalaya has joined #panfrost

19:03 <jekstrand> You probably shouldn't in general (question everything!) but in this case I think you can. :D

19:03 <alyssa> :)

19:04 * alyssa is procrastinating on another paper due tonight

19:04 <alyssa> i should maybe start it.

19:04 * jekstrand is trying to avoid the temptation to play with his new toy...

19:04 <alyssa> jekstrand: OOI, what % passing are we for VK 1.0?

19:05 <alyssa> (has panvk survived a full run?)

19:05 <jekstrand> alyssa: We crash a lot still but we're getting there. Currently at 42k fails

19:05 <jekstrand> 12k crash

19:06 <alyssa> of?

19:06 <jekstrand> idk off-hand

19:06 <alyssa> alright

19:06 <alyssa> point is we're very far? :(

19:06 <jekstrand> Yeah

19:07 <jekstrand> And stuff like indirect rendering and indexed rendering are hard requirements and we can't pull "look at it from the CPU" tricks.

19:07 <alyssa> nod

19:11 <jekstrand> Every crash I get rid of makes the CTS faster, though. Runs are now around 10 hours.

19:11 <jekstrand> Down from 36 when I started hacking.

19:11 <jekstrand> That'll help A LOT

19:11 <alyssa> wow, nice

19:12 <alyssa> crash recovery sucks hard then lol

19:12 <jekstrand> Yeah. deqp-vk is a pig to start up.

19:12 <alyssa> anything I can to help?

19:13 <alyssa> other than er what i already do

19:13 <jekstrand> For now, just answer questions and review panfrost compiler patches. :)

19:13 <alyssa> excellent :-D

19:13 <alyssa> jekstrand: Oh, pro-tip: PAN_MESA_DEBUG=sync (or panvk equivalent) makes us crash on GPU faults/hangs.

19:14 <alyssa> as opposed to them happening silently

19:14 <jekstrand> And, snark. Feel free to snark. :D

19:14 Anonym3310 has joined #panfrost

19:14 <alyssa> I realize that's not helpful yet, but once you get through the CPU assert / UB / etc crashes, those will be good

19:14 <alyssa> =sync hurts perf but helps catch flakes early, so we run CI with it

19:15 <jekstrand> yup

19:15 <jekstrand> ANV has a ANV_ABORT_ON_DEVICE_LOSS env var which does something similar.

19:15 <alyssa> PANVK_DEBUG=sync i think

19:15 <jekstrand> Actually, it's MESA_VK_ABORT_ON_DEVICE_LOSS now.

19:15 Anonym3310 has quit []

19:16 <jekstrand> And panvk will pick that one up once I port it to the common device lost code

19:16 <alyssa> we don't seem to run panvk CI with it..

19:16 <alyssa> common device lost code?

19:16 <alyssa> i think we need new UABI for that

19:16 <jekstrand> Oh, it's based on the driver's hang detection

19:16 nlhowell has quit [Ping timeout: 480 seconds]

19:16 <jekstrand> If the driver doesn't detect hangs, it won't work. :)

19:17 <alyssa> yeah we don't have proper hang detection

19:17 <jekstrand> It just detects VK_ERROR_DEVICE_LOST and aborts

19:17 <jekstrand> If you don't throw that properly, it won't work.

19:17 <alyssa> need it for GL too though..

19:17 <jekstrand> Yeah

19:20 Anonym3310 has joined #panfrost

19:23 <alyssa> jekstrand: re index_type, see panfrost_translate_index_size for a hand optimized version

19:23 Fgfdyh has joined #panfrost

19:23 <alyssa> that would work for vk too with an extra >> 3 thrown in

19:24 <jekstrand> And... fixing UBO alignments gets rid of a bunch more crashes. :D

19:25 <alyssa> :tada:

19:25 <alyssa> re setting up attribute buffers to support instancing, wonder if we want that as a common helper

19:26 <alyssa> it's a pile of code that should be identical across drivers and is decidedly different on valhall

19:26 <Anonym3310> Hi

19:27 <jekstrand> alyssa: Yeah, that might not be a bad idea.

19:27 <alyssa> jekstrand: let me just.. start this essay first.. umm

19:27 * alyssa closes gitlab begrudgingly

19:27 <jekstrand> alyssa: I've been thinking about some of the attribute stup stuff in light of indirect. I'm wondering if we don't want some very carefully crafted helpers that we can compile as OpenCL kernels.

19:28 <jekstrand> It adds LLVM to the build pipeline which is unfortunate.

19:28 <jekstrand> But it might let us write it once in C and get indirect for freeish

19:29 <jekstrand> alyssa: I've not gone too far down that road yet. There's a lot of details to make it viable and it may be more infrastructure than it's worth in the end.

19:30 <jekstrand> Then again, the indirect code in panfrost has been buggy and it's really hard to debug giant piles of nir_builder

19:30 <Anonym3310> Hello, who can help solve the problem with the panfrost driver?

19:31 <jekstrand> alyssa: Did you want a "panfrost" or "bifrost" prefix on that compiler patch?

19:32 * jekstrand goes with bifrost for now

19:32 <Anonym3310> In general, I collect mesa using panfrost, but for some reason glxinfo displays softpipe, not panfrost

19:33 <alyssa> jekstrand: Either bifrost or pan/bi

19:33 <Anonym3310> translated using Google translate from Russian

19:33 <alyssa> panfrost i mostly reserve for gallium + src/panfrost/lib

19:35 <anarsoul> Anonym3310: likely your display driver (not GPU driver, these are separate hw) isn't supported. What's your hardware and distro?

19:35 nlhowell has joined #panfrost

19:38 <icecream95> jekstrand: I've already used a couple of OpenCL kernels for driver-internal purposes in my fork of Mesa

19:38 <icecream95> (Based off the intel_clc code)

19:39 <Anonym3310> anarsoul Mali-G76 MC-4, android, chroot debian 11

19:39 <jekstrand> icecream95: Good to know. I've been thinking a bit about how to best do that.

19:39 <alyssa> ^^

19:40 <alyssa> icecream95: 's AFBC kernesl scare me but the infrastructure is reasonable :-)

19:40 <icecream95> Currently I'm just embedding SPIR-V in a header (or alternatively loading from a file at runtime) but maybe it would be better to precompile it

19:41 <jekstrand> icecream95: If there are few enough HW differences or if takes a long time to compile, maybe.

19:41 Fgfdyh has quit [Remote host closed the connection]

19:41 <jekstrand> icecream95: But if the compile time from SPIR-V is short enough, meh.

19:42 <alyssa> roughly 1 kernel per major arch

19:42 <icecream95> alyssa: ..And then there are all the Midgard quirks

19:42 <alyssa> occasionally need to split per-model due to errata or implementation details

19:42 <alyssa> yeah

19:42 <anarsoul> Anonym3310: in short: it won't work that way. 1) You don't have panfrost kernel driver 2) Likely you don't have a proper display driver either

19:44 <icecream95> Serialized NIR is also an option, but I guess the binary format isn't stable, with SPIR-V we can just include the file in git for people who don't have the bits required for compiling OpenCL

19:44 <alyssa> I don't know how I feel about checking in SPIR-V to git

19:44 <icecream95> SPIR-V also has the advantage that you can decide whether you want printf at runtime :)

19:44 <alyssa> I suspect Debian will complain about it

19:45 <icecream95> alyssa: Have an option to ignore precompiled shaders and always compile from source?

19:46 <alyssa> I suspect Debian will still complain about it

19:46 <icecream95> Users will complain if we force them to compile SPIRV-LLVM-Translator and libclc and all the other stuff

19:47 <alyssa> yeah...

19:47 <alyssa> now you see why I've dragged my feet on this :)

19:50 <jekstrand> icecream95: Serialized NIR may be problematic if you cross-compile.

19:50 <jekstrand> I don't think we need to check SPIR-V into git. Just depend on LLVM at build time.

19:50 nlhowell has quit [Ping timeout: 480 seconds]

19:50 <jekstrand> But maybe someone will complain about that too

19:50 <alyssa> That hits the "users will complain" point

19:51 <icecream95> jekstrand: Cross-compiling is problematic anyway.. you have to compile at least the clc code for two architectures

19:51 <jekstrand> They don't need to compile LLVM stuff unless they're on Gentoo.

19:51 <Anonym3310> anarsoul: Hope the translation doesn't fail. 1 person from w3bsit3-dns.com forum was able to build mesa with panfrost and run it in termux in proot. I want to do the same but don't understand the reason for the failure. And it's definitely not in the core, and not in incompatibility. LINK: https://4pda.to/forum/index.php?showtopic=741456&view=findpost&p=109727534

19:52 <icecream95> A bunch of distros still don't have some of the other stuff needed, such as SPIRV-LLVM-Translator that I already mentioned

19:52 <jekstrand> It won't be like radeon where they need a new LLVM version. 12+ should work more-or-less forever.

19:52 <Anonym3310> 4pda forum*

19:52 <icecream95> But I guess they'll add it pretty quick if it's required for building Mesa

19:52 <jekstrand> Yup

19:52 <icecream95> ..Part of your secret plan to get Mesa OpenCL everywhere?

19:53 <jekstrand> And SPIRV-LLVM-Translator is super easy to build. You don't need to build LLVM yourself. Just grab the branch corresponding to your LLVM version, cmake, and make

19:53 <jekstrand> icecream95: hehe. :)

19:53 <icecream95> ..but it's C++, and so takes far longer to compile than Mesa itself

19:54 <jekstrand> heh

19:54 <jekstrand> Nah, that one's not bad. It's only a few files

19:55 <alyssa> does Debian have SPIRV-LLVM-Translator yet

19:55 <icecream95> The other question is.. OpenCL Rust when?

19:55 <icecream95> (Instead of OpenCL C)

19:55 <alyssa> llvm-spirv, yes, cool

19:55 <alyssa> in stable, even

19:57 <anarsoul> Anonym3310: well, check if you have panfrost kernel driver (lsmod | grep panfrost), then check if you have a display driver (that will depend on your platform), and it has to be supported by kmsro

20:00 <Anonym3310> anarsoul: Have display driver. Otherwise, the display itself would simply not work. Also in the core android drivers are not in the form of modules

20:01 <Anonym3310> By the way, does this chat have chats in matrix or telegram?

20:01 <anarsoul> Anonym3310: it needs to be a proper drm driver, not a usual android mess

20:03 <anarsoul> android usually uses kernel driver from ARM, it's not compatible with mesa

20:03 <anarsoul> anyway, it goes nowhere, I don't have any magical solution for you

20:04 <anarsoul> have fun with debugging it :)

20:09 <Anonym3310> anarsoul: How can I do debugging?

20:13 <icecream95> alyssa: I note that "panfrost: Process scissor state earlier" (the top commit on the "frozen-dairy" branch) was never merged..

20:15 <icecream95> But the way that commit is now, it will just keep emitting viewports if there are a lot of draws which get skipped..

20:15 <icecream95> Oh wait, that branch got deleted, where is the commit now?

20:18 <anarsoul> Anonym3310: gdb or add printf-s

20:19 <icecream95> Anyway, that commit is not upstream and the bug still exists

20:20 nlhowell has joined #panfrost

20:20 Danct12 has quit [Ping timeout: 480 seconds]

20:21 <Anonym3310> anarsoul: Ok thanks, I'll look for a solution

20:21 Anonym3310 has quit [Quit: CoreIRC for Android - www.coreirc.com]

20:22 Anonym3310 has joined #panfrost

20:22 Anonym3310 has quit []

20:23 Anonym3310 has joined #panfrost

20:23 Anonym3310 has quit []

20:31 Danct12 has joined #panfrost

20:54 nlhowell has quit [Ping timeout: 480 seconds]

20:57 erlehmann has joined #panfrost

20:58 <alyssa> icecream95: Uhhh

20:59 <alyssa> It was causing some problem (CI flake or something) and I put off debugging to merge the rest (causing the branch to get deleted by gitlab) and then I forgot about it

21:00 <alyssa> repushed frozen-dairy

21:00 FLHerne has quit [Quit: There's a real world out here!]

21:00 <alyssa> it's of course possible the flake was unrelated, I should revisit

21:00 <alyssa> and/or if your fix doesn't have the issue I can merge that?

21:01 <alyssa> IIRC yours was a lot more sophisticated

21:01 FLHerne has joined #panfrost

21:11 <alyssa> mine seems a bit simpler, might be perf problems

21:14 <icecream95> Well, doing a lot of draws that get skipped is arguably an application bug..

21:15 <alyssa> Yes ;D

21:18 <icecream95> In terms of memory allocations, it's only 32 bytes per draw, which is less than a normal draw that isn't skipped

21:18 <icecream95> (Well at least for !CSF)

21:18 <alyssa> CSF is less mem/draw I hope?

21:19 <icecream95> Yup, if no state changes you could maybe get away with only eight bytes per draw

21:19 * jekstrand is implementing coarse derivatives

21:19 <alyssa> icecream95: nice!

21:19 <alyssa> jekstrand: also, nice!

21:19 <alyssa> you're both nice!

21:19 <alyssa> :-p

21:20 <alyssa> <icecream95 and jekstrand, simultaneously> Blasphemy

21:20 <icecream95> With CSF, viewports are uploaded a field at a time, but each instruction writes six bytes, and I don't know what happens to the upper two bytes of the second word..

21:21 <icecream95> (Are they reset? Are they left alone? Do they copy the upper two bytes of the instruction?)

21:21 <alyssa> jekstrand: ^

21:21 <jekstrand> What can I say? It's Friday. It's not like I have anything else to do. It's not like I've got a new game I'm dying to play.

21:21 <alyssa> jekstrand: Re loweing ldexp, uadd_carry, usub_borrow, frexp,

21:21 <alyssa> we should have native ops for most of that?

21:23 <jekstrand> alyssa: I can look, I guess.

21:23 <alyssa> LDEXP.f32, LDEXP.v2f16, IADDC.i32, ISUBB.i32, +FREXPE.f32, +FREXPM.f32, +FREXPE.v2f16, +FREXPM.v2f16

21:23 <alyssa> uadd_carry/usub_borrow need to be lowered to 32-bits I guess but they don't make much sense as 8/16-bit anyway

21:25 <icecream95> That reminds me of how making 8-bit ops actually 16-bit on Midgard breaks some of them, such as ADDSAT

21:25 <alyssa> Weee

21:25 <alyssa> TBF have you seen midgard's 8-bit support

21:25 <alyssa> ("Yes.")

21:26 <jekstrand> alyssa: Yup

21:26 <jekstrand> alyssa: I'm not going to care about 16-bit for now

21:26 <alyssa> Yeah but I am..

21:27 <alyssa> jekstrand: more importantly uadd_carry is already lowered :-p

21:28 <alyssa> jekstrand: wait, uadd_carry/usub_borrow are not native ops for us nvm, lower away

21:28 <alyssa> IADDC.i32 is a ternary add, (a + b + (c & 1))

21:28 <jekstrand> alyssa: Yeah... :-/

21:29 <alyssa> still one op though

21:29 <alyssa> ICMP.u32.lt.i1 d, a, b

21:32 <jekstrand> alyssa: Yeah, but seriously no one uses it

21:32 <alyssa> K

21:32 <jekstrand> fr/ldexp, on the other hand, those being native is good

21:32 <alyssa> bifrost has opcodes for everything

21:32 <alyssa> it's kind of a problem

21:32 <jekstrand> lol

21:34 davidlt has quit [Ping timeout: 480 seconds]

21:35 <icecream95> alyssa: It was maybe a mistake to use the same datastructure for liveness and constraints.. I want to remove the binary search on orr for constraints, but that would break liveness..

21:35 <icecream95> Or should I just give up and create an MR with the code I have now?

21:35 <icecream95> I do wonder how I could port the improvements to Midgard.. Eight bits is not enough for encoding constraints there

21:36 <alyssa> dschuermann: ^, questions for you, my unilaterally appointed RA maintainer ;-p

21:37 <icecream95> That reminds me that I have two or three blog posts about that which I need to finish writing

21:40 <dschuermann> alyssa: uhm, guess I should read your paper about that constraints-solver RA thing you wrote?

21:40 <dschuermann> :P

21:41 <icecream95> ..Is there even a point to SSA-based RA with all of my optimisations for LCRA?

21:41 <alyssa> dschuermann, SSA-based RA; icecream95, hand optimized LCRA

21:41 <alyssa> go go go go go go! :-p

21:41 * alyssa grabs popcorn

21:41 <dschuermann> if you want to go from 600loc to 3000? sure :D

21:42 <icecream95> I can write a bunch of NEON intrinsics for LCRA if I want to do that

21:42 * jekstrand should figure out why fairly basic texturing stuff doesn't work. :-/

21:43 <jekstrand> alyssa: Is this your new troll maintainership strategy? :D

21:44 <jekstrand> alyssa: Can Mali's f32->f16 conversion flush denorms on command?

21:44 <icecream95> alyssa: What about if I merge my RA optimisations with my scheduler rewrite, so that codegen improves while compile time stays constant?

21:45 <icecream95> (The scheduler rewrite currently brute-forces every possible scheduling for each clause to find the best one)

21:48 <dschuermann> o_O and you tested that on something other than the triangle demo?

21:49 <icecream95> I have not tried it on large shaders yet

21:49 <icecream95> At least it doesn't try every possible scheduling for the whole program..

21:49 <alyssa> technically polynomial time is the best polynomial time

21:50 <alyssa> your algorithm is, errr... O(n^16) ?

21:50 <icecream95> I'm sure with a little NEON magic it will all work out fine...

21:53 * jekstrand wonders how many MRs he's going to post today

21:53 <alyssa> NEON only improves perf by a constant factor

21:53 <HdkR> That's fine, just do some loop unrolling and throw six vector pipelines at it

21:53 <HdkR> :>

21:54 <jekstrand> Run RA on the GPU!

21:54 <alyssa> Also, the probably of my merging NEON into the compiler is small and conditional on my mood

21:54 <alyssa> probability

21:54 <dschuermann> jekstrand: :D

21:55 <icecream95> alyssa: But it could be quite a large constant factor if I can make things so that e.g. scheduling an instruction is as simple as an addition operation

21:55 <dschuermann> alyssa: having a quick look at your paper and the problem you had to solve, I don't think SSA-based RA is a good fit for vector archs at all

21:57 <dschuermann> icecream95: is that postRA or preRA scheduling?

21:57 <alyssa> dschuermann: Yeah... but Bifrost is not vector (unlike Midgard)

21:57 <alyssa> (LCRA was for Midgard and got copy pasted into Bifrost and has somehow survived )

21:58 <icecream95> dschuermann: Scheduling for Bifrost is done after RA

21:59 <dschuermann> well... then there is probably still some room

21:59 <dschuermann> ACO doesn't have a postRA scheduler at all ;)

22:00 <icecream95> Midgard does scheduling before RA, which is partly why the spilling code has so many bugs

22:02 <icecream95> alyssa: After rewriting scheduling, maybe some improvements could be made to RA.. such as not reusing a register too soon after it has been made 'free'

22:02 <dschuermann> that sounds strange... shouldn't it be more tested because more stressed?

22:06 <icecream95> dschuermann: Midgard has some constraints such as IIRC not being able to read from a register used for memory stores, and the spill can't be inserted until the end of the bundle, so if a spilt register is used multiple times inside the bundle, there are problems

22:08 <dschuermann> that sounds awful to deal with :|

22:08 <dschuermann> have to temporarily assign registers for the bundle until it succeeded and only then commit?

22:08 <alyssa> doing RA post-bundling on midgard was another one of those regrets I learned from for Bifrost....

22:10 <icecream95> But given that Valhall doesn't have bundling, how should any scheduling fit in there?

22:11 <dschuermann> you mean, the first time you have a sane architecture, and you don't know what to do now? :D

22:20 <alyssa> :-D

22:35 * jekstrand kicks off another panvk run and calls it a day

22:38 <alyssa> a week even

22:39 <HdkR> panvk run day. The day between Thursday and Friday.

22:44 <jekstrand> HdkR: I've got a full run down to about 10 hours

22:45 <HdkR> That's a heck of an improvement!

22:45 rasterman has quit [Quit: Gettin' stinky!]

22:48 <jekstrand> Time to head off into the forbidden west!

22:50 * icecream95 looks west

22:50 <icecream95> Into the sea?!

23:02 <HdkR> Swimming in the ocean must be illegal ever since COVID

23:02 <HdkR> forbidden

23:03 <alyssa> HdkR: you sure that's not just california

23:04 <HdkR> mmm, Forbidden avocado toast

23:11 <HdkR> Really panvk just needs to get good enough to run the game's prequel that Jason is talking about :>

23:14 <icecream95> HdkR: uh, https://spectrumcomputing.co.uk/entry/6440 ?

23:15 <HdkR> I like the image being, "You cannot go east"

23:17 <HdkR> I'm making the assumption that the forbidden west is in reference to the Horizon game

23:18 <HdkR> Unless your reference was from The Hobbit, which I don't really know :D

23:19 <icecream95> HdkR: That the west is forbidden is not even mentioned until LOTR

23:21 <HdkR> Neat. I've never read the books or watched the movies

23:21 * icecream95 can see several filming locations for the movies from where he sits

23:24 <HdkR> https://twitter.com/FEX_Emu/status/1487597165312495619 This game engine already runs under emulation. Looking forward to when the sequel launches on PC in three years to give it a whirl :D

23:30 <icecream95> Advantages of having a status bar that updates every second: Your computer makes a nice ticking noise when plugged in