marcan changed the topic of #asahi-gpu to: Asahi Linux: porting Linux to Apple Silicon macs | GPU / 3D graphics stack black-box RE and development (NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
macc24 has quit [Ping timeout: 260 seconds]
artemist has quit [Ping timeout: 260 seconds]
artemist has joined #asahi-gpu
<bloom>
Taking a peak at fsin/fcos
<bloom>
Observation 1: the asm of fcos is identical to fsin, except the first instruction
<bloom>
So the first thing we do in either sin or cos is divide by tau to change from radians to "times around the circle"
<bloom>
and for cos we just push forward by 90 degrees.
<bloom>
Observation 2: The next few instructions are the sequence floor/fsub. We immediately recognize this as the fract(...) function. This is again what we'd expect -- fsin/fcos are periodic, with period 2pi, which is now period 1 after the above change of units.
<bloom>
The last four instructions are where things get mysterious.
<bloom>
There are special one-op instructions dparametrized as 0b1010 and 0b1110... call them F and G
<bloom>
and reading off, we see take that angle 0 <= x < 1 and calculate... G(F(4 * x)) * F(4 * x)
<bloom>
and somehow that equals sin?
<bloom>
The multiplication by four is just another change of units. Now the fraction is angle within a quadrant, and the integer part is the quadrant. This is natural enough given the quadrant (anti)symmetries we have with sin.
<bloom>
All we're doing is a series of reductions to make G and F as simple as possible. Or rather, as small as possible -- they're probably lookup tables.
<bloom>
At this point we have two options:
<bloom>
1. Just call F and G sin_pt_1 and sin_pt_2 and call it a day. Who cares?
<bloom>
2. Using a compute shader test (I believe dougallj has a script for this), run each op in isolation and dump the results. Then plot them and see what happens.
<bloom>
(There's no magic for tan. It's just computing fsin and fcos and dividing them. Just looks funny due to aggressive scheduling.)
<dougall>
yeah, maybe 1 is the way to go for now? (i do have a script that i have technically done something like this with, but it'll be a lot easier once we've made more progress on everything else)
<bloom>
dougall: sure
<bloom>
I was secretly hoping there would be Taylor series involved
<bloom>
The lowering for Bifrost is super cute
<dougall>
but yeah, it's a nice self-contained challenge for anyone interested in numerics if people want to try :)
odmir has quit [Remote host closed the connection]
odmir has joined #asahi-gpu
odmir has quit [Ping timeout: 246 seconds]
<bloom>
patches incoming
<bloom>
Patches sent
odmir has joined #asahi-gpu
<dougall>
looks great!
<bloom>
thanks!
<bloom>
no clue on the extended fields, although IIRC you have a clever way to determine those that I don't :p
<bloom>
also, I still can't get over the fact the shaders are keyed to formats
<bloom>
I am not ok with this
<dougall>
what does 'keyed to formats' mean?
<bloom>
The code of the vertex shader depends on the format of the vertex attributes,
<bloom>
likewise the code of the fragment shader depends on the format of the framebuffer
<bloom>
That means you *can't* compile shaders up front,
<bloom>
rather you have to wait until the app is actually drawing things and potentially have to recompile many times with different shader "keys" - combinations of "leaky" state
<bloom>
("shader variants")
<bloom>
a number of architectures are specificalluy designed to eliminate all shader variants in e.g. core OpenGL ES
<bloom>
Apple has gone the other end... why? well, because they can - Metal requires all this info at pipeline create time anyway, so why not use it? /s
<dougall>
ah, that makes sense, thanks
odmir has quit [Ping timeout: 240 seconds]
<chrisf>
only trouble for GL. we know this upfront for vulkan too
<bloom>
chrisf: still means a lot more recompiles if the app swaps things in the pipeline
<chrisf>
may be feasible to patch it?
<chrisf>
is just the shader epilog that's affected?
<bloom>
tbd
<bloom>
I've just compiled my first shader (from GLSL down to AGX machine code) with Mesa :)
<dougall>
:o
<bloom>
it's gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0), don't get too excited :p
<bloom>
doing all that dev on linux, let's see if it works on macos too :-p
<bloom>
It does not work. Down the debug hole.
<bloom>
Glaring problems include missing a stop instruction :-p
phiologe has quit [Ping timeout: 250 seconds]
phiologe has joined #asahi-gpu
<bloom>
Also, while obviously I can't enforce this, I kindly ask y'all not to make a splash about that before I can get around to blogging ;-P
<bloom>
ok, stop/trap implemented
<bloom>
ok, now it doesn't fault but there are crazy colours, guessing I screwd up the register alloc
<bloom>
ofc I don't know the regalloc field for FS...
<bloom>
Yep, there it is.
<bloom>
First shader compiled from GLSL with mesa running successfully on the hardware
<bloom>
not bad seeing as I started writing the compiler yesterday :p
mxw39 has joined #asahi-gpu
<dougall>
haha awesome!
<bloom>
ld_vary is up next at which point I can start writing real shaders and implementing the heaps of ALU needed for anything interesting
<bloom>
thanks :)
<bloom>
have ld_var emitted but some cmdbuf work remains to use it in the demo..