marcan changed the topic of #asahi-gpu to: Asahi Linux: porting Linux to Apple Silicon macs | GPU / 3D graphics stack black-box RE and development (NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
gruetzko- is now known as gruetzkopf
<bloom>
actually the dummy while_fcmp isn't needed but the point still stands
<bloom>
for at least 2 iterations you come out ahead with the Apple idiom
PhilippvK has joined #asahi-gpu
* bloom
studies dougall 's notes
phiologe has quit [Ping timeout: 480 seconds]
<dougall>
ooh, conditional execution time - exciting!
<bloom>
heh
_whitelogger has joined #asahi-gpu
aquijoule_ has joined #asahi-gpu
richbridger has quit [Ping timeout: 480 seconds]
manawyrm has joined #asahi-gpu
_whitelogger has joined #asahi-gpu
<yrlf>
bloom: yeah, checking the condition on the tail end of the loop saves you one branch in most instruction sets
<yrlf>
then you can just do a conditional branch to the top instead of doing an unconditional one to the top and having the conditional branch there
<yrlf>
are there any situations where a `while (x) { ... }` loop would be beneficial in asm? I currently don't see a reason to generate one, ever, and I have the feeling I'm overlooking something here
<yrlf>
(then again: I have no GPU experience, that is all transferred knowledge from regular CPU asm stuff)
minimaul is now known as mini
<jix>
yrlf: I think if the loop condition is complex, the loop usually has a low iteration count and the whole loop is executed frequently, in that case not wasting icache on the duplicated condition might make things faster
<jix>
now whether you can reliably know and use that within a compiler? no idea
<bloom>
yrlf: nod
<bloom>
AGX is really, really weird in that its GPU asm is based on structured CF
<bloom>
with do...while as the loop priitives
<yrlf>
oh, that is... very interesting
<yrlf>
on non-structured architectures you can avoid the duplicated condition by just jumping forward into the condition part of the loop (at the cost of an additional branch)
<yrlf>
on something with structured control-flow just jumping into the middle of something won't work I presume