ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
rlittl01 has quit [Read error: Network is unreachable]
anuragrao has quit [Ping timeout: 480 seconds]
anuragrao has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
anuragrao has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
KxCORP58940 has quit [Quit: Bye!]
KxCORP58940 has joined #asahi-gpu
pb17 has quit [Ping timeout: 480 seconds]
anuragrao has joined #asahi-gpu
pb17 has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
ddxtanx has quit [Quit: Konversation terminated!]
ddxtanx has joined #asahi-gpu
jeisom has quit [Quit: Leaving]
ddxtanx_ has joined #asahi-gpu
ddxtanx_ has quit [Remote host closed the connection]
ddxtanx_ has joined #asahi-gpu
ddxtanx has quit [Ping timeout: 480 seconds]
karolherbst_ has joined #asahi-gpu
jn has quit [Remote host closed the connection]
jn has joined #asahi-gpu
karolherbst has quit [Ping timeout: 480 seconds]
pb17 has quit [Ping timeout: 480 seconds]
ddxtanx_ has quit [Remote host closed the connection]
ddxtanx has joined #asahi-gpu
pb17 has joined #asahi-gpu
anuragrao has joined #asahi-gpu
Stary has quit [Quit: ZNC - http://znc.in]
anuragrao has quit [Remote host closed the connection]
anuragrao has joined #asahi-gpu
Stary has joined #asahi-gpu
cylm has quit [Ping timeout: 480 seconds]
anuragrao has quit [Ping timeout: 480 seconds]
chadmed has quit [Quit: Konversation terminated!]
chadmed has joined #asahi-gpu
ddxtanx has quit [Quit: Konversation terminated!]
ddxtanx has joined #asahi-gpu
ddxtanx has quit []
ddxtanx has joined #asahi-gpu
anuragrao has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
pjakobsson has joined #asahi-gpu
john-cabaj has joined #asahi-gpu
karolherbst_ is now known as karolherbst
john-cabaj has quit [Ping timeout: 480 seconds]
cr1901 has quit [Read error: Connection reset by peer]
cr1901 has joined #asahi-gpu
pb17 has quit [Ping timeout: 480 seconds]
pb17 has joined #asahi-gpu
anuragrao has joined #asahi-gpu
yuyichao_ has quit [Ping timeout: 480 seconds]
anuragrao has quit [Ping timeout: 480 seconds]
anuragrao has joined #asahi-gpu
abbe has quit [Remote host closed the connection]
abbe has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
pb17 has quit [Ping timeout: 480 seconds]
chadmed has quit [Quit: Konversation terminated!]
chadmed has joined #asahi-gpu
pb17 has joined #asahi-gpu
anuragrao has joined #asahi-gpu
yuyichao_ has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
anuragrao has joined #asahi-gpu
yuyichao has joined #asahi-gpu
yuyichao_ has quit [Ping timeout: 480 seconds]
yuyichao_ has joined #asahi-gpu
anuragrao has quit [Remote host closed the connection]
anuragrao has joined #asahi-gpu
yuyichao has quit [Ping timeout: 480 seconds]
anuragrao has quit [Ping timeout: 480 seconds]
<karolherbst> "Testing "/" cl_int Verification failed at element 1749980: 0x2ba97a73 / 0xfa8d9df5 = 0xfffffff8, got 0xffffffef" noooo
anuragrao has joined #asahi-gpu
pb17 has quit [Ping timeout: 480 seconds]
pb17 has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
anuragrao has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
ddxtanx_ has joined #asahi-gpu
ddxtanx has quit [Ping timeout: 480 seconds]
<karolherbst> alyssa: okay, your block/blit stuff works for fp32, but not for fp16
<karolherbst> image format that means
<karolherbst> but mhh.. only fails for 1D and 2D
<karolherbst> ahh.. and 1Darray
anuragrao has joined #asahi-gpu
flokli has quit [Ping timeout: 480 seconds]
flokli has joined #asahi-gpu
chadmed has quit [Read error: Connection reset by peer]
chadmed has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
pb17 has quit [Ping timeout: 480 seconds]
anuragrao has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
pb17 has joined #asahi-gpu
<karolherbst> okay.. I think the int bug above is a rusticl one
<karolherbst> something with block/grid sizes
jeisom has joined #asahi-gpu
<karolherbst> oh no.. it's a real bug..
<karolherbst> probably something wrong in nir_lower_idiv
<alyssa> uhoh
<karolherbst> I've asked austriancoder if it also happens on etnaviv...
<karolherbst> that's gonna be pain to debug
pb17 has quit [Ping timeout: 480 seconds]
<karolherbst> at least reverting 65e431e61a3bace7e50c11d699880ae860f76133 doesn't help.. but maybe the other changes might? But I'm not in the mood of doing reverts across the big nir rename...
<alyssa> what test?
<alyssa> and how do I reproduce it?
<karolherbst> in the CL CTS it's integer_ops/test_integer_ops uint_math int_math
<alyssa> uhhh ok
<karolherbst> the piglit one helps, because you won't have to run it for a minute until hitting it :D
<alyssa> .. where is that in piglit?
<karolherbst> needs -DPIGLIT_BUILD_CL_TESTS=ON
<karolherbst> and then just bin/cl-program-tester
<alyssa> ah
<alyssa> Could NOT find PythonNumpy (missing: PythonNumpy_STATUS) (Required is at
<alyssa> excuse me
<karolherbst> how rude
pb17 has joined #asahi-gpu
* alyssa rebuilds piglit
anuragrao has joined #asahi-gpu
* alyssa builds rusticl
<karolherbst> huh.. this blit bug is weird...
john-cabaj has joined #asahi-gpu
* alyssa poking at idiv
<alyssa> karolherbst: ok, this is interesting
<alyssa> the problem is the value of `rcp` at the start
<alyssa> int((1.0 / float(b)) * (4294966784.0))
<alyssa> agx has an off-by-1 compared to my cpu
<alyssa> and that off-by-1 propagates to the totally wrong final rsult
<karolherbst> funky
<alyssa> IDK if this is an AGX backend bug or a nir_lower_idiv bug, strictly
<karolherbst> maybe something with rounding?
<alyssa> plausible
<alyssa> I mean I'm just plumbing through the hardware frcp here
<karolherbst> or just the hw not being precise enough
<alyssa> although Metal's docs claim that 1.0 / x is correctly rounded
<karolherbst> and we can safely assume they emit it directly without any funky lowering?
<alyssa> yes
<alyssa> er wait
<alyssa> hm no that should be ok
<alyssa> yeah..
<alyssa> frcp(0x4C4565C8)
<alyssa> should be 32A60000
<alyssa> but AGX gives 32A60001
<alyssa> oh, jeez, ugh!
<alyssa> karolherbst: frcp with -no-fast-math has lowering indeed. ughhh.
<alyssa> cute.
<karolherbst> :3
<karolherbst> just some scaling like nir_scale_fdiv or something more complex?
<alyssa> more complex, gimme a few mins
mrosorensen1 has joined #asahi-gpu
mrosorensen has quit [Ping timeout: 480 seconds]
<alyssa> this is... subtle
<alyssa> there are two parts to the lowering
<alyssa> one part is a scaling like scale_fdiv, to reduce the domain for large denomintaors
<alyssa> *denominators
<alyssa> the other part though I don't quite understand yet
<alyssa> under a certain condition it'll adjust the output as
<alyssa> rcp := fma(fma(-input, rcp, 1.0), rcp, rcp)
<karolherbst> ohh wait
<karolherbst> I've seen that one
<karolherbst> that's uhhh
<karolherbst> how is that called
<alyssa> ----wait is that a newton raphson step
<karolherbst> yep
<karolherbst> that one
<alyssa> lolz
<alyssa> ok
<alyssa> got it in 2
<alyssa> :P
<alyssa> my aborted math degree came in handy!
<karolherbst> there is a good thing about this.. I think libclc already has the code for the lowering... but...
<karolherbst> somewhere
<karolherbst> maybe not
<alyssa> needs to be in-tree for idiv lowering anyhow
<karolherbst> yeah...
<alyssa> I'm also wondering if there's a way to make lower idiv tolerate the error, for faster overall
<alyssa> maybe not
mrosorensen14 has joined #asahi-gpu
mrosorensen1 has quit [Ping timeout: 480 seconds]
<karolherbst> (which gets real funny real quick when you also need fma lowering)
cylm has joined #asahi-gpu
<alyssa> karolherbst: ook. so what I realized is real funny is that CL CTS is going to hit libclc for frcp
<alyssa> but you need exact frcp! in the backend *anyway* for correct idiv
<alyssa> so that seems.. bad
<alyssa> although i dont actually see where we call libclc frcp
<alyssa> I guess libclc doesn't do divide?
<karolherbst> libclc doesn't have a frcp opcode
<karolherbst> ehh clc
<karolherbst> the spirv env just defines precision for "1.0 / x" explicitly
<alyssa> ah. but does the CL CTS test it.
<alyssa> reciprocal is suspiciously absent from math_bruteforce
<karolherbst> yeah.. I don't think it's tested besides through divide
<alyssa> :=/
<alyssa> I would like math_bruteforce to test the full 2^32 input space because otherwise I have no confidence in writing the lowering correctly
<karolherbst> reference_recip
<karolherbst> wait..
<karolherbst> is that even used
<karolherbst> ohhh
<karolherbst> reciprocal
<karolherbst> //ENTRY(reciprocal, 1.0f, 1.0f, FTZ_OFF, unaryF),
<karolherbst> 🙃
<alyssa> real
<karolherbst> only disabled test besides tgamma
<karolherbst> maybe enable it and see if it works?
<alyssa> was slightly more complicated than that but yeah, added the test and indeed we're failing for large inputs
<alyssa> i can work with this :+1:
<alyssa> though not right now I have a big todo list this week
<karolherbst> yeah, fair
<alyssa> i'll get to it soon
<karolherbst> I'll just be busy figuring out what's wrong with the blit, because it's not the actual copy being all wrong, so it's a bit confusing...
<alyssa> yeehaw
<karolherbst> alyssa: mhhh.. okay, maybe it's the blit.. but what seems to happen is, that the image gets initialized with 0xffff over it's entire size, then an API copy happens at offset srcOffset = 79, destOffset = 121 and then the destination becomes 0x7e00 starting at offset 96.... but 121+ is as expected copied from the source
<alyssa> ok
<alyssa> which test case is that?
<alyssa> I think I know what's happening but I'd like to confirm
<karolherbst> images/clCopyImage/test_cl_copy_images 1D CL_HALF_FLOAT
<karolherbst> you need to skip the first blit
<karolherbst> the second hits the bug
<karolherbst> (the first is a full copy)
<alyssa> or I can just give a patch in a bit
<alyssa> i was busy reading french wikipedia's page on newton's method :P
<karolherbst> :D
<karolherbst> fair
<alyssa> yeah i need a break too many things happening right now
<karolherbst> have a good rest
anuragrao has quit [Ping timeout: 480 seconds]
anuragrao has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
pb17 has quit [Ping timeout: 480 seconds]
anuragrao has joined #asahi-gpu
pb17 has joined #asahi-gpu
chadmed has quit [Quit: Konversation terminated!]
chadmed has joined #asahi-gpu
DarkShadow4444 has joined #asahi-gpu
DarkShadow44 has quit [Ping timeout: 480 seconds]
anuragrao has quit [Ping timeout: 480 seconds]
anuragrao has joined #asahi-gpu
anuragrao has quit [Ping timeout: 480 seconds]
cyrinux has quit []
cyrinux has joined #asahi-gpu
alih has quit []
flokli has quit [Ping timeout: 480 seconds]
john-cabaj has quit [Remote host closed the connection]
john-cabaj has joined #asahi-gpu
flokli has joined #asahi-gpu
alyssa has quit [Quit: alyssa]
john-cabaj has quit [Ping timeout: 480 seconds]
anuragrao has joined #asahi-gpu
pb17 has quit [Ping timeout: 480 seconds]
anuragrao has quit [Ping timeout: 480 seconds]
ddxtanx_ has quit [Quit: Konversation terminated!]
ddxtanx_ has joined #asahi-gpu
pb17 has joined #asahi-gpu
pb17 has quit [Ping timeout: 480 seconds]
pb17 has joined #asahi-gpu