<marcan>
hm, since someone mentioned ANE again... I'm wondering *how* that would even be implemented?
<marcan>
my hunch is it should be a DRI driver, and I see there's been drama over this already and people aren't doing that because DRI requires open source userspace
<marcan>
(cc arnd)
<arnd>
what is ANE?
<marcan>
the neural engine thing
<arnd>
ah, right. What do we know about the hardware side, and about how applications access it in MacOS?
<marcan>
I haven't looked at it at all, but AIUI it's an ASC with command submission and apparently a DART, so probably less fancy address translation than the GPU
<marcan>
not sure how it does context switching
<marcan>
but it smells like something GPU-shaped to me
<marcan>
oddly enough though, it isn't described as an RTKit ASC in the ADT, so maybe it isn't?
<arnd>
the main question to me is whether it implements a particular set of high-level matrix operations (GEMM, convolution, ...) that can be abstracted using a kernel interface, or if this isx a fully programmable unit that relies on JIT-compiling your ML model into a custom ISA and run it autonomously
<marcan>
not sure if that's the AP interface but it sounds like it could be
<arnd>
right, this does sound like an ioctl-type interface to send compiled code to the engine and run that, which is indeed similar to what GPUs would do, and also a bit like what we did a long time ago with spufs
<arnd>
in this case, I don't think the question of open source user space is the main issue. As we won't be running MacOS user space on Linux for this, someone has to complete the reverse-engineering anyway in order to create a new compiler
vmcs has quit [Ping timeout: 480 seconds]
<alyssa>
arnd: right, if we have any linux support, it'd be an open user space
<alyssa>
marcan: my 2c is to stick it in drivers/gpu/drm and use the standard DRM/GEM interface just like AGX
<alyssa>
but I'm also biased as hell
<alyssa>
"gpu" stands for "generic programming unit" ;-)
<alyssa>
marcan: but do AGX first :>
<alyssa>
and AVD and AVE
<alyssa>
those coprocs actually matter
<alyssa>
I am genuinely unsure why ANE would matter on Linux outside of some niche spaces
<arnd>
The GPU programming model always feels like a layering violation to me, the same way that using DPDK to send network data does, or using a custom interface to a DSP doing video encoding instead of using v4l2. OTOH I have no idea what a good kernel abstraction for machine learning hardware would actually look like, so it's probably the best we can do here
<TheLink>
emulate the cuda api with ane :B
<alyssa>
arnd: nod.
<sven>
just create a new machine learning subsystem!
<alyssa>
arnd: I would also point out the GPU ioctl interfaces are extremely NIH
<alyssa>
but every mature driver by definition is committed to an existing uAPI so there is a strong incentive not to change it.
<alyssa>
krh did a neat PoC of what a "common" interface could look like but. there's no turning the clock back on anything mainline
<jn>
the other AI accelerator driver that i'm aware of lives in drivers/misc/habanalabs, probably with a very custom interface
<arnd>
TheLink: cuda is a user space interface, not kernel level, and their kernel interface is not an abstraction but hardware specific
<jn>
(userspace interface)
<arnd>
one could do something that looks like cudnn or cublas, and build on top of that
<arnd>
which in turn is what apple's Accelerate framework or Intel's OneAPI do as well, but I have not seen anyone do such an abstraction on the kernel to user boundary
<arnd>
alyssa: on a related note, do you think it would be possible to have an OpenCL based BLAS implementation on top of your gallium driver, and use that for machine learning on the GPU instead of the ANE?
<arnd>
I'm thinking of applications that use cublas on nvidia GPUs today, not their higher level interfaces or the tensor cores
<alyssa>
sure, clover provides OpenCL on top of Gallium drivers
<alyssa>
it's not ready for production yet, but it's closer to it than the AGX stack ;-)
<alyssa>
I had a dream about Apple sending me an M1 MBA and M1X MBP respectively. It wasn't a very interesting dream.
<TheLink>
that's the way it should be ... apple sending you devices and the process being totally uninteresting :)
jbowen has joined #asahi-dev
yuyichao has quit [Ping timeout: 480 seconds]
stzsch has joined #asahi-dev
yuyichao has joined #asahi-dev
<chadmed>
just reading geohot's description of its data handling, it seems very GPU-
<chadmed>
GPU-like*
<chadmed>
the ANE, that is
jkkm_ has joined #asahi-dev
jkkm has quit [Ping timeout: 480 seconds]
jkkm_ is now known as jkkm
phire_ has joined #asahi-dev
aleasto has joined #asahi-dev
aleasto has quit [Quit: Konversation terminated!]
phire_ has quit [Quit: Leaving]
aleasto has joined #asahi-dev
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi-dev
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi-dev
___nick___ has joined #asahi-dev
___nick___ has quit [Ping timeout: 480 seconds]
phire_ has joined #asahi-dev
phire has quit [Quit: Leaving]
phire_ is now known as phire
jacoxon has joined #asahi-dev
jacoxon has quit []
jacoxon has joined #asahi-dev
erincandescent has quit [Remote host closed the connection]