marcan changed the topic of #asahi to: Asahi Linux: porting Linux to Apple Silicon macs | General project discussion | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Topics: #asahi-dev #asahi-re #asahi-gpu #asahi-stream #asahi-offtopic | Keep things on topic | Logs: https://alx.sh/l/asahi
FireFox317 has joined #asahi
yuyichao has joined #asahi
riker77_ has joined #asahi
riker77 has quit [Ping timeout: 480 seconds]
riker77_ is now known as riker77
rohin2 has joined #asahi
bgb_ has quit [Ping timeout: 480 seconds]
bgb_ has joined #asahi
bgb_ has quit [Ping timeout: 480 seconds]
PhilippvK has joined #asahi
bgb_ has joined #asahi
phiologe has quit [Ping timeout: 480 seconds]
<chadmed>
citizen1[m]: its not really quite as simple as that. the graphics portion of the M1 Max is like ~30-40 billion transistors, which is significaantly more than the GA102 die used in the RTX3090 even. Apples advantage with these chips isnt really perf/mm^2 or even raw perf in general, but specifically in perf/W
<chadmed>
consider that this enormous graphics SIP pulls down something like 50W at full tilt, vs a GA102 card (RTX 3090, 3080Ti) which can suck down upwards of 300W. thats the impressive part, not so much the performance by itself. the M1 Max chip is enormous, almost from the nvidia and intel school of thought where you just throw transistors at a problem until it goes away
<chadmed>
additionally, RISC is not magic. AArch64 starting with ARMv8 is just a very very well designed series of RISC ISAs, and at this point theres easily been trillions of dollars of combined R&D effort over the last ~25 years into making ARM the most efficient architecture in the industry.
bgb_ has quit [Ping timeout: 480 seconds]
<chadmed>
as it currently stands, the major advantage these well designed ARMv8 chips have over, say, an AMD64 chip, is that the fixed instruction length makes it very easy to fetch and decode multiple instructions at once. even with modern AMD64 implementations (which are RISC-like at the hardware level) the software-facing ISA makes doing this extremely difficult
maennich_ has joined #asahi
maennich_ is now known as maennich1
steev_ has joined #asahi
steev has quit [Ping timeout: 480 seconds]
steev_ is now known as steev
maennich has quit [Ping timeout: 480 seconds]
maennich1 is now known as maennich
eichin_ has joined #asahi
eichin has quit [Ping timeout: 480 seconds]
bgb_ has joined #asahi
<phire>
I personally don't think ARMv8 counts as RISC. It's just a well-designed modern ISA
<phire>
that happens to have both a load-store archtecture and fixed-width instructions
<chadmed>
phire: well yeah i think the meaning of "RISC" in 2021 has kinda shifted to just "load/store distant descendant of berkley risc" rather than be any indication of how many instructions the set has
<chadmed>
the latest version of the power isa manual is like 3000 pages lmao, they still call it a risc afaik
<chadmed>
i guess if we consider the traditional meaning of the term, the last "real" RISC before RISC-V was probably SPARCv9, for which i have the manual
<phire>
I disagree. I just think people have kept using the terms RISC/CISC out of habbit and applying them to processors that might have started off as RISC/CISC but arent' anymore
<phire>
ignoreing the ISAs, modern x86 cores and modern aarch64 cores look very simular to each-other
<chadmed>
well in that case given the convergence of all designs on load/store and the increasing complexity of all instruction sets, do the terms really mean anything in everyday parlance anymore?
<phire>
they are all very wide, out-of-order designs
<phire>
no, I think the terms RISC/CISC should mostly stay in the 90s where it belongs.
<chadmed>
like im not even sure if ibm z/architecture is a "real" CISC anymore or if they went the uop route like amd and intel
<i509vcb[m]>
No one can agree what number at which an architecture moves from risc to cisc
<i509vcb[m]>
Is the original 8086 ISA with 81 ish instructions CISC while modern armv8 can get to 1300 instructions in a fulled fitted out chips being RISC?
<phire>
much of the RISC vs CISC stuff is a marketing debate that happened in the 90s
bgb_ has quit [Ping timeout: 480 seconds]
<i509vcb[m]>
s/fulled/fully/, s/fitted/kitted/
<chadmed>
well yeah im actually with phire on this one, the terms are utterly meaningless in 2021 and people only really use them because we havent invented new ones to distinguish designs (because there really isnt much to distinguish them anymore)
<phire>
and it's very simple really. Any ISA designed before 1985 is CISC and every ISA designed after 1985 claims to be RISC
<phire>
because RISC was the trendy buzzword and all the CPU designers went "ofcourse my ISA is RISC"
<chadmed>
i think people just use CISC as a pejorative to express their dislike of amd64, but very few people can actually exlpain *why* amd64 is crap
<chadmed>
and to be fair to them, its not an easy thing to explain. the problems with it are multifaceted and extremely complex and so its easier for people on the left hand side of the dunning-kruger graph to just say "oh yeah its cisc so its bad lol"
<phire>
It doesn't help that the term RISC got closely assocated with modern cpu ideas like "Pipelined" and "Superscalar"
<phire>
sure, most of the early pipelined and superscaler cpus were RISC. and that was the point. But mostly because simpling the ISA and getting the instruction count down saved transistors and made pipelined/superscaler designed viable a few years before they would be viable for older ISAs
<phire>
personally, I think the original ARMv1 itself has a dubous claim to being RISC. I think the ISA design is closer to VLIW
<phire>
with the ablitly to pack a conditional, a barrel shift and an arithmetic/logic operation into a single instruction
bgb_ has joined #asahi
<chadmed>
ive tried multiple times to study VLIW and every time it kinda just makes my brain shut off
<phire>
It's like out-of-order, with multiple functional units running each cycle. Excpet instead of the CPU's scheduler working out the dependencies dynamically, the ISA has some way of packing multiple sub-instructions into a single combined instruciton
<chadmed>
yeah thats about as far as my understanding goes lmao, i cant wrap my head around _how_ it does that effectively
<phire>
The designs vary. At the simpler end you have design's like the raspberry pi's QPUs (shader coers). 64 bit wide instrucions (hence Very Long Instruction Word) and there are fixed feilds in the instruction.
<chadmed>
ohh right so it is just fixed fields in the instruction word ok
<phire>
there is a 4 bit flag field (this instruction triggers a immedate load, or this instruction ends), there is are felds for selecting two registers from the big reigster files
<phire>
and felids for controlling the two ALUs
<phire>
At the narrower end, it's basically fixed fields again, except you have multiple instruction formats to select between
riker77 has quit [Quit: Quitting IRC - gone for good...]
riker77 has joined #asahi
<phire>
um, ARM32 is a good example of that
<chadmed>
i think power isa does that for some instructions too
<chadmed>
its been a while since i looked at it
<phire>
the "real" VLIW ISAs tend to not get documented so well
<phire>
they aren't normally public ISAs, beacasue they only really work when paired with a good scheduling compiler
<chadmed>
yeah the last good one was terascale i think, and all radeon cards have hardware scheduling
<phire>
it turns out that hardware scheduling is just unreasonably effective at producing good preformance
<chadmed>
crazy how windows has only just recently received support for addressing hardware schedulers in GPUs, and only through dx12u
<phire>
diffrent scheduler
<chadmed>
right
marvin24 has joined #asahi
marvin24_ has quit [Ping timeout: 480 seconds]
bgb has joined #asahi
Gue___________________________ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
bgb_ has quit [Ping timeout: 480 seconds]
kov has quit [Quit: Coyote finally caught me]
eichin_ has quit []
eichin has joined #asahi
Gue___________________________ has joined #asahi
rohin2 has quit [Ping timeout: 480 seconds]
<sorear>
"no, I think the terms RISC/CISC should mostly stay in the 90s where it belongs." very this
dlss^ has joined #asahi
<chadmed>
i think i may have already mentioned this here a while ago but the progress of hardware design shares many similarities with convergent evolution in biology
<chadmed>
such that there is no longer any meaning to terms like risc and cisc because the canonical way to design a cpu core has converged so much that architectures that we once called risc and cisc now look virtually identical at the hardware level
rohin has quit [Quit: Konversation terminated!]
<phire>
It's not like cpu designs ever diverged that much (along the RISC/CISC split). It was more about the ISA you would design from scratch to go with that ISA
<phire>
The cpu designs with legacy ISAs just got suck with extra hardware to make it work
<phire>
And over time, the size of that extra hardware realtive to the rest of the core has shrunk
<phire>
I think you can argue that the legacy restrictions actually helped x86 core designs in the long term.
<phire>
It encouraged Intel and AMD to go for Out-of-order designs very early compared to most of their RISC competitors
jeffmiw has quit [Ping timeout: 480 seconds]
<chadmed>
yeah intel almost screwed the pooch there with netburst's marianas trench of a pipeline
<phire>
Was still a better design than Itanic that was Itanium
<chadmed>
if itanium came out when it was meant to maybe it would have been competitive. dont get me wrong, im thankful it didnt but a part of me wonders what wouldve happened to the PC had it succeeded as intel and hp planned. would people have held themselves captive to intel and hp and continued to use the platform?
<chadmed>
they hold themselves captive to intel and amd now, but thats only because until recently There [Was] No Alternative(tm)
maor26 has joined #asahi
<phire>
I'm really not sure how serious the plan for itanium on the desktop ever was. It's not like Intel paused or slowed down development on x86
<phire>
The P5, P6 and netburst were all developed in parallel
nobodynada has quit [Quit: leaving]
Hotswap has quit [Ping timeout: 480 seconds]
<marcan>
I think the main problem x86 has is the variable length instructions
<marcan>
that makes parallel decode an exponential problem
bgb_ has joined #asahi
<marcan>
sure, you can µop cache your way around that for tight loops, but it doesn't work so well with general code
<marcan>
but once you get past decode, yeah, all modern CPUs look similar to some extent
bgb has quit [Ping timeout: 480 seconds]
bgb has joined #asahi
bgb_ has quit [Ping timeout: 480 seconds]
<chadmed>
yeah we were saying this before. dunno what sort of deal with satan intel did to get to 5-way decode for alder lake
<phire>
it's actually 6-way
Hotswap has joined #asahi
<chadmed>
thats genuinely impressive wow
<phire>
Gracemont (the little core) is more intersting. Instead of trying to do a big massive 6-way decode, it has two 3-way decoders.
<phire>
then they cache Instruction-length-decode infomation in the L1i cache, and the two decoders can actually operate in an interleaved mode
<phire>
For fresh code after a cache-miss, you only get upto 3 instructions per cycle, but the second time around you get upto 6 instructions per cycle
<phire>
and persumably two complex instructions per cycle?
<phire>
Apparnetly the two decoders can even operate out of order, whatever that means
aleasto has joined #asahi
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi
dlss^ has quit [Ping timeout: 480 seconds]
Gue___________________________ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
bps has joined #asahi
off^ has joined #asahi
riker77 has quit [Quit: Quitting IRC - gone for good...]
riker77 has joined #asahi
X-Scale` has joined #asahi
X-Scale has quit [Ping timeout: 480 seconds]
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi
hspak8 has joined #asahi
hspak has quit [Read error: Connection reset by peer]
hspak8 is now known as hspak
m42uko_ is now known as m42uko
X-Scale has joined #asahi
gabuscus_ has quit [Read error: Connection reset by peer]
X-Scale` has quit [Ping timeout: 480 seconds]
gabuscus has joined #asahi
jbowen has joined #asahi
darkapex1 has quit [Ping timeout: 480 seconds]
nsklaus has joined #asahi
nsklaus_ has joined #asahi
nskl has quit [Ping timeout: 480 seconds]
aleasto has quit [Remote host closed the connection]
nsklaus has quit [Ping timeout: 480 seconds]
chadmed has quit [Quit: Konversation terminated!]
chadmed has joined #asahi
aleasto has joined #asahi
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi
yuyichao has quit [Ping timeout: 480 seconds]
yuyichao has joined #asahi
kenjigashu has joined #asahi
kenjigashu has quit []
kenjigashu has joined #asahi
kenjigashu has quit [Remote host closed the connection]
bps has quit [Remote host closed the connection]
bps has joined #asahi
loop0 has joined #asahi
bps has quit [Remote host closed the connection]
aleasto has quit [Quit: Konversation terminated!]
aleasto has joined #asahi
aleasto has quit [Remote host closed the connection]
aleasto has joined #asahi
jbowen_ has joined #asahi
jbowen has quit [Read error: Connection reset by peer]
Gues__________________________ has joined #asahi
___nick___ has joined #asahi
___nick___ has quit []
___nick___ has joined #asahi
loop0 has quit [Quit: Leaving.]
dff has quit [Quit: WeeChat 3.1]
<citizen1[m]>
<chadmed> "additionally, RISC is not magic...." <- whatever it is, it's a flat single chip that is doing both cpu+gpu work silently with major power saves. Whatever apple is doing everyone else should do unless there is a point where x86 intel chips can or perform them . i imagine what happens if you have a dedicated gpu running m1 like processors
jmr2 has joined #asahi
<jmr2>
Question... I'm trying to add j_ey's version of Corellium's keyboard driver to alyssa's tree. I believe that I have the code right but the DT wrong.
<jmr2>
Driver is loaded, visible in /sys/bus/spi/drivers/applespi, device is visible at /sys/bus/platform/devices/23510c000.spi. No symlinks between those.
<jmr2>
The compatible strings match between the dtsi and spi-apple-mc.c - but that driver's probe function isn't called.
<jmr2>
Only reference to SPI in dmesg is "SPI driver applespi has no spi_device_id for input,applespi-kbd-v1"
<jmr2>
Any suggestions on what I should check next?