#linux-sunxi on 2022-01-03 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #linux-sunxi to: Allwinner/sunxi development - Did you try looking at our wiki? https://linux-sunxi.org - Don't ask to ask. Just ask and wait for an answer! - This channel is logged at https://oftc.irclog.whitequark.org/linux-sunxi

00:19 bauen1 has quit []

00:49 ZenWalker has quit [Read error: Connection reset by peer]

01:54 apritzel has quit [Ping timeout: 480 seconds]

03:22 cnxsoft has joined #linux-sunxi

04:48 sunshavi has quit [Remote host closed the connection]

05:28 gamiee2 has joined #linux-sunxi

05:57 JohnDoe_71Rus has joined #linux-sunxi

08:02 apritzel has joined #linux-sunxi

08:29 apritzel has quit [Ping timeout: 480 seconds]

09:04 <montjoie> does anyone know where can I find full datasheet of EMAC ? I tried A10 and A20 and I see nothing.

09:24 pcBob has joined #linux-sunxi

09:30 gsz has joined #linux-sunxi

10:38 apritzel has joined #linux-sunxi

10:45 cnxsoft has quit [Ping timeout: 480 seconds]

10:51 <apritzel> montjoie: does the R40 manual help? AFAICS this describes both, and the "EMAC" in there looks like the EMAC from the A20

11:01 <apritzel> aggi: I smeared up the decompressor's head.S with debug instructions, and got one step further: I now see: "Uncompressing Linux..."

11:02 <apritzel> this was not showing up before because U-Boot's SCTLR_EL1 setup is not 100% compatible with AArch32's SCTLR version

11:12 sunshavi has joined #linux-sunxi

11:20 <montjoie> apritzel: thanks, it has everything

11:22 <apritzel> is there is any *board* without an A10 using that IP? I see that the A20 and R40 have the EMAC as well, but does any of the boards with those SoCs allow to use the EMAC (vs. the GMAC)?

11:23 <montjoie> according to dtb, very few boards use sun4i-emac

12:14 <NekoMay> gamiee: Last I checked, Cubieboard was using Actions Semiconductor SoCs now

12:15 <NekoMay> ARM ones, not the MIPS ones they were (in)famous for

12:15 <NekoMay> Lots of those Actions MIPS SoCs in cheap emulator handhelds

12:16 <NekoMay> Same place the F1Cx00s is super popular

12:16 <NekoMay> Though Miyoo has moved on to SigmaStar now

12:17 <NekoMay> Oh, and Mediatek

12:17 <NekoMay> Though Mediatek is for their new fancy offering

12:17 <NekoMay> SigmaStar is for their low-cost/tiny offering

12:18 <NekoMay> You mentioned Rockchip invasion, well, they managed to muscle into there, too; lots of Quad A7 and Quad A35 handhelds these days

12:39 cnxsoft has joined #linux-sunxi

13:12 gamiee2 has quit [Ping timeout: 480 seconds]

13:19 <libv> is actsem even adhering to the gpl?

13:19 <libv> or is it like realtek?

13:21 <libv> last i checked, no communities exist for those

13:21 <libv> exynos also was dead in the water

13:21 <libv> amlogic is probably not too much better

13:27 <plaes> there's ton of upstreaming happening for amlogic

13:27 gsz has quit [Read error: Connection reset by peer]

13:28 gsz has joined #linux-sunxi

13:44 <apritzel> isn't BayLibre taking care of Amlogic?

13:50 <montjoie> apritzel: yes

13:58 <libv> oh, ok

14:13 <pcBob> Is there a repository with USB working for F1C100s?

14:15 <apritzel> pcBob: can't you just download the mbox from patchwork, and then "git am" it?

14:16 <pcBob> How do I know what kernel version this applies to?

14:16 <apritzel> latest, I guess?

14:17 <apritzel> but I guess you could apply it to anything fairly recent

14:19 <pcBob> It fails for a modified 5.11 kernel which added some other hardware. I took a look into the current linux master branch and the dis-file for F1C100s didn't even include SPI. So, even though I might be able to apply it I would throw away SPI etc.

14:20 <pcBob> *dts file

14:20 <pcBob> To me it looks like bits and pieces everywhere

14:23 <apritzel> pcBob: yes, but the patches are fairly easy, actually

14:23 <apritzel> not sure why you play around with 5.11, though

14:23 <apritzel> that's not even LTS

14:24 <apritzel> you should be able to apply the patches manually

14:24 <pcBob> the modified kernel is 5.11 (it is not from me, I am just using it) and as far as I can tell it has a lot more hardware support than linux main

14:25 <pcBob> looks to me like a merge conflict

14:25 <apritzel> a lot more unreviewed and hackish code, you mean?

14:25 <pcBob> Maybe, I don't know. But it seems like to changes have been happening to F1C100s in mainline since 3 years or so

14:26 <pcBob> and no recent changes are supported officially

14:27 <apritzel> well, if nobody submits, reviews and tests, then not much will happen

14:28 <apritzel> if you look at the patches, the changes are fairly self contained, basically adding lines or structs to existing data

14:28 <apritzel> you should be able to find the respective places in your kernel

14:29 <apritzel> and then just hope that no significant changes happened to the MUSB or PHY driver in the last year

14:29 <apritzel> or you go real and just use 5.15 or 5.16-rc8

14:31 <pcBob> sound like pain to me

14:32 <apritzel> pcBob: what? using a newer kernel or manually applying the patches?

14:39 <pcBob> manually applying the patches of USB and then also the changes the 5.11 custom kernel did

14:41 <apritzel> well, you are now seeing one of the problems of those non-mainline trees: adding stuff or applying patches becomes more and more painful

14:42 <pcBob> didn't Icenowy Zheng did a lot for F1C100s? I don't see these changes being mainlined

14:43 <apritzel> pcBob: those are mostly her patches, just sent by someone else

14:43 <pcBob> there is a 4.14 kernel with USB but it fails building with my buildroot

14:49 <apritzel> so the whole series applied cleanly on top of v5.11.22 for me

14:50 <apritzel> just downloaded the file given to me by the "series" link in the top right corner of patchwork, and "git am"'ed it

14:51 <apritzel> chances are the actual USB/PHY driver changes apply cleanly on your kernel as well, it's probably just the DT nodes that clash?

14:56 <apritzel> pcBob: for the DT changes: you can literally just copy out the diff from patch 4/5, and add it to the end of your version of suniv-f1c100s.dtsi

14:57 <apritzel> same with patch 5/5

15:00 <pcBob> yes the dts files are clashing. Sorry to ask this but is this enough to get USB devices like mass storage or WLAN dongles working?

15:01 <pcBob> I see some USB drivers being loaded but neither my USB stick nor my WLAN dongle are recognized

15:03 <apritzel> do you have the MUSB and the PHY drivers compiled in or loaded?

15:05 <apritzel> there is only the OTG controller on this SoC, AFAICS

15:08 cnxsoft has quit []

15:33 <aggi> apritzel: i can test the changes with a kernel-version/config i am certain is functional with my pine64; if the loader jumps to "decompressing linux" this would help me alot

15:33 <apritzel> aggi: actually it just booted for me, after I removed all the debug cruft from head.S

15:33 rajkosto has joined #linux-sunxi

15:34 <apritzel> with one tiny U-Boot patch to setup SCTLR_EL1 differently for AArch32

15:35 <apritzel> no kernel changes, just enabling COMPILE_TEST and selecting the pinctrl and clock drivers

15:36 <apritzel> (minus SMP, as expected)

15:39 <aggi> i see, PSCI/SMP aren't available then?

15:40 <apritzel> no, and I think that's a hard problem, because the spec is nasty in this respect

15:41 <aggi> can't find my way through u-boot sources either, to see aht needs changes for SCTLR_EL1 with aarch32

15:42 <aggi> *what

15:42 <aggi> apritzel: nonetheless, i would like to test your diff if it is available.

15:46 <apritzel> aggi: https://gist.github.com/apritzel/ba519c7a58e064b926d161f3afd1eaab

15:48 rajkosto has quit [Read error: Connection reset by peer]

15:49 <apritzel> aggi: but please: this is just a hack, because I was curious: you should seriously fix your build environment and bring it into the 21st century

15:50 <apritzel> gcc 4.7.4 is beyond evil, even my old Slackware is on 5.5.0

15:50 <apritzel> but for cross-compiling the version of your native compiler shouldn't matter much anyway

15:52 <aggi> apritzel: i was thinking to downgrade gcc even further, because gcc-4.7 contains C11 features already; and i am aiming to replace both gcc and clang with another C99 compiler

15:53 <apritzel> why?

15:53 <aggi> otherwise, i already succeeded compiling kernel v5.10 with gcc-4.7 and an entire userspace required to proceed with gentoo

15:53 <apritzel> I mean if you don't want newer standards, just say so on the command line

15:54 <aggi> apritzel: why? 1) for the reason alone C90/99 compliance itself and 2) desired compilers are known to be up to 10x faster which is practical, and 3) GCC any later than 4.7 are written in c++

15:54 <apritzel> or bake your your own compiler, and set the default standard to 1969, if you like

15:55 <aggi> suprisingly, even some basic interpreters from 1970s and legacy 8/16bit systems had some appeal

15:56 <aggi> apritzel: i see a constant is loaded into a "model specific register", what does this constant encode?

15:56 <apritzel> this is way too much retro for my taste, I consider even kernels from last year "old"

15:56 <apritzel> or from 2020, I guess

15:57 <apritzel> and there was no glorious past, just bad keyboards and next-to-useless computers ;-)

15:57 <aggi> i am not opposing any kernel or compiler upgrade, i reject the fact a c++ compiler (meaning a compiler written in c++ itself) is required since recently, to compile GNU/Linux

15:58 <apritzel> honestly I don't see how the implementation language is relevant, and C++ is not really niche

15:58 <aggi> if this is too much off-topic here then simply say, otherwise i explain

15:59 <apritzel> I don't like C++ much, but mostly because my time is limited, and I stopped caring about GCC compilation requiring g++ decades ago

15:59 <aggi> i am not rejecting c++; i reject c++ in critical base system components (kernel, compiler itself)

15:59 <aggi> furthermore, g++ is *slow*, and kicked my pine64 board into OOM regularly, at least gcc didn't

15:59 <apritzel> well, then you have to bite the bullet and design your own eco system

16:00 <apritzel> but this won't be a walk in the park

16:00 <aggi> what cought my attention was the claim legacy c90/c99 compilers are 10x faster with compilation times,and this alone would be a *huge* benefit

16:00 <apritzel> claim, you said it

16:01 <aggi> i know, it's alot of dirty hackjobs required, to revive gcc-4.7.4 with kernel v5.10LTS required some time for testing etc.

16:01 <apritzel> I still don't understand how this is relevant: the kernel isn't written in ANSI C, you need a GCC or enough compatible compiler to build it

16:02 <apritzel> you can't just expect to take a random C compiler and compile the kernel

16:02 <aggi> sure, officially kernel v5.10 required gcc-4.9 (the first one written in c++ and supporting aarch64 btw.); and with only some minor changes in headers files kernel v5.10LTS compiled nicely with gcc-4.7

16:03 <aggi> furthermore, C90/99 compliance isn't just any "random compiler", it's a standard

16:03 <apritzel> yes, but the kernel is written in "kernel C", not standard C

16:04 <aggi> linux kernel was rather conservative with compiler requirements (gcc-3.2 for a very long time by today's standards): up until kernel v4.19LTS/v5.10LTS and recently again v5.15 bumped compiler version

16:04 <apritzel> and for very good reasons

16:05 <apritzel> there are so many many things that require recent compilers, alone "asm goto" for instance is a killer feature

16:05 <aggi> i'm no asm hacker tbh.; and even if i would practice asm hacking on Z80 era systems, certainly not ARM

16:06 <aggi> anyway

16:06 <apritzel> I wouldn't recommend that, ARM is considered one of the best assembly languages / architectures to be programmed in assembly

16:06 <aggi> apritzel: what does this constant encode 0x00c00878 ?

16:07 <apritzel> Z80 is just some crufty stuff because technology was limited back then ;-)

16:07 <apritzel> that's some version of the ARMv7 reset value to SCTLR

16:08 <apritzel> I need to pick some brains in the office later this week on the exact details

16:11 <apritzel> aggi: the current U-Boot code is just setting up the AArch64 version of SCTLR_EL1, which has some subtle differences to the AArch32 version

16:11 <apritzel> and the Linux decompressor code does a read-modify-write of that register, so any nasty bits already set stick

16:13 <aggi> apritzel: currently compiling some things again (reminder why i need a fast compiler on my pine64 boards), i'll notify about any progress

16:14 <apritzel> aggi: you just need a faster board, really ;-) You could try an RPI4, or even QEMU on some decent PC

16:14 <aggi> lots of other bugs and quirks, with the compiler downgrade; otherwise, i can confirm aarch32 userspace binaries can be executed on aarch64 kernel with CONFIG_COMPAT

16:14 <apritzel> aggi: and you brought those problems on yourself, by choosing Gentoo: now you pay the price ...

16:15 <aggi> the pin64 board i got is among the fastest SBC ever built (that's none of my complaints)

16:15 <aggi> it's not a gentoo-specific issue either

16:15 <apritzel> aggi: in which world are you living in? A quad A53 is probably one of the most annoying build platforms

16:16 <apritzel> my ARM64 desktop at work is doing a recent kernel in around 1min40sec

16:16 <aggi> cannot complain about A53 and A72; too i fiddled together various helper scripts and configured distcc with gentoo tooling

16:17 <aggi> the limitation mainly was available RAM and g++ hitting OOMs regularly (another reason i downgrade to gcc-4.7 because no c++ required to bootstrap this)

16:18 <apritzel> so why not just get an RPi4 or some RK3399 board with 4GB or more of DRAM?

16:18 <apritzel> or even use QEMU ;-)

16:19 <aggi> RPI contain some nasty firmware on the videocore which i reject, i got a pine64 board with rk3399 which is fast (that's not the issue and even with 4GiB i encountered OOMs with g++)

16:19 <aggi> and i react allergic to "virtualization", i try to avoid it whenever possible

16:21 <apritzel> I figured OOMs during compilation are mitigated by reducing parallelism (less -j), or by using a real machine

16:23 <aggi> this is the plan here: i try to keep the bases system clean, nothing written in or depending on c++; and with gentoo i can emerge anything else into a chroot with a cross-compiler elsewhere, if desired

16:24 <aggi> ideally, the base-system contains everything required to work and hack, and anything else is dumped elsewhere flatpak style, i'll not care about anymore

16:24 <aggi> if the system compiler, such as tinycc or pcc is 10x faster even, any ARM SBC can outcompete almost anything else, with a power budget of 5W instead of 500W

16:25 <aggi> didn't expect C11 poisoned kernel already, and who knows what else

16:25 <apritzel> wow, so you reject recent GCCs and virtualisation, but embrace flatpak. That's ... interesting

16:25 <aggi> i said flatpak style, not flatpak

16:26 <apritzel> still, this whole concept raises eyebrows here

16:26 <aggi> gentoo can emerge into any desired system root (with cross compilers even), which is what i'll do to keep my system fast and clean

16:27 <aggi> and, i got a kernel v5.10 and recent userspace compiled already, with gcc-4.7; lots of cleanup todo still, before i'll downgrade further

16:30 <apritzel> sure, do whatever you want, I just figure that there are enough real problems to solve, no need to invent new ones ;-)

16:32 <jernej> speaking of speedy kernel compilation, did anyone noticed 2000+ patches for big reduction of compile time?

16:32 <jernej> I wonder how this will be reviewed :)

16:32 <apritzel> yeah, read about it, sounds quite scary

16:33 <apritzel> [PATCH 0000/2297]

16:34 <aggi> i can post the patchset here to compile kernel v5.10LTS with a legacy compiler; then see for yourself who invented problems, with C11 _Generic for example

16:40 <aggi> this is what GNU/Linux did: they dropped C90/C99 compliant compiler support (gcc-3.2 until kernel v4.18), and instead favored llvm/clang (who introduced _Generic for example, and who knows what else)

16:45 <aggi> this one compiles with both gcc-4.7(aarch32) and gcc-8.5(aarch64,aarch32), and boots with aarch64 already (couldn't test aarch32 yet because lack of hardware): https://dpaste.com/DB75YSA57

16:46 ftg has quit [Ping timeout: 480 seconds]

16:47 chewitt has joined #linux-sunxi

17:43 macromorgan has joined #linux-sunxi

18:05 gamiee2 has joined #linux-sunxi

18:47 <juri_> aggi: fwiw, the pi's firmware problem is a non-problem, if you don't want HDMI. in practice, the VC4 is more capable than the CPU, it's just that everyone likes their blobs...

18:47 <juri_> Free Software raspberry pi configs are a thing.

18:53 gamiee2 has quit [Remote host closed the connection]

19:01 gsz has quit [Ping timeout: 480 seconds]

19:02 JohnDoe_71Rus has quit []

19:09 gsz has joined #linux-sunxi

19:10 <libv> juri_: can the rpi now boot without the RTOS?

19:11 <juri_> yep.

19:11 <libv> url?

19:11 <juri_> even better, you can sub in your own, and use it to harness the spare CPU cycles.

19:11 <juri_> https://github.com/librerpi/rpi-open-firmware

19:12 <libv> oh, this is news to me

19:12 <libv> was this at all supported by either the rpi foundation or broadcom?

19:12 <libv> or was this purely community?

19:12 <juri_> absolutely not. more of anti-supported.

19:13 <libv> (i am the guy who pointed out this binary mess back in 2012 btw, when they announced their "open source" driver)

19:13 <juri_> it's purely community, over in #raspberrypi-internals on libera

19:13 vagrantc has joined #linux-sunxi

19:13 <libv> right

19:13 <libv> so there is a good reason to still dislike the rpi

19:13 <libv> but good job

19:14 <juri_> tell him. i just cheerlead. :)

19:14 <juri_> (the main dev nowadays is 'clever'.)

19:15 <libv> the thing is, such work is never rewarded

19:15 <juri_> he does some cool stuff with it, and will talk your ear off if you let him. :)

19:15 <libv> he will be blackballed most places as well

19:15 * juri_ nods.

19:15 <juri_> he's happy where he is.. and i'd hire him. :P

19:17 <libv> it's a bit late for that now, but that guy should definitely talk at fosdem

19:17 <juri_> I think he's in the wrong timezone.

19:18 <juri_> i think he's over at kowainik.. ?

19:19 <libv> clever: well done :)

19:20 <juri_> oh. he's here. :P

19:20 <libv> build in some printing which states just how averse rpi foundation and broadcom are to this work

19:20 <libv> it's not as if they will ever support you in their or your lifetime

19:21 rajkosto has joined #linux-sunxi

19:27 <clever> libv: https://invidious.snopyta.org/watch?v=BQyyVtmmVg8

19:27 tuxd3v has quit [Read error: Connection reset by peer]

19:27 <clever> that would be nixos (a linux distro) booting on an rpi, with the open firmware on the VPU

19:27 tuxd3v has joined #linux-sunxi

19:29 chewitt has quit [Quit: Zzz..]

19:30 tuxd3v_ has joined #linux-sunxi

19:31 <libv> so the animation is the gpu and an overlay/sprite being moved around, in the background

19:31 tuxd3v has quit [Read error: Connection reset by peer]

19:31 <clever> yep

19:31 <clever> the 2d subsystem is basically a sprite-only gpu

19:31 <clever> if you want anything visible, it must be a sprite

19:31 <clever> the text console on the top is a stationary sprite, with the VPU debug logs

19:32 <clever> the bottom text console is mapped to /dev/fb0, and linux does text+gfx on it

19:32 <libv> so this is not a display engine thing

19:32 <libv> it's a blit

19:32 <clever> you configure a list of sprites in a special MMIO region, and the hardware will automatically fetch image data and alpha-blend them together

19:32 <clever> generating a stream of pixels in raster order

19:33 <clever> but that stream is very bursty, so it goes thru a FIFO to regulate it down to the pixel clock, and that then feeds the output PHY

19:33 <clever> https://invidious.snopyta.org/watch?v=suswjbpR1HU

19:33 <libv> so closer to an advanced display engine thing than a classical 2d engine/blitter

19:33 <clever> it also does scaling on the fly

19:33 <clever> and it supports planar and yuv formats

19:34 <clever> so you can just throw a yuv420 sprite at the 2d subsystem, and it will render, with zero cpu cost

19:34 <libv> right, like with what allwinner calls the mixer for their DE engine

19:34 <clever> the display list is 4096 x 32bit long on the VC4 lineup (pi0-pi3)

19:35 <libv> 4k entries?

19:35 <clever> a non-planar sprite at 1:1 scale takes up 7 x 32bit slots

19:35 <clever> so while you have 4096 slots, you need 7 slots for each sprite, giving a limit of 585 initially

19:35 <clever> but the hardware malfunctions if you modify an active display list

19:35 <libv> ok

19:35 <clever> so you need 2 lists, and do page-flips between them, so halve that

19:36 <libv> still insane

19:36 <clever> which reduces you down to 292 sprites

19:36 <clever> but, you then have memory bandwidth issues to deal with

19:36 <libv> i thought allwinners DE was insane with (1)+4+32 per pipe

19:36 <clever> https://invidious.snopyta.org/watch?v=u7DzPvkzEGA

19:36 <clever> in this demo, i have 20 sprites active

19:36 <clever> but if too many sprites are on the same scanline, the rasterization falls behind, and the FIFO runs dry

19:37 <libv> clever: btw: drm planes only allows 32 planes

19:37 <clever> causing the glitching you can see

19:37 <clever> the smaller the sprites, the more you can have

19:37 <clever> avoiding overlap and alpha helps a lot

19:37 <libv> jesse barnes was told to solve the power issue of wayland vs hwcomposer, and he threw planes together as a result, on intel hw that had just lost almost all of its overlays

19:38 <libv> so he chose a 32bit mask for the plane identifier.

19:38 <clever> its basically just a hw accelerated memcpy, with pixel format conversion and alpha-blending on the dest

19:38 <clever> so if you have 2 sprites on a scanline, it has to copy 2 ranges

19:38 <libv> clever: does it go to memory, or does it scan out?

19:38 <clever> and if you have overlap, it will waste cycles drawing something you cant see

19:39 <clever> there is a dedicated chunk of fifo ram

19:39 <libv> if it goes to memory: 2d engine/blitter

19:39 <clever> and for each video output, you define the start/end range in that ram, to create a fifo

19:39 <libv> if it scans out, then it's part of the display engine

19:39 <clever> the scan-out then reads from that fifo

19:39 <clever> the hardware supports up to 3 video outputs at once, each with its own private fifo

19:39 <clever> but there is only 1 composition engine

19:40 <clever> which gets time-shared over the 3 displays

19:40 <libv> explain time-shared

19:40 <clever> it will just round-ribon between generating scanlines for each display

19:40 <libv> from the same list

19:40 <clever> different lists

19:41 <libv> right, but one engine, so bandwidth goes x3

19:41 <libv> ok

19:41 <clever> yep

19:42 <clever> also, if you dont cover a region of the screen with a sprite, it just doesnt write to that pixel in the fifo

19:42 <clever> so you get whatever was in that fifo slot on the last pass

19:42 <clever> and depending on the ratio between fifo length and scanline length, it gives different repeating patterns

19:42 <libv> but then, if the display pipelines are a different frequencies or offset, then the blanking space of one or more displays is not a deadzone

19:43 <clever> yeah, you need some headroom for composition to not stall out

19:43 <clever> the FIFO length can hide short stalls

19:43 <clever> but a lot of these details are in docs broadcom wont share

19:43 <libv> of course

19:44 <clever> so i'm having to mis-configure the hw, and then guess from how it malfunctions

19:44 <clever> there is an optional background fill you can set

19:44 <libv> but don't be surprised if $vendor has not fully tested the limits of such bandwidth issues

19:44 <clever> but that also costs bandwidth

19:44 <clever> and there is also a dedicated transposer for offline composition

19:44 <libv> not on the fetching side, and not with scaling

19:45 <libv> ok, so that then blits to memory

19:45 <clever> the transposer can do a 90 degree rotation, but consumes 1 video output channel

19:45 <clever> yeah, it writes the rotated image back to ram

19:45 <clever> and the rotation step is optional

19:45 <libv> are the pipelines freely switchable?

19:45 <clever> so you can just compose 400 sprites together, and then have a 2nd channel just render the result

19:45 <libv> between display "outputs"

19:46 <clever> entirely bypassing the bandwidth problems

19:46 <clever> for the VC4 lineup, each channel is hard-wired to 2 potential video oututs

19:46 <clever> and only 1 of those outputs can be on at once

19:46 <clever> https://github.com/librerpi/rpi-open-firmware/blob/master/docs/pixelvalve.txt#L18-L24

19:46 <libv> heh

19:46 <clever> PV0 can only drive dsi0 or dpi

19:47 <clever> PV1 can only drive DSI1 or SMI

19:47 <clever> PV2 can only drive HDMI or VEC (composite, ntsc, pal)

19:47 <libv> yeah

19:47 <clever> pick one from each

19:47 <libv> so rpi basically does not do dual monitor?

19:47 <clever> but on the bcm2711, there is a mux (lines 29-33), that basically lets you remap things freely

19:47 <clever> the rpi can do dual and even tripple monitor

19:48 <clever> DPI + DSI1 + HDMI is possible under linux with the closed firmware

19:48 <clever> ive gotten DPI + NTSC working on the open firmware

19:48 <clever> DSI and HDMI init is something i still need to figure out

19:49 <libv> oh, each channel can toggle between 2 potential video outputs

19:49 <libv> right

19:49 <clever> yeah

19:49 <clever> the pixel valve defines all of the video timing parameters

19:49 <clever> https://github.com/librerpi/rpi-open-firmware/blob/master/docs/pixelvalve.md#pv-horza

19:50 <clever> hsync, h backporch, h frontporch, hactive, vsync, vfront, vback, vactive, and an optional duplicate v set for the odd field

19:50 <clever> when interlacing is active

19:50 <libv> i have not followed the drm driver of videocore

19:50 <clever> each PV also has a mux on it, to select the pixel clock

19:50 <libv> does that talk to the blob, or is that native code?

19:51 <clever> the fkms drivers in linux talk directly to the hardware, but rely on the blob to enable power

19:51 <clever> oops, no backwards!

19:51 <clever> fkms is the firmware kms, it talks exclusively to the blob

19:51 <clever> the kms (no f) drivers talk directly to the hw

19:52 <clever> ive created most of these demos and docs by reading the kms source in linux

19:52 <libv> but they probably depend on some setup having been done by the firmware

19:52 <clever> yeah

19:52 <clever> ive not been able to get the kms drivers to run blob-free yet, but have discovered what may be the key to unlocking that

19:53 <libv> clever: https://libv.livejournal.com/19432.html

19:53 <clever> that linux demo i linked above, is just using the dumb simple-framebuffer api, and having my custom firmware configure it as a sprite

19:53 <libv> right

19:53 <clever> let me get an example of the 2d api i designed from scratch

19:53 <clever> https://github.com/librerpi/lk-overlay/blob/master/app/grid/grid.c

19:53 <clever> that is the code behind https://invidious.snopyta.org/watch?v=suswjbpR1HU

19:54 <clever> line 15 creates a bitmap image for the grid, 18-30 draws the grid on that image

19:54 <clever> 31-35 creates a sprite with that image, and sets the xywh

19:54 <clever> 38 makes the sprite visible, and 39 updates the screen on the next vsync

19:55 <clever> 53 will wait for vsync, 55-72 changes the w/h, and 75 updates the display list

19:55 <clever> and 85 runs grid_entry() as a new thread on bootup

19:55 <clever> you could then use that code to create a sprite based game

19:57 <libv> right, with this many sprites that probably could speed up many an emulator

19:57 <clever> the PV can also do interrupts on vsync, hsync, and even the porches

19:58 <libv> clever: is the pv implemented in any kernel code already upstream or soon to be reaching upstream at this point?

19:58 <clever> its already in mainline linux i believe

19:58 <libv> as anything clever with display or with 2d engines is something that is mostly ignored

19:58 <libv> ok

19:58 <clever> https://github.com/raspberrypi/linux/blob/rpi-5.10.y/drivers/gpu/drm/vc4/vc4_crtc.c

19:58 <clever> thats the file youll find it in

19:59 <libv> ok

20:00 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/v3d/v3d.c

20:00 <clever> i also the original 3d demos from the pi1 days (it ran via /dev/mem) and ported them over

20:01 <clever> that spinning triangle with 3 colors, was done using shaders on the 3d core

20:01 <libv> no planes implemented it seems

20:01 <libv> just the base

20:01 <clever> ?

20:03 <libv> ah, they are added separately

20:03 <clever> oh, and the 3d core is capable of emiting images with alpha present

20:03 <clever> so that spinning triangle, is on a transparent background

20:04 <clever> which lets the 2d core compose it together with other sprites

20:04 <libv> https://github.com/raspberrypi/linux/blob/1d6957db7f835469ceab856ef255bdbdf16ddb5e/drivers/gpu/drm/vc4/vc4_plane.c#L1506

20:04 <libv> 16 planes :)

20:04 <libv> uint32_t for all the planes per kms instance

20:04 <clever> i believe that 16k is for bcm2711, but i'm not entirely sure

20:05 <libv> 16, not 16k

20:05 <clever> line 1499

20:06 <libv> a uint32_t is a mask for identifying planes, each plane owning 1 bit of that

20:06 <clever> ahhh

20:06 <clever> that explains why linux has a limit of 32

20:06 <libv> so the kms plane infrastructure allows for max 32 planes

20:06 <clever> in LK, each display just has a linked list of visible sprites

20:06 <libv> again, 2011, jesse, who could not give a shit about display drivers, probably got told to fix the power discrepancy with android

20:06 <clever> and the code will just iterate over the list, and convert it into a hw display list

20:07 <libv> and intel had just dumped most of the overlays from its hw

20:07 <clever> yeah, ive seen that people just use opengl for this now

20:08 <libv> the second i held my lima talk in 2012, codethink rented me out to my former nokia manager, who had then joined intel (i was not allowed to join for the usual political reasons)

20:08 <libv> i got tasked with writing hwcomposer on top of kms planes before ville worked on atomic

20:09 <libv> and i actually had to get a netbook as that had an intel display engine with more than 1 extra overlay

20:09 <libv> this was the mantra of the 00s: we can do it all on the 3d engine

20:10 <libv> before 2011, i was the last person in the xorg sphere to touch overlays properly

20:10 <clever> for the rpi, the 3d engine is in its own power domain, and can be shut off when not needed

20:10 <libv> when i re-implemented xvideo for the unichrome

20:10 <libv> clever: as is true for almost anything

20:10 <clever> that reminds me, i think xorg lacks proper support for yuv420 output

20:10 <libv> the gpu can do it, but it takes a lot of work, and a lot of ramp-up

20:11 <clever> so any time your doing video under xorg, you often wind up converting to rgb in software or opengl

20:11 <libv> and you incur a massive bandwidth overhead

20:11 <clever> having the xvideo extension over a kms plane, would greatly reduce the cpu usage on an rpi

20:11 <clever> yeah, same for the rpi 3d

20:11 <clever> first, textures on the 3d core are not in a linear raster order

20:11 <clever> they are in a wonky tile format

20:11 <libv> as you wait for the gpu to stop rendering to memory, and then flip the buffer so the display can scan it out

20:11 <clever> so you must convert from linear to tile format

20:12 <clever> also, the 3d core cant snoop on the arm cache, so you must flush the arm cache

20:12 <libv> if you do it in the display engine, with dma and fifos in between, you consume bandwidth as you go

20:12 <anarsoul> libv: it's OKish for GL

20:12 <clever> then the 3d core can render the scene, ram->ram overall

20:12 <clever> then the 2d pipeline has to read from ram, and do its fifo and output stuff

20:12 <clever> but if you just tossed a yuv420 image at the 2d pipeline, it would just read it once, and your done

20:12 <libv> anarsoul: but if the display engine can compose on the fly, it's a lot better

20:13 <libv> same thing for a blitter

20:13 <anarsoul> libv: true

20:13 <libv> and the setup of such engines is trivial

20:13 <anarsoul> but userspace isn't ready :P

20:14 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/hvs/hvs.c#L96-L110

20:14 <clever> for 1:1 scale non-planar sprites on a vc4 era chip, you just write 7 x 32bit to the dlist like this, and your basically done

20:14 <anarsoul> sunxi de supports yuv planes, but AFAIK it's used only by kodi

20:14 <libv> because userspace is full of people who would only state "we can do all of that on the 3d engine" (which they usually label gpu, which has become a very ethereal term)

20:14 <libv> anarsoul: the fosdem video project uses it extensively

20:15 <libv> it's why i am so amused by the 32 plane limit :)

20:15 <anarsoul> libv: because on x86 you've got enough memory bandwidth to waste

20:15 <libv> anarsoul: you never have memory bandwidth to waste

20:15 <anarsoul> and no company wants to spend engineering resources to fix it for SBCs

20:15 <libv> but that's an unpopular viewpoint

20:16 <libv> anarsoul: there was a reason why i focussed on the command stream for lima

20:16 <anarsoul> moreover even A64 has enough CPU power and memory bandwidth to smoothly decode and display (via GL YUV->RGB conversion) 1080p H264

20:16 <libv> i knew someone else would be mad keen on playing with shaders and the isa of them

20:16 <libv> poking at display engines is comparatively boring

20:16 <clever> anarsoul: but might battery usage be lower if your doing that all in hw?

20:17 <anarsoul> clever: of course :)

20:17 <libv> playing with shaders or back when i started doing display stuff, fixed function 3d engines, everyone loves that!

20:17 <libv> it's why my work on modesetting was treated as it was

20:17 <libv> people just did not care

20:17 <libv> until they did, and then they handily forgot where it came from

20:18 <anarsoul> meh, modern devices waste *a lot* of CPU cycles and memory bandwidth for the work that could be done in hw

20:19 <libv> anarsoul: back in late 2010, my team at nokia went to visit imagination

20:19 <libv> nokia was using the sgx 530

20:19 <libv> and there was no arm gpu competition at the time

20:19 <libv> that was all img

20:20 <libv> and the sgx 530 had a 2d render library, which of course did the full steup of the 3d engine to emulate the brilliance of a blitter

20:20 <libv> and we got to usual presentation of the upcoming hw, the rogue

20:20 <libv> guess what suddenly had appeared again on the hw diagram

20:20 <libv> a blitter :)

20:21 <libv> turns out, having a simple bit of hw to copy some blocks of memory with a specific layout from one point, to another point with possibly another layout, is really really handy

20:22 <clever> one major feature that broadcom is still keeping a tight grip on, is the ISP

20:22 <clever> it does exactly what you said, and more

20:22 <libv> again, everyone had just thrown away such hw around mid 2000

20:22 <clever> the ISP deals with bayer->rgb conversion, computing stats, and doing image correction

20:22 <libv> there's a reason why android had hwcomposer this early on

20:22 <clever> but the ISP can also do rgb->yuv acceleration

20:22 <libv> clever: so tht's mainly aimed at camera

20:22 <libv> clever: right

20:22 <clever> mostly camera, but it also ties into the h264 encoder

20:23 <clever> so the ISP can do rgb->yuv for screen capture, for example

20:23 <libv> https://github.com/libv/sun4i-demp

20:23 <libv> our hdmi capture solution for fosdem grabs 24bit rgb, but outputs to 3 seperate planes

20:24 <libv> so it's like yuv444, but without conversion

20:24 <libv> and nothing, absolutely nothing seems to use planar rgb

20:24 <libv> not drm, not v4l

20:24 <libv> our display engine can do it

20:25 <libv> well, the original allwinner display engine can

20:25 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/hvs/include/platform/bcm28xx/hvs.h#L47-L71

20:25 <libv> and the mixer processor/2d engine of the original allwinner can also do the planar to sequential conversion...

20:25 <clever> this is every pixel format the rpi's 2d subsystem can support

20:25 <libv> if you feed it a unity matrix

20:25 <clever> the planar ones just accept 2 or 3 seperate addr+stride pairs

20:26 <clever> so each plane can be in an entirely different region of memory

20:26 <libv> oh, cool

20:26 <clever> each plane also has its own scale factors

20:26 <libv> we have a second user :)

20:26 <clever> so the color planes dont have to be at the usual ratio to luma

20:26 <libv> right, feed it a unity matrix and bob is your uncle

20:27 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/hvs/hvs.c#L232-L239

20:27 <clever> when a certain scaling mode is enabled (upscaling i think it was), you must specify the addr for 4 scaling kernels

20:27 <clever> one pair is the h/v scaling for the luma, the other pair is the h/v scaling for the chroma

20:28 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/hvs/hvs.c#L276-L288

20:28 <clever> and this is the raw scaling kernel, which must live in the same memory as the display list

20:28 <clever> i have zero docs, and linux only ever uses 1 global kernel, so i dont know what exactly its doing

20:29 <libv> a kernel here being the block of commands needed to get operations done on the pixel valve

20:29 <clever> i think its less of a set of commands, you need to manually mirror the numbers

20:29 <libv> yeah, that's going to be painful, but at least you can pipe to memory

20:29 <clever> i think its just a raw set of coeffs to mult each pixel by

20:30 <libv> that's the pain of display engines, with 3d engines you can always exactly measure the end result

20:31 <clever> there are 3 ways to measure the outpuf of the 2d engine

20:31 <clever> DPI is just a raw 24bit digital output bus, with a pixel clock to latch off of

20:31 <libv> true

20:31 <clever> some HDMI capture cards can do a bit-perfect capture

20:31 <clever> the transposer can just write the result back to system ram

20:32 <libv> the last is the easiest here

20:32 <libv> and you do not depend on external hw or a redesigned board

20:32 <clever> yep

20:35 <clever> another fun problem to deal with is clock ratios

20:35 <clever> the ntsc/pal generator needs a pure 108mhz clock to function correctly

20:35 <clever> you cant use fractional division

20:35 <clever> so the input your dividing down from the PLL must be a multiple of 108

20:37 <libv> depends on how the pll is built

20:38 <libv> on radeon r500 it was a simple integer divider

20:38 <clever> https://elinux.org/The_Undocumented_Pi#Clocks

20:38 <libv> if memory serves

20:38 <clever> for the rpi, it is fractional capable divisors nearly everywhere

20:38 <libv> and on r700 it allowed fractions there too

20:38 <clever> so you can make a clean 1ghz from 19.2mhz for example

20:38 <clever> but you cant divide 1ghz down to a clean 108mhz

20:38 <libv> so you could divide even finer

20:39 <libv> so it all depends on how the clock generator is built

20:39 <clever> its far simpler to divide 1.08ghz down to 108mhz

20:39 <libv> clever: then there's another bit of fun that comes in

20:39 <clever> the rpi has 5 PLL's in it

20:39 <libv> pll internal loop max frequency

20:39 <clever> each multiply 19.2mhz by a different freq

20:39 <libv> on some processes, it's not a measurable issue

20:39 <libv> on other processes, it becomes an issue

20:40 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/pll/pll_control.c#L496-L498

20:40 <libv> so then you need to go map out the limit

20:40 <clever> yeah

20:40 <clever> linux defines PLLC as having a range of 600mhz to 3ghz

20:40 <libv> which is how i have killed several a CRT

20:40 <libv> as they are better than an oscilloscope if you still have a vga output

20:40 <clever> ive pushed PLLC down to ~150mhz i think, but get it any lower and it just wont lock, and it bottoms out and just ignores the settings

20:41 <clever> also, above 1.75ghz, the counter for the divider cant function

20:41 <clever> so you have to enable a dedicated /2 stage

20:41 <libv> clever: ati ran into such issues with the display port enabled r600 asics

20:42 <clever> each PLL has between 1 and 4 taps on it

20:42 <clever> which divide it back down to something

20:42 <libv> so some display modes would wobble or not sync at all with their pll factor calculation routine

20:42 <clever> for example, PLLC_CORE0, PLLC_CORE1, PLLC_CORE2 and PLLC_PER (peripherals)

20:43 <libv> the solution: keep a table with asic, display mode (not frequency mind), and then one of the factors would be counted down instead of up during the calculation

20:43 <clever> each peripheral then has a mux, that can select from all of the PLL?_PER taps, and a few other special sources

20:43 <libv> so basically they were approaching the same wall from the other side

20:43 <libv> that's how fglrx did it, and the forked driver actually went with that as well

20:43 <libv> s/the other side/another angle/

20:44 <libv> i think that code is in drm kms as well

20:46 <clever> the secondary problem that comes from that 108mhz thing...

20:46 <libv> plls are fun, but at least for display, you can measure them

20:46 <clever> the VPU normally runs at 500mhz

20:46 <clever> but the only nearby multiples of 108 are 432 and 540

20:46 <libv> but depends on the same parent clock?

20:46 <clever> yeah

20:46 <clever> 540 would be an overclock, so the official firmware ignores that

20:46 <libv> sunxi is fun that way too

20:46 <clever> and there have been bugs, where 432mhz was too slow for the audio driver to function

20:47 <clever> so composite broke the pwm audio

20:47 <libv> hehe

20:47 <libv> have you managed to test 540MHz?

20:47 <clever> ive not gotten things to work reliably above 500mhz

20:48 <libv> ok

20:48 <clever> but ive not touched overvolting at all

20:48 <clever> all voltages are still at the reset default

20:48 <libv> which then runs into the territory of "how good is my chip, and for how long"

20:48 <clever> yep

20:49 <libv> for the fosdem project, we are running a 6.something MHz 320x240 panel

20:49 <libv> the sunxi display driver is pretty messed up with respect to clocks

20:50 <libv> so both the pixel clocks of the 2 pipelines would depend on the same parent

20:50 <clever> seen the gertvga board?

20:50 <clever> https://uk.pi-supply.com/products/gert-vga-666-hardware-vga-raspberry-pi

20:50 <libv> looks like a resistor based dac

20:50 <clever> exactly

20:51 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/dpi/dpi.c

20:51 <clever> lines 26-50 define the timing parameters for 2 vga modes

20:51 <clever> 89-100 then computes the divider needed to hit the desired vsync rate

20:51 <clever> and boom, i have VGA output on the rpi

20:52 <libv> anyway, once i had my capture stuff being displayed correctly, for some reason it ran a lot faster on the pipeline with lcd

20:52 <clever> ive confirmed it can do 120hz on a real crt

20:52 <clever> but my vga lcd wont accept 120hz

20:52 <libv> turned out, i had been running that poor lcd at like 25MHz for 6 months or so

20:52 <clever> lol

20:52 <libv> it didn't care

20:52 <libv> but still

20:52 <libv> that's a display capital office

20:52 <libv> offence

20:55 <libv> clever: so do you have to buy a whole range of different smd resistors, or is it all the same one?

20:55 <clever> an array of different ones

20:55 <libv> or does the board actually come fully populated, unlike what is seen in the pictures

20:56 <clever> it comes as a kit

20:56 <clever> for EMI reasons, they cant sell it assembled, because it entirely lacks shielding

20:56 <libv> ah, right, clicked through more pictures

20:56 <libv> we have switching power supplies

20:56 <clever> but if they just sell you a pile of resistors and connectors, they arent liable :P

20:56 <libv> no-one actually still cares about emi

20:57 <libv> oh, through hole

20:57 <libv> but smd is also possible it seems

20:57 <clever> yeah, the pcb supports both

20:57 <clever> the kit i bought was thru-hole

20:57 <clever> the schematics are on github

20:58 <libv> clever: for fosdem video hw we were talking about using the ti tfp401 to drive the LCD in the fosdem video box from a future sbc which has only hdmi out

20:58 <libv> during the 2018 video weekend getogether

20:58 <libv> then my brain got triggered to take it one step further

20:59 <clever> one of the things i need to work on more, is the display list management code

20:59 <libv> to use this dvi to parallel chip to fool the parallel camera interface

20:59 <libv> and then looked into it

20:59 <libv> and so that's what we did

20:59 <clever> ah, ive seen some people abusing the camera interface on the rpi in a few ways

20:59 <clever> there are ntsc and hdmi capture chips, that output CSI

21:00 <clever> one group made a parallel to csi converter with an fpga

21:00 <libv> turns out that some clever people had already tried that with sunxi hw

21:00 <libv> but they had not figured out the quirks of the sunxi capture engine, so they could not get it to sync up right

21:00 <clever> and the csi interface on the rpi itself, just takes the raw bayer bytes, and shoves them into a configured ringbuffer

21:00 <clever> and its up to the irq handler to move that buffer and not overwrite the frame

21:01 <libv> but with mipi csi you have tons of good options

21:01 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/hvs/hvs.c#L684-L740

21:01 <libv> like displayport

21:01 <clever> line 691 will note the start of this new display list

21:01 <libv> hdmi -> parallel is doable, mipi-csi is plenty

21:01 <clever> line 693-699 will generate each sprite in the list

21:01 <clever> 701 inserts an end of list marker

21:01 <libv> for displayport, you only get mipi-csi

21:01 <clever> 733 detects when your nearing the end of the list, and wraps back to the start

21:02 <clever> the bug, is that this has no support for 2 lists co-existing in the same space

21:02 <clever> so dual-monitor support breaks very quickly

21:02 <clever> and there is currently no way to know how big a list will be ahead of time

21:03 <clever> i can either partition it up into 2 smaller sub-sections, or pre-create the dlist in system ram, then find a free space and copy it over

21:03 <libv> until you figure out the kernels better?

21:03 <clever> kernels are unrelated

21:03 <clever> its more of an malloc type problem

21:03 <libv> ok

21:03 <clever> i have 4kb of ram, and i want to add a 300 byte object to it

21:04 <clever> on every frame, i need to allocate 300 bytes, and free 300 bytes

21:04 <clever> but sometimes the size will change

21:04 <clever> and there are 2 seperate displays, with vsync's that arent synced

21:05 <libv> and you need at least 1 for any display pipeline

21:05 <libv> +n for extra sprites

21:05 <clever> one object thats actively being visible, one thats being created, one that just stopped being visible

21:05 <clever> maybe 1 more that is wanting for a vsync pageflip

21:05 <clever> so 4 copies per display potentially

21:06 <libv> oh, right

21:06 <clever> having a pending copy already in the dlist memory, means a pageflip on irq is just 1 mmio write

21:06 <clever> changing the start pointer for the hw

21:06 <libv> so your 585? quickly boils down to ideally 120

21:06 <libv> on the other hand, this is display stuff

21:07 <libv> you have 16.666ms

21:07 <clever> the length of each object will be (sprites * 7) + 1 * 32bit

21:07 <libv> you can spend some time keeping lists in memory and copying/dmaing it in

21:08 <clever> yeah

21:08 <libv> so you need at most 2 per pipeline in there

21:08 <libv> 2 lists

21:08 <libv> still tough, but not as insane as 3 or 4

21:09 <libv> and how much time does a full page copy take to that mmio region?

21:09 <clever> ive not measured it

21:09 <libv> would you be able to do it in the vblank after sync?

21:09 <libv> vsync

21:09 <clever> i was doing the entire dlist write, and the pageflip, in vsync

21:09 <clever> and it was finishing maybe 15 scanlines late

21:10 <clever> so there was a stable tear at the top of the screen

21:10 <clever> but my new design pre-writes the dlist outside of vsync, and queues up a pageflip on the next vsync

21:10 <libv> start creating the list on vblank, you said you get an interrupt on that

21:10 <clever> so it only has to write 1 register upon the irq, and then its free for the entire frame

21:10 <libv> even better

21:11 <clever> after it does that flip, it also pokes the scheduler, to unblock every thread waiting for vsync

21:11 <clever> the scheduler then runs those threads, which mutate state, and queue up the next dlist for the next vsync

21:11 <libv> right

21:11 <clever> but now you have up to 1 frame of latency in the whole pipeline

21:12 <libv> indeed

21:12 <clever> because your based on state from the previous vsync, not the current vsync

21:12 <libv> but you know when the next vsync should come

21:12 <clever> yep

21:12 <libv> if that changes, your dlist has become invalid anyway

21:12 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/hvs/hvs.c#L373-L405

21:13 <clever> this code was saving the timestamp of every video related irq

21:13 <clever> so i could measure things

21:13 <libv> right

21:13 <libv> is vblank start good enough?

21:13 <clever> if you only want pageflip, yeah

21:13 <libv> for creating the dlist?

21:13 <libv> ok

21:13 <clever> but with hsync, you can start doing fancy effects

21:14 <clever> if you slide something left/right on each hsync, you can create certain types of shearing effects

21:14 <libv> vga sends its regards

21:14 <libv> iirc, palette was flipped on hsync to allow more colours

21:14 <juri_> my C64 sends it's regards. :P

21:14 <libv> i'm sure 80s console and such did the same

21:14 <libv> yes

21:15 <clever> systems without a v-scaler also move a sprite up/down on hsync

21:15 <libv> i am too young for c64 :)

21:15 <clever> to stretch/squish it

21:15 <libv> i never get to say those words anymore

21:15 <libv> except for perhaps things like hip replacements

21:15 <juri_> libv: I did my tour of duty on video hardware from the VIC to the ATI rage ][. :)

21:16 gsz has left #linux-sunxi [#linux-sunxi]

21:16 <libv> all i ever had as a kid was an i386 :(

21:16 <juri_> now i just sling plastic instead of pixels. :)

21:17 <libv> juri_: rage 2 was the generation before the 128, right? or was there something in between

21:18 <libv> rumour has it that the 128 was heavily influenced by tseng, as ati had bought up tseng in like 99 or 2000

21:18 <juri_> libv: it was. I implemented SVGAlib support for it.

21:19 <libv> and iirc, the rage-128 was significantly different from the rages before, and the leap to radeon was not too huge from there

21:19 <libv> heh, we are definitely in old far territory now :)

21:19 <libv> fart even

21:19 <juri_> yeah, i was pulled into the dotcombomb after that. :)

21:23 <libv> i actually debated in 2018 whether i would add flashrom support for pci based ati devices

21:23 <juri_> a lot of them were socketed. could have come in handy 20 years back. :)

21:23 <libv> but decided to not go there, especially as i did not want to mess with agp either (i have working x86-64 hw with agp still), and to stick to pci-e devices only

21:25 <libv> iirc, ethernet and sata controllers were used for that then

21:27 <libv> there is actually someone talking about directfb at fosdem

21:27 <libv> working on reviving it

21:28 <libv> hah, typical

21:28 <clever> libv: as for the problems with the open kms drivers on the open firmware....

21:28 <juri_> I'm working on something that'll make a good talk, once covid is gone. I'm writing a 3d printing slicer in haskell.

21:28 <clever> any attempt to write to any register in the 2d hw, results in an async external abort exception on the arm core

21:29 <libv> recent github svgalib repo: https://github.com/sauparna/svgalib

21:29 <libv> no history of course.

21:29 <juri_> Wow. that brings you back.

21:29 <libv> why do people keep doing that?

21:29 <clever> libv: i had talked to engineers repeatedly, and every single time they claim its a power domain problem, but my demo videos above claim otherwise

21:29 <libv> clever: indeed

21:29 <libv> clever: a mapping issue perhaps

21:30 <clever> but while i was helping some amiga emulator guys with scaling, i stumbled upon a hint

21:30 <libv> find out how that range is mapped

21:30 <clever> SCALER_DISPECTRL_SECURE_MODE

21:30 <libv> ah

21:30 <clever> that sounds like the answer

21:30 <libv> fun.

21:30 <clever> but i havent gotten around to testing it

21:33 <libv> clever: but the kms driver has access to it...

21:33 <libv> so when kms is loaded, you cannot access it? or...

21:33 <clever> the firmware likely sets that bit before linux boots

21:33 <clever> to unlock access from the arm

21:34 <libv> yeah, broadcom is all about openness

21:34 <libv> remember when the rpi was hailed as to project to get kids coding on hw again?

21:34 <clever> i suspect its a legacy security feature

21:34 <clever> they are still claiming that is their goal

21:34 <clever> and i suspect thats also partially behind their choices on what gets opened

21:34 <libv> one of the biggest loads of bs ever

21:34 <clever> how many kids are going to do ddr4 init?

21:35 <libv> some.

21:35 <libv> clever: when i did lima, my plan was to focus on everything but the shader isa, as i knew that enough people would be way keen to work on that so some useful people would turn up

21:36 <clever> oh, and do you know about how the VPU also has vector extensions?

21:36 <libv> and someone did, and when i wanted to send him some amlogic based tablet, he said "i will have to ask my parents first"

21:36 <libv> "some" is a success

21:37 <libv> and for rpi, you basically have none.

21:38 <libv> it's pretty much been boiled down to a more powerful arduino which can double up as a settopbox

21:38 <clever> the vector extensions are pretty powerful

21:38 <clever> for (int i=0; i<16; i++) { int temp = a[i] * b[i]; if (store) c[i] = temp; if (accumulate) accumulator[i] += temp; }

21:38 <clever> basically, i can run this entire thing in 2 clock cycles, at 500mhz

21:44 <juri_> I took over a project, and instead of fixing it, i wrote documentation, assuming others would read it, and use it to fix the software. big mistake.

21:44 <clever> ive been doing a mix of docs, example code, and making it functional

23:32 pcBob has quit [Remote host closed the connection]