ChanServ changed the topic of #linux-sunxi to: Allwinner/sunxi development - Did you try looking at our wiki? https://linux-sunxi.org - Don't ask to ask. Just ask and wait for an answer! - This channel is logged at https://oftc.irclog.whitequark.org/linux-sunxi
bauen1 has quit []
ZenWalker has quit [Read error: Connection reset by peer]
apritzel has quit [Ping timeout: 480 seconds]
cnxsoft has joined #linux-sunxi
sunshavi has quit [Remote host closed the connection]
gamiee2 has joined #linux-sunxi
JohnDoe_71Rus has joined #linux-sunxi
apritzel has joined #linux-sunxi
apritzel has quit [Ping timeout: 480 seconds]
<montjoie> does anyone know where can I find full datasheet of EMAC ? I tried A10 and A20 and I see nothing.
pcBob has joined #linux-sunxi
gsz has joined #linux-sunxi
apritzel has joined #linux-sunxi
cnxsoft has quit [Ping timeout: 480 seconds]
<apritzel> montjoie: does the R40 manual help? AFAICS this describes both, and the "EMAC" in there looks like the EMAC from the A20
<apritzel> aggi: I smeared up the decompressor's head.S with debug instructions, and got one step further: I now see: "Uncompressing Linux..."
<apritzel> this was not showing up before because U-Boot's SCTLR_EL1 setup is not 100% compatible with AArch32's SCTLR version
sunshavi has joined #linux-sunxi
<montjoie> apritzel: thanks, it has everything
<apritzel> is there is any *board* without an A10 using that IP? I see that the A20 and R40 have the EMAC as well, but does any of the boards with those SoCs allow to use the EMAC (vs. the GMAC)?
<montjoie> according to dtb, very few boards use sun4i-emac
<NekoMay> gamiee: Last I checked, Cubieboard was using Actions Semiconductor SoCs now
<NekoMay> ARM ones, not the MIPS ones they were (in)famous for
<NekoMay> Lots of those Actions MIPS SoCs in cheap emulator handhelds
<NekoMay> Same place the F1Cx00s is super popular
<NekoMay> Though Miyoo has moved on to SigmaStar now
<NekoMay> Oh, and Mediatek
<NekoMay> Though Mediatek is for their new fancy offering
<NekoMay> SigmaStar is for their low-cost/tiny offering
<NekoMay> You mentioned Rockchip invasion, well, they managed to muscle into there, too; lots of Quad A7 and Quad A35 handhelds these days
cnxsoft has joined #linux-sunxi
gamiee2 has quit [Ping timeout: 480 seconds]
<libv> is actsem even adhering to the gpl?
<libv> or is it like realtek?
<libv> last i checked, no communities exist for those
<libv> exynos also was dead in the water
<libv> amlogic is probably not too much better
<plaes> there's ton of upstreaming happening for amlogic
gsz has quit [Read error: Connection reset by peer]
gsz has joined #linux-sunxi
<apritzel> isn't BayLibre taking care of Amlogic?
<montjoie> apritzel: yes
<libv> oh, ok
<pcBob> Is there a repository with USB working for F1C100s?
<apritzel> pcBob: can't you just download the mbox from patchwork, and then "git am" it?
<pcBob> How do I know what kernel version this applies to?
<apritzel> latest, I guess?
<apritzel> but I guess you could apply it to anything fairly recent
<pcBob> It fails for a modified 5.11 kernel which added some other hardware. I took a look into the current linux master branch and the dis-file for F1C100s didn't even include SPI. So, even though I might be able to apply it I would throw away SPI etc.
<pcBob> *dts file
<pcBob> To me it looks like bits and pieces everywhere
<apritzel> pcBob: yes, but the patches are fairly easy, actually
<apritzel> not sure why you play around with 5.11, though
<apritzel> that's not even LTS
<apritzel> you should be able to apply the patches manually
<pcBob> the modified kernel is 5.11 (it is not from me, I am just using it) and as far as I can tell it has a lot more hardware support than linux main
<pcBob> looks to me like a merge conflict
<apritzel> a lot more unreviewed and hackish code, you mean?
<pcBob> Maybe, I don't know. But it seems like to changes have been happening to F1C100s in mainline since 3 years or so
<pcBob> and no recent changes are supported officially
<apritzel> well, if nobody submits, reviews and tests, then not much will happen
<apritzel> if you look at the patches, the changes are fairly self contained, basically adding lines or structs to existing data
<apritzel> you should be able to find the respective places in your kernel
<apritzel> and then just hope that no significant changes happened to the MUSB or PHY driver in the last year
<apritzel> or you go real and just use 5.15 or 5.16-rc8
<pcBob> sound like pain to me
<apritzel> pcBob: what? using a newer kernel or manually applying the patches?
<pcBob> manually applying the patches of USB and then also the changes the 5.11 custom kernel did
<apritzel> well, you are now seeing one of the problems of those non-mainline trees: adding stuff or applying patches becomes more and more painful
<pcBob> didn't Icenowy Zheng did a lot for F1C100s? I don't see these changes being mainlined
<apritzel> pcBob: those are mostly her patches, just sent by someone else
<pcBob> there is a 4.14 kernel with USB but it fails building with my buildroot
<apritzel> so the whole series applied cleanly on top of v5.11.22 for me
<apritzel> just downloaded the file given to me by the "series" link in the top right corner of patchwork, and "git am"'ed it
<apritzel> chances are the actual USB/PHY driver changes apply cleanly on your kernel as well, it's probably just the DT nodes that clash?
<apritzel> pcBob: for the DT changes: you can literally just copy out the diff from patch 4/5, and add it to the end of your version of suniv-f1c100s.dtsi
<apritzel> same with patch 5/5
<pcBob> yes the dts files are clashing. Sorry to ask this but is this enough to get USB devices like mass storage or WLAN dongles working?
<pcBob> I see some USB drivers being loaded but neither my USB stick nor my WLAN dongle are recognized
<apritzel> do you have the MUSB and the PHY drivers compiled in or loaded?
<apritzel> there is only the OTG controller on this SoC, AFAICS
cnxsoft has quit []
<aggi> apritzel: i can test the changes with a kernel-version/config i am certain is functional with my pine64; if the loader jumps to "decompressing linux" this would help me alot
<apritzel> aggi: actually it just booted for me, after I removed all the debug cruft from head.S
rajkosto has joined #linux-sunxi
<apritzel> with one tiny U-Boot patch to setup SCTLR_EL1 differently for AArch32
<apritzel> no kernel changes, just enabling COMPILE_TEST and selecting the pinctrl and clock drivers
<apritzel> (minus SMP, as expected)
<aggi> i see, PSCI/SMP aren't available then?
<apritzel> no, and I think that's a hard problem, because the spec is nasty in this respect
<aggi> can't find my way through u-boot sources either, to see aht needs changes for SCTLR_EL1 with aarch32
<aggi> *what
<aggi> apritzel: nonetheless, i would like to test your diff if it is available.
rajkosto has quit [Read error: Connection reset by peer]
<apritzel> aggi: but please: this is just a hack, because I was curious: you should seriously fix your build environment and bring it into the 21st century
<apritzel> gcc 4.7.4 is beyond evil, even my old Slackware is on 5.5.0
<apritzel> but for cross-compiling the version of your native compiler shouldn't matter much anyway
<aggi> apritzel: i was thinking to downgrade gcc even further, because gcc-4.7 contains C11 features already; and i am aiming to replace both gcc and clang with another C99 compiler
<apritzel> why?
<aggi> otherwise, i already succeeded compiling kernel v5.10 with gcc-4.7 and an entire userspace required to proceed with gentoo
<apritzel> I mean if you don't want newer standards, just say so on the command line
<aggi> apritzel: why? 1) for the reason alone C90/99 compliance itself and 2) desired compilers are known to be up to 10x faster which is practical, and 3) GCC any later than 4.7 are written in c++
<apritzel> or bake your your own compiler, and set the default standard to 1969, if you like
<aggi> suprisingly, even some basic interpreters from 1970s and legacy 8/16bit systems had some appeal
<aggi> apritzel: i see a constant is loaded into a "model specific register", what does this constant encode?
<apritzel> this is way too much retro for my taste, I consider even kernels from last year "old"
<apritzel> or from 2020, I guess
<apritzel> and there was no glorious past, just bad keyboards and next-to-useless computers ;-)
<aggi> i am not opposing any kernel or compiler upgrade, i reject the fact a c++ compiler (meaning a compiler written in c++ itself) is required since recently, to compile GNU/Linux
<apritzel> honestly I don't see how the implementation language is relevant, and C++ is not really niche
<aggi> if this is too much off-topic here then simply say, otherwise i explain
<apritzel> I don't like C++ much, but mostly because my time is limited, and I stopped caring about GCC compilation requiring g++ decades ago
<aggi> i am not rejecting c++; i reject c++ in critical base system components (kernel, compiler itself)
<aggi> furthermore, g++ is *slow*, and kicked my pine64 board into OOM regularly, at least gcc didn't
<apritzel> well, then you have to bite the bullet and design your own eco system
<apritzel> but this won't be a walk in the park
<aggi> what cought my attention was the claim legacy c90/c99 compilers are 10x faster with compilation times,and this alone would be a *huge* benefit
<apritzel> claim, you said it
<aggi> i know, it's alot of dirty hackjobs required, to revive gcc-4.7.4 with kernel v5.10LTS required some time for testing etc.
<apritzel> I still don't understand how this is relevant: the kernel isn't written in ANSI C, you need a GCC or enough compatible compiler to build it
<apritzel> you can't just expect to take a random C compiler and compile the kernel
<aggi> sure, officially kernel v5.10 required gcc-4.9 (the first one written in c++ and supporting aarch64 btw.); and with only some minor changes in headers files kernel v5.10LTS compiled nicely with gcc-4.7
<aggi> furthermore, C90/99 compliance isn't just any "random compiler", it's a standard
<apritzel> yes, but the kernel is written in "kernel C", not standard C
<aggi> linux kernel was rather conservative with compiler requirements (gcc-3.2 for a very long time by today's standards): up until kernel v4.19LTS/v5.10LTS and recently again v5.15 bumped compiler version
<apritzel> and for very good reasons
<apritzel> there are so many many things that require recent compilers, alone "asm goto" for instance is a killer feature
<aggi> i'm no asm hacker tbh.; and even if i would practice asm hacking on Z80 era systems, certainly not ARM
<aggi> anyway
<apritzel> I wouldn't recommend that, ARM is considered one of the best assembly languages / architectures to be programmed in assembly
<aggi> apritzel: what does this constant encode 0x00c00878 ?
<apritzel> Z80 is just some crufty stuff because technology was limited back then ;-)
<apritzel> that's some version of the ARMv7 reset value to SCTLR
<apritzel> I need to pick some brains in the office later this week on the exact details
<apritzel> aggi: the current U-Boot code is just setting up the AArch64 version of SCTLR_EL1, which has some subtle differences to the AArch32 version
<apritzel> and the Linux decompressor code does a read-modify-write of that register, so any nasty bits already set stick
<aggi> apritzel: currently compiling some things again (reminder why i need a fast compiler on my pine64 boards), i'll notify about any progress
<apritzel> aggi: you just need a faster board, really ;-) You could try an RPI4, or even QEMU on some decent PC
<aggi> lots of other bugs and quirks, with the compiler downgrade; otherwise, i can confirm aarch32 userspace binaries can be executed on aarch64 kernel with CONFIG_COMPAT
<apritzel> aggi: and you brought those problems on yourself, by choosing Gentoo: now you pay the price ...
<aggi> the pin64 board i got is among the fastest SBC ever built (that's none of my complaints)
<aggi> it's not a gentoo-specific issue either
<apritzel> aggi: in which world are you living in? A quad A53 is probably one of the most annoying build platforms
<apritzel> my ARM64 desktop at work is doing a recent kernel in around 1min40sec
<aggi> cannot complain about A53 and A72; too i fiddled together various helper scripts and configured distcc with gentoo tooling
<aggi> the limitation mainly was available RAM and g++ hitting OOMs regularly (another reason i downgrade to gcc-4.7 because no c++ required to bootstrap this)
<apritzel> so why not just get an RPi4 or some RK3399 board with 4GB or more of DRAM?
<apritzel> or even use QEMU ;-)
<aggi> RPI contain some nasty firmware on the videocore which i reject, i got a pine64 board with rk3399 which is fast (that's not the issue and even with 4GiB i encountered OOMs with g++)
<aggi> and i react allergic to "virtualization", i try to avoid it whenever possible
<apritzel> I figured OOMs during compilation are mitigated by reducing parallelism (less -j), or by using a real machine
<aggi> this is the plan here: i try to keep the bases system clean, nothing written in or depending on c++; and with gentoo i can emerge anything else into a chroot with a cross-compiler elsewhere, if desired
<aggi> ideally, the base-system contains everything required to work and hack, and anything else is dumped elsewhere flatpak style, i'll not care about anymore
<aggi> if the system compiler, such as tinycc or pcc is 10x faster even, any ARM SBC can outcompete almost anything else, with a power budget of 5W instead of 500W
<aggi> didn't expect C11 poisoned kernel already, and who knows what else
<apritzel> wow, so you reject recent GCCs and virtualisation, but embrace flatpak. That's ... interesting
<aggi> i said flatpak style, not flatpak
<apritzel> still, this whole concept raises eyebrows here
<aggi> gentoo can emerge into any desired system root (with cross compilers even), which is what i'll do to keep my system fast and clean
<aggi> and, i got a kernel v5.10 and recent userspace compiled already, with gcc-4.7; lots of cleanup todo still, before i'll downgrade further
<apritzel> sure, do whatever you want, I just figure that there are enough real problems to solve, no need to invent new ones ;-)
<jernej> speaking of speedy kernel compilation, did anyone noticed 2000+ patches for big reduction of compile time?
<jernej> I wonder how this will be reviewed :)
<apritzel> yeah, read about it, sounds quite scary
<apritzel> [PATCH 0000/2297]
<aggi> i can post the patchset here to compile kernel v5.10LTS with a legacy compiler; then see for yourself who invented problems, with C11 _Generic for example
<aggi> this is what GNU/Linux did: they dropped C90/C99 compliant compiler support (gcc-3.2 until kernel v4.18), and instead favored llvm/clang (who introduced _Generic for example, and who knows what else)
<aggi> this one compiles with both gcc-4.7(aarch32) and gcc-8.5(aarch64,aarch32), and boots with aarch64 already (couldn't test aarch32 yet because lack of hardware): https://dpaste.com/DB75YSA57
ftg has quit [Ping timeout: 480 seconds]
chewitt has joined #linux-sunxi
macromorgan has joined #linux-sunxi
gamiee2 has joined #linux-sunxi
<juri_> aggi: fwiw, the pi's firmware problem is a non-problem, if you don't want HDMI. in practice, the VC4 is more capable than the CPU, it's just that everyone likes their blobs...
<juri_> Free Software raspberry pi configs are a thing.
gamiee2 has quit [Remote host closed the connection]
gsz has quit [Ping timeout: 480 seconds]
JohnDoe_71Rus has quit []
gsz has joined #linux-sunxi
<libv> juri_: can the rpi now boot without the RTOS?
<juri_> yep.
<libv> url?
<juri_> even better, you can sub in your own, and use it to harness the spare CPU cycles.
<libv> oh, this is news to me
<libv> was this at all supported by either the rpi foundation or broadcom?
<libv> or was this purely community?
<juri_> absolutely not. more of anti-supported.
<libv> (i am the guy who pointed out this binary mess back in 2012 btw, when they announced their "open source" driver)
<juri_> it's purely community, over in #raspberrypi-internals on libera
vagrantc has joined #linux-sunxi
<libv> right
<libv> so there is a good reason to still dislike the rpi
<libv> but good job
<juri_> tell him. i just cheerlead. :)
<juri_> (the main dev nowadays is 'clever'.)
<libv> the thing is, such work is never rewarded
<juri_> he does some cool stuff with it, and will talk your ear off if you let him. :)
<libv> he will be blackballed most places as well
* juri_ nods.
<juri_> he's happy where he is.. and i'd hire him. :P
<libv> it's a bit late for that now, but that guy should definitely talk at fosdem
<juri_> I think he's in the wrong timezone.
<juri_> i think he's over at kowainik.. ?
<libv> clever: well done :)
<juri_> oh. he's here. :P
<libv> build in some printing which states just how averse rpi foundation and broadcom are to this work
<libv> it's not as if they will ever support you in their or your lifetime
rajkosto has joined #linux-sunxi
tuxd3v has quit [Read error: Connection reset by peer]
<clever> that would be nixos (a linux distro) booting on an rpi, with the open firmware on the VPU
tuxd3v has joined #linux-sunxi
chewitt has quit [Quit: Zzz..]
tuxd3v_ has joined #linux-sunxi
<libv> so the animation is the gpu and an overlay/sprite being moved around, in the background
tuxd3v has quit [Read error: Connection reset by peer]
<clever> yep
<clever> the 2d subsystem is basically a sprite-only gpu
<clever> if you want anything visible, it must be a sprite
<clever> the text console on the top is a stationary sprite, with the VPU debug logs
<clever> the bottom text console is mapped to /dev/fb0, and linux does text+gfx on it
<libv> so this is not a display engine thing
<libv> it's a blit
<clever> you configure a list of sprites in a special MMIO region, and the hardware will automatically fetch image data and alpha-blend them together
<clever> generating a stream of pixels in raster order
<clever> but that stream is very bursty, so it goes thru a FIFO to regulate it down to the pixel clock, and that then feeds the output PHY
<libv> so closer to an advanced display engine thing than a classical 2d engine/blitter
<clever> it also does scaling on the fly
<clever> and it supports planar and yuv formats
<clever> so you can just throw a yuv420 sprite at the 2d subsystem, and it will render, with zero cpu cost
<libv> right, like with what allwinner calls the mixer for their DE engine
<clever> the display list is 4096 x 32bit long on the VC4 lineup (pi0-pi3)
<libv> 4k entries?
<clever> a non-planar sprite at 1:1 scale takes up 7 x 32bit slots
<clever> so while you have 4096 slots, you need 7 slots for each sprite, giving a limit of 585 initially
<clever> but the hardware malfunctions if you modify an active display list
<libv> ok
<clever> so you need 2 lists, and do page-flips between them, so halve that
<libv> still insane
<clever> which reduces you down to 292 sprites
<clever> but, you then have memory bandwidth issues to deal with
<libv> i thought allwinners DE was insane with (1)+4+32 per pipe
<clever> in this demo, i have 20 sprites active
<clever> but if too many sprites are on the same scanline, the rasterization falls behind, and the FIFO runs dry
<libv> clever: btw: drm planes only allows 32 planes
<clever> causing the glitching you can see
<clever> the smaller the sprites, the more you can have
<clever> avoiding overlap and alpha helps a lot
<libv> jesse barnes was told to solve the power issue of wayland vs hwcomposer, and he threw planes together as a result, on intel hw that had just lost almost all of its overlays
<libv> so he chose a 32bit mask for the plane identifier.
<clever> its basically just a hw accelerated memcpy, with pixel format conversion and alpha-blending on the dest
<clever> so if you have 2 sprites on a scanline, it has to copy 2 ranges
<libv> clever: does it go to memory, or does it scan out?
<clever> and if you have overlap, it will waste cycles drawing something you cant see
<clever> there is a dedicated chunk of fifo ram
<libv> if it goes to memory: 2d engine/blitter
<clever> and for each video output, you define the start/end range in that ram, to create a fifo
<libv> if it scans out, then it's part of the display engine
<clever> the scan-out then reads from that fifo
<clever> the hardware supports up to 3 video outputs at once, each with its own private fifo
<clever> but there is only 1 composition engine
<clever> which gets time-shared over the 3 displays
<libv> explain time-shared
<clever> it will just round-ribon between generating scanlines for each display
<libv> from the same list
<clever> different lists
<libv> right, but one engine, so bandwidth goes x3
<libv> ok
<clever> yep
<clever> also, if you dont cover a region of the screen with a sprite, it just doesnt write to that pixel in the fifo
<clever> so you get whatever was in that fifo slot on the last pass
<clever> and depending on the ratio between fifo length and scanline length, it gives different repeating patterns
<libv> but then, if the display pipelines are a different frequencies or offset, then the blanking space of one or more displays is not a deadzone
<clever> yeah, you need some headroom for composition to not stall out
<clever> the FIFO length can hide short stalls
<clever> but a lot of these details are in docs broadcom wont share
<libv> of course
<clever> so i'm having to mis-configure the hw, and then guess from how it malfunctions
<clever> there is an optional background fill you can set
<libv> but don't be surprised if $vendor has not fully tested the limits of such bandwidth issues
<clever> but that also costs bandwidth
<clever> and there is also a dedicated transposer for offline composition
<libv> not on the fetching side, and not with scaling
<libv> ok, so that then blits to memory
<clever> the transposer can do a 90 degree rotation, but consumes 1 video output channel
<clever> yeah, it writes the rotated image back to ram
<clever> and the rotation step is optional
<libv> are the pipelines freely switchable?
<clever> so you can just compose 400 sprites together, and then have a 2nd channel just render the result
<libv> between display "outputs"
<clever> entirely bypassing the bandwidth problems
<clever> for the VC4 lineup, each channel is hard-wired to 2 potential video oututs
<clever> and only 1 of those outputs can be on at once
<libv> heh
<clever> PV0 can only drive dsi0 or dpi
<clever> PV1 can only drive DSI1 or SMI
<clever> PV2 can only drive HDMI or VEC (composite, ntsc, pal)
<libv> yeah
<clever> pick one from each
<libv> so rpi basically does not do dual monitor?
<clever> but on the bcm2711, there is a mux (lines 29-33), that basically lets you remap things freely
<clever> the rpi can do dual and even tripple monitor
<clever> DPI + DSI1 + HDMI is possible under linux with the closed firmware
<clever> ive gotten DPI + NTSC working on the open firmware
<clever> DSI and HDMI init is something i still need to figure out
<libv> oh, each channel can toggle between 2 potential video outputs
<libv> right
<clever> yeah
<clever> the pixel valve defines all of the video timing parameters
<clever> hsync, h backporch, h frontporch, hactive, vsync, vfront, vback, vactive, and an optional duplicate v set for the odd field
<clever> when interlacing is active
<libv> i have not followed the drm driver of videocore
<clever> each PV also has a mux on it, to select the pixel clock
<libv> does that talk to the blob, or is that native code?
<clever> the fkms drivers in linux talk directly to the hardware, but rely on the blob to enable power
<clever> oops, no backwards!
<clever> fkms is the firmware kms, it talks exclusively to the blob
<clever> the kms (no f) drivers talk directly to the hw
<clever> ive created most of these demos and docs by reading the kms source in linux
<libv> but they probably depend on some setup having been done by the firmware
<clever> yeah
<clever> ive not been able to get the kms drivers to run blob-free yet, but have discovered what may be the key to unlocking that
<clever> that linux demo i linked above, is just using the dumb simple-framebuffer api, and having my custom firmware configure it as a sprite
<libv> right
<clever> let me get an example of the 2d api i designed from scratch
<clever> that is the code behind https://invidious.snopyta.org/watch?v=suswjbpR1HU
<clever> line 15 creates a bitmap image for the grid, 18-30 draws the grid on that image
<clever> 31-35 creates a sprite with that image, and sets the xywh
<clever> 38 makes the sprite visible, and 39 updates the screen on the next vsync
<clever> 53 will wait for vsync, 55-72 changes the w/h, and 75 updates the display list
<clever> and 85 runs grid_entry() as a new thread on bootup
<clever> you could then use that code to create a sprite based game
<libv> right, with this many sprites that probably could speed up many an emulator
<clever> the PV can also do interrupts on vsync, hsync, and even the porches
<libv> clever: is the pv implemented in any kernel code already upstream or soon to be reaching upstream at this point?
<clever> its already in mainline linux i believe
<libv> as anything clever with display or with 2d engines is something that is mostly ignored
<libv> ok
<clever> thats the file youll find it in
<libv> ok
<clever> i also the original 3d demos from the pi1 days (it ran via /dev/mem) and ported them over
<clever> that spinning triangle with 3 colors, was done using shaders on the 3d core
<libv> no planes implemented it seems
<libv> just the base
<clever> ?
<libv> ah, they are added separately
<clever> oh, and the 3d core is capable of emiting images with alpha present
<clever> so that spinning triangle, is on a transparent background
<clever> which lets the 2d core compose it together with other sprites
<libv> 16 planes :)
<libv> uint32_t for all the planes per kms instance
<clever> i believe that 16k is for bcm2711, but i'm not entirely sure
<libv> 16, not 16k
<clever> line 1499
<libv> a uint32_t is a mask for identifying planes, each plane owning 1 bit of that
<clever> ahhh
<clever> that explains why linux has a limit of 32
<libv> so the kms plane infrastructure allows for max 32 planes
<clever> in LK, each display just has a linked list of visible sprites
<libv> again, 2011, jesse, who could not give a shit about display drivers, probably got told to fix the power discrepancy with android
<clever> and the code will just iterate over the list, and convert it into a hw display list
<libv> and intel had just dumped most of the overlays from its hw
<clever> yeah, ive seen that people just use opengl for this now
<libv> the second i held my lima talk in 2012, codethink rented me out to my former nokia manager, who had then joined intel (i was not allowed to join for the usual political reasons)
<libv> i got tasked with writing hwcomposer on top of kms planes before ville worked on atomic
<libv> and i actually had to get a netbook as that had an intel display engine with more than 1 extra overlay
<libv> this was the mantra of the 00s: we can do it all on the 3d engine
<libv> before 2011, i was the last person in the xorg sphere to touch overlays properly
<clever> for the rpi, the 3d engine is in its own power domain, and can be shut off when not needed
<libv> when i re-implemented xvideo for the unichrome
<libv> clever: as is true for almost anything
<clever> that reminds me, i think xorg lacks proper support for yuv420 output
<libv> the gpu can do it, but it takes a lot of work, and a lot of ramp-up
<clever> so any time your doing video under xorg, you often wind up converting to rgb in software or opengl
<libv> and you incur a massive bandwidth overhead
<clever> having the xvideo extension over a kms plane, would greatly reduce the cpu usage on an rpi
<clever> yeah, same for the rpi 3d
<clever> first, textures on the 3d core are not in a linear raster order
<clever> they are in a wonky tile format
<libv> as you wait for the gpu to stop rendering to memory, and then flip the buffer so the display can scan it out
<clever> so you must convert from linear to tile format
<clever> also, the 3d core cant snoop on the arm cache, so you must flush the arm cache
<libv> if you do it in the display engine, with dma and fifos in between, you consume bandwidth as you go
<anarsoul> libv: it's OKish for GL
<clever> then the 3d core can render the scene, ram->ram overall
<clever> then the 2d pipeline has to read from ram, and do its fifo and output stuff
<clever> but if you just tossed a yuv420 image at the 2d pipeline, it would just read it once, and your done
<libv> anarsoul: but if the display engine can compose on the fly, it's a lot better
<libv> same thing for a blitter
<anarsoul> libv: true
<libv> and the setup of such engines is trivial
<anarsoul> but userspace isn't ready :P
<clever> for 1:1 scale non-planar sprites on a vc4 era chip, you just write 7 x 32bit to the dlist like this, and your basically done
<anarsoul> sunxi de supports yuv planes, but AFAIK it's used only by kodi
<libv> because userspace is full of people who would only state "we can do all of that on the 3d engine" (which they usually label gpu, which has become a very ethereal term)
<libv> anarsoul: the fosdem video project uses it extensively
<libv> it's why i am so amused by the 32 plane limit :)
<anarsoul> libv: because on x86 you've got enough memory bandwidth to waste
<libv> anarsoul: you never have memory bandwidth to waste
<anarsoul> and no company wants to spend engineering resources to fix it for SBCs
<libv> but that's an unpopular viewpoint
<libv> anarsoul: there was a reason why i focussed on the command stream for lima
<anarsoul> moreover even A64 has enough CPU power and memory bandwidth to smoothly decode and display (via GL YUV->RGB conversion) 1080p H264
<libv> i knew someone else would be mad keen on playing with shaders and the isa of them
<libv> poking at display engines is comparatively boring
<clever> anarsoul: but might battery usage be lower if your doing that all in hw?
<anarsoul> clever: of course :)
<libv> playing with shaders or back when i started doing display stuff, fixed function 3d engines, everyone loves that!
<libv> it's why my work on modesetting was treated as it was
<libv> people just did not care
<libv> until they did, and then they handily forgot where it came from
<anarsoul> meh, modern devices waste *a lot* of CPU cycles and memory bandwidth for the work that could be done in hw
<libv> anarsoul: back in late 2010, my team at nokia went to visit imagination
<libv> nokia was using the sgx 530
<libv> and there was no arm gpu competition at the time
<libv> that was all img
<libv> and the sgx 530 had a 2d render library, which of course did the full steup of the 3d engine to emulate the brilliance of a blitter
<libv> and we got to usual presentation of the upcoming hw, the rogue
<libv> guess what suddenly had appeared again on the hw diagram
<libv> a blitter :)
<libv> turns out, having a simple bit of hw to copy some blocks of memory with a specific layout from one point, to another point with possibly another layout, is really really handy
<clever> one major feature that broadcom is still keeping a tight grip on, is the ISP
<clever> it does exactly what you said, and more
<libv> again, everyone had just thrown away such hw around mid 2000
<clever> the ISP deals with bayer->rgb conversion, computing stats, and doing image correction
<libv> there's a reason why android had hwcomposer this early on
<clever> but the ISP can also do rgb->yuv acceleration
<libv> clever: so tht's mainly aimed at camera
<libv> clever: right
<clever> mostly camera, but it also ties into the h264 encoder
<clever> so the ISP can do rgb->yuv for screen capture, for example
<libv> our hdmi capture solution for fosdem grabs 24bit rgb, but outputs to 3 seperate planes
<libv> so it's like yuv444, but without conversion
<libv> and nothing, absolutely nothing seems to use planar rgb
<libv> not drm, not v4l
<libv> our display engine can do it
<libv> well, the original allwinner display engine can
<libv> and the mixer processor/2d engine of the original allwinner can also do the planar to sequential conversion...
<clever> this is every pixel format the rpi's 2d subsystem can support
<libv> if you feed it a unity matrix
<clever> the planar ones just accept 2 or 3 seperate addr+stride pairs
<clever> so each plane can be in an entirely different region of memory
<libv> oh, cool
<clever> each plane also has its own scale factors
<libv> we have a second user :)
<clever> so the color planes dont have to be at the usual ratio to luma
<libv> right, feed it a unity matrix and bob is your uncle
<clever> when a certain scaling mode is enabled (upscaling i think it was), you must specify the addr for 4 scaling kernels
<clever> one pair is the h/v scaling for the luma, the other pair is the h/v scaling for the chroma
<clever> and this is the raw scaling kernel, which must live in the same memory as the display list
<clever> i have zero docs, and linux only ever uses 1 global kernel, so i dont know what exactly its doing
<libv> a kernel here being the block of commands needed to get operations done on the pixel valve
<clever> i think its less of a set of commands, you need to manually mirror the numbers
<libv> yeah, that's going to be painful, but at least you can pipe to memory
<clever> i think its just a raw set of coeffs to mult each pixel by
<libv> that's the pain of display engines, with 3d engines you can always exactly measure the end result
<clever> there are 3 ways to measure the outpuf of the 2d engine
<clever> DPI is just a raw 24bit digital output bus, with a pixel clock to latch off of
<libv> true
<clever> some HDMI capture cards can do a bit-perfect capture
<clever> the transposer can just write the result back to system ram
<libv> the last is the easiest here
<libv> and you do not depend on external hw or a redesigned board
<clever> yep
<clever> another fun problem to deal with is clock ratios
<clever> the ntsc/pal generator needs a pure 108mhz clock to function correctly
<clever> you cant use fractional division
<clever> so the input your dividing down from the PLL must be a multiple of 108
<libv> depends on how the pll is built
<libv> on radeon r500 it was a simple integer divider
<libv> if memory serves
<clever> for the rpi, it is fractional capable divisors nearly everywhere
<libv> and on r700 it allowed fractions there too
<clever> so you can make a clean 1ghz from 19.2mhz for example
<clever> but you cant divide 1ghz down to a clean 108mhz
<libv> so you could divide even finer
<libv> so it all depends on how the clock generator is built
<clever> its far simpler to divide 1.08ghz down to 108mhz
<libv> clever: then there's another bit of fun that comes in
<clever> the rpi has 5 PLL's in it
<libv> pll internal loop max frequency
<clever> each multiply 19.2mhz by a different freq
<libv> on some processes, it's not a measurable issue
<libv> on other processes, it becomes an issue
<libv> so then you need to go map out the limit
<clever> yeah
<clever> linux defines PLLC as having a range of 600mhz to 3ghz
<libv> which is how i have killed several a CRT
<libv> as they are better than an oscilloscope if you still have a vga output
<clever> ive pushed PLLC down to ~150mhz i think, but get it any lower and it just wont lock, and it bottoms out and just ignores the settings
<clever> also, above 1.75ghz, the counter for the divider cant function
<clever> so you have to enable a dedicated /2 stage
<libv> clever: ati ran into such issues with the display port enabled r600 asics
<clever> each PLL has between 1 and 4 taps on it
<clever> which divide it back down to something
<libv> so some display modes would wobble or not sync at all with their pll factor calculation routine
<clever> for example, PLLC_CORE0, PLLC_CORE1, PLLC_CORE2 and PLLC_PER (peripherals)
<libv> the solution: keep a table with asic, display mode (not frequency mind), and then one of the factors would be counted down instead of up during the calculation
<clever> each peripheral then has a mux, that can select from all of the PLL?_PER taps, and a few other special sources
<libv> so basically they were approaching the same wall from the other side
<libv> that's how fglrx did it, and the forked driver actually went with that as well
<libv> s/the other side/another angle/
<libv> i think that code is in drm kms as well
<clever> the secondary problem that comes from that 108mhz thing...
<libv> plls are fun, but at least for display, you can measure them
<clever> the VPU normally runs at 500mhz
<clever> but the only nearby multiples of 108 are 432 and 540
<libv> but depends on the same parent clock?
<clever> yeah
<clever> 540 would be an overclock, so the official firmware ignores that
<libv> sunxi is fun that way too
<clever> and there have been bugs, where 432mhz was too slow for the audio driver to function
<clever> so composite broke the pwm audio
<libv> hehe
<libv> have you managed to test 540MHz?
<clever> ive not gotten things to work reliably above 500mhz
<libv> ok
<clever> but ive not touched overvolting at all
<clever> all voltages are still at the reset default
<libv> which then runs into the territory of "how good is my chip, and for how long"
<clever> yep
<libv> for the fosdem project, we are running a 6.something MHz 320x240 panel
<libv> the sunxi display driver is pretty messed up with respect to clocks
<libv> so both the pixel clocks of the 2 pipelines would depend on the same parent
<clever> seen the gertvga board?
<libv> looks like a resistor based dac
<clever> exactly
<clever> lines 26-50 define the timing parameters for 2 vga modes
<clever> 89-100 then computes the divider needed to hit the desired vsync rate
<clever> and boom, i have VGA output on the rpi
<libv> anyway, once i had my capture stuff being displayed correctly, for some reason it ran a lot faster on the pipeline with lcd
<clever> ive confirmed it can do 120hz on a real crt
<clever> but my vga lcd wont accept 120hz
<libv> turned out, i had been running that poor lcd at like 25MHz for 6 months or so
<clever> lol
<libv> it didn't care
<libv> but still
<libv> that's a display capital office
<libv> offence
<libv> clever: so do you have to buy a whole range of different smd resistors, or is it all the same one?
<clever> an array of different ones
<libv> or does the board actually come fully populated, unlike what is seen in the pictures
<clever> it comes as a kit
<clever> for EMI reasons, they cant sell it assembled, because it entirely lacks shielding
<libv> ah, right, clicked through more pictures
<libv> we have switching power supplies
<clever> but if they just sell you a pile of resistors and connectors, they arent liable :P
<libv> no-one actually still cares about emi
<libv> oh, through hole
<libv> but smd is also possible it seems
<clever> yeah, the pcb supports both
<clever> the kit i bought was thru-hole
<clever> the schematics are on github
<libv> clever: for fosdem video hw we were talking about using the ti tfp401 to drive the LCD in the fosdem video box from a future sbc which has only hdmi out
<libv> during the 2018 video weekend getogether
<libv> then my brain got triggered to take it one step further
<clever> one of the things i need to work on more, is the display list management code
<libv> to use this dvi to parallel chip to fool the parallel camera interface
<libv> and then looked into it
<libv> and so that's what we did
<clever> ah, ive seen some people abusing the camera interface on the rpi in a few ways
<clever> there are ntsc and hdmi capture chips, that output CSI
<clever> one group made a parallel to csi converter with an fpga
<libv> turns out that some clever people had already tried that with sunxi hw
<libv> but they had not figured out the quirks of the sunxi capture engine, so they could not get it to sync up right
<clever> and the csi interface on the rpi itself, just takes the raw bayer bytes, and shoves them into a configured ringbuffer
<clever> and its up to the irq handler to move that buffer and not overwrite the frame
<libv> but with mipi csi you have tons of good options
<libv> like displayport
<clever> line 691 will note the start of this new display list
<libv> hdmi -> parallel is doable, mipi-csi is plenty
<clever> line 693-699 will generate each sprite in the list
<clever> 701 inserts an end of list marker
<libv> for displayport, you only get mipi-csi
<clever> 733 detects when your nearing the end of the list, and wraps back to the start
<clever> the bug, is that this has no support for 2 lists co-existing in the same space
<clever> so dual-monitor support breaks very quickly
<clever> and there is currently no way to know how big a list will be ahead of time
<clever> i can either partition it up into 2 smaller sub-sections, or pre-create the dlist in system ram, then find a free space and copy it over
<libv> until you figure out the kernels better?
<clever> kernels are unrelated
<clever> its more of an malloc type problem
<libv> ok
<libv> ok
<clever> i have 4kb of ram, and i want to add a 300 byte object to it
<clever> on every frame, i need to allocate 300 bytes, and free 300 bytes
<clever> but sometimes the size will change
<clever> and there are 2 seperate displays, with vsync's that arent synced
<libv> and you need at least 1 for any display pipeline
<libv> +n for extra sprites
<clever> one object thats actively being visible, one thats being created, one that just stopped being visible
<clever> maybe 1 more that is wanting for a vsync pageflip
<clever> so 4 copies per display potentially
<libv> oh, right
<clever> having a pending copy already in the dlist memory, means a pageflip on irq is just 1 mmio write
<clever> changing the start pointer for the hw
<libv> so your 585? quickly boils down to ideally 120
<libv> on the other hand, this is display stuff
<libv> you have 16.666ms
<clever> the length of each object will be (sprites * 7) + 1 * 32bit
<libv> you can spend some time keeping lists in memory and copying/dmaing it in
<clever> yeah
<libv> so you need at most 2 per pipeline in there
<libv> 2 lists
<libv> still tough, but not as insane as 3 or 4
<libv> and how much time does a full page copy take to that mmio region?
<clever> ive not measured it
<libv> would you be able to do it in the vblank after sync?
<libv> vsync
<clever> i was doing the entire dlist write, and the pageflip, in vsync
<clever> and it was finishing maybe 15 scanlines late
<clever> so there was a stable tear at the top of the screen
<clever> but my new design pre-writes the dlist outside of vsync, and queues up a pageflip on the next vsync
<libv> start creating the list on vblank, you said you get an interrupt on that
<clever> so it only has to write 1 register upon the irq, and then its free for the entire frame
<libv> even better
<clever> after it does that flip, it also pokes the scheduler, to unblock every thread waiting for vsync
<clever> the scheduler then runs those threads, which mutate state, and queue up the next dlist for the next vsync
<libv> right
<clever> but now you have up to 1 frame of latency in the whole pipeline
<libv> indeed
<clever> because your based on state from the previous vsync, not the current vsync
<libv> but you know when the next vsync should come
<clever> yep
<libv> if that changes, your dlist has become invalid anyway
<clever> this code was saving the timestamp of every video related irq
<clever> so i could measure things
<libv> right
<libv> is vblank start good enough?
<clever> if you only want pageflip, yeah
<libv> for creating the dlist?
<libv> ok
<clever> but with hsync, you can start doing fancy effects
<clever> if you slide something left/right on each hsync, you can create certain types of shearing effects
<libv> vga sends its regards
<libv> iirc, palette was flipped on hsync to allow more colours
<juri_> my C64 sends it's regards. :P
<libv> i'm sure 80s console and such did the same
<libv> yes
<clever> systems without a v-scaler also move a sprite up/down on hsync
<libv> i am too young for c64 :)
<clever> to stretch/squish it
<libv> i never get to say those words anymore
<libv> except for perhaps things like hip replacements
<juri_> libv: I did my tour of duty on video hardware from the VIC to the ATI rage ][. :)
gsz has left #linux-sunxi [#linux-sunxi]
<libv> all i ever had as a kid was an i386 :(
<juri_> now i just sling plastic instead of pixels. :)
<libv> juri_: rage 2 was the generation before the 128, right? or was there something in between
<libv> rumour has it that the 128 was heavily influenced by tseng, as ati had bought up tseng in like 99 or 2000
<juri_> libv: it was. I implemented SVGAlib support for it.
<libv> and iirc, the rage-128 was significantly different from the rages before, and the leap to radeon was not too huge from there
<libv> heh, we are definitely in old far territory now :)
<libv> fart even
<juri_> yeah, i was pulled into the dotcombomb after that. :)
<libv> i actually debated in 2018 whether i would add flashrom support for pci based ati devices
<juri_> a lot of them were socketed. could have come in handy 20 years back. :)
<libv> but decided to not go there, especially as i did not want to mess with agp either (i have working x86-64 hw with agp still), and to stick to pci-e devices only
<libv> iirc, ethernet and sata controllers were used for that then
<libv> there is actually someone talking about directfb at fosdem
<libv> working on reviving it
<libv> hah, typical
<clever> libv: as for the problems with the open kms drivers on the open firmware....
<juri_> I'm working on something that'll make a good talk, once covid is gone. I'm writing a 3d printing slicer in haskell.
<clever> any attempt to write to any register in the 2d hw, results in an async external abort exception on the arm core
<libv> recent github svgalib repo: https://github.com/sauparna/svgalib
<libv> no history of course.
<juri_> Wow. that brings you back.
<libv> why do people keep doing that?
<clever> libv: i had talked to engineers repeatedly, and every single time they claim its a power domain problem, but my demo videos above claim otherwise
<libv> clever: indeed
<libv> clever: a mapping issue perhaps
<clever> but while i was helping some amiga emulator guys with scaling, i stumbled upon a hint
<libv> find out how that range is mapped
<clever> SCALER_DISPECTRL_SECURE_MODE
<libv> ah
<clever> that sounds like the answer
<libv> fun.
<clever> but i havent gotten around to testing it
<libv> clever: but the kms driver has access to it...
<libv> so when kms is loaded, you cannot access it? or...
<clever> the firmware likely sets that bit before linux boots
<clever> to unlock access from the arm
<libv> yeah, broadcom is all about openness
<libv> remember when the rpi was hailed as to project to get kids coding on hw again?
<clever> i suspect its a legacy security feature
<clever> they are still claiming that is their goal
<clever> and i suspect thats also partially behind their choices on what gets opened
<libv> one of the biggest loads of bs ever
<clever> how many kids are going to do ddr4 init?
<libv> some.
<libv> clever: when i did lima, my plan was to focus on everything but the shader isa, as i knew that enough people would be way keen to work on that so some useful people would turn up
<clever> oh, and do you know about how the VPU also has vector extensions?
<libv> and someone did, and when i wanted to send him some amlogic based tablet, he said "i will have to ask my parents first"
<libv> "some" is a success
<libv> and for rpi, you basically have none.
<libv> it's pretty much been boiled down to a more powerful arduino which can double up as a settopbox
<clever> the vector extensions are pretty powerful
<clever> for (int i=0; i<16; i++) { int temp = a[i] * b[i]; if (store) c[i] = temp; if (accumulate) accumulator[i] += temp; }
<clever> basically, i can run this entire thing in 2 clock cycles, at 500mhz
<juri_> I took over a project, and instead of fixing it, i wrote documentation, assuming others would read it, and use it to fix the software. big mistake.
<clever> ive been doing a mix of docs, example code, and making it functional
pcBob has quit [Remote host closed the connection]