<mangix>
__SANE_USERSPACE_TYPES__ must be defined before including
<rsalvaterra>
robimarko: I don't know… Ingo is one of the most respected/trusted (by Linus) developers, and people would kill for a 10 % improvement in build times. This series improves them from 50 to 80 %.
<rsalvaterra>
[PATCH 0000/2297], however…
<robimarko>
I am not doubting his competence and the great achievement
<robimarko>
But getting 2300 patches just for 4 arch-s
<robimarko>
Try reaching a consensus for how to move on with that
<rsalvaterra>
Pretty sure nobody is going to review each and every patch… :)
<rsalvaterra>
… but since bugs would probably manifest themselves as build failures, they should be quickly spotted.
<robimarko>
Yeah, its basically impossible to review them
<mangix>
robimarko: the way forward is for linus to assume direct control and merge.
<robimarko>
mangix: Yeah, that is like the only way
<robimarko>
Otherwise you are gonna be hit with hundreds of opinions
<Borromini>
stintel: ping
<stintel>
pang
<rsalvaterra>
mangix: Oh, definitely. This is one of those last pull requests to be merged just before the tagging of a new release.
<Borromini>
stintel: hey. i just moved my erlite to 5.10. is there a way to reproduce the issues you were seeing with 5.10 on yours?
<stintel>
Borromini: just leave it on for a while
<Borromini>
stintel: ok. just the LAN connected is OK? or do i need to configure a wan/lan setup?
<stintel>
one of my SNIC10E shows increasing RAM usage, like few MB per day, without doing any routing
<stintel>
I was about to ask here, if anyone knows a way to find kernel memory leaks when kmemleak is not reliable
<Grommish>
stintel: That the octeon NIC?
<stintel>
yes
<Grommish>
5.10?
<stintel>
yes
<nitroshift>
brb, reboot
<Grommish>
Same issue I've been trying to trace
<stintel>
I know :)
nitroshift has quit [Remote host closed the connection]
<Grommish>
Bisect and Good Luck is the answer I got
nitroshift has joined #openwrt-devel
<stintel>
for that we need a better reproducer than few MB per day
<Grommish>
Route something
<Grommish>
I can crash my box in about an hour
<stintel>
only with routing ?
srslypascal_ has joined #openwrt-devel
srslypascal is now known as Guest10125
srslypascal_ is now known as srslypascal
<Grommish>
I can watch it climb on simpe free refresh hehe
<Grommish>
When I turned the octeon_ethernet.ko off (as in, I didn't include it), no issue
<nitroshift>
dhewg, mangix same error in dmesg
<Grommish>
I removed ethernet and usb
<Grommish>
No leak under 5.10 or 5.15 with them off
<Grommish>
I was crashing a dnsmasq issue at the time
<mangix>
what error?
srslypascal has quit [Remote host closed the connection]
srslypascal has joined #openwrt-devel
<Grommish>
You were CC'd on the email to the maintainer and the response, so.. I never got any farther than that because I turned back to rust in frustration
Guest10125 has quit [Ping timeout: 480 seconds]
<stintel>
Grommish: yeah that response was quite useless
<Grommish>
but yeah, I yanked usb and ethernet out from staging drivers and no leak..
<Grommish>
Next step was to modular them so I could load and unload and see which was doing it
<stintel>
we were pretty sure already the leak is network related
<stintel>
PaulFertser: would that release leaked memory you think?
<PaulFertser>
Grommish: and you can go to a specific driver subdirectory and you see a symlink there if any devices are bound to the driver. Then you can do "echo ... > unbound" and it should be about the same as unloading a module.
<PaulFertser>
stintel: depends on what the driver does on unbind.
<mangix>
rsalvaterra: don't really know what to say
<PaulFertser>
stintel: my point was that it's more about (un)binding than being modular.
<rsalvaterra>
mangix: Oh.
<mangix>
anyway, lgtm from former maintainer
<mangix>
who got irritated that i merged some patch without him knowing
<nitroshift>
mangix, just checked my build tree, it did pull your changes, error still exists
<stintel>
mangix: yes
<mangix>
nitroshift: doesn't sounds right. I remember I tested this along with dango. The actual error is because of --enable-fastmath: DEPENDS:=@LINUX_5_10
<neggles>
Grommish / stintel: didn't I narrow down the problem with the octeon-ethernet driver to a spot where it may fail to free an skbuff
<Grommish>
neggles: Dunno, the one I linked is one that was in 5.4 but not in 5.10/5.10, but I've not tried it yet
<neggles>
yeah that gregkh commit is relevant iirc
<neggles>
gregkh tree commit*
<Borromini>
neggles: you got octeon stuff as well?
<neggles>
oh boy do i ever have octeon stuff.
<Borromini>
:P
<neggles>
I can't escape the wretched things
<nitroshift>
mangix, 5.10.89 even
<nitroshift>
just git pulled
<neggles>
it started with a USG-XG-8, then i found stintel and hurricos' shenanigans with the SNIC10E and figured hey, what's 50 bucks for a couple of NICs? how hard can it be?
<neggles>
*how hard can it be*
<Borromini>
:D :D
* neggles
sobs
<Borromini>
well i got my an ER4... (new) and an ER Lite (second hand)
<neggles>
take the ER-Lite-3, and throw it the fsck away
<Borromini>
i keep it for testing purposes
<Borromini>
like now
* neggles
has... feelings... about the ERLite-3 and USG-3
<Borromini>
hehe
<neggles>
namely I feel that they are a waste of sand
<neggles>
the most frustrating part is, there is definitely an octeon SDK with kernel 5.4 in it in existence, but marvell won't cough up
<neggles>
you could theoretically do FCoIP acceleration with them, sure, but... there's a reason this exact variant never made it to market
<neggles>
they were built for AWS, using a quasiPHY with drivers that barely exist
<neggles>
the SNIC10E is probably better known as "The First-Generation AWS Nitro Card"
<Borromini>
well Marvell and FOSS... we all know how that went
<neggles>
they've been doing quite well with all their newer stuff, the Armadas and Presteras and the ARM octeons
<nitroshift>
marvell != foss
<nitroshift>
hint mwlwifi
<neggles>
the problem is that the MIPS Octeon SDK is an absolute fscking trainwreck
<Grommish>
Because it came from Cavium
<neggles>
because it came from a very, very long time ago
<neggles>
before Octeons were Octeons
<neggles>
back when they were NitroX cryptographic accelerators that just happened to have a MIPS core in there for management and coordination
<neggles>
cavium managed to scope creep themselves into an entire SoC
<neggles>
the hardware acceleration units? really good. APIs/ABIs are well documented, interfaces are a bit weird but the quirks are documented, and they're *fast*. but the MIPS processor and all its supporting bits and pieces? not so much
srslypascal has quit [Quit: Leaving]
<neggles>
even before marvell bought them, cavium were in the process of throwing all the MIPS out the window and replacing it with their ThunderX/ThunderX2 ARM cores
<Grommish>
MIPS64 is a pretty dead platform though, isn't it?
<neggles>
tell that to china
<Borromini>
yes seems the Cavium Octeon stuff is badly upstreamed if at all :(
<neggles>
Borromini: the problem is the entire architecture of the mips-octeon SDK; because these chips started out as intelligent accelerators, all the peripheral code/libraries/etc were built to run on baremetal in the form of the Cavium Simple Executive
<neggles>
huge chunks of CSE code ended up in drivers/staging and arch/mips/cavium-octeon just to make basic things work
<Borromini>
ok
<neggles>
disentangling all of that is a nightmare, the best way to do it would be to throw out all the code that relies on bits of CSE and rewrite/adapt it to use existing in-kernel interfaces, essentially rewriting all the drivers from scratch
<neggles>
in other words, the problem is that they didn't start clean when writing the linux driver support, they started with "running a linux kernel as one of multiple applications sharing the octeon SoC"
<neggles>
but yeah... MIPS octeon is effectively dead. it won't actually be dead for a while, there are still plenty of in-production and in-support devices running Octeon IIs and Octeon IIIs that will need to keep getting patches and updates
srslypascal has joined #openwrt-devel
<Borromini>
should have seen the writing on the wall with all the drivers in staging/
<neggles>
and supposedly marvell have hired another organization to manage that
<neggles>
hence why the kernel 5.4 SDK exists
<neggles>
even if they won't give it to me...
<neggles>
but I wouldn't be surprised if 5.4 is the last major kernel upgrade octeon mips ever sees
<neggles>
i'd settle for a functional NAND driver on *any* vaguely recent kernel, mind you :P
<Borromini>
:P
<stintel>
routing ~1.45Gbps over the SNIC10e does not seem to increase the memory usage
<neggles>
IIRC, Grommish ...didn't?... have CONFIG_NETFILTER turned on
<nbd>
rmilecki: will take a look when i find the time
<nbd>
hitech95: pong
fda- has joined #openwrt-devel
<rmilecki>
nbd: thanks, I wouldn't like to waste time on solution that will get rejected later :)
<nbd>
i have an alternative idea that is simpler
<nbd>
when creating rootfs, generate a file that contains a hash of the fs contents
<nbd>
embed that into the rootfs
<nbd>
when checking overlay fs state, check for the hash in a file on rootfs_data
<nbd>
if it's not present or does not match, assume that the fs needs to be wiped
<nbd>
no format changes needed
<hitech95>
nbd: can I PM you? I have a couple of questions on how to properly expand netifd.
<nbd>
hitech95: sure
fda has quit [Ping timeout: 480 seconds]
<rmilecki>
nbd: oops, sorry, I didn't see your reply, reading now
<rmilecki>
nbd: that won't work for flashing the same image twice using bootloader
<rmilecki>
and I think that users expect newly-flashed firmware to always come in a "clean" (factory) state
<nbd>
what kind of boot loader upgrade mechanism are you using?
<nbd>
command line, or something web/tftp based without serial console access?
jlsalvador has quit [Quit: jlsalvador]
<rmilecki>
U-Boot
jlsalvador has joined #openwrt-devel
<rmilecki>
nbd: on Broadcom's U-Boot it's the easiest to send firmware image over htttp
<rmilecki>
nbd: http://192.168.1.1/ with a simple form: 1 file input and 1 submit input
<nbd>
so broadcom u-boot uses ubi?
<rmilecki>
it does
<nbd>
is it limited to overwriting kernel + rootfs, or can it overwrite any partitions included in an image
<rmilecki>
firmware format allows two entries only: https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/bcm4908/image/pkgtb-bcm4908.its
<rmilecki>
nbd: "nand_squashfs" gets written to the UBI volume
<rmilecki>
(bootfs gets written to the UBI volume)
<rmilecki>
other entries are ignored, you can't ask U-Boot to write another UBI volume (that would be too nice for us)
<rmilecki>
the same applies to flashing firmware using vendor's UI I believe
<rmilecki>
(in case you ask: bootfs is a container for ATF, kernel & DTB - if you care see https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/bcm4908/image/bootfs-generic.its )
<Borromini>
Grommish: i probably misunderstood, seems 673b41e04a035d760bc0aff83fa9ee24fd9c2779 is already in 5.10?
<nbd>
rmilecki: i guess another approach would be to reserve an extra byte at the end of rootfs
<nbd>
and use that as a marker for initialized fs
<rmilecki>
nbd: hm, should be OK... we don't even need any unique number of "rootfs_data" UBI volume
<nbd>
right
<rmilecki>
nbd: ok, I can't think of any issues with that, except maybe a bit hacky implementation
<rmilecki>
nbd: so basic question: is it worth it?
<rmilecki>
nbd: is it bettern than finally switching to initramfs & putting more compex booting logic there?
<rmilecki>
nbd: i see some implementation challenges: 1. where to put it 2. how to find it 3. how to modify it
<rmilecki>
append to kernel? append to squashfs? what about non-squashfs based targets if there are any
<nbd>
append to squashfs
<rmilecki>
how to find end of kernel or end of squashfs? add more hacks to mtdsplit?
<nbd>
non-squashfs targets are unaffected
<nbd>
because they don't use overlayfs
<rmilecki>
ok
<nbd>
no need for mtdsplit hacks
<nbd>
this can be done in fstools
<rmilecki>
ah, user space
<nbd>
fstools
<nbd>
it already keeps track of the state of the overlayfs
<nbd>
so here's how i think it could work:
<rmilecki>
we may need to change partitioning to make sure "rootfs" partition is writable though
<nbd>
append a specific marker after the squashfs image (only if the target needs it)
<nbd>
fstools can check squashfs size in the partition based on the header
<nbd>
if it detects a magic marker after the image, it nukes rootfs_data and erases the marker
<nbd>
doesn't rootfs need to be writable in order to be able to write firmware?
<rmilecki>
nbd: on old Broadcom devices we have writable "firmware" MTD partition that contains *subpartitions*: linux, rootfs, rootfs_data
<rmilecki>
or maybe rootfs_data gets created later, which means rootfs is writable...
<rmilecki>
i have to verify that
<nbd>
well, on older devices you'd never put in the marker
<rmilecki>
ok
<nbd>
no need for writable rootfs in that case
onemarcfifty has quit [Remote host closed the connection]
<rmilecki>
what about UBI volume located "rootfs"? do we have sth like ubi_write is fstools? let me check that
<rmilecki>
libubi-tiny.c internally uses ubi_write but it only exposes ubiupdatevol()
rua has quit [Ping timeout: 480 seconds]
<rmilecki>
nbd: ok, so all of that may work but will require a bit of hacking
<rmilecki>
nbd: that makes me wondering if some target specific solution / hack wouldn't be acceptable
<rmilecki>
nbd: what if I check boot counter and write it to rootfs_data (UBI ubifs volume)
<nbd>
what kind of boot counter?
<rmilecki>
s/boot counter/flash counter/
<nbd>
does the boot loader increase a counter every time it flashes an image?
<rmilecki>
nbd: bcm4908 increments value in U-Boot env variable on every firmware flash
<nbd>
that would work too and would be even simpler
<nbd>
i'd recommend making fstools call a script to check overlay fs state
<nbd>
if it doesn't have something like that already
<rmilecki>
nbd: what about adding a target custom /etc/init.d/01-check-fs-state ?