<digitalcircuit>
If there's a simpler approach (e.g. can I simply compile a new kernel image like on LineageOS and squash it into the 21.02.0rc2 official build?), or if I'm overlooking anything, feel free to call me out!.
<digitalcircuit>
I'm looking to try to help find the relevant difference between OpenWRT 21.02.0rc2 and the current snapshots given 21.02.0rc2 is broken for a specific use case on the NBG6817 ( https://bugs.openwrt.org/index.php?do=details&task_id=3099#comment9712 - that's me), yet the snapshot works.
<digitalcircuit>
(I also have a hardware serial console via USB TTL adapter, so if I soft-brick the device I can access the bootloader. Hopefully I won't do that :)
<slh>
serial console access can help you a lot for debugging, catching potential error messages/ kernel panics, but it's fortunately (usually) not needed for recovering this device, push-button tftp recovery works reliably there
<slh>
in general, larger work on the kernel is not easy - the wiki helps a bit on the syntactical mechanics, but not for dealing with the semantics, actual rebasing, etc.
<digitalcircuit>
slh: Glad to hear I likely won't need to break out the serial console to recover, though having a failsafe is good anyways. And, ah, understood! I figured if this was gonna be easy, someone more skilled than I would've backported the CPU governor patchset by now :)
<digitalcircuit>
It's very likely I won't succeed, but given my frustration with this issue, I feel I should try to help out beyond simply filling the bug tracker issue 3099 with rambling commentary and bizarre serial console logs. The other notion that came to mind was git bisecting between the 21.02 branch off from the main branch and snapshot current, though that sounds even messier thanks to the multiple repos.
<plntyk2>
digitalcircuit, the cpu governor patch might really be a good starting point - also if there are frequency / voltage tables like on many ARM platforms those might be worth to look at
<plntyk2>
for those errors you probably should use a custom compile (no opkg install) - and you might want to modify the kernel cmdline (for debug / verbose outputs) and maybe enable some debug config symbols to check
<plntyk2>
digitalcircuit, also maybe run a memtester or something that does checksumming / high cpu & mem load tests (stress-ng package ?)
<digitalcircuit>
On stress-ng - thanks, too! I had tried iperf3 + openssl benchmark in loops, which didn't seem to trigger the issue, but I hadn't considered memory pressure (SFTP backup uploads also heavily exercise the RAM cache, and it's a bursty workload - local encrypt+compress, SFTP upload to router). Putting the time into figuring out stress-ng makes sense.
<digitalcircuit>
plntyk2: Thank you! I'm glad that trying backporting the CPU frequency driver changes makes sense, and checking for frequency/voltage tables also is a good idea if they exist.
<digitalcircuit>
slh: Thanks as well! I tried the /etc/init.d/cpufreq script on 21.02.0rc1 (verified applied with "cat /sys/devices/[...]"), and I also upgraded to 21.02.0rc2 with that script built in, and neither fixed the issue, unfortunately. It might have bought me a bit more time before crashes, but it's hard to tell if it made any difference given the non-deterministic nature of this.
gch981213 has joined #openwrt-devel
rmilecki has joined #openwrt-devel
<digitalcircuit>
I've also noticed load averages during SFTP bursty uploads go higher (above 1.0) at times with the snapshot, whereas 21.02.0rc2 (and rc1) usually stayed around/under 0.7, hence another vague suspicion for something with the CPU frequency driver. That's uncertain, though. I think my next goal (after making a full, safe backup of my desktop) is to play with stress-ng and try to recreate the crash without risking my backup data :)
nitroshift has joined #openwrt-devel
<digitalcircuit>
(Crashes happen within 1-7 hours with the Deja Dup SFTP-to-router backup session; a full backup takes 7-ish hours)
decke has joined #openwrt-devel
<digitalcircuit>
I missed the remarks on custom compile (no opkg), modified kernel cmdline, and debug config symbols too - those are good points. I'll focus on learning how to do all that once I recreate the issue with stress-ng (unless that takes too long).
silver has quit [Quit: One for all, all for One (2 Corinthians 5)]
<jow>
Package ip-tiny is missing dependencies for the following libraries:
<jow>
libselinux.so.1
<jow>
that reads like ... a contradiction
<jow>
since when do we require selinux in ip-tiny? and since when is it enabled by default? I don't recall enabling anything selinux
<jow>
just trying to rebuild my tree after 7 weeks or so
<jow>
CMake Error: CMake was unable to find a build program corresponding to "Ninja". CMAKE_MAKE_PROGRAM is not set. You probably need to select a different build tool.
<jow>
o_O
<jow>
where do I get "ninja" ?
<jow>
ah, it needs to be installed fro mfeeds
<jow>
weird stuff
<nbd>
which package needs ninja?
<jow>
cgi-io, apparently
<jow>
my feed mirror was too old it seems, after pulling latest I had to manually feeds install a a bunch of things due to changed deps
<jow>
among them ninja
<jow>
its likely no issue for someone starting from scratch
<blocktrron>
russell--: this is vastly different hardware
<nbd>
jow: can you try changing cmake.mk to set CMAKE_GENERATOR to "Unix Makefiles"?
<nbd>
that should hopefully get rid of the ninja stuff
<blocktrron>
The speed in dmesg is the one reported from the PHY device, in your case the switch IC
<jow>
nbd: it won't work. The package has been patched in the feed to include an additional cmake-ninja.mk after cmake.mk which does various overrides
<jow>
it also introduces PKG_BUILD_DEPENDS += ninja/host and HOST_BUILD_DEPENDS += ninja/host so I suppose a selectinve feeds install cgi-io (or wahtever else uses ninja) will pull in the ninja package as well
<nbd>
o_O
<mangix>
the ninja stuff is courtesy of me
<jow>
hmm, still failing at Package ip-tiny is missing dependencies for the following libraries:
<russell-->
yeah, but the regression appeared with 5.4
<russell-->
was working on 4.19
Pepes has quit [Quit: WeeChat 2.3]
<blocktrron>
This would be odd
* russell--
not denying it's odd
<blocktrron>
for 5.10, the PHY driver would need refactoring, as the driver overwrites the supported link modes in it's probe function, while the phy subsys overwrites it again directly after.
<blocktrron>
This change broke it, which was introduced in 5.1
<blocktrron>
I'll prepare a patch for that, can you test it on your device?
<russell-->
cool, thanks!
<blocktrron>
Will probably be around this evening (EU)
<russell-->
okay, np
<russell-->
i only noticed this after 1+ year, so it's not super urgent ;-)
pmelange1 has joined #openwrt-devel
<blocktrron>
100 Mbit/s > 0 Mbit/s
<stintel>
:P
<stintel>
what degree do you have that you come up with these complex facts :P
<blocktrron>
shall i paint you a conspiracy based on my compley logic?
<stintel>
oooh
<stintel>
fits the time we live in
<stintel>
I like it ;)
<blocktrron>
It's all a plot from the wireless industry to break our hardwaired connections so we are forced to buy their 5G products
<russell-->
i've been vaccinated, so i'm 5G ready
<russell-->
one shot in each arm for spatial diversity
<blocktrron>
I was enjoying the tinfoil head proclaiming we all need fiber in my village so we can skip 5g. Can't disagree with him completely.
<stintel>
russell--: haha me too. one shot in my left arm and one in my right arm, so I have now 2 5G antennas
<mangix>
hmm github is acting up
pmelange1 has quit [Quit: Leaving.]
<stintel>
russell--: I have now 6 out of 5 bars signal strength!
<russell-->
rock on!
root4 is now known as Pepes
<russell-->
fwiw, someone was testifying for an anti-wifi bill in my state's legislature a couple years ago and was citing research that i actually worked on (i did the data management and data reduction, another group did the association), involving 60Hz magnetic fields (not GHz, Hz) and that purported to show a link between exposure and pregnancy miscarriage (which was probably erroneous anyway) *facepalm*
<russell-->
... they were using the brainmelting baby poster too.
fenrig has quit [Quit: Page closed]
<rmilecki>
Broadcom's Northstar failed me on USB controller / driver stability, I need to buy some other platform device for daily office use
<rmilecki>
very low requirements: any speed WiFi, USB port for modem, CPU fast enough for wireguard (so basically any)
<rmilecki>
i guess ramips + mt7621 is the best option those days?
<rmilecki>
any recommendation for a cheap device I can actually buy? i checked for few models I found by DTS files and they are not available in shopts anymore or second hand
<rmilecki>
any other recommendations still welcome, maybe I can find someth else cheaper / more easily accesible in Poland
<rmilecki>
blocktrron: 802.11n would be sufficient for my needs, i don't care ;)
<blocktrron>
I mean in regard to "maybe they've updated the board"
<blocktrron>
anyways, you can order from any countries amazon to your own country, you only have to pay shipping
<karlp>
plus whatever amazon declares is "estimated customs and handling charges"....
<karlp>
which is not always reasonable or sane...
<karlp>
but sure, yes, it's "possible"
<karlp>
(oh, and lots of sellers won't actually sell to other coutnries anyway...)
<blocktrron>
karlp: this shouldn't matter within EU
<blocktrron>
At least I've ordered multiple times from ES, FR and in the past UK to germany and it was always only the ~5 EUR shipping that had to be paid.
Tapper has joined #openwrt-devel
<johnf>
if anyone can offer advice as to why my fresh port is getting invalid tar magic when I attempt to sysupgrade I'd really appreciate it
<jow>
rmilecki: yes it is possible that for a brief period of time the release image contain newer builtin packages that what is available in the online package repos
<jow>
*contain builtin packages that are newer than the ones avaialble in the package repos
<jow>
until the phase2 builders catched up
<jow>
stuff like this happens because people start using releases pre-announced
pmelange has quit [Read error: Connection reset by peer]
pmelange has joined #openwrt-devel
<rmilecki>
jow: luci-mod-network was built *today*
<jow>
rmilecki: yes, packages are continuously rebuilt
<rmilecki>
jow: ok, i can't be sure what has happened then
<rmilecki>
jow: user on the forum has claimed that "I built the image between 1am and 2am on June 1st (CEST)."
<jow>
nbd: it seems the abi version propagation to package dependencies does not work anymore
<zulukilobravo>
if phase2 is derived from phase1 how can packages exist there at all?
<rmilecki>
zulukilobravo: what packages?
<rmilecki>
zulukilobravo: i guess phase1 builts only required (default) packages
<rmilecki>
zulukilobravo: not all of them
<zulukilobravo>
the ones needing to be 'caught up'
<jow>
nbd: I have a package here which depends on a lib that specifies an ABI version, yet the ipk dependencies just state "Depends: libfoo" instead of the expected "Depends: libfoo20210526"
jlsalvador has quit [Ping timeout: 480 seconds]
<rmilecki>
can someone remind me why master snapshots don't include LuCI?
<zulukilobravo>
the stale packages preceed the phase1 package dates
<jow>
nbd: was the behaviour changed here?
<jow>
zulukilobravo: phase2 uses SDKs from phase1 branch builders, they don't depend on tagged builds
<zulukilobravo>
pong rmilecki im new to this irc jazz
<rmilecki>
zulukilobravo: you may also post info that packages are being rebult contantly
<rmilecki>
zulukilobravo: so imagebuilder is actually NOT expected to provide the same images as prebuilt ones
jlsalvador has quit [Ping timeout: 480 seconds]
jlsalvador2 is now known as jlsalvador
<zulukilobravo>
on it
<rmilecki>
thank you
<jow>
nbd: you also reworked the GetABISuffix macro and changed its logic
<jow>
the current implementation fails to add a leading dash if the package basename ends with a digit
<jow>
libfoo2$(ABI_VERSION) is wrong, it must be libfoo2-$(ABI_VERSION)
olmari has quit [Quit: authenticating]
olmari has joined #openwrt-devel
<JohnA_>
jow, Sorry for the delay. pastebin id zzz2002
<jow>
JohnA_: can you give me the full url?
<JohnA_>
jow, that was a little criptic, sorry! i posted yesterday about problem with pppOe ON A WRT3200ACM. You asked for f4 files on pastebin. https://pastebin.com/u/zzz2002
<JohnA_>
jow, f4 = 4
<blogic>
network.@device[1].name='wan'
<blogic>
that is missing the ports field
<jow>
no its fine
<blogic>
is it ?
<jow>
wan is no bridge
<blogic>
but the name=wan does not tell it the ifname/port ?!
<jow>
I believe it has something to do with the fact that both the logical interface and the netdev are called wan
<jow>
this trips up the pppoe proto handler somewhere, this has been reported here a few weeks ago already
<jow>
unless it has a type, then name is the name of the device to be created and ifname is the basename
<jow>
or if it s a bridge, then its list ports
<jow>
or something like that
<jow>
I lost track of it
<blogic>
oh of course totally intuitive
<rmilecki>
blogic: that device has "name" set to "wan", so it refers to Linux's "wan" interface
<rmilecki>
blogic: AKA switch port called "wan"
<nbd>
jow: can you show me the package for which dependency propagation doesn't work?
<rmilecki>
blogic: it has no "ports" list as it isn't a bridge
<rmilecki>
blogic: you usually don't need bridge for a WAN
<blogic>
rmilecki: ifname or netdev would be more intuitive
<blogic>
anyhow, dont really care
<blogic>
;)
<rmilecki>
blogic: "ifname" was already used in the past for god knows what
<rmilecki>
blogic: i don't care enough to rename it to "netdev"
<blogic>
yeah, all good I was making totally useless and unqualified comments
<rmilecki>
;)
<nbd>
jow: btw. why did we add the - only for packages ending with a digit? wouldn't it be better to add it unconditionally?
<jow>
only for packages ending with a digit, to follow debian naming policies
<jow>
libfoo + abi version 123 -> libfoo123
<jow>
libfoobar2000 + abi version 123 -> libfoobar2000-123
Tapper has joined #openwrt-devel
<nbd>
i'll add back the dash later
Dracos-Carazza has quit [Ping timeout: 480 seconds]
<jow>
should be just a matter of adding the filter %0 %1 .. %9 thing
<jow>
*readding
<nbd>
right
<nbd>
just need to check if it should be added in just one place or in multiple places
<nbd>
i don't remember right now
<jow>
it needs to be added to multiple places since some former GetABISuffix calls now use ABIV_* directly
<JohnA_>
jow, do you need me to stay online. I will have to switch back to 19.07, or the family will lynch me.
<jow>
JohnA_: no its fine, thank you
<JohnA_>
i will sitch and cme bak.
<jow>
JohnA_: this gives me enough info to reproduce it (if I can finally find my mir3g)
<jow>
nbd: intra-package abi-version propagation appears to work now, maybe my tree was unclean
<JohnA_>
if you need anthing else, post here or I could give you my "I never want to hear from you again email" and I will selectively forward you to me.
<jow>
I was introducing an ABI_VERSION:=... in a preexisting package, maybe metadata was not preoperly refreshed
<jow>
JohnA_: I'll post here. My plan is to setup an pppoe-server locally here later and hook a mir3g to it (which uses the same netdev naming convetion) to see if it is a generic issue
<jow>
JohnA_: if that works we at least now that your issue is specific to your device. The switch/dsa driver might be something to look into then
<jow>
s/now/know/
<jow>
"daemon.warn pppd[9409]: Timeout waiting for PADO packets"
<jow>
that basically means "nothing received on wan"
<jow>
or at least nothing PPPoE related
Dracos-Carazza has joined #openwrt-devel
JohnA_ has quit [Ping timeout: 480 seconds]
rejoicetreat has quit [Ping timeout: 480 seconds]
JohnA has joined #openwrt-devel
<JohnA>
jow, could this have anything to do with the device having 2 cpu/eth - setup as vlans 1 & 2? under 19.7 the lan is vlan eth0.1, while the wan is on eth1.2.
<JohnA>
jow, if I am talking drivel don't hesitate to say so!
<jow>
JohnA: maybe, maybe not. Would be interesting to know if the wan port works at all, e.g. by setting wan to DHCP and connect it to another router. Or by giving it a static IP and trying to ping it from a laptop connected to it or so
Tapper has quit [Remote host closed the connection]
Tapper has joined #openwrt-devel
<aparcar[m]>
goliath: side quest, do you see a possibility to give the squashfs labels?
<aparcar[m]>
I'm wondering if it's possible to add a post generation label which can be later read by the running system
<goliath>
something like a free text field stored in the image?
<aparcar[m]>
I think some other filesystems support labels of some kind
<aparcar[m]>
no real experience with that, I'm just thinking of how to store the openwrt "build profile" within the image, but since the same squashfs file is used for multiple devices, it has to happen post generation
<goliath>
In theory, there are many places in a SquashFS image where additional information could be stored
<goliath>
most easily e.g. directly after super block in the data area as long as no inode references it
<aparcar[m]>
is there tooling to write this during generation and read this on a running device?
<goliath>
The format itself has no direct way to store something like that, in a clean and portable way, but a patch against squahshfs-tools[-ng] could be added and a small custom tool could be written that locates the data again.
<goliath>
If the SquashFS format itself should ever be extended, a "comment" field would a good idea (in addition to e.g. ACL support).
<aparcar[m]>
ok I hoped for something existing...
<aparcar[m]>
I'll check on ext4, in case it supports labels and you have the time maybe we can look into it
<aparcar[m]>
goliath: works!
<aparcar[m]>
I'll upate the PR, thanks for working on this!
angelsl has joined #openwrt-devel
<aparcar[m]>
zorun: ping, I'm still confused with the JSON patch you mentioned some time ago
<zorun>
aparcar[m]: pong, sorry, I'm swamped with work this week, not much I can do on the openwrt side
<zorun>
aparcar[m]: I'm also still confused because I don't understand which difference between master and 21.02 explains the difference
<zorun>
(that is, that the json generation works on master but generates incorrect data in 21.02)
<aparcar[m]>
what values are wrong?
<aparcar[m]>
to me it looks like IMG_PREFIX is partly wrong but that variable is already is the defaults