fda- has joined #openwrt-devel
fda has quit [Ping timeout: 480 seconds]
floof58 has quit [Ping timeout: 480 seconds]
floof58 has joined #openwrt-devel
<aparcar[m]> mangix: ping
<mangix> aparcar[m]: pong
clayface_ has joined #openwrt-devel
clayface has quit [Ping timeout: 480 seconds]
<digitalcircuit> History so far: r17390-9baca41064-bisect-good, r17491-c98ddf0f01-bisect-good, r17457-25cb37bc00-bisect-good, r17508-a88b32bf6e-bisect-bad
<digitalcircuit> slh: NBG6817 eMMC initialization error update - I'm at "Bisecting: 8 revisions left to test after this (roughly 3 steps)", but if I had to guess, it feels like it's closing in on the Linux 5.4 -> 5.10 kernel change. I'll do the full bisect though because it might be something else.
<shibboleth> PaulFertser, mangix: aaaand it turns out the emmc isn't "bad" or write-protected at all, it's more like a logic-bomb of the embedded environment. the embedded squashfs only has drivers for ro, it depends on stuff in jffs, ext parts for rw
<shibboleth> and if these parts happen to have fs errors? well, what could go wrong, eh?
<shibboleth> can't mount due to fs errors, can't fsck because the squashfs can only do ro
<shibboleth> good job there, dell
<slh> digitalcircuit: I really hope for the best, at least that issue should be easier to identify and ultimately fix
<slh> I just hope it's not gcc-10 --> gc-11
<slh> gcc-11*
<digitalcircuit> slh: Oof, that's a good point too. Though the 5.4 -> 5.10 migration is pretty huge, and it seems like nobody else has encountered this issue so I don't know what's wrong with my setup in particular. I've effectively factory reset everything handled by dualboot.
<digitalcircuit> slh: No toolchain changes appear to be between 25cb37bc00 (good) and a88b32bf6e (bad). I'm not sure how to show the differences as a list of commits, but https://github.com/openwrt/openwrt/compare/25cb37bc00..a88b32bf6e appears to show all impacted files.
<digitalcircuit> (Locally I'm using gitk which shows the current git bisect good/bad markers)
floof58 has quit [Ping timeout: 480 seconds]
floof58 has joined #openwrt-devel
goliath has quit [Quit: SIGSEGV]
victhor has quit [Ping timeout: 480 seconds]
<slh> I've been running kernel v5.10 for roughly half a year on my nbg6817 and later the ASRock g10, so that 'shouldn't' be the issue - but at the same time, my binaries 'should' have worked on yours
shibboleth has quit [Quit: shibboleth]
<digitalcircuit> slh: That makes sense. The only thing that comes to mind so far is if there's any persistent changes to the eMMC that I've missed (e.g. do I need to update the ZyXEL u-boot bootloader?) or such. Regardless of 5.10 (after git bisect, I could test that by enabling 5.10 on an older commit), your build should have worked given sysupgrade did not persist config.
<digitalcircuit> (Inversely, I only got my NBG6817 in 2019, so maybe I have a slightly newer/changed firmware/hardware revision of something.)
<digitalcircuit> "U-Boot 2012.07 [Standard IPQ806X.LN,unknown] (Oct 03 2018 - 18:59:17)"
<digitalcircuit> (Pie-in-the-sky theory, maybe that's why I'm experiencing crashes too. Though if others can recreate the crash with https://github.com/digitalcircuit/openwrt-ipq806x-qa-cpu-reset#verify-crash-still-happens-with-unchanging-cpu-frequency then it's probably unrelated.)
Tapper has quit [Ping timeout: 480 seconds]
* digitalcircuit will continue the git bisect either way; this is all speculating until the exact commit is found.
<slh> possibly, bit I hope not. I don't see how u-boot would be updated (aside from factory updates). yes, there are provisions for that, but I don't think any OEM update actually touched the 4 MB spi-nor flash
<digitalcircuit> Unrelated good news, I've found the cause of my earlier build failures, which has been fixed by: https://github.com/openwrt/openwrt/commit/d27f6e2c5d2a2315cc8fe684d117c80aa9984ca8 For now as I'm bisecting, I'm just building with make [...] V=s to answer the kernel config prompt.
<digitalcircuit> slh: Yeah. I'm starting to wonder if I have a haunted router or something :)
danitool has quit [Quit: Cubum autem in duos cubos, aut quadratoquadratum in duos quadratoquadratos]
<digitalcircuit> Err, wrong commit - https://github.com/openwrt/openwrt/commit/1a3b3dc7974c98843baeb22251dc4c580dc771d6 "kernel: Add missing kernel config options" possibly with previous is what addresses the prior build failures (not the eMMC issues or CPU crash)
<mrkiko> digitalcircuit: hi! Did you fix your reboot issue?
<digitalcircuit> mrkiko: I've successfully found a new way to make the reboot happen that's closer to my test case :D https://github.com/digitalcircuit/openwrt-ipq806x-qa-cpu-reset#verify-crash-still-happens-with-unchanging-cpu-frequency But.. no, I haven't found a fix. I've unfortunately run into a NEW issue with the MMC storage failing to initialize, which I'm git bisect'ng to figure out when that broke. It's not broken for others, oddly.
<digitalcircuit> (I wanted to make sure I could recreate the issue using an up-to-date snapshot before I emailed Ansuel on the mailing list. Un/fortunately, I found a new regression in the snapshot.. kind of. Still hunting down what went wrong.)
<digitalcircuit> (It's looking like the switch to Linux kernel 5.10 may have broken boot for my NBG6817. Still 2 revisions left to compile and test though.)
mig has joined #openwrt-devel
mig has quit []
zadr has joined #openwrt-devel
<zadr> Where I can find table of rates for fixed_rate_idx knob?
awgh has quit [Ping timeout: 480 seconds]
Tapper has joined #openwrt-devel
pmelange has joined #openwrt-devel
pmelange has left #openwrt-devel [#openwrt-devel]
danitool has joined #openwrt-devel
dedeckeh has joined #openwrt-devel
zadr has quit [Remote host closed the connection]
Tapper has quit [Ping timeout: 480 seconds]
goliath has joined #openwrt-devel
victhor has joined #openwrt-devel
bookworm_ has joined #openwrt-devel
bookworm has quit [Read error: Connection reset by peer]
pmelange has joined #openwrt-devel
Tapper has joined #openwrt-devel
<russell--> slh: fwiw, ubiquiti updates often replace u-boot
<stintel> hauke: ideally you would have explained why you disabled the errata in 57b323ce38f327557d1b016dddd712bb4a8e0854
<stintel> for cavium and fujitsu it's most likely because we don't have that at all
<stintel> but for the more generic ones it's not clear to me at all
goliath has quit [Quit: SIGSEGV]
<pmelange> I'm having trouble building a local feed with docker. It seems like the LetsEncrypt problem is causing problems there too.
<pmelange> That is, I'm using the docker-sdk to make my local feed.
clayface has joined #openwrt-devel
clayface_ has quit [Ping timeout: 480 seconds]
goliath has joined #openwrt-devel
danitool has quit [Quit: Cubum autem in duos cubos, aut quadratoquadratum in duos quadratoquadratos]
Tapper1 has joined #openwrt-devel
Tapper has quit [Read error: No route to host]
<swalker> updated openwrt/upstream, https://sdwalker.github.io/uscan/index.html
<rsalvaterra> stintel: Probably CPU revisions that didn't make into production, or haven't been used in the systems we support.
tohojo has quit [Ping timeout: 480 seconds]
<hauke> stintel: I thought that this is clear when I list the CPU cores they are for
<hauke> The ARM erratas are for Cortex A76 and N1
<hauke> I do not think we have any target using them
<hauke> armvirt could use them, so they are still active there
dangole has joined #openwrt-devel
hurricos has quit [Quit: WeeChat 2.8]
pmelange has left #openwrt-devel [#openwrt-devel]
danitool has joined #openwrt-devel
tohojo has joined #openwrt-devel
<slh> russell--: I was specifically referring to the firmware images (and their internal structure) of the ZyXEL NBG6817, I know that many other vendors/ devices do occassionally update u-boot (e.g. ath79/ TP-Link). but from what I've seen with the nbg6817, I don't think it has been done (as part of a regular vendor update) there, yet
dedeckeh has quit [Quit: Page closed]
dangole has quit [Quit: Leaving]
<owrt-1907-builds> Build [#27](https://buildbot.openwrt.org/openwrt-19.07/images/#builders/11/builds/27) of `armvirt/64` failed.
<owrt-1907-builds> Build [#24](https://buildbot.openwrt.org/openwrt-19.07/images/#builders/20/builds/24) of `mediatek/mt7622` failed.
<aparcar[m]> mangix: so macos CI doesn't like toolchain even with your suggested changes https://github.com/aparcar/openwrt/runs/3781476951?check_suite_focus=true
<aparcar[m]> any ideas?
rmilecki has quit [Ping timeout: 480 seconds]
<mangix> Nope. I setup a macOS VM yesterday. Works fine.
<mangix> The CI seems to timeout. Strange.
<aparcar[m]> neoraider: could you please backport https://git.openwrt.org/?p=project/opkg-lede.git;a=commit;h=5936c4f9660248284e8a9b040ea3153d3ea888de to 19.07?
<aparcar[m]> mangix: not even a mac CI need 6 hours to do the magic
<aparcar[m]> i'll do another run with V=s
<mangix> On my installation, I have to compile with gmake instead of make. No idea if related.
Tapper1 has quit [Ping timeout: 480 seconds]
<digitalcircuit> slh: Bad news (?), NBG6817 MMC initialization failures have been tracked down to... 0470159552641c2b11ccc1b0fcfcb4ea08f2c6ab is the first bad commit ("ipq806x: switch to kernel 5.10"). I'm not sure how to "git bisect" from Linux kernel 5.4 to Linux kernel 5.10, especially considering all the patches on top of stock Linux kernel...
<digitalcircuit> slh: I'm guessing this is when I'd file a new bug report and/or mailing list post?
* digitalcircuit still doesn't understand why his router has trouble, but others' NBG6817 (and IPQ8065) devices seem to handle 5.10 just fine.
<slh> mine is totally fine with kernel 5.10, still on r17525-a46fa5c3a7 with v5.10.66
<owrt-1907-builds> Build [#23](https://buildbot.openwrt.org/openwrt-19.07/images/#builders/70/builds/23) of `ipq806x/generic` completed successfully.
<owrt-1907-builds> Build [#25](https://buildbot.openwrt.org/openwrt-19.07/images/#builders/46/builds/25) of `archs38/generic` completed successfully.
<digitalcircuit> slh: Noted! Since that PR is already merged, should I file a new bug report, or is commenting on that PR acceptable?
<slh> digitalcircuit: I'd start by adding a comment to the closed PR
<slh> that should notify Ansuel, but maybe address @Ansuel as well
<digitalcircuit> slh: Makes sense! Might be simple enough of a fix for Ansuel to be able to skip some formalities. I'll offer to file a bug report in my PR comment though. @mention is also a good idea!
<slh> very, very weird, as it works for me (and has been, for over half a year now)
jlsalvador has quit [Quit: jlsalvador]
<digitalcircuit> Looking at the log, I just now noticed: "[ 2.902209] mmci-pl18x 12400000.sdcc: card claims to support voltages below defined range"
<digitalcircuit> So I wonder if ZyXEL changed the MMC part..?
<slh> just for comparison, http://paste.debian.net/hidden/695c70ba/
<digitalcircuit> Which appears identical. Huh.
<slh> but different eMMCs in a new production batch sounds pretty reasonable
<digitalcircuit> slh: Yeah... Your log doesn't have the "support voltages below defined range", so that sounds like it. May I mention my comparison with your log in my comment?
<slh> sure, but it will expire in 1h
<slh> so better copy'n'paste
<digitalcircuit> slh: Noted!
<slh> what do cat /sys/block/mmcblk0/device/cid and cat /sys/block/mmcblk0/device/date say?
<digitalcircuit> slh: root@TARS:~# cat /sys/block/mmcblk0/device/cid && cat /sys/block/mmcblk0/device/date
<digitalcircuit> 7001004d36323730340109d1c809c52a \n 12/2018
<slh> d04f01320f5903ffffffffe78a400050 12/2015
<digitalcircuit> Ooh, so it IS different - noted!
<slh> err, sorry: 700100533130303034081e31aedac21e 12/2015
<slh> messed up with csd first
<slh> /sys/block/mmcblk0/device/name --> S10004
<slh> there's a lot mjore interesting stuff in that directory
<digitalcircuit> "M62704" here, I'll add that to my comment too.
<slh> manfid: 0x000070 fwrev: 0x0800000000000000 hwrev: 0x0 oemid: 0x0100 rev: 0x7
<digitalcircuit> slh: So "cd /sys/block/mmcblk0/device/ && tail -v cid date name manfid fwrev hwrev oemid rev" should print all your details, right? (Just finding an easier output for the GitHub comment, I can reformat what you've provided me.)
<slh> don't know if there's more, but http://paste.debian.net/hidden/dfe91e0d/
<slh> I have found http://www1.futureelectronics.com/doc/Kingston/EMMC04G-M627-X03U.pdf, but no technical specs for my s10004 so far
<digitalcircuit> slh: Sounds good! I'll post my comment now, unless there's anything else you'd like me to look for first.
<slh> nope, just fishing in the dark and looking for potential differences
<slh> iirc you have two nbg6817? if you can, please check the other one as well
<slh> if you're 'lucky', one works, one doesn't ;)
<digitalcircuit> slh: Good point! I'd check the other one.. but unfortunately, it's at someone else's house and OpenVPN appears to be broken at the moment (alongside other issues). It's still running 19.07.6 and next time I get over there I think I'll just flash it to 21.02.0 and reduce max CPU clock to 1.0 GHz to work around the crash bug.
<digitalcircuit> (I'm working on a simple, robust init.d service to apply that which will persist across flashes so I can just have it stable while I'm tinkering at this place.)
<slh> I'm very happy with wireguard on mine
<digitalcircuit> I don't know (other person isn't technical), but I think the router itself is partially locked up - I ran into this issue at this place as well, the WiFi driver crashes/stops responding to LAN/etc. That was fixed for me with 21.02 (but that introduced CPU crashes instead).
<slh> make sure to write the 'good' image to /dev/mmcblk0p8, as push-button tftp recovery always overwrites /dev/mmcblk0p5, so /dev/mmcblk0p8 remains safe and untouched
<digitalcircuit> (I asked them to reboot the router, but they're in a lot of physical pain/etc so it's not feasible for now. I'm heading over this upcoming Friday.)
<digitalcircuit> slh: Good to know, thanks!
jlsalvador has joined #openwrt-devel
<slh> the problem just is, that kernel 5.4 for ipq806x has just yesterday been removed from master...
<slh> but I expect it to be fixable relatively easily...
<slh> /famous last words
<digitalcircuit> slh: Good point, I didn't realize that, added a minor note of that in my reply comment: https://github.com/openwrt/openwrt/pull/3954#issuecomment-933035532
<slh> btw., I'm pkgadd on github (well, and also in here, from a remote system)
<digitalcircuit> Noted!
<aparcar[m]> does something like make -c tools/zstd val.PKG_VERSION also work with `call` of a define?
<slh> digitalcircuit: btw. # strings /dev/mtd8 | grep U-Boot --> U-Boot 2012.07 [Standard IPQ806X.LN,unknown] (Jul 25 2016 - 16:16:46)
<slh> not that it matters (I think) for this issue
pmelange has joined #openwrt-devel
pmelange has left #openwrt-devel [#openwrt-devel]
<Slimey> hmm
<digitalcircuit> slh: Noted as well. It's at least a sign that ZyXEL did change U-Boot over time, though even my U-Boot date is older than the MMC date.
<digitalcircuit> mrkiko: I've created a workaround script for the CPU crash bug: https://github.com/digitalcircuit/openwrt-ipq806x-qa-cpu-reset#how-to-workaround-this-issue If you haven't bought an NBG6817 though, still wait - this reduces performance! I only created it as I'm managing an NBG6817 at a remote location too, and I wanted a way to help ensure it stays stable when I'm not testing things.
<stintel> hauke: maybe it was my wishful thinking but I thought we had cortex a76
<digitalcircuit> I'm not sure who to ping regarding the Let's Encrypt certificate troubles - OpenWRT 21.02.0 has trouble communicating with https://sysupgrade.openwrt.org/ (the attended-sysupgrade client "auc" says "Connection error: Invalid SSL certificate")
<digitalcircuit> (The download repos had a workaround applied, which probably needs applied to sysupgrade as well)
cp- has quit [Quit: Disappeared in a puff of smoke]
cp- has joined #openwrt-devel
<digitalcircuit> jow: ^ I think you had talked about the Let's Encrypt cert workaround for OpenWRT 21.02 by removing the cross-signed ISRG Root X1 from the chain? This probably should be done for sysupgrade.openwrt.org as well (for auc & luci-app-attended-sysupgrade).
Grommish has joined #openwrt-devel
<digitalcircuit> Ack, I was mistaken - auc is affected, luci-app-attended-sysupgrade appears to not be affected (perhaps due to using browser's HTTPS stack?).