<f00b4r0>
ynezz: i wonder if the osusl builders wouldn't benefit from the netlocks system. Seems like they have a flaky pipe shared between them
Slimey has quit [Ping timeout: 480 seconds]
Slimey_ is now known as Slimey
<ynezz>
f00b4r0: isnt that a problem on server side?
<ynezz>
error: RPC failed; curl 56 GnuTLS recv error (-9): Error decoding the received TLS packet.
<f00b4r0>
ynezz: considering that the azure builder don't seem to experience the same issue, i'd guess not?
<f00b4r0>
command timed out: 1200 seconds without output running [b'./scripts/feeds', b'update'], attempting to kill
<f00b4r0>
if it's a load issue on our side, netlocks will help too: you can reduce the number of concurrent connections
<ynezz>
ok, I've thought that its related to those 504 issues
<ynezz>
it might be, that osuosl network is broken again
<f00b4r0>
it certainly won't hurt to test that the feature is working :)
<f00b4r0>
alternatively if you can send me a playbook I'd like to setup my machines as well for a test run so they are ready for release builds. We can test that there
<ynezz>
current idea is to use ephemeral workers for releases
<f00b4r0>
ah well. Then I've upgraded cpus and ram for naught ;P
<ynezz>
BTW I've never used those locks, so I'm going to test it, just wondering how does it work
<f00b4r0>
sounds good
<ynezz>
say we've 30 workers on fat pipe and we don't want them to hammer git.openwrt.org during git clone/fetching and as well downloads.openwrt.org during rsync
<ynezz>
I should the locks names to the same value?
<ynezz>
s/should/should set/
<ynezz>
dl_lock = hetzner; ul_lock = hetzner to all 30 workers?
<ynezz>
err, dl_lock = hetzner_dl; ul_lock = hetzner_ul to all 30 workers?
<f00b4r0>
that would serialize all network operations from/to all workers
<f00b4r0>
dl_lock controls download, ul upload
<ynezz>
so when master is scheduling the build steps, it's going to assure, that only one up/dl would be possible on those 30 workers, right?
<f00b4r0>
if you want to allow simultaneous ul and dl, you'd use a different name for both
<f00b4r0>
correct
<ynezz>
ok, clear, thanks
<f00b4r0>
for the steps that are locked
<f00b4r0>
which is pretty much all of them iirc
<ynezz>
probably groups of 10 would be fine
<ynezz>
so lets try that
<f00b4r0>
the idea wasn't so much about alleviating load on *our* side than alleviating load on the builders side, for builders that share e.g. an asymetric pipe
<f00b4r0>
for instance when I first came up with this stuff my builders where running off of VDSL, and as soon as two big network transfers started they were killing each other, and delaying builds. That's how I came up with the idea
<ynezz>
you see failures on those truecz workers during rsync phase, thats likely a problem on our side
<f00b4r0>
that's not what the log suggests tho
<f00b4r0>
rsync: rename "/packages/arm_cortex-a9_vfpv3-d16/base/.thc-ipv6-fuzz-dhcpc6_3.8-1_arm_cortex-a9_vfpv3-d16.ipk.Kzzxr6" (in snapshot-uploads) -> "base/.~tmp~arm_cortex-a9_vfpv3-d16/thc-ipv6-fuzz-dhcpc6_3.8-1_arm_cortex-a9_vfpv3-d16.ipk": No such file or directory (2)
<f00b4r0>
looks like some storage issue on the worker
<ynezz>
ok, I've thought, thats the problem on receiving side with some I/O timeout
<f00b4r0>
hmm
<f00b4r0>
well do we have some monitoring of the receiving server? ;)
robimarko has joined #openwrt-devel
<f00b4r0>
ynezz: rereading rsync manpage I think you might be right
<f00b4r0>
the "rename" part confused me
<f00b4r0>
also, are you sure you don't want to have spare builders for e.g. testing purposes or as a "just in case" backup? :)
<ynezz>
mirror-01 has single 8TB spinning disk, so INFO: task rsync:28640 blocked for more than 120 seconds. explains it all
<ynezz>
those workers are basically on LAN so can saturate the 1G link itself and that poor disk wont be able to handle such concurrent uploads I would guess
<ynezz>
there is no monitoring, but hopefully soon (tm)
<f00b4r0>
yeah single disk that's never going to go well
<ynezz>
we should upgrade indeed, the boxes are dated
<f00b4r0>
well you don't need anything super fast, but you need good I/O throughput.
<f00b4r0>
if this is a consumer grade sata disk and running off bog standard sata controller, that's a non-starter
<ynezz>
everything evolves :)
<ynezz>
so if we want to self-host we need to evolve as well
<f00b4r0>
my builders (well, ex-) are "old" machines (they might be 8-9yo now), yet they still perform remarkably well :)
<f00b4r0>
they use rotational media, but sas disks, cache-backed sas controller etc.
<f00b4r0>
reminds me i need to delete the containers and your account ;P
<ynezz>
BTW I'm not rejecting your build resources offer, but its quite hard to keep that in sync
<f00b4r0>
i'm curious: how is it harder than other builders?
<ynezz>
everyone has full admin acces to every builder, they're always online for the start, so even simple base image update means some overhead
<ynezz>
sure, we could use latest tag for the containers, but that screams issues
<robimarko>
Dont want to intrude, but what is the solution to add more resources while not adding more resources?
<ynezz>
full build of one release can be done on hetzner for $20, so the current idea is to do it that way
<f00b4r0>
ynezz: so just to make sure I get this right: the issue is the maintenance of the container on the metal? If so, wouldn't turning all workers into latent workers that deploy their container when they start "fix" this issue?
<ynezz>
yes, latent workers is the solution, thats why we've switched to monomaster, right
<f00b4r0>
ok so I guess I know what's my next task :)
<ynezz>
and due to lack of time, I'm optimizing where I can, so instead of spending 30min with some buildworker fiddiling, I just do `terraform apply` and call it a day
<f00b4r0>
i'm thinking first move phase2 to monomaster, then merge phase1 and phase2 masters, then switch everything to latent workers. Sounds right?
<ynezz>
I would suggest to reorder it: phase2 monomaster, latent workers, merging phase1/phase2 masters
<f00b4r0>
sure, that should work too
<f00b4r0>
i'll get started in a couple weeks
<ynezz>
lets see first how does phase1 monomaster looks like with snapshot/23.05/22.03
<f00b4r0>
*nod*
<aparcar[m]>
are there already 22.03 test builds on the monomaster?
<aparcar[m]>
I lost track of the conversation
<f00b4r0>
btw the "tag_only" setting should enable you to provision workers for release builds without using them
schwicht has joined #openwrt-devel
Lynx- has joined #openwrt-devel
<f00b4r0>
workers with tag_only will only accept builds for release tags
<ynezz>
aparcar[m]: I don't want to experiment on 22.03, so we would move it over once 23.05 is confirmed working fine
<f00b4r0>
(that should hopefully pan out nicely with latent workers)
<aparcar[m]>
ynezz: and the last bits are musl updates for now?
<f00b4r0>
ynezz: you could arguably test 22.03 snapshots though
<ynezz>
aparcar[m]: that musl fall out is quite huge on package feeds indeed and there is as well some issue with u-boot compilation on buildbot for mt7620
<aparcar[m]>
heh let's do a branch but also update core software 😉
<ynezz>
:)
<aparcar[m]>
well unfortunate, CI didn't catch the fallout?
<aparcar[m]>
wonder if i.e. musl should be explicitly marked as a "please rebuild everything in CI" thing
Lynx- has quit [Quit: Going offline, see ya! (www.adiirc.com)]
<aparcar[m]>
I just got pointed towards https://openwrt.org/releases/start and people think 21.02 is still active. I think we could branch 22.05 now and mark 21.02 as EOL
<TianlingShen[m]>
There's a PR for rust, maybe we can try some luck on it
<f00b4r0>
ynezz: re i/o contention on mirror-01, I'm guessing the disk is formatted ext4?
<Ansuel>
btw f00b4r0 no idea why but grid view is broken 3/4 of the times
<Ansuel>
also what is that fake revision?
<f00b4r0>
Ansuel: that would be a buildbot bug I'm afraid
cbeznea has quit [Ping timeout: 480 seconds]
<f00b4r0>
the config doesn't touch the default views, beyond enabling them
<f00b4r0>
Ansuel: the fake revision is the janitor task run; you can ignore it
<ynezz>
f00b4r0: yes, ext4
<f00b4r0>
ok. not much we can do there then
<ynezz>
well, we can impose the locks
<f00b4r0>
ah yeah sure. I was thinking about fs tuning to reduce contention :)
<nick[m]1234>
Tianling Shen: can you test?
<TianlingShen[m]>
nick[m]1234: im not the rust dev, need help from @lu-zero
<f00b4r0>
ynezz: i just remember that phase2 still uses rsync --checksum. That is an absolute killer both for CPU and I/O. This may likely explain some of the issues
<f00b4r0>
getting rid of that was among the first changes i did for phase1, years ago.
<ynezz>
f00b4r0: PRs welcome, thanks!
<f00b4r0>
ynezz: i'll get there, not before next month though. It's unfortunately not as simple as removing the option. That's why I wrote sha2rsync.pl :)
<robimarko>
\x: mac80211 support should be sent upstream, I am sure they will be happy about it
<ynezz>
f00b4r0: its likely more about upgrading the complete solution, from hand crafted rsync based homemade mirroring to something like minio or such
<f00b4r0>
i don't see how that would apply to the buildbots though
<f00b4r0>
quite a bit more overhead than good ol' trusted rsync though :)
<f00b4r0>
setup overhead, that is
<f00b4r0>
anyway, I'm off, ttyl
<aparcar[m]>
ynezz: I'd be in favour of minio, too
djfe_ has joined #openwrt-devel
<schmars[m]>
while you're touching rsync, could we have sources.openwrt.org syncable via rsync too?
djfe has quit [Ping timeout: 480 seconds]
cbeznea1 has joined #openwrt-devel
cbeznea has quit [Ping timeout: 480 seconds]
<ynezz>
schmars[m]: should be done, can you check it?
<schmars[m]>
ynezz: works <3 even syncs fine on top of my wget --mirror :-) thanks!
<schmars[m]>
i wanna spread little boards in our community network that have everything needed for openwrt-based work :)
<ynezz>
well its 133GB of sources, not that small :p
<f00b4r0>
some have fairly old timestamps, is this expected?
<f00b4r0>
I see 2014
<schmars[m]>
ssd storage prices are collapsing though, we're already at 50 euro/TB, so a 4 TB ssd is actually affordable. the downloads mirror is slightly more than 2 TB
<f00b4r0>
make that 2013
<schmars[m]>
timestamps look reasonable to me on a quick glance
<ynezz>
wow that git server is on its knees
<f00b4r0>
Ansuel: so quick glance at Safari dev mode suggests that the grid view may be some js snafu, some event not firing. The data exchanged for "works" and "doesn't work" is similar
<f00b4r0>
needless to say, I'm not touching that ;P
<aparcar[m]>
ynezz: switch to GH mirrors for builders?
<f00b4r0>
aparcar[m]: wouldn't that risk triggering GH's excessive requests block?
<aparcar[m]>
not sure
<schmars[m]>
i've never seen them rate-limit git access itself
<aparcar[m]>
gitlab does that from my experience
<ynezz>
github too
<schmars[m]>
ok good to know
<f00b4r0>
otherwise we could conceivably store a local copy of feeds and pull from that
<aparcar[m]>
maybe we should switch away from feeds.conf src-git-full
<ynezz>
yes
<aparcar[m]>
we're dropping AUTORELEASE anyway 💔
<f00b4r0>
;)
<ynezz>
and maybe use github.com in there
<aparcar[m]>
ynezz: github in where?
<ynezz>
feeds.default.conf
<f00b4r0>
feeds.conf
<ynezz>
more folks are doing a CI and using those defaults
<ynezz>
of we should impose some rate limits
<ynezz>
or move to more beefy machine, its some poor DO VPS
<f00b4r0>
or get hosted on kernel.org cdn ;}
<ynezz>
that would be great indeed
<aparcar[m]>
they have a CDN for git?
<f00b4r0>
dig git.kernel.org suggests they do
<robimarko>
AFAIK they do have CDN
<Ansuel>
f00b4r0 yep it's totally builbot bug just noticing before was correctly working and now it doesn't probably doesn't like that much master?
<f00b4r0>
Ansuel: no idea. Refresh a couple times and it falls into place.
<ynezz>
:D
<f00b4r0>
nothing in the browser console. Some event is most likely not firing up the final rendering
<ynezz>
works here
<aparcar[m]>
I don't see openwrt being hosted in that list?
<f00b4r0>
ynezz: it's hit and miss apparently
<aparcar[m]>
I mean, I don't see the perspective of it being hosted there, or is it open to anyone?
<f00b4r0>
aparcar[m]: i was merely jesting, but in reality you will never know until you ask. Sometimes you can get nice surprises :)
<robimarko>
Well, they do host standalone tooling, libraries and docs there
<aparcar[m]>
I'm happy to write them an email
schwicht has joined #openwrt-devel
<aparcar[m]>
however then we might aswell move most our stuff to use github.com as mirror and can host our own stuff
george_p56 has joined #openwrt-devel
schwicht has quit []
<f00b4r0>
that's probably the easiest thing to do with immediate impact
<f00b4r0>
but for the builders we want to make sure they won't be throttled; I'm a bit worried about that
<aparcar[m]>
ynezz: can't we make the builder do always a regular git pull to only download the diff? if so they shoudl never be ratelimited
<f00b4r0>
aparcar[m]: feeds are entirely removed after each build
<f00b4r0>
a shallow clone would already be a step forward. I understand that's an option since AUTORELEASE is gone
<aparcar[m]>
f00b4r0: why so?
<f00b4r0>
why so what?
<aparcar[m]>
feeds are for each openwrt.git branch the same, right?
<robimarko>
Why do buildboots even need full git history?
<f00b4r0>
they needed that for AUTORELEASE, AIUI
schwicht has joined #openwrt-devel
<f00b4r0>
which is exactly why I'm glad we killed it with fire :)
<robimarko>
Then its perfect time for shallow clones, that will save a lot of bandwith
<aparcar[m]>
-.-
<aparcar[m]>
but why are all feeds always removed?
<f00b4r0>
because they're not part of the main git and so are purged with git clean?
<f00b4r0>
and since they aren't a submodule either, there's not much we can do about that
george_p56 has quit [Ping timeout: 480 seconds]
<f00b4r0>
else opening the door to not fully cleaning between builds, and living through the consequences ;P
<robimarko>
I swear that QCA is trying to piss me off with reset naming like this: reset-names = "phy", "phy_phy";
<robimarko>
Like WTF
<Ansuel>
better than
<Ansuel>
reset-names = "reset"
<f00b4r0>
Ansuel: lol
<robimarko>
Well, they had 2 resets, so it could have been "reset2"
<robimarko>
Also, they keep reinventing USB and PCIe PHY-s
<aparcar[m]>
we have dl/ in our own folder, can't we have feeds/openwrt-{22.03.23.05}
<robimarko>
They just settled on QMP and now we have UNIPHY
<f00b4r0>
aparcar[m]: that's what I suggested above
<aparcar[m]>
f00b4r0: I didn't read that sorry
<aparcar[m]>
in each feed you could also run git clean I guess
<robimarko>
Well, CI kernel patch refresh check doesnt work anymore
<f00b4r0>
aparcar[m]: note that this will do nothing for latent workers which by nature will download everything as they start
<f00b4r0>
so the correct approach would be to reduce the amount of data needed in the first place. Hence, shallow clone :)
<aparcar[m]>
sure but they don't start and stop all the time right? we'll have some running for a while right?
<aparcar[m]>
shallow clone is bad-ish anyway for the reproducibility of packages
<f00b4r0>
that's something I do not understand
<f00b4r0>
it works for phase1; surely it could be made to work for phase2?
<f00b4r0>
ynezz: btw I'm just thinking: even if you don't want to try actual 22.03 snapshots, you could still enable the branch and *not* set the rsync urls. This will run the builds without uploading anything, which would provide us with good info on scheduling and branch switches on builders: if that works, then you can be fairly confident that everything will work with uploads enabled.
<f00b4r0>
i tried to make the system very flexible :)
<PaulFertser>
aparcar[m]: packages AUTORELEASE I removed, and described the full procedure in the commit message iirc.
<Lynx->
can anyone advise about procd - how to have script launched as command and terminated show up in 'service X status' as 'inactive' rather than 'running'?
<apritcha>
Trying to add device support as a first time contributor - my pipeline failed due to kernel patches requiring refresh and said to run "make target/linux/refresh" and force push, but this doesn't seem to do anything that git recognizes as a change that can be pushed, any idea what I'm missing?
djfe_ is now known as djfe
tlj has joined #openwrt-devel
<aparcar[m]>
apritcha: link
tlj_ has quit [Ping timeout: 480 seconds]
<robimarko>
apritcha: kernel refresh in CI seems to be broken currently
<robimarko>
ynezz: Awesome, will see if boss wants to pay for them
<robimarko>
I have a bunch of FT232R clones that have 1.8V jumpers from China
<robimarko>
On order, but need to test those on junk gear first
<ynezz>
there is a lot of fake ft232r so if you've still enough hairs left, go for it :P
<robimarko>
These are fake 100%
<ynezz>
anyway if it's for manual fiddling then its probably good, you wont notice the difference
<robimarko>
But, I have been using fakes ones for years and they have been rock solid, though 3.3 or 5V only
<blocktrron>
TianlingShen[m]: conflicts when adding device support on this single patch file.
<TianlingShen[m]>
blocktrron: that can be simply rebased imo, and it's unlikely would happen often as it's unlike the ath79 target to have a large number of devices to add
cbeznea has quit [Quit: Leaving.]
<blocktrron>
We've had the same story with the ipq40xx dts source patch which was cumbersome.
<blocktrron>
Please split the patch.
<blocktrron>
aparcar[m]: this I'd does not exist?
<blocktrron>
nvm my client is shit
<TianlingShen[m]>
though i would argue ipq40xx is somehow different
<TianlingShen[m]>
i will keep that patch separated
<TianlingShen[m]>
okay updated
Slimey has quit [Remote host closed the connection]
Slimey has joined #openwrt-devel
goliath has joined #openwrt-devel
minimal has joined #openwrt-devel
Sammydadsasdas has joined #openwrt-devel
Sammydadsasdas is now known as Sammy
Sammy is now known as Smy
<Smy>
hello there
<Smy>
I want to edit the source file tplink-safeloader.c, but I cannot find it in the latest releases
<Smy>
I found a commit saying "additionally moves the source files to another project", but it doesn't tell which one
<hauke>
Ansuel: I hope you are feeling betetr again. I also wanted musl 1.2.4 in the next release. We should probably improve the github CI to detect such problems
<Ansuel>
hauke recovering. main problem of detecting such problem is that the test time will get much longer... but i may think of a specific run triggered by a tag that will just build everything on every target
<Ansuel>
also i notice our get-target script skips some target
<Ansuel>
for example lantiq xrx200 for some reason wasn't there so i didn't notice vr9 packages were broken
<Ansuel>
(and i had to fix them with the help of buildbot logs)
<hauke>
Ansuel: for toolchain changes we should not use the prebuild toolchain in the build all packages ci task
<Ansuel>
hauke yep the "don't use prebuilt toolchain" is correct and is not that hard to implement (will be based on changed files and will force) but main problem is that 20 hours
<Ansuel>
but needs to be done carefully or big mem leak
tlj has joined #openwrt-devel
<robimarko>
hauke: Well, the glibc issue is reproducible
<robimarko>
Autoconf version "error" shouldnt terminate the build, however its claiming that m4 fails?
<stintel>
\
<hauke>
robimarko: I see that problem in a local build too
<robimarko>
hauke: Yes, I am hitting the same one locally
<robimarko>
I get the autoconf warning as configure was generated by autoconf 2.69
<nick[m]1234>
Ansuel: but the dynamic alloc you make is 2048 and gcc complains that it needs 2088. Is there any reason why not chosing something above 2088?
<Ansuel>
nick things dynamically alloc are not part of the stack
<Ansuel>
gcc complain that stack is too big for that function
<hauke>
nick[m]1234: maybe this is already fixed upstream
<nick[m]1234>
whups, I misinterpreted the warning. I thought it would be some new fancy warning telling me somehow that I have bufferoverflow.