<ynezz>
f00b4r0: I mean, if I create 8 workers at the same time, there is high probability, that buildbot async foo results in 2 concurrent builds for same builder
torv has joined #openwrt-devel
<ynezz>
f00b4r0: which then results in this rsync errors later on, so probably would need to use same ul_lock on all those workers started at the same time?
<ynezz>
f00b4r0: OR I could add some random delay at the worker startup phase, so buildbot has enough time to process the async event queue
<f00b4r0>
i'm confused
<f00b4r0>
I assume that for your speedup workers, adding the property is simply a matter of editing the playbook? This would ensure that your workers only process the tag builds, which AIUI is exactly what you intend for them.
<f00b4r0>
for the race I'm even more confused: I don't undrestand what's happening?
<ynezz>
as I said, I'm using those workers solely for tagged builds
<ynezz>
I'm NOT using :)
<f00b4r0>
buildbot will not trigger two concurrent builds for the same "target" (as in bulidbot build artifact), otherwise I suppose it's a bug in buildbot,
<f00b4r0>
?
<f00b4r0>
ah ok
<ynezz>
concurrent failure happened due to arm_cortex-a9_neon/14 and arm_cortex-a9_neon/13 being built almost at the same time
<ynezz>
I've started 8 workers for phase2 openwrt-23.05 build at the same time and thats the result
<f00b4r0>
so these are phase2 builds, i see now
<f00b4r0>
I'm not familiar with that buildbot setup tbh
<f00b4r0>
what puzzles me looking at the build properties for each build, is that they target different repositories
<f00b4r0>
so that's not a buildbot bug, more likely a configuration bug
<f00b4r0>
ynezz: I guess I'll "fix" that (hopefully) when I rewrite the whole setup (which is the most likely scenario).
<schmars[m]>
Yeah, man, my buildbot also sometimes gets confused about repos, although the config is pretty straightforward... Not saying it's the same thing, but bugs are definitely not unheard of
<f00b4r0>
meanwhile, lunch. bbl :)
<ynezz>
f00b4r0: bon apetit, BTW what do you mean by that different property?
<f00b4r0>
ynezz: I'm sorry I do not understand your question :)
<f00b4r0>
as for the buildbot fix, i guess there's no much risk in trying to revert our workaround and see if it works. Push comes to shove we'll just revert the revert :) Also beware that the current git code does not include the "flunkOnFailure=False" bit
<f00b4r0>
if you do restart phase1, I know I've been saying this multiple times but please do add 22.03 *without* the rsync urls, so that 1) the databases registers successful builds for the current HEAD, so that new phase1 has the same internal state as old 22.03-phase1; and 2) so we confirm that 22.03 builds fine in Debian 11 (I have no doubt it will, though)
<ynezz>
f00b4r0: "what puzzles me looking at the build properties for each build, is that they target different repositories"
<f00b4r0>
and I don't have time to dive for context just now sorry ;P
<f00b4r0>
i'll rewrite all this anyway, it'll be a better use of my time :)
<ynezz>
f00b4r0: Re: 22.03, I don't want to make the decision about the switch myself, planning to write and email about that to discuss it
<f00b4r0>
o_O
<f00b4r0>
I'm sorry if I'm blunt, what's the big deal?
<ynezz>
we've now Debian 10 base for phase1 and phase2 OpenWrt 22.03
<ynezz>
we're going to switch phase1 to Debian 11, leaving phase2 on Debian 10
<f00b4r0>
we're not supposed to bring anything from the build host into the artifacts. If we do, that's a bug we need to fix :)
<ynezz>
there is always some weird possibility, that something might go wrong
<ynezz>
but why would I waste time with doing that move and burning resources on the builds, if we're not going to do the switch in the end?
<f00b4r0>
on phase1 I can tell you there's no issue. I've been building 22.03, including full buildbot images, forever on debian 11
<f00b4r0>
so you plan to keep a separate buildbot running for the rest of 22.03's life? Meaning that 22.03 will still suffer from all the issues that the new system fixes?
<f00b4r0>
I'm sorry, that sounds completely absurd to me.
<ynezz>
if that is needed, then why not, it works so far, its still going to work
<f00b4r0>
yeah well
<f00b4r0>
phase2 works so far.
<f00b4r0>
phase1 worked so far
<f00b4r0>
you saying this really gives me pause about dealing with phase2.
<ynezz>
well, this is quite chicken egg problem, isn't it? :)
<f00b4r0>
not really no. It's an extreme case of "if it ain't broke don't fix it".
<f00b4r0>
if I spend another couple weeks rewriting and testing phase2 (like I did on phase1), only to be then told "well, we're not so sure about this, we might want to have a vote about using the new code"; I'll do the only logical thing: nothing ;P
<Ansuel>
mhhh we are doing this discussion about 22.03 while we have 23.05 and master that would still use the new phase2 and that code will be used for anything newer
<Ansuel>
ynezz honestly the change from debian 10 to debian 11 can be problematic if we end up with different images.... for phase2 we can accept some funny situation since they are not core packages
tlj has quit [Remote host closed the connection]
<f00b4r0>
Ansuel: we have reproducible checking over our shoulder
<ynezz>
well, speaking about it, I'm going to switch master and 23.05 to that new Debian 11 base already
<ynezz>
once the rc1 is out
<f00b4r0>
and the world isn't going to stop if one snapshot 22.03 build fails. We can test it in new phase1, and if shit happens, old phase1 hasn't suddenly vanished into the void, right :P
tlj has joined #openwrt-devel
<ynezz>
phase2 packages are used for the stable releases as well
<ynezz>
if it was only about snapshots, fine with me as well
<f00b4r0>
ok well then it's sorted. I'm not touching phase2 :)
<ynezz>
step 1. move phase2 for snapshot to the new Debian 11 base, wait for a feedback and continue with step 2. and bump 23.05 phase2 to new Debian 11 base, ideally before rc2 is out
<dwfreed>
f00b4r0: R-B does not consider "I updated the build server to a whole new version of Debian" to be a variant to consider for reproducibility
<dwfreed>
you can use R-B tools (like diffoscope) to diff images that should be nominally identical across builds, though
<f00b4r0>
that.
<robimarko>
If they change then its a bug that needs to fixed anyway as host components are leaking into the build
<f00b4r0>
also that.
<dwfreed>
also, before putting in all this effort to move to debian 11, why not wait a month or so and move to debian 12 once the post-release dust settles?
<dwfreed>
it literally releases in a week
cmonroe has quit [Ping timeout: 480 seconds]
<dwfreed>
correction: 3 days
<dwfreed>
(barring last minute emergencies, Debian 12 will be released on the 10th)
<dwfreed>
realistically, you could most likely move to debian 12 now
<dwfreed>
most of the post release bugs are going to be in desktop, which is obviously not applicable here
<ynezz>
IMO the switch is still doable, but I'm not able to participate in the proposal/preparations/discussion around that due to lack of time
tlj has quit [Remote host closed the connection]
tlj has joined #openwrt-devel
<ynezz>
well, I was wrong, we're already using Debian11 for phase2 on snapshot/23.05 as its not about the master, but about the worker container image
<f00b4r0>
i am shocked there was no vote on this ;->
<ynezz>
so it seems good, we did the switch on master 3 weeks ago and I don't register any complaints so far
tlj has quit [Remote host closed the connection]
tlj has joined #openwrt-devel
<Ansuel>
sooo i guess concern solved ?
<f00b4r0>
except for 22.03
tlj has quit [Remote host closed the connection]
tlj has joined #openwrt-devel
<nbd>
Ansuel: i just noticed https://github.com/openwrt/openwrt/issues/12829. i think switching from stack-allocated data to kmalloc in the elf parsing code is a rather heavy hammer considering that it's only for getting rid of a warning (turned error). what do you think about reverting the patch and simply disabling the warning for that one source file instead?
<Slimey>
stintel lol, ill have to see if i can get it at the bar some time, google says "Ssh... the best Belgian Blonde beer in the world | Duvel" so we shall see ;)
<Ansuel>
nbd can you create a pr i still would love to see if upstream they have a better solution (if they will answer...) also is the perf hit that bad? that function is used that much on our images?
<nbd>
Ansuel: it's an extra kmalloc for every single exec()
<nbd>
not sure how much it is in practice, but it doesn't make any sense to me to resolve the stack warning in this manner
<nbd>
i can't create PR right now, since i'm on a flakey mobile data connection
<nbd>
the extra kmalloc would probably show up on some microbenchmarks used by automated kernel testing
<Slimey>
ath9k 0000:00:00.0: direct firmware load for ath9k-eeprom-pci-0000:00:00.0.bin failed with error -2, the art is in the ART partition and has the correct reference in the dts file, is this a ath9k-owl-loader requirement?
<ukleinek>
..ooOO(Oh fine, arnd is also here, didn't notice before and only accidently highlighted him I guess)
<arnd>
in an allmodconfig build, you can have UBSAN + GCOV enabled together, which current gcc doesn't handle well, the question is just which of the two to disable for allmodconfig
<arnd>
KASAN_STACK is the main culprit, but that is already disabled for allmodconfig because of COMPILE_TEST, iirc
* ukleinek
reads backlog here
<Slimey>
should make kernel_menuconfig do make menuconfig instead?
<Slimey>
or will it ask after configuring that part first at some point
<arnd>
nbd: if you increase CONFIG_FRAME_WARN on 32-bit architectures, you should probably increase THREAD_SIZE_ORDER accordingly
<arnd>
32-bit architectures still mostly use 8KB stacks, which is not all that much, especially in configurations where a single function hits that 1KB warning limit
<ukleinek>
Ansuel, nbd, robimarko: Do we have UBSAN and GCOV both enabled here, or is this a different problem?
<nbd>
arnd: i think increasing CONFIG_FRAME_WARN from 1k to 1.2k shouldn't really need a THREAD_SIZE_ORDER adjustment...
<Slimey>
i fucking swear i cant win for losing
<robimarko>
ukleinek: I am not really that familiar, just hit similar issues recently
Borromini has quit [Quit: leaving]
<nbd>
ukleinek: this is without UBSAN/GCOV
<nbd>
ukleinek: parse_elf_properties has a stack variable that's bigger than 1k
<nbd>
so it's guaranteed to trip the warning if the limit is 1024
<ukleinek>
nbd: but even though -Wframe-larger-than=1024 is in the cmdline the file compiles fine for me?!
<ukleinek>
nbd: and that's because only ARCH=arm64 has CONFIG_ARCH_USE_GNU_PROPERTY
<ukleinek>
FTR: looking at v6.4-rc1
<nbd>
ah
<hauke>
on 64 bit systems we set CONFIG_FRAME_WARN to 2038 now
<hauke>
2048
<hauke>
Is the patch still needed?
<nbd>
i overlooked that the commit that reconfigured CONFIG_FRAME_WARN in openwrt came after the workaround patch
<nbd>
so i guess we can simply drop the patch without changing anything else
robimarko has quit [Quit: Leaving]
<Ansuel>
nbd fine by me to drop since the error was caused by us setting FRAME_WARN to a non standard value
<Ansuel>
in the meantime i will talk with upsteam in search of a solution
torv has quit [Remote host closed the connection]
torv has joined #openwrt-devel
schwicht has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Tapper has quit [Ping timeout: 480 seconds]
schwicht has joined #openwrt-devel
schwicht has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]