ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
AbleBacon has quit [Read error: Connection reset by peer]
Kayden has joined #freedesktop
columbarius has joined #freedesktop
co1umbarius has quit [Ping timeout: 480 seconds]
ximion has quit [Quit: Detached from the Matrix]
GNUmoon has quit [Read error: Connection reset by peer]
GNUmoon has joined #freedesktop
keypresser86 has quit []
keypresser86 has joined #freedesktop
pixelcluster has quit [Ping timeout: 480 seconds]
krushia has quit [Ping timeout: 480 seconds]
keypresser86 has quit []
MajorBiscuit has joined #freedesktop
bmodem has joined #freedesktop
bmodem has quit [Remote host closed the connection]
sima has joined #freedesktop
bmodem has joined #freedesktop
ximion has joined #freedesktop
tzimmermann has joined #freedesktop
MajorBiscuit has quit [Quit: WeeChat 3.6]
pkira has joined #freedesktop
pkira_ has joined #freedesktop
ximion has quit [Quit: Detached from the Matrix]
pkira has quit [Ping timeout: 480 seconds]
AbleBacon has joined #freedesktop
bmodem has quit [Ping timeout: 480 seconds]
bmodem has joined #freedesktop
bmodem has quit [Excess Flood]
bmodem has joined #freedesktop
bmodem has quit [Remote host closed the connection]
bmodem has joined #freedesktop
pixelcluster has joined #freedesktop
tzimmermann has quit [Remote host closed the connection]
tzimmermann has joined #freedesktop
<MrCooper>
eric_engestrom: FYI, now everyone who had already fetched the older mesa-23.2.0 tag will need to manually delete that, or git fetch will fail
<MrCooper>
glslang (IIRC) keeps changing their tags like that, it's pretty annoying (breaks scripts using git fetch)
nedko has joined #freedesktop
<nedko>
hello, i have issue with gitlab.freedesktio.org. I've signed with github and i want to fork the pipewire repository so to make a pull request. However gitlab tells me that I'm not allowed to have more repositories (There are zero "my" gitlab.fdo repos so far)
blatant has quit [Read error: Connection reset by peer]
blatant has joined #freedesktop
AbleBacon has quit [Quit: I am like MacArthur; I shall return.]
tzimmermann has quit [Remote host closed the connection]
tzimmermann has joined #freedesktop
blatant has quit [Read error: Connection reset by peer]
tzimmermann has quit [Quit: Leaving]
<eric_engestrom>
MrCooper: yeah I know, but there's no good solution for when a tag was created by accident :/
<eric_engestrom>
not sure what dcbaker and/or me should do when we really do release 23.2.0 final
<eric_engestrom>
the only thing I can think of is to skip .0 and go from .0-rc4 to .1
<sam_>
please don't delete it, it's even more confusing and it scares downstreams
heapify has joined #freedesktop
<psykose>
.0.1 :)
blatant has joined #freedesktop
<MrCooper>
or maybe something like mesa-23.2.0-fixed
<psykose>
issue with adding -words to the tag is most build templates do https://url/archive/$someversion.tar, and so for the one release someone has to spend 15 seconds typing $someversion-fixed just the once (so it's not in the real package version reflected in the package after, since that would be weird)
<psykose>
just .0.1/.1 needs no changes anywhere and people just increment it more
<psykose>
depends on how meaningful a real .0 release is or isn't
<eric_engestrom>
yeah adding words is not a good idea
<eric_engestrom>
sam_: "it scares downstreams" could you expand on that?
<eric_engestrom>
fdo admins, could you have a look at the workload of the runners?
<eric_engestrom>
the generic ones
<eric_engestrom>
eg. fdo-equinix-m3l-*
<eric_engestrom>
`sanity` on a Marge pipeline shouldn't be waiting 5 minutes before a runner can pick it up
<eric_engestrom>
I expect someone is overloading all of our runners
<sam_>
it generally makes distributors wonder what went on, we have to go checking through issues to see why the tag existed and got deleted, if any of us packaged it then we end up getting checksum errors when we fetch the old tag and have to try get a copy of old/new and diff it in case of malicious stuff
<sam_>
it's much easier to just do .0/.1 like psykose says and move on because we can do it without thinking
<daniels>
eric_engestrom: doesn't seem to be a DoS, no
<sam_>
> see .1 tagged not long after .0, "huh ok that's shorter than usual, they must have botched something", end of
<daniels>
bear in mind that there are 24 shared slots across the whole of x86-64
<daniels>
so it doesn't exactly take a concerted effort to tie them up
<eric_engestrom>
daniels: true
<daniels>
probably the bigger issue is that we only have two shared runners active atm, so 16 slots
<eric_engestrom>
ie. concurrent=8 ?
<eric_engestrom>
sam_: yeah, I don't think we should replace the .0, but I was wondering if there was an issue with deleting the bad one
<daniels>
eric_engestrom: yeah
<eric_engestrom>
sam_: sounds like the only issue might be some packagers jumping the gun and releasing .0 when they see the tag instead of waiting for the email
<eric_engestrom>
(in which case I'm tempted to say they brought it on themselves :P)
<eric_engestrom>
daniels: do you know if any fdo admin has experimented with making the non-hw runners scale with demand?
sumits has joined #freedesktop
<eric_engestrom>
I don't know how to actually do it so I also don't know if there are reasons that make it not workable for us, and also maybe it's too much work for not enough benefit, but it does pop in the back of my mind every now and then
sumits has quit []
heapify is now known as heapheap
sumits has joined #freedesktop
<DavidHeidelberg[m]>
eric_engestrom: joining the complaining party. I have few quick jobs, but everything in pending :(
<daniels>
eric_engestrom: I'm quite sure no fd.o admins, but other non-fd.o people may have done
<daniels>
we do a bit of dynamic provisioning stuff at Collabora, but it doesn't directly apply; we have fewer & longer-running jobs rather than 10 billion jobs which all complete relatively quickly, so the elasticity is very different
<daniels>
anyway, I've just binned the dead runner and created a new one
<__tim>
I wonder if it would make sense to have a separate tag for linux image build jobs, similar to placeholder
<__tim>
99% of them finish super quick and the rest is i/o bound rather than cpu bound
<__tim>
and those often block the rest of the pipeline, so good to get them going quicker
heapheap has quit []
<daniels>
I think the thing which would actually make the most sense is that you run a single check-container job under the placeholder-jobs tag, which checks everything is up to date, and if not launches a child pipeline to rebuild
<__tim>
hah
<daniels>
bentiss: ugh, have you seen rpm-ostree failing to install stuff on coreos? I just took main and tried to provision a new runner, but https://gitlab.freedesktop.org/-/snippets/7665
zxq9 has quit [Ping timeout: 480 seconds]
<alatiera>
btw, what I did recently in the gst runners was to bump the global job limit, but have two runners per config
<alatiera>
one for builds with the normal 8 or so limit
<alatiera>
and another with the placeholder tag and 24 jobs limit
<alatiera>
could probably go way up as well
<__tim>
the placeholder ones are not the problem though afaict
<daniels>
right, the one with placeholder-jobs has a global limit of like 32768 or something dumb
<alatiera>
we could do something similar for image builds
<alatiera>
since they don't really hammer cpu or ram
<alatiera>
nor io really unless they are done composing
vkareh has joined #freedesktop
blatant has quit [Ping timeout: 480 seconds]
mvlad has joined #freedesktop
<bentiss>
daniels: i can be in front of a laptop in ~ 30 minutes
<bentiss>
it might be an issue with the layered packages
<daniels>
bentiss: no rush at all! just wondering if it was something you'd seen before
<daniels>
I've never seen anything that with Silverblue here :(
<bentiss>
yeah it's weird
<bentiss>
daniels: not sure if you saw but I think I got the ceph object store multisite working
<bentiss>
the harbor data was copied this morning and I started the opa one
<bentiss>
I spent my whole week spinning up clusters ;)
<daniels>
bentiss: oh wow! no I didn't see but that's super cool :D
<bentiss>
I think the new cluster is almost ready
blatant has joined #freedesktop
<MrCooper>
"`sanity` on a Marge pipeline shouldn't be waiting 5 minutes before a runner can pick it up" is unrealistic without overprovisioning runner capacity on average
blatant has quit []
<bentiss>
daniels: in front of a laptop now
<bentiss>
which runner did you binned?
<sam_>
eric_engestrom: I guess it's fine if you just delete and never re-use, yeah
<bentiss>
(and FWIW, OPA has replicated 2.5TB over the 7.5TB since this morning, so it's good)
<daniels>
bentiss: I deleted -17
<daniels>
bentiss: it was unresponsive to ping, unresponsive to SOS, and powering down/up wasn't doing anything either
<daniels>
bentiss: then provisioned -23
<bentiss>
daniels: ok, so it was one of the last 2 debians
<daniels>
yeah, I mean if we can get -23 working then we might as well replace -21 too so they're all COS?
<bentiss>
daniels: for coreos, I enabled the autologin on the console. I think it's safe enough and way more convenient
<bentiss>
yep
<daniels>
++
<bentiss>
actually, I should do the same for the unners
<bentiss>
daniels: rpm-ostree seems to state that it is correctly installed, even though command failed... have you tried rebootin the server?
<bentiss>
it could be the "--apply-live" that fails and for the runners we have to force a reboot
<daniels>
bentiss: trying now
<bentiss>
k
Haaninjo has joined #freedesktop
<eric_engestrom>
kisak: yeah, it was created by mistake and deleted afterwards, but not before it got propagated; git doesn't have a way to propagate deletes so there's not much we can do :(
<eric_engestrom>
(haha sam_ yep you did)
<kisak>
meh? seems like I should complain less?
<eric_engestrom>
MrCooper: re- overprovisioning that's why I was wondering about automatic scaling, so that we don't overprovision, and we can instead have fewer runners up during slow periods (weekends, EU/US night), and automatically have more during high demand hours
AbleBacon has joined #freedesktop
psukys has joined #freedesktop
<bentiss>
eric_engestrom: it's way easier for us to request a fix number of free machines from equiinix than autoscaling
<eric_engestrom>
kisak: haha no, don't worry about complaining! you've always been right every time you've pointed something out, and in this specific case we made a mistake and partly fixed it but we can't fully fix it
<eric_engestrom>
bentiss: ack :(
<eric_engestrom>
daniels: +1 on the "build jobs need a very different elasticity than test jobs", and yeah having the container jobs (and also the build jobs?) be in a child pipeline would actually make a lot of sense, provided `needs:` works across children pipelines; since you mention it do you know if someone's already looking into this?
<bentiss>
daniels: looks like the reboot went through and that binutils-ppc64le is correctly installed
<daniels>
bentiss: gitlab-runner failed to start because we can't pull the container because too many requests :(
<bentiss>
we can work around it by pushing the image we have on ml-22 on our registry...
<bentiss>
daniels: `sudo podman pull registry.gitlab.com/gitlab-org/gitlab-runner:latest` went through :)
<daniels>
\o/ \o/
<daniels>
though sadly the service still failed to start
<bentiss>
daniels: I manually edited the service in /etc to point at the gitlab.com address.
<bentiss>
daniels: I need to run now. The thing left to check is if the gitlab-runner-register service starts properly and we should have the runner up
<daniels>
bentiss: haha, I didn't realise gitlab.com had their own registry?
<daniels>
*their own registry which was publicly advertised and worked for gitlab-runner
<bentiss>
daniels: I'll let you update the config or I'll do it tonight for the last runner if you want
<daniels>
in any case, it's working now, thanks a lot!
<bentiss>
\o/
<daniels>
update the config -> add the /etc/hosts entry?
<bentiss>
no change the gitlab-runner.service in the butane config on fdo-infra
<zmike>
moreover why does this job block test jobs
<eric_engestrom>
zmike: you should kill it and retry it
<eric_engestrom>
the job hasn't even started, it's clearly having network issues when trying to set up a runner
<zmike>
🤕
<eric_engestrom>
as for blocking test jobs, this will be fixed once a future gitlab update is installed (not sure if it's the next one or the one after)
<eric_engestrom>
zmike: there you go: took 50 sec :)
<eric_engestrom>
I'm going to add a 2 or 5 min job timeout since this specific job should never take long, but what happened here can happen to every job, and most of them can be given a tiny timeout
<zmike>
timeouts++
kisak has left #freedesktop [#freedesktop]
pkira__ has joined #freedesktop
pkira_ has quit [Ping timeout: 480 seconds]
pkira__ has quit [Ping timeout: 480 seconds]
<bentiss>
daniels: thanks for bringing up ml24!
<bentiss>
daniels: we are however lacking one placeholder runner now that ml-17 is gone
lyudess has quit []
Lyude has joined #freedesktop
<daniels>
oops
ximion has joined #freedesktop
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
mvlad has quit [Remote host closed the connection]
thaller is now known as Guest7629
thaller has joined #freedesktop
Guest7629 has quit [Ping timeout: 480 seconds]
psukys has quit [Ping timeout: 480 seconds]
random_james_away has quit [Remote host closed the connection]
rsripada_ has quit [Remote host closed the connection]
random_james has joined #freedesktop
rsripada has joined #freedesktop
Haaninjo has quit [Quit: Ex-Chat]
alpernebbi has quit [Ping timeout: 480 seconds]
sima has quit [Ping timeout: 480 seconds]
alpernebbi has joined #freedesktop
psykose has quit [Remote host closed the connection]