#freedesktop on 2023-08-01 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org

00:19 AbleBacon has quit [Read error: Connection reset by peer]

00:23 Kayden has joined #freedesktop

01:00 columbarius has joined #freedesktop

01:02 co1umbarius has quit [Ping timeout: 480 seconds]

01:51 ximion has quit [Quit: Detached from the Matrix]

02:25 GNUmoon has quit [Read error: Connection reset by peer]

02:26 GNUmoon has joined #freedesktop

02:47 keypresser86 has quit []

02:48 keypresser86 has joined #freedesktop

02:52 pixelcluster has quit [Ping timeout: 480 seconds]

03:04 krushia has quit [Ping timeout: 480 seconds]

03:32 keypresser86 has quit []

04:44 MajorBiscuit has joined #freedesktop

05:22 bmodem has joined #freedesktop

06:31 bmodem has quit [Remote host closed the connection]

06:35 sima has joined #freedesktop

06:47 bmodem has joined #freedesktop

06:48 ximion has joined #freedesktop

06:53 tzimmermann has joined #freedesktop

06:54 MajorBiscuit has quit [Quit: WeeChat 3.6]

07:03 pkira has joined #freedesktop

07:11 pkira_ has joined #freedesktop

07:16 ximion has quit [Quit: Detached from the Matrix]

07:18 pkira has quit [Ping timeout: 480 seconds]

07:20 AbleBacon has joined #freedesktop

07:27 bmodem has quit [Ping timeout: 480 seconds]

07:27 bmodem has joined #freedesktop

07:29 bmodem has quit [Excess Flood]

07:30 bmodem has joined #freedesktop

07:32 bmodem has quit [Remote host closed the connection]

07:32 bmodem has joined #freedesktop

07:46 pixelcluster has joined #freedesktop

07:56 tzimmermann has quit [Remote host closed the connection]

07:57 tzimmermann has joined #freedesktop

08:22 <MrCooper> eric_engestrom: FYI, now everyone who had already fetched the older mesa-23.2.0 tag will need to manually delete that, or git fetch will fail

08:25 <MrCooper> glslang (IIRC) keeps changing their tags like that, it's pretty annoying (breaks scripts using git fetch)

08:35 nedko has joined #freedesktop

08:38 <nedko> hello, i have issue with gitlab.freedesktio.org. I've signed with github and i want to fork the pipewire repository so to make a pull request. However gitlab tells me that I'm not allowed to have more repositories (There are zero "my" gitlab.fdo repos so far)

08:47 <MrCooper> nedko: see https://gitlab.freedesktop.org/freedesktop/freedesktop/-/wikis/home#warning-restrictions-due-to-spam-warning

08:49 <nedko> MrCooper: thank you

09:03 <nedko> https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/772

09:31 blatant has joined #freedesktop

09:46 blatant has quit [Read error: Connection reset by peer]

09:48 blatant has joined #freedesktop

09:59 AbleBacon has quit [Quit: I am like MacArthur; I shall return.]

10:06 tzimmermann has quit [Remote host closed the connection]

10:07 tzimmermann has joined #freedesktop

10:08 blatant has quit [Read error: Connection reset by peer]

10:27 tzimmermann has quit [Quit: Leaving]

10:48 <eric_engestrom> MrCooper: yeah I know, but there's no good solution for when a tag was created by accident :/

10:49 <eric_engestrom> not sure what dcbaker and/or me should do when we really do release 23.2.0 final

10:50 <eric_engestrom> the only thing I can think of is to skip .0 and go from .0-rc4 to .1

10:50 <sam_> please don't delete it, it's even more confusing and it scares downstreams

10:51 heapify has joined #freedesktop

10:51 <psykose> .0.1 :)

10:52 blatant has joined #freedesktop

10:53 <MrCooper> or maybe something like mesa-23.2.0-fixed

10:55 <psykose> issue with adding -words to the tag is most build templates do https://url/archive/$someversion.tar, and so for the one release someone has to spend 15 seconds typing $someversion-fixed just the once (so it's not in the real package version reflected in the package after, since that would be weird)

10:55 <psykose> just .0.1/.1 needs no changes anywhere and people just increment it more

10:56 <psykose> depends on how meaningful a real .0 release is or isn't

11:01 sumits has quit [Quit: ZNC - http://znc.in]

11:03 <eric_engestrom> yeah adding words is not a good idea

11:03 <eric_engestrom> sam_: "it scares downstreams" could you expand on that?

11:05 <eric_engestrom> fdo admins, could you have a look at the workload of the runners?

11:05 <eric_engestrom> the generic ones

11:05 <eric_engestrom> eg. fdo-equinix-m3l-*

11:05 <eric_engestrom> `sanity` on a Marge pipeline shouldn't be waiting 5 minutes before a runner can pick it up

11:06 <eric_engestrom> I expect someone is overloading all of our runners

11:06 <sam_> it generally makes distributors wonder what went on, we have to go checking through issues to see why the tag existed and got deleted, if any of us packaged it then we end up getting checksum errors when we fetch the old tag and have to try get a copy of old/new and diff it in case of malicious stuff

11:06 <sam_> it's much easier to just do .0/.1 like psykose says and move on because we can do it without thinking

11:07 <daniels> eric_engestrom: doesn't seem to be a DoS, no

11:07 <sam_> > see .1 tagged not long after .0, "huh ok that's shorter than usual, they must have botched something", end of

11:07 <daniels> bear in mind that there are 24 shared slots across the whole of x86-64

11:07 <daniels> so it doesn't exactly take a concerted effort to tie them up

11:07 <eric_engestrom> daniels: true

11:07 <daniels> probably the bigger issue is that we only have two shared runners active atm, so 16 slots

11:08 <eric_engestrom> ie. concurrent=8 ?

11:09 <eric_engestrom> sam_: yeah, I don't think we should replace the .0, but I was wondering if there was an issue with deleting the bad one

11:09 <daniels> eric_engestrom: yeah

11:10 <eric_engestrom> sam_: sounds like the only issue might be some packagers jumping the gun and releasing .0 when they see the tag instead of waiting for the email

11:10 <eric_engestrom> (in which case I'm tempted to say they brought it on themselves :P)

11:12 <eric_engestrom> daniels: do you know if any fdo admin has experimented with making the non-hw runners scale with demand?

11:13 sumits has joined #freedesktop

11:14 <eric_engestrom> I don't know how to actually do it so I also don't know if there are reasons that make it not workable for us, and also maybe it's too much work for not enough benefit, but it does pop in the back of my mind every now and then

11:15 sumits has quit []

11:23 heapify is now known as heapheap

11:32 sumits has joined #freedesktop

11:36 <DavidHeidelberg[m]> eric_engestrom: joining the complaining party. I have few quick jobs, but everything in pending :(

11:58 <daniels> eric_engestrom: I'm quite sure no fd.o admins, but other non-fd.o people may have done

11:59 <daniels> we do a bit of dynamic provisioning stuff at Collabora, but it doesn't directly apply; we have fewer & longer-running jobs rather than 10 billion jobs which all complete relatively quickly, so the elasticity is very different

11:59 <daniels> anyway, I've just binned the dead runner and created a new one

12:05 <__tim> I wonder if it would make sense to have a separate tag for linux image build jobs, similar to placeholder

12:06 <__tim> 99% of them finish super quick and the rest is i/o bound rather than cpu bound

12:06 <__tim> and those often block the rest of the pipeline, so good to get them going quicker

12:13 heapheap has quit []

12:15 <daniels> I think the thing which would actually make the most sense is that you run a single check-container job under the placeholder-jobs tag, which checks everything is up to date, and if not launches a child pipeline to rebuild

12:17 <__tim> hah

12:19 <daniels> bentiss: ugh, have you seen rpm-ostree failing to install stuff on coreos? I just took main and tried to provision a new runner, but https://gitlab.freedesktop.org/-/snippets/7665

12:24 zxq9 has quit [Ping timeout: 480 seconds]

12:29 <alatiera> btw, what I did recently in the gst runners was to bump the global job limit, but have two runners per config

12:29 <alatiera> one for builds with the normal 8 or so limit

12:29 <alatiera> and another with the placeholder tag and 24 jobs limit

12:29 <alatiera> could probably go way up as well

12:29 <__tim> the placeholder ones are not the problem though afaict

12:29 <daniels> right, the one with placeholder-jobs has a global limit of like 32768 or something dumb

12:30 <alatiera> we could do something similar for image builds

12:30 <alatiera> since they don't really hammer cpu or ram

12:31 <alatiera> nor io really unless they are done composing

12:42 vkareh has joined #freedesktop

12:59 blatant has quit [Ping timeout: 480 seconds]

13:03 mvlad has joined #freedesktop

13:09 <bentiss> daniels: i can be in front of a laptop in ~ 30 minutes

13:11 <bentiss> it might be an issue with the layered packages

13:14 <daniels> bentiss: no rush at all! just wondering if it was something you'd seen before

13:14 <daniels> I've never seen anything that with Silverblue here :(

13:15 <bentiss> yeah it's weird

13:16 <bentiss> daniels: not sure if you saw but I think I got the ceph object store multisite working

13:17 <bentiss> the harbor data was copied this morning and I started the opa one

13:17 <bentiss> I spent my whole week spinning up clusters ;)

13:19 <daniels> bentiss: oh wow! no I didn't see but that's super cool :D

13:20 <bentiss> I think the new cluster is almost ready

13:24 blatant has joined #freedesktop

13:24 <MrCooper> "`sanity` on a Marge pipeline shouldn't be waiting 5 minutes before a runner can pick it up" is unrealistic without overprovisioning runner capacity on average

13:24 blatant has quit []

13:36 <bentiss> daniels: in front of a laptop now

13:37 <bentiss> which runner did you binned?

13:37 <sam_> eric_engestrom: I guess it's fine if you just delete and never re-use, yeah

13:37 <bentiss> (and FWIW, OPA has replicated 2.5TB over the 7.5TB since this morning, so it's good)

13:38 <daniels> bentiss: I deleted -17

13:38 <daniels> bentiss: it was unresponsive to ping, unresponsive to SOS, and powering down/up wasn't doing anything either

13:38 <daniels> bentiss: then provisioned -23

13:39 <bentiss> daniels: ok, so it was one of the last 2 debians

13:39 <daniels> yeah, I mean if we can get -23 working then we might as well replace -21 too so they're all COS?

13:39 <bentiss> daniels: for coreos, I enabled the autologin on the console. I think it's safe enough and way more convenient

13:39 <bentiss> yep

13:40 <daniels> ++

13:41 <bentiss> actually, I should do the same for the unners

13:41 <bentiss> runners

13:41 <bentiss> it's only enabled on teh servers

13:46 kisak has joined #freedesktop

13:48 <kisak> phantom mesa tag :D https://cgit.freedesktop.org/mesa/mesa/tag/?h=mesa-23.2.0 ... but it's not in https://gitlab.freedesktop.org/mesa/mesa/-/tags

13:48 <sam_> told you!

13:49 <bentiss> daniels: rpm-ostree seems to state that it is correctly installed, even though command failed... have you tried rebootin the server?

13:49 <bentiss> it could be the "--apply-live" that fails and for the runners we have to force a reboot

13:49 <daniels> bentiss: trying now

13:49 <bentiss> k

13:50 Haaninjo has joined #freedesktop

13:50 <eric_engestrom> kisak: yeah, it was created by mistake and deleted afterwards, but not before it got propagated; git doesn't have a way to propagate deletes so there's not much we can do :(

13:51 <eric_engestrom> (haha sam_ yep you did)

13:52 <kisak> meh? seems like I should complain less?

13:52 <eric_engestrom> MrCooper: re- overprovisioning that's why I was wondering about automatic scaling, so that we don't overprovision, and we can instead have fewer runners up during slow periods (weekends, EU/US night), and automatically have more during high demand hours

13:53 AbleBacon has joined #freedesktop

13:54 psukys has joined #freedesktop

13:54 <bentiss> eric_engestrom: it's way easier for us to request a fix number of free machines from equiinix than autoscaling

13:55 <eric_engestrom> kisak: haha no, don't worry about complaining! you've always been right every time you've pointed something out, and in this specific case we made a mistake and partly fixed it but we can't fully fix it

13:55 <eric_engestrom> bentiss: ack :(

13:56 <eric_engestrom> daniels: +1 on the "build jobs need a very different elasticity than test jobs", and yeah having the container jobs (and also the build jobs?) be in a child pipeline would actually make a lot of sense, provided `needs:` works across children pipelines; since you mention it do you know if someone's already looking into this?

13:56 <bentiss> daniels: looks like the reboot went through and that binutils-ppc64le is correctly installed

14:00 <daniels> bentiss: gitlab-runner failed to start because we can't pull the container because too many requests :(

14:00 <bentiss> ouch

14:01 <daniels> and https://gitlab.com/gitlab-org/gitlab-runner/container_registry/29383?orderBy=NAME&sort=asc&search%5B%5D=latest&search%5B%5D= seems to be broken

14:04 <bentiss> we can work around it by pushing the image we have on ml-22 on our registry...

14:08 <bentiss> daniels: `sudo podman pull registry.gitlab.com/gitlab-org/gitlab-runner:latest` went through :)

14:08 <daniels> \o/ \o/

14:12 <daniels> though sadly the service still failed to start

14:14 <bentiss> daniels: I manually edited the service in /etc to point at the gitlab.com address.

14:14 <bentiss> daniels: I need to run now. The thing left to check is if the gitlab-runner-register service starts properly and we should have the runner up

14:14 <daniels> bentiss: haha, I didn't realise gitlab.com had their own registry?

14:16 <daniels> *their own registry which was publicly advertised and worked for gitlab-runner

14:16 <bentiss> daniels: I'll let you update the config or I'll do it tonight for the last runner if you want

14:16 <daniels> in any case, it's working now, thanks a lot!

14:16 <bentiss> \o/

14:16 <daniels> update the config -> add the /etc/hosts entry?

14:16 <bentiss> no change the gitlab-runner.service in the butane config on fdo-infra

14:19 <daniels> ahaaa, I see

14:24 vkareh has quit [Quit: WeeChat 3.6]

14:24 scrumplex has quit [Quit: Quassel - Signing Off]

14:25 scrumplex has joined #freedesktop

14:47 bmodem has quit [Ping timeout: 480 seconds]

15:34 <zmike> maybe this is obvious, but why does the clang-format job take 20+ minutes https://gitlab.freedesktop.org/zmike/mesa/-/jobs/46476087

15:34 <zmike> moreover why does this job block test jobs

15:48 <eric_engestrom> zmike: you should kill it and retry it

15:49 <eric_engestrom> the job hasn't even started, it's clearly having network issues when trying to set up a runner

15:49 <zmike> 🤕

15:50 <eric_engestrom> as for blocking test jobs, this will be fixed once a future gitlab update is installed (not sure if it's the next one or the one after)

15:52 <eric_engestrom> zmike: there you go: took 50 sec :)

15:53 <eric_engestrom> I'm going to add a 2 or 5 min job timeout since this specific job should never take long, but what happened here can happen to every job, and most of them can be given a tiny timeout

15:57 <zmike> timeouts++

16:04 kisak has left #freedesktop [#freedesktop]

16:58 pkira__ has joined #freedesktop

17:05 pkira_ has quit [Ping timeout: 480 seconds]

17:06 pkira__ has quit [Ping timeout: 480 seconds]

17:25 <bentiss> daniels: thanks for bringing up ml24!

17:25 <bentiss> daniels: we are however lacking one placeholder runner now that ml-17 is gone

17:31 lyudess has quit []

17:31 Lyude has joined #freedesktop

17:31 <daniels> oops

17:36 ximion has joined #freedesktop

18:33 alanc has quit [Remote host closed the connection]

18:33 alanc has joined #freedesktop

19:00 mvlad has quit [Remote host closed the connection]

19:38 thaller is now known as Guest7629

19:38 thaller has joined #freedesktop

19:44 Guest7629 has quit [Ping timeout: 480 seconds]

20:20 psukys has quit [Ping timeout: 480 seconds]

20:22 random_james_away has quit [Remote host closed the connection]

20:22 rsripada_ has quit [Remote host closed the connection]

20:22 random_james has joined #freedesktop

20:22 rsripada has joined #freedesktop

20:28 Haaninjo has quit [Quit: Ex-Chat]

21:09 alpernebbi has quit [Ping timeout: 480 seconds]

21:10 sima has quit [Ping timeout: 480 seconds]

21:10 alpernebbi has joined #freedesktop

21:57 psykose has quit [Remote host closed the connection]

23:11 m5zs7k has quit []

23:13 IndiumInLCD has joined #freedesktop

23:16 IndiumInLCD has quit []