#freedesktop on 2025-04-03 — irc logs at oftc.irclog.whitequark.org

2024-07-16 04:52 ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org

00:05 scrumplex has joined #freedesktop

00:08 scrumplex_ has quit [Ping timeout: 480 seconds]

01:03 scrumplex_ has joined #freedesktop

01:06 scrumplex has quit [Ping timeout: 480 seconds]

01:10 bozo16 has quit [Remote host closed the connection]

03:07 swatish2 has joined #freedesktop

03:17 dcunit3d has joined #freedesktop

03:32 swatish2 has quit [Ping timeout: 480 seconds]

03:38 swatish2 has joined #freedesktop

03:52 swatish2 has quit [Ping timeout: 480 seconds]

04:29 eluks has quit [Remote host closed the connection]

04:30 eluks has joined #freedesktop

04:50 ximion has quit [Quit: Detached from the Matrix]

04:50 ximion has joined #freedesktop

05:11 swatish2 has joined #freedesktop

05:30 tlwoerner_ has joined #freedesktop

05:31 tlwoerner has quit [Ping timeout: 480 seconds]

05:40 tzimmermann has joined #freedesktop

05:45 swatish2 has quit [Ping timeout: 480 seconds]

05:56 jsa1 has joined #freedesktop

06:04 swatish2 has joined #freedesktop

06:44 <svuorela> I've gotten a feeling that gitlab has gotten slower ?

06:45 <MrCooper> compared to a pre- or post-migration baseline?

06:49 nephyrin has quit [Quit: ... besides, it was hot]

06:51 DemiMarie is now known as Guest12805

07:00 sghuge has quit [Remote host closed the connection]

07:00 ximion has quit [Remote host closed the connection]

07:00 sghuge has joined #freedesktop

07:04 <svuorela> definitely compared to a post-migration baseline, maybe even compared to a pre-migration baseline.

07:05 <svuorela> (I'm primarily in poppler if that makes a difference)

07:06 swatish2 has quit [Ping timeout: 480 seconds]

07:07 nephyrin has joined #freedesktop

07:11 swatish2 has joined #freedesktop

07:28 sima has joined #freedesktop

07:28 AbleBacon has quit [Read error: Connection reset by peer]

07:36 swatish21 has joined #freedesktop

07:36 MrCooper_ has joined #freedesktop

07:38 <bilboed> hm... indeed

07:39 MrCooper has quit [Ping timeout: 480 seconds]

07:42 swatish2 has quit [Ping timeout: 480 seconds]

07:55 jsa1 has quit [Ping timeout: 480 seconds]

08:02 <slomo> i think still a bit faster then pre-migration, but not as much as right after migration

08:03 jsa1 has joined #freedesktop

08:13 <bentiss> one thing that could explain, is the backups of gitaly that I only enabled last Sunday

08:13 <bentiss> the daily backup takes 6h, and we are 4h12m in

08:15 <bentiss> side note: I've enabled fastly for all *.freedesktop.org pages sites as of this morning. Of course I screwed up a bit the DNS, so if this is not working yet, wait a little bit more that the DNS gets cached properly to fastly

08:15 <bentiss> (IOW, mesa.freedesktop.org is using fastly, mesa3d.org is not)

08:22 overtime69ffcf[m] has joined #freedesktop

08:27 <eric_engestrom> bentiss: womp womp... we need to add other tags to fdo runners, otherwise just `priority:low` gets picked by any runner that has that tag, such as... a steamdeck in mupuf's farm: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/73883719

08:27 <eric_engestrom> I think we need to haave the fdo runners register both the priority tag and an `fdo-runner` tag or something like that, and jobs needs to require both

08:29 kxkamil2 has quit []

08:30 <eric_engestrom> (ci-tron jobs are fine because they always have a tag for the farm they run on, so they don't risk being picked up by fdo runners, it's only the other way around that's a problem right now)

08:32 <bentiss> sigh, the tagging mechanism in gitlab is just shitty

08:32 <eric_engestrom> yeah :/

08:33 zerozero2 has quit [Ping timeout: 480 seconds]

08:34 <eric_engestrom> I think my solution should work though, what do you think?

08:36 <bentiss> I just checked, this runner from mupuf is the only one having priority:low (mupuf-gfx10-vangogh-1 and mupuf-gfx10-vangogh-5), so I wonder if we should not address that instead

08:36 <bentiss> your solution works, but I feel like that's not the best

08:37 MrCooper_ is now known as MrCooper

08:40 <bentiss> mupuf: do you use the priority in mupuf-gfx10-vangogh-*?

08:41 overtime69ffcf[m] has left #freedesktop [User left]

08:42 <eric_engestrom> yeah we use it, but we can rename it to eg. `ci-tron-priority:*`

08:43 <eric_engestrom> or `ci-tron:priority:*` to be more in line with our other tags

08:49 <eric_engestrom> bentiss, mupuf: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34358

08:49 <eric_engestrom> (I haven't renamed the tag on the farm side, I'll do that when merging this)

08:50 <bentiss> thanks!

08:59 <mupuf> I proposed an alternative name

09:08 swatish21 has quit [Ping timeout: 480 seconds]

09:08 samuelig has joined #freedesktop

09:22 ybogdano has quit [Remote host closed the connection]

09:22 ybogdano has joined #freedesktop

09:30 <__tim> we've been seeing loads of "WARNING: Uploading artifacts as "archive" to coordinator... 500 Internal Server Error" since yesterday, is that related to the hetzner S3 problems or something else?

09:31 <__tim> (and failed jobs/pipelines as a result)

09:33 ___nick___ has joined #freedesktop

09:39 swatish2 has joined #freedesktop

09:41 <mupuf> bentiss:

09:41 <mupuf> is fastly still hammering our bandwidth?

09:42 <mupuf> I keep getting KVM jobs to timeout, seemingly due to slow network but could also be insane CPU usage too

09:47 MrCooper_ has joined #freedesktop

09:51 kxkamil has joined #freedesktop

09:51 MrCooper has quit [Ping timeout: 480 seconds]

09:56 JerryXiao has quit [Quit: Bye]

10:04 <bentiss> mupuf: runner-x86-1 is pulling a lot of data from RIPE-ERX-146-75-0-0

10:05 <mupuf> bentiss: any idea what this is?

10:05 <bentiss> nothing seems abnormal on the runner

10:05 <bentiss> android-ndk is running, maybe that's related

10:06 <bentiss> qit eventually stopped

10:07 <bentiss> s/qit/it

10:07 <bentiss> mupuf: also one thing to remember, is that those runners now only have a single Gbit line, when the Equinix ones had a 10 Gbit (maybe dual)

10:08 <mupuf> bentiss: ack, but it shouldn't be using *that* much network

10:08 <bentiss> mupuf: link?

10:08 <bentiss> __tim: yeah, hetzner is still having a little bit of issues with their object storage

10:09 <__tim> "little bit" 😆

10:09 <mupuf> bentiss: https://gitlab.freedesktop.org/samueldr/ci-tron/-/jobs/73885848

10:09 <mupuf> this very same step takes less than 5 minutes on all the gateways we have, and none have as good a connection as one would expect from hetzner

10:10 <mupuf> at equinix, it took less than 2 minutes

10:11 <bentiss> I just don't know what I'm supposed to see

10:11 <mupuf> it shouldn't be pulling much data at all, no more than 100 MB... most of it coming from a DNF update

10:11 <mupuf> there isn't much to see, indeed

10:12 <mupuf> maybe I could re-run the job and from there you could tell if there is a high cpu load or something?

10:12 <bentiss> the pull, create, and init the container only took 32 secs, so I guess it's not a network issue

10:12 <mupuf> yeah, but sometimes I saw that just revalidating an artifact (a HEAD request) would take over a minute

10:13 <bentiss> artifacts is different, as mentioned above hetzner is having issues

10:13 MrCooper_ is now known as MrCooper

10:14 <mupuf> it's basically been ever since you moved the kvm runner, so that pre-dates the issues at hetzner

10:14 <bentiss> k

10:30 swatish2 has quit [Ping timeout: 480 seconds]

10:33 <mupuf> bentiss, eric_engestrom: the gitlab runner priority has a little bit of a bug

10:34 <mupuf> no jobs of a lower priority will be picked up as long as there is at least a high priority job executing

10:34 <mupuf> the script was designed for `parallel: 1`

10:37 <mupuf> in other words, we should probably drop the parallel and register as many runners as we can run in parallel

10:44 <eric_engestrom> ah indeed, good catch

10:45 <eric_engestrom> I haven't looked at how bentiss integrated my code into the fdo infra; do you have a link I could look at?

10:51 <mupuf> https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/tree/main/cloud-init/ci-baremetal/files.d/etc/gitlab-runner?ref_type=heads is what I've found

10:51 <mupuf> but not sure how this works

10:52 <mupuf> https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/blob/main/cloud-init/ci-baremetal/runcmd.d/70-prep-gitlab-runner.gotmpl?ref_type=heads seems like this is the invocation of the templates

10:56 <eric_engestrom> and https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/blob/main/cloud-init/ci-baremetal/files.d/usr/local/bin/gitlab_runner_priority.py for the imported script

10:58 swatish2 has joined #freedesktop

10:58 todi1 has quit []

10:59 todi has joined #freedesktop

11:26 Caterpillar has quit [Quit: Konversation terminated!]

11:26 Caterpillar has joined #freedesktop

11:30 <bentiss> yep to all three files

11:31 <bentiss> https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/commit/0123a6a14bd806be7c1063955c9293db54ce2b3a is the commit for handling concurrency

11:31 MrCooper_ has joined #freedesktop

11:32 <bentiss> mupuf, eric_engestrom: IIRC I fixed that in the deployed version. Each runner has a concurrent variable set to the number of threads, and the commit from above ensures each thread is independant of each other

11:33 <bentiss> https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/blob/main/cloud-init/ci-baremetal/files.d/etc/gitlab-runner/config.toml?ref_type=heads for the deployed config

11:35 MrCooper has quit [Ping timeout: 480 seconds]

11:37 <mupuf> bentiss: great!

11:38 <bentiss> I just realized I promised eric_engestrom a MR with my changes... sorry

11:43 swatish2 has quit [Ping timeout: 480 seconds]

12:01 guludo has joined #freedesktop

12:02 sooc has quit [Remote host closed the connection]

12:02 mebious has quit [Write error: connection closed]

12:02 Guest9365 has quit [Remote host closed the connection]

12:02 moses has quit [Remote host closed the connection]

12:02 nucfreq has quit [Remote host closed the connection]

12:02 shymega[i] has quit [Remote host closed the connection]

12:02 elibrokeit_ has quit [Remote host closed the connection]

12:02 rpigott has quit [Remote host closed the connection]

12:02 ajhalili2006 has quit [Remote host closed the connection]

12:07 MrCooper_ is now known as MrCooper

12:12 moses has joined #freedesktop

12:17 raghavgururajan has joined #freedesktop

12:17 jsa1 has quit [Ping timeout: 480 seconds]

12:17 raghavgururajan is now known as Guest12825

12:19 shymega[i] has joined #freedesktop

12:21 elibrokeit_ has joined #freedesktop

12:24 rpigott has joined #freedesktop

12:26 jsa1 has joined #freedesktop

12:26 sooc has joined #freedesktop

12:27 nucfreq has joined #freedesktop

12:29 ajhalili2006 has joined #freedesktop

12:30 mebious has joined #freedesktop

12:40 MrCooper_ has joined #freedesktop

12:42 ximion has joined #freedesktop

12:43 swatish2 has joined #freedesktop

12:43 MrCooper has quit [Ping timeout: 480 seconds]

12:50 jsa1 has quit [Ping timeout: 480 seconds]

13:00 <eric_engestrom> bentiss: no worries! I applied some already

13:01 <eric_engestrom> the concurrent change didn't make enough sense to me so I didn't apply it back for now

13:01 <eric_engestrom> also, there's `os.cpu_count()` instead of calling `nproc` :)

13:05 <eric_engestrom> also, we might want a check that `cpu_count % concurrent == 0` to make sure we're not leaving some cpus unreachable

13:12 emersion_ has joined #freedesktop

13:12 r00tobo[BNC] has joined #freedesktop

13:14 phryk_ has joined #freedesktop

13:14 MrCooper__ has joined #freedesktop

13:14 minus_ has joined #freedesktop

13:15 emersion has quit [Read error: Connection reset by peer]

13:15 leftas has quit [Quit: Ping timeout (120 seconds)]

13:15 ocrete has quit [Quit: Ping timeout (120 seconds)]

13:15 dbrouwer has quit [Quit: Ping timeout (120 seconds)]

13:15 tintou has quit [Quit: Ping timeout (120 seconds)]

13:15 ndufresne has quit [Quit: Ping timeout (120 seconds)]

13:15 ao2_collabora has quit [Quit: Ping timeout (120 seconds)]

13:15 konstantin has quit [Remote host closed the connection]

13:15 r00tobo has quit [Quit: Quit]

13:15 phryk has quit [Read error: Connection reset by peer]

13:15 minus has quit [Remote host closed the connection]

13:15 mrpops2ko has quit [Remote host closed the connection]

13:16 fantom has joined #freedesktop

13:16 leftas has joined #freedesktop

13:16 konstantin has joined #freedesktop

13:17 ocrete has joined #freedesktop

13:17 mrpops2ko has joined #freedesktop

13:17 ao2_collabora has joined #freedesktop

13:17 jsa1 has joined #freedesktop

13:18 MrCooper_ has quit [Ping timeout: 480 seconds]

13:19 a_fantom has quit [Ping timeout: 480 seconds]

13:22 swatish2 has quit [Ping timeout: 480 seconds]

13:24 dbrouwer has joined #freedesktop

13:29 ndufresne has joined #freedesktop

13:34 tintou has joined #freedesktop

13:43 tintou has quit []

13:44 tintou has joined #freedesktop

13:44 tintou has quit []

13:44 tintou has joined #freedesktop

13:53 <bentiss> eric_engestrom: sure, I'll take any upgrade in that script. It was kind of a "let get this thing working" situation

13:54 <__tim> ooc, how does mesa get any MRs merged? does mesa not do any artefact upload/download?

14:00 <bentiss> __tim: looks like they get a few merged regularly, so not sure if they have an issue

14:01 <__tim> yes exactly, I'm wondering if we're the only ones having issues :) (esp since it's with our own runners)

14:01 <bentiss> oh... link to a failed job?

14:03 <__tim> artefact download on mac os runner (this did pass on the 15th retry though): https://gitlab.freedesktop.org/gstreamer/cerbero/-/jobs/73923885

14:03 <__tim> usually it's the uploads that are failing

14:04 <__tim> and yes, I guess that's the hetzner s3 issue, but why is mesa not so affected?

14:04 <bentiss> maybe they use smaller artifacts?

14:05 <__tim> maybe :)

14:06 <bentiss> but yeah, right now, there isn't much I can do. Worse case we'll have to use a different bucket location, but that means we'd have to move all of the data first, which is a PITA

14:06 <__tim> ouch

14:07 <bentiss> the artifacts data, not the git data

14:08 <bentiss> https://status.hetzner.com/incident/da6b6285-b8a3-450f-b54b-19849ee9a09e is still "investigating"

14:08 <bentiss> I put the data there, to be closer to the machines

14:16 MrCooper_ has joined #freedesktop

14:19 MrCooper__ has quit [Ping timeout: 480 seconds]

14:20 haaninjo has joined #freedesktop

14:22 emersion_ has quit [Remote host closed the connection]

14:22 emersion has joined #freedesktop

14:27 jsa1 has quit [Ping timeout: 480 seconds]

14:46 MrCooper__ has joined #freedesktop

14:50 MrCooper_ has quit [Ping timeout: 480 seconds]

14:56 karolherbst has quit [Read error: Connection reset by peer]

14:56 karolherbst has joined #freedesktop

15:18 swatish2 has joined #freedesktop

15:26 tzimmermann has quit [Quit: Leaving]

15:35 swatish2 has quit [Ping timeout: 480 seconds]

15:51 mripard has quit [Quit: WeeChat 4.6.0]

15:59 Kayden has quit [Quit: -> jf]

16:22 MrCooper_ has joined #freedesktop

16:25 MrCooper__ has quit [Ping timeout: 480 seconds]

16:30 guludo has quit [Ping timeout: 480 seconds]

16:31 guludo has joined #freedesktop

16:34 tlwoerner_ has quit [Ping timeout: 480 seconds]

17:23 swatish2 has joined #freedesktop

17:27 ___nick___ has quit [Ping timeout: 480 seconds]

17:35 swatish2 has quit [Ping timeout: 480 seconds]

17:36 swatish2 has joined #freedesktop

18:00 swatish2 has quit [Ping timeout: 480 seconds]

19:20 Guest12805 is now known as DemiMarie

19:21 <DemiMarie> Was anyone ever concerned about the security of Hetzner’s bare-metal offerings in light of hardware/firmware infection attacks?

19:38 cascardo_ has joined #freedesktop

19:38 cascardo has quit [Ping timeout: 480 seconds]

19:47 haaninjo has quit [Quit: Ex-Chat]

19:49 AbleBacon has joined #freedesktop

20:16 sima has quit [Ping timeout: 480 seconds]

20:32 <pinchartl> DemiMarie: are you volunteering to go camp in the data centre to keep watch ? :-)

20:33 <DemiMarie> pinchartl: what I mean is “was it wise to pick a bare-metal offering run by a not-that-high-end provider, as opposed to one of the big name vendors or a colo”

20:34 <DemiMarie> https://eclypsium.com/blog/the-missing-security-primer-for-bare-metal-cloud-services/

20:35 <pinchartl> are the big names inherently safer ? especially when considering that many of them are USA companies, and are covered by the USA cloud act ?

20:35 <pinchartl> I don't think anyone can answer that question with any certainty

20:35 <DemiMarie> More resources to spend on things like custom board designs

20:36 <DemiMarie> I believe that generally the people who are really concerned about security go for colos

20:36 <DemiMarie> or their own datacenters if the scale justifies it (which this does not)

20:37 <pinchartl> it reminds me of https://xkcd.com/641/. do you pick the cereals guaranteed 100% free of asbestos, or the ones guaranteed 100% free of plutonium ?

20:37 <DemiMarie> see above w.r.t. colos

20:38 <DemiMarie> (read: giving up on cloud and using dedicated hardware)

20:38 <pinchartl> I don't think fdo can afford building its own data centre indeed :-)

20:39 <DemiMarie> I think the general rule is that if security is the top priority, you want to own hardware, not rent it

20:40 <pixelcluster> honestly hetzner isn't exactly a no-name provider either is it

20:40 <pixelcluster> this really seems like a "if it's so important to you, feel free to provide the resources to make it happen" scenario to me

20:40 * pixelcluster is not too involved in infra tbc

20:50 <DragoonAethis> DemiMarie: Would you consider "Oracle" to be enough of a big name to trust?

20:51 aswar002_ has quit []

20:51 <DemiMarie> DragoonAethis: for me, “big name” in the cloud space means “AWS/Azure/GCP”, especially Amazon or Google

20:52 <DragoonAethis> And all 3 of these options are at least an order of magnitude more expensive than what Hetzner gets you

20:52 <DemiMarie> personally, I would have gone with a colo facility and buying servers from a vendor, but if fd.o doesn’t ahve the resources for that it makes sense why they had to go with a different option

20:52 <DemiMarie> DragoonAethis: you get what you pay for in the hosting space

20:54 <DemiMarie> my concern, of course, is that someone would target https://gitlab.freedesktop.org so they can backdoor Mesa or one of the other giant projects

20:55 <vyivel> i would just pay someone to push vulnerable code

20:56 <vyivel> sounds much easier

20:56 <DemiMarie> vyivel: am I too paranoid?

20:57 <DragoonAethis> DemiMarie: kinda?

20:57 <DragoonAethis> This is a massive project that you would like to run at hyperscaler's levels of corporate security

20:58 aswar002 has joined #freedesktop

20:58 <DragoonAethis> Whereas the backend gets 3 part-time admins mostly trying to keep it held together with duct tape

20:58 <vyivel> oh right bribing/blackmailing an admin is even "better"

20:59 <pixelcluster> infecting a server with a baremetal malware to (I guess?) alter some files in the git repo honestly sounds like the most elaborate and expensive setup for the smallest possible result to me

21:04 <airlied> indeed if you wanted to run a botnet on hetzner it might be okay or hoping someone with corp secrets would provision the same server after you, but for a server hosting open source git repos, probably not worth it

21:11 kasper93 has quit [Ping timeout: 480 seconds]

21:26 <alanc> "if security is the top priority" - for fd.o though, security cannot be the top priority - something the org can afford (from both a monetary and admin time perspective) has to be the top priority, since otherwise the project is just dead

21:27 <alanc> security is important, and a high priority, but at a level appropriate to the project, not excluding everything else

21:32 kasper93 has joined #freedesktop

21:43 kasper93 has quit [Ping timeout: 480 seconds]

22:09 <zmike> anyone know what's going on with CI jobs on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34235

22:09 <zmike> seems like trace jobs are having issues maybe?

22:50 guludo has quit [Ping timeout: 480 seconds]

22:58 <robclark> for another example, https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/1396465 .. the traces are not ok

23:40 alanc has quit [Remote host closed the connection]

23:44 alanc has joined #freedesktop