#freedesktop on 2021-05-25 — irc logs at oftc.irclog.whitequark.org

00:46 whot has joined #freedesktop

00:53 whot has quit []

01:15 jstein has quit [Ping timeout: 480 seconds]

02:43 alatiea has left #freedesktop [#freedesktop]

02:44 alatiera has joined #freedesktop

02:45 <imirkin> getting 502's with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests

02:45 <imirkin> Mithrandir: daniels --^

02:45 <imirkin> bentiss: --^

02:46 <imirkin> it says "taking too much time to respond", but the 502 comes back quite snappily (under 1s)

02:46 <alatiera> been getting 502s from gitlab for a couple minues

02:47 <alatiera> imirkin though its past midnight europe time

02:47 <imirkin> quite a bit past midnight

02:54 alatiera is now known as alatiea

02:59 alatiea has left #freedesktop [#freedesktop]

03:01 alatiera has joined #freedesktop

03:11 a_a has joined #freedesktop

03:12 a_a has quit []

03:31 blue__penquin has joined #freedesktop

03:32 blue__penquin is now known as Guest4818

03:34 Guest4818 has quit []

04:06 blue__penquin has joined #freedesktop

04:06 ygb has joined #freedesktop

04:07 ygb has quit []

04:11 ana_ana has joined #freedesktop

04:11 <ana_ana> gitlab looks down?

04:23 blue__penquin has quit []

05:19 danvet has joined #freedesktop

05:20 <bentiss> postgreesql seems to be crashing, on it

05:21 <bentiss> disk full :(

05:21 <alatiera> ouch :/

05:22 <bentiss> that's fine, we can hot-resize it

05:23 blue__penquin has joined #freedesktop

05:27 <bentiss> and we are back

05:28 <bentiss> sorry for the inconvenience, haven't been monitoring the postgres disk for a while

06:45 mceier has joined #freedesktop

07:23 pzanoni has quit [Ping timeout: 480 seconds]

07:31 blue__penquin has quit []

07:32 blue__penquin has joined #freedesktop

07:34 <__tim> don't suppose there's a trick to mark MRs as merged where the commits have been pushed directly?

07:35 <__tim> what's strange is that if I reload https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/827/commits it shows 1 commit, which has already been pushed to master. Usually gitlab picks up on that and closes the MR as merged, or it shows 0 commits in the MR

07:35 <gitlab-bot> GStreamer issue (Merge request) 827 in gstreamer "buffer: rename new gst_buffer_new_copy() to gst_buffer_new_memdup()" [Api, Opened]

07:43 * __tim merge button seems to do the trick ¯\_(ツ)_/¯

08:05 <__tim> still getting things like this fwiw: WARNING: Uploading artifacts as "archive" to coordinator... failed id=10061560 responseStatus=500 Internal Server Error status=500 token=7xTpLTaY

08:18 DragoonAethis has joined #freedesktop

08:20 whot has joined #freedesktop

08:29 <__tim> (post-resize new pipeline, new jobs)

08:46 <__tim> for all ci pipelines

09:03 _whitelogger has joined #freedesktop

09:06 _whitelogger has joined #freedesktop

09:13 _whitelogger has joined #freedesktop

09:15 _whitelogger has joined #freedesktop

09:19 _whitelogger has joined #freedesktop

09:22 blue_penquin has joined #freedesktop

09:27 <kusma> I'm seeing a similar issue here: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/10063369

09:27 <kusma> This is the second time in a row for this MR

10:17 <bentiss> sorry, this happens when minio is a little bit too loaded

10:17 <bentiss> my guess is it'll be better when I migrated all the repos and the db to the new cluster, so we don't end up encrypting the traffic 3 or 4 times

10:18 <bentiss> the other option is to not use minio cluster, but plain minio like before, and rely on ceph to do replication, but I am not sure it'll be better :/

11:05 <__tim> is it possible this affects external runners more?

11:08 <__tim> I seem to be hitting it a lot more ("always" currently with very small sample size) with the gst-htz runners and the gst macos runners

11:08 <__tim> and is it "new" (config changed) or just bad luck/timing?

11:25 <bentiss> __tim: it shouldn't be specific to ext runners. The minio server is only seen by gitlab itself.

11:26 <bentiss> The bad luck might also be related to the size of the artifacts in the pipeline. I saw that it tends to have a spike on cpu usage when sending big (or a lot of) files

11:27 <__tim> and is it a recent change in setup?

11:27 <bentiss> but right now, when you upload, we get: runner -> nginx (old cluster) -> wireguard to server-2 in the new cluster -> wireguard flannel to one of the 2 minio server, and then wireguard over the next minio server

11:28 <bentiss> it's a recent change, mostly because next week the elastic storage I was using on packet is retired, so we have to host our files ourselves

11:29 <bentiss> I need to get rid of the old cluster first, but for that I need to migrate the db and the remaining repos

11:29 <__tim> currently gstreamer CI is pretty much completely broken because we can't get jobs to pass at all from the looks of it

11:30 <__tim> e.g. https://gitlab.freedesktop.org/gstreamer/gst-devtools/-/merge_requests/246

11:30 <gitlab-bot> GStreamer issue (Merge request) 246 in gst-devtools "validate: launcher: Simplify fakesink handling" [Opened]

11:31 <bentiss> do you have an idea os the size of the artifacts in this job?

11:31 <__tim> I don't think the integration-test/valgrind jobs have big artefacst

11:31 <__tim> others maybe, but it doesn't seem related to that

11:32 <bentiss> it's more a global think

11:32 <bentiss> when I was mirroring the old artifact server, the load on the servers was between 700 and 800

11:32 <bentiss> now it's on a more norm,al 10-15

11:35 <bentiss> (it's on a 32 threaded/proc machine)

11:37 <bentiss> __tim: there might be something that is problematic, which is that the minio server is not exposed to the outside world. *maybe* this is an issue with hetzner, though I think everything should go through gitlab

11:38 <bentiss> I'll try to have a look at it later today, but can't do much ATM

11:38 <__tim> ok, thanks

11:39 <__tim> I don't understand any of this, I'm just reporting what I'm seeing :)

11:48 <__tim> the macos runners are not on hetzner fwiw, they're on macstadium.com

13:23 halfline has quit [Quit: Leaving]

13:59 blue__penquin has quit []

14:33 tanuk has joined #freedesktop

14:47 vsyrjala has joined #freedesktop

14:54 _whitelogger has joined #freedesktop

15:01 <tomeu> daniels: bentiss: getting connection refused on https://dri.freedesktop.org/libdrm/libdrm-2.4.105.tar.xz

15:08 ceyusa has joined #freedesktop

15:19 emersion has quit [Remote host closed the connection]

15:20 emersion has joined #freedesktop

15:25 tintou has joined #freedesktop

15:28 Guest4778 is now known as dcbaker

15:46 emersion has quit [Remote host closed the connection]

15:47 emersion has joined #freedesktop

15:54 Sheogorath[m] has joined #freedesktop

15:57 aaronp has joined #freedesktop

15:58 <alatiera> also affects gst ^

16:04 alanc has quit [Remote host closed the connection]

16:04 alanc has joined #freedesktop

17:31 <bentiss> __tim, daniels: FWIW the issues with the artifacts uploads are related to https://gitlab.com/gitlab-org/gitlab/-/issues/270077 -> we get a timopuetout if the upload doesn't finish in 60s

17:31 <gitlab-bot> GitLab.org issue 270077 in gitlab "Artifact object store timing out on S3 upload" [Ci Artifacts, Bug, Devops::Verify, Group::Pipeline Execution, Priority::3, Section::Ops, Closed]

17:33 <bentiss> and they point at https://gitlab.com/gitlab-org/gitlab/-/issues/285597 for a proper fix (14.0)

17:33 <gitlab-bot> GitLab.org issue 285597 in gitlab "Avoid copying objects from one bucket to another for CI artifacts" [Accepting Merge Requests, Ci Artifacts, Category:Continuous Integration, Category:Git Lfs, Object Storage, Devops::Verify, Group::Pipeline Execution, Section::Ops, Technical Debt, Workflow::Planning Breakdown, Opened]

17:46 pzanoni has joined #freedesktop

18:19 <__tim> bentiss, 1.14, 17 June, ufff :)

18:20 <__tim> bentiss, thanks for digging

18:35 <bentiss> well, there is a workaround I should be able to put in place now

18:41 <bentiss> __tim: deploying the fix now, we will see tomorrow if this helped (or if we get an other error ;-P )

18:42 <__tim> awesome, thanks a lot

18:42 <__tim> why tomorrow?

18:42 <bentiss> and FWIW, this issue is definitively making sense

18:42 <bentiss> "why tomorrow" -> because I do not see errors in the past 2 hours :)

18:43 <bentiss> the issue makes sense because before we were using a single minio server, so moving a file was just renaming it on the filesystem, while now, it must be actually moving around the file, and it might take more than 60s to do so

18:45 <bentiss> __tim: new config is deployed, feel free to monitor 500s :)

18:46 Seirdy has joined #freedesktop

18:48 <__tim> Thanks! Let's see...

18:54 <bentiss> daniels: I just push (almost) all of my changes. The one problematic probably for you will be https://gitlab.freedesktop.org/freedesktop/helm-gitlab-omnibus/-/commit/27d2f693afdd02b6246db1e5ccdd5812dfe0fee4 -> I am using features from helm v3.5+ because otherwise it was awful to write the chart

19:00 <bentiss> sigh: https://gitlab.freedesktop.org/johanneswolf/pipewire/-/jobs/10089706 -> doesn't seem to be working :(

19:19 <__tim> nope: https://gitlab.freedesktop.org/tpm/gst-plugins-bad/-/jobs/10089702

19:22 <bentiss> yeah, though we get a different error this time on the server: "exception.class":"Excon::Error::Timeout","exception.message":"read timeout reached"

19:22 <bentiss> previously it was "exception.class": "Rack::Timeout::RequestTimeoutException", "exception.message": "Request ran for longer than 60000ms"

19:29 <bentiss> __tim: alright added a new env var, *maybe* it'll do something :/

19:30 <__tim> should I retry?

19:30 <bentiss> __tim: yes please

19:41 <bentiss> same thing :(

19:42 karolherbst has quit [Remote host closed the connection]

19:42 karolherbst has joined #freedesktop

20:48 aaronp has quit [Ping timeout: 480 seconds]

21:00 danvet has quit [Ping timeout: 480 seconds]

21:26 ocrete has joined #freedesktop

21:32 aaronp has joined #freedesktop

22:51 tsdgeos has joined #freedesktop

23:06 tsdgeos has quit [Ping timeout: 480 seconds]