whot has joined #freedesktop
whot has quit []
jstein has quit [Ping timeout: 480 seconds]
alatiea has left #freedesktop [#freedesktop]
alatiera has joined #freedesktop
<imirkin> Mithrandir: daniels --^
<imirkin> bentiss: --^
<imirkin> it says "taking too much time to respond", but the 502 comes back quite snappily (under 1s)
<alatiera> been getting 502s from gitlab for a couple minues
<alatiera> imirkin though its past midnight europe time
<imirkin> quite a bit past midnight
alatiera is now known as alatiea
alatiea has left #freedesktop [#freedesktop]
alatiera has joined #freedesktop
a_a has joined #freedesktop
a_a has quit []
blue__penquin has joined #freedesktop
blue__penquin is now known as Guest4818
Guest4818 has quit []
blue__penquin has joined #freedesktop
ygb has joined #freedesktop
ygb has quit []
ana_ana has joined #freedesktop
<ana_ana> gitlab looks down?
blue__penquin has quit []
danvet has joined #freedesktop
<bentiss> postgreesql seems to be crashing, on it
<bentiss> disk full :(
<alatiera> ouch :/
<bentiss> that's fine, we can hot-resize it
blue__penquin has joined #freedesktop
<bentiss> and we are back
<bentiss> sorry for the inconvenience, haven't been monitoring the postgres disk for a while
mceier has joined #freedesktop
pzanoni has quit [Ping timeout: 480 seconds]
blue__penquin has quit []
blue__penquin has joined #freedesktop
<__tim> don't suppose there's a trick to mark MRs as merged where the commits have been pushed directly?
<__tim> what's strange is that if I reload https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/827/commits it shows 1 commit, which has already been pushed to master. Usually gitlab picks up on that and closes the MR as merged, or it shows 0 commits in the MR
<gitlab-bot> GStreamer issue (Merge request) 827 in gstreamer "buffer: rename new gst_buffer_new_copy() to gst_buffer_new_memdup()" [Api, Opened]
* __tim merge button seems to do the trick ¯\_(ツ)_/¯
<__tim> still getting things like this fwiw: WARNING: Uploading artifacts as "archive" to coordinator... failed id=10061560 responseStatus=500 Internal Server Error status=500 token=7xTpLTaY
DragoonAethis has joined #freedesktop
whot has joined #freedesktop
<__tim> (post-resize new pipeline, new jobs)
<__tim> for all ci pipelines
_whitelogger has joined #freedesktop
_whitelogger has joined #freedesktop
_whitelogger has joined #freedesktop
_whitelogger has joined #freedesktop
_whitelogger has joined #freedesktop
blue_penquin has joined #freedesktop
<kusma> I'm seeing a similar issue here: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/10063369
<kusma> This is the second time in a row for this MR
<bentiss> sorry, this happens when minio is a little bit too loaded
<bentiss> my guess is it'll be better when I migrated all the repos and the db to the new cluster, so we don't end up encrypting the traffic 3 or 4 times
<bentiss> the other option is to not use minio cluster, but plain minio like before, and rely on ceph to do replication, but I am not sure it'll be better :/
<__tim> is it possible this affects external runners more?
<__tim> I seem to be hitting it a lot more ("always" currently with very small sample size) with the gst-htz runners and the gst macos runners
<__tim> and is it "new" (config changed) or just bad luck/timing?
<bentiss> __tim: it shouldn't be specific to ext runners. The minio server is only seen by gitlab itself.
<bentiss> The bad luck might also be related to the size of the artifacts in the pipeline. I saw that it tends to have a spike on cpu usage when sending big (or a lot of) files
<__tim> and is it a recent change in setup?
<bentiss> but right now, when you upload, we get: runner -> nginx (old cluster) -> wireguard to server-2 in the new cluster -> wireguard flannel to one of the 2 minio server, and then wireguard over the next minio server
<bentiss> it's a recent change, mostly because next week the elastic storage I was using on packet is retired, so we have to host our files ourselves
<bentiss> I need to get rid of the old cluster first, but for that I need to migrate the db and the remaining repos
<__tim> currently gstreamer CI is pretty much completely broken because we can't get jobs to pass at all from the looks of it
<gitlab-bot> GStreamer issue (Merge request) 246 in gst-devtools "validate: launcher: Simplify fakesink handling" [Opened]
<bentiss> do you have an idea os the size of the artifacts in this job?
<__tim> I don't think the integration-test/valgrind jobs have big artefacst
<__tim> others maybe, but it doesn't seem related to that
<bentiss> it's more a global think
<bentiss> when I was mirroring the old artifact server, the load on the servers was between 700 and 800
<bentiss> now it's on a more norm,al 10-15
<bentiss> (it's on a 32 threaded/proc machine)
<bentiss> __tim: there might be something that is problematic, which is that the minio server is not exposed to the outside world. *maybe* this is an issue with hetzner, though I think everything should go through gitlab
<bentiss> I'll try to have a look at it later today, but can't do much ATM
<__tim> ok, thanks
<__tim> I don't understand any of this, I'm just reporting what I'm seeing :)
<__tim> the macos runners are not on hetzner fwiw, they're on macstadium.com
halfline has quit [Quit: Leaving]
blue__penquin has quit []
tanuk has joined #freedesktop
vsyrjala has joined #freedesktop
_whitelogger has joined #freedesktop
<tomeu> daniels: bentiss: getting connection refused on https://dri.freedesktop.org/libdrm/libdrm-2.4.105.tar.xz
ceyusa has joined #freedesktop
emersion has quit [Remote host closed the connection]
emersion has joined #freedesktop
tintou has joined #freedesktop
Guest4778 is now known as dcbaker
emersion has quit [Remote host closed the connection]
emersion has joined #freedesktop
Sheogorath[m] has joined #freedesktop
aaronp has joined #freedesktop
<alatiera> also affects gst ^
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
<bentiss> __tim, daniels: FWIW the issues with the artifacts uploads are related to https://gitlab.com/gitlab-org/gitlab/-/issues/270077 -> we get a timopuetout if the upload doesn't finish in 60s
<gitlab-bot> GitLab.org issue 270077 in gitlab "Artifact object store timing out on S3 upload" [Ci Artifacts, Bug, Devops::Verify, Group::Pipeline Execution, Priority::3, Section::Ops, Closed]
<bentiss> and they point at https://gitlab.com/gitlab-org/gitlab/-/issues/285597 for a proper fix (14.0)
<gitlab-bot> GitLab.org issue 285597 in gitlab "Avoid copying objects from one bucket to another for CI artifacts" [Accepting Merge Requests, Ci Artifacts, Category:Continuous Integration, Category:Git Lfs, Object Storage, Devops::Verify, Group::Pipeline Execution, Section::Ops, Technical Debt, Workflow::Planning Breakdown, Opened]
pzanoni has joined #freedesktop
<__tim> bentiss, 1.14, 17 June, ufff :)
<__tim> bentiss, thanks for digging
<bentiss> well, there is a workaround I should be able to put in place now
<bentiss> __tim: deploying the fix now, we will see tomorrow if this helped (or if we get an other error ;-P )
<__tim> awesome, thanks a lot
<__tim> why tomorrow?
<bentiss> and FWIW, this issue is definitively making sense
<bentiss> "why tomorrow" -> because I do not see errors in the past 2 hours :)
<bentiss> the issue makes sense because before we were using a single minio server, so moving a file was just renaming it on the filesystem, while now, it must be actually moving around the file, and it might take more than 60s to do so
<bentiss> __tim: new config is deployed, feel free to monitor 500s :)
Seirdy has joined #freedesktop
<__tim> Thanks! Let's see...
<bentiss> daniels: I just push (almost) all of my changes. The one problematic probably for you will be https://gitlab.freedesktop.org/freedesktop/helm-gitlab-omnibus/-/commit/27d2f693afdd02b6246db1e5ccdd5812dfe0fee4 -> I am using features from helm v3.5+ because otherwise it was awful to write the chart
<bentiss> sigh: https://gitlab.freedesktop.org/johanneswolf/pipewire/-/jobs/10089706 -> doesn't seem to be working :(
<bentiss> yeah, though we get a different error this time on the server: "exception.class":"Excon::Error::Timeout","exception.message":"read timeout reached"
<bentiss> previously it was "exception.class": "Rack::Timeout::RequestTimeoutException", "exception.message": "Request ran for longer than 60000ms"
<bentiss> __tim: alright added a new env var, *maybe* it'll do something :/
<__tim> should I retry?
<bentiss> __tim: yes please
<bentiss> same thing :(
karolherbst has quit [Remote host closed the connection]
karolherbst has joined #freedesktop
aaronp has quit [Ping timeout: 480 seconds]
danvet has quit [Ping timeout: 480 seconds]
ocrete has joined #freedesktop
aaronp has joined #freedesktop
tsdgeos has joined #freedesktop
tsdgeos has quit [Ping timeout: 480 seconds]