<gitlab-bot>
GStreamer issue (Merge request) 827 in gstreamer "buffer: rename new gst_buffer_new_copy() to gst_buffer_new_memdup()" [Api, Opened]
* __tim
merge button seems to do the trick ¯\_(ツ)_/¯
<__tim>
still getting things like this fwiw: WARNING: Uploading artifacts as "archive" to coordinator... failed id=10061560 responseStatus=500 Internal Server Error status=500 token=7xTpLTaY
<kusma>
This is the second time in a row for this MR
<bentiss>
sorry, this happens when minio is a little bit too loaded
<bentiss>
my guess is it'll be better when I migrated all the repos and the db to the new cluster, so we don't end up encrypting the traffic 3 or 4 times
<bentiss>
the other option is to not use minio cluster, but plain minio like before, and rely on ceph to do replication, but I am not sure it'll be better :/
<__tim>
is it possible this affects external runners more?
<__tim>
I seem to be hitting it a lot more ("always" currently with very small sample size) with the gst-htz runners and the gst macos runners
<__tim>
and is it "new" (config changed) or just bad luck/timing?
<bentiss>
__tim: it shouldn't be specific to ext runners. The minio server is only seen by gitlab itself.
<bentiss>
The bad luck might also be related to the size of the artifacts in the pipeline. I saw that it tends to have a spike on cpu usage when sending big (or a lot of) files
<__tim>
and is it a recent change in setup?
<bentiss>
but right now, when you upload, we get: runner -> nginx (old cluster) -> wireguard to server-2 in the new cluster -> wireguard flannel to one of the 2 minio server, and then wireguard over the next minio server
<bentiss>
it's a recent change, mostly because next week the elastic storage I was using on packet is retired, so we have to host our files ourselves
<bentiss>
I need to get rid of the old cluster first, but for that I need to migrate the db and the remaining repos
<__tim>
currently gstreamer CI is pretty much completely broken because we can't get jobs to pass at all from the looks of it
<bentiss>
do you have an idea os the size of the artifacts in this job?
<__tim>
I don't think the integration-test/valgrind jobs have big artefacst
<__tim>
others maybe, but it doesn't seem related to that
<bentiss>
it's more a global think
<bentiss>
when I was mirroring the old artifact server, the load on the servers was between 700 and 800
<bentiss>
now it's on a more norm,al 10-15
<bentiss>
(it's on a 32 threaded/proc machine)
<bentiss>
__tim: there might be something that is problematic, which is that the minio server is not exposed to the outside world. *maybe* this is an issue with hetzner, though I think everything should go through gitlab
<bentiss>
I'll try to have a look at it later today, but can't do much ATM
<__tim>
ok, thanks
<__tim>
I don't understand any of this, I'm just reporting what I'm seeing :)
<__tim>
the macos runners are not on hetzner fwiw, they're on macstadium.com
<gitlab-bot>
GitLab.org issue 285597 in gitlab "Avoid copying objects from one bucket to another for CI artifacts" [Accepting Merge Requests, Ci Artifacts, Category:Continuous Integration, Category:Git Lfs, Object Storage, Devops::Verify, Group::Pipeline Execution, Section::Ops, Technical Debt, Workflow::Planning Breakdown, Opened]
pzanoni has joined #freedesktop
<__tim>
bentiss, 1.14, 17 June, ufff :)
<__tim>
bentiss, thanks for digging
<bentiss>
well, there is a workaround I should be able to put in place now
<bentiss>
__tim: deploying the fix now, we will see tomorrow if this helped (or if we get an other error ;-P )
<__tim>
awesome, thanks a lot
<__tim>
why tomorrow?
<bentiss>
and FWIW, this issue is definitively making sense
<bentiss>
"why tomorrow" -> because I do not see errors in the past 2 hours :)
<bentiss>
the issue makes sense because before we were using a single minio server, so moving a file was just renaming it on the filesystem, while now, it must be actually moving around the file, and it might take more than 60s to do so
<bentiss>
__tim: new config is deployed, feel free to monitor 500s :)
<bentiss>
yeah, though we get a different error this time on the server: "exception.class":"Excon::Error::Timeout","exception.message":"read timeout reached"
<bentiss>
previously it was "exception.class": "Rack::Timeout::RequestTimeoutException", "exception.message": "Request ran for longer than 60000ms"
<bentiss>
__tim: alright added a new env var, *maybe* it'll do something :/
<__tim>
should I retry?
<bentiss>
__tim: yes please
<bentiss>
same thing :(
karolherbst has quit [Remote host closed the connection]