ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
<imirkin> the latter was just hung
<imirkin> same job, same machine, same everything
<imirkin> without knowing the setup, feels like there's like a load-balanced firewall, and one of the firewalls just sucks at TCP.
<__tim> my current theory is is that because gstreamer is being stupid at the moment (it downloads 600MB of media data via git-lfs for every gstreamer integration job, plus possibly also a full couple-of-hundred MB git repo pull), that can lead to multiple jobs downloading lots of stuff in parallel, which then totally kills uploads because there's some
<__tim> bandwidth constraint/packet loss in the middle somewhere or something *waves hands*
<__tim> but that may well be complete nonsense
<imirkin> like a job was stuck for 5 minutes "downloading artifacts". i restarted it, lands on the same runner, and download goes totally fine.
jstein has joined #freedesktop
ngcortes has quit [Remote host closed the connection]
ybogdano has quit [Ping timeout: 480 seconds]
bnieuwenhuizen has quit [Quit: Bye]
bnieuwenhuizen has joined #freedesktop
ximion has quit []
Seirdy has joined #freedesktop
danvet has joined #freedesktop
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
sunarch has quit []
thaller has joined #freedesktop
egbert has joined #freedesktop
egbert is now known as Guest7210
egbert has joined #freedesktop
egbert has quit [Quit: leaving]
egbert has joined #freedesktop
egbert has quit []
Guest7210 has quit []
egbert has joined #freedesktop
egbert has quit []
ximion has joined #freedesktop
<bentiss> __tim, daniels: FYI, I had a meeting with equinix a couple of weeks ago (followup of my XDC presentation), and I told the guy that some times we were seeing packet loss between packet and hetzner
<bentiss> __tim, daniels: he told me that if/when this happens, we should give him the output of `mtr` and he'll be able to debug the things on the equinix side
<bentiss> I am running `mtr minio-packet.freedesktop.org` and it eventually manages to get all the hops in the middle
<__tim> I'm not sure it actually represents packet loss, I think it might just be routers in the middle dropping/throttling icmp packets, since it shows 0%-ish again for later hops
<bentiss> __tim: I think what matters is the route the packets are taking, and they can check if there is anything wrong (planned maintainance or sometrhing like that)
egbert has joined #freedesktop
pjakobsson has joined #freedesktop
<__tim> right
ximion has quit []
<__tim> bentiss, maybe we should run some iperf tests or such so we have a baseline?
<bentiss> __tim: sure
<bentiss> __tim: is minio-packet.fd.o the one having the most issues?
<bentiss> or the others from the k3s cluster
<__tim> I don't know tbh
<__tim> sec
<__tim> so we're having two problems afaict, one recent, one not so recent
jstein has quit []
<bentiss> __tim: which ip should I allow in the firewall for iperf3?
<__tim> the recent problem is that artefact upload stalls and times out. Not sure when that started happening, perhaps a week or so ago?
<__tim> 95.217.116.50 + 95.217.116.51
<bentiss> k, thanks
<__tim> when that happens the upload seems to putter about at like 20-50 kB/sec, which is clearly ridiculous speed-wise
<__tim> (just from one or two random observations)
<bentiss> __tim: I have set up an iperf3 server on minio-packet.freedesktop.org and those 2 ips should be able to contact it (it's running in tmux, hopefully it'll stay up a bit)
<__tim> [ 5] 0.00-10.00 sec 331 MBytes 277 Mbits/sec 0 sender
<__tim> [ 5] 0.00-10.09 sec 330 MBytes 274 Mbits/sec receiver
<__tim> I also have no explanation why e.g. a git clone tops out at 7-8MB/sec (avg 4.5MB/sec) then
<bentiss> __tim: maybe I should add an iperf3 server on the cluster itself
<__tim> I get much higher speeds elsewhere so the server seems plenty fast in general
<bentiss> __tim: try with 147.75.38.77 -> this is an in-cluster IP
<bentiss> seems fast enough too
<__tim> let me try from both machines at the same time
<__tim> seems just fine too, same values (one iperf to in-cluster, the other to minio)
<bentiss> maybe it's because we are using wireguard internally
<bentiss> for in cluster communication
<bentiss> but the CPU doesn't seem to be that used
<__tim> at the same time, git clone from gnome.gitlab.org is at full speed (I presume that's gc hosted, not sure), so it's not that the client machine is incapable either
<__tim> it's all rather puzzling and I can't see any explanation for upload starvation to 10s of kB/sec
<__tim> my theory is still that there's some bottleneck "somewhere in the middle" and we're ending up with upload starvation when there's too much downloading going on
<bentiss> yep, there is a bottle neck :(
<bentiss> ideally I wish I could just dump wireguard in the middle but that requires some changes in the infra and a risk...
<__tim> for giggles I set up a wireguard tunnel to the htz virginia DC (from the machine there I can git clone at 40MB/sec from fdo), and it's showing the same 280Mbit/sec-ish iperf throughput, but then actual git clone again is very pedestrian
<bentiss> the pb I see with wireguard is that all of our internal communication is encrypted with it, so when you access data from ceph, you end up encrypting it maybe 6 times
<bentiss> (access to 3 disks on nodes, from the gitaly pod, then forward to webservice, then nginx)
<bentiss> ok that's 5
<bentiss> and FWIW, on ceph, we are constantly reading at ~15-20 MB/s, with pikes at 80
<bentiss> from what I can see on the dashboard
pendingchaos_ has joined #freedesktop
pendingchaos has quit [Ping timeout: 480 seconds]
pendingchaos_ is now known as pendingchaos
<alatiera> do we use a pre clone script for minio?
<alatiera> and if so where do the scripts come from
<daniels> alatiera: a) yes, and b) the project
<daniels> projects can set $CI_PRE_CLONE_SCRIPT
<daniels> so the runners have pre_clone_script = "eval \"$CI_PRE_CLONE_SCRIPT\""
<alatiera> daniels thanks! that seems like what I remembered looking in the past
Haaninjo has joined #freedesktop
jarthur has joined #freedesktop
jstein has joined #freedesktop
jstein has quit []
jarthur has quit [Quit: Textual IRC Client: www.textualapp.com]
jarthur has joined #freedesktop
ximion has joined #freedesktop
ybogdano has joined #freedesktop
___nick___ has joined #freedesktop
jstein has joined #freedesktop
pinkflames[m] has left #freedesktop [#freedesktop]
___nick___ has quit []
___nick___ has joined #freedesktop
___nick___ has quit []
___nick___ has joined #freedesktop
ngcortes has joined #freedesktop
___nick___ has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Quit: Ex-Chat]
i-garrison has quit []
i-garrison has joined #freedesktop
danvet has quit [Ping timeout: 480 seconds]
iNKa has joined #freedesktop
Brocker has quit [Read error: Connection reset by peer]