ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
oldpcuser_ has quit [Remote host closed the connection]
oldpcuser_ has joined #freedesktop
oldpcuser_ has quit []
oldpcuser has joined #freedesktop
ximion has quit [Remote host closed the connection]
ximion has joined #freedesktop
AbleBacon has quit [Quit: I am like MacArthur; I shall return.]
AbleBacon has joined #freedesktop
dcunit3d has quit [Remote host closed the connection]
ximion has quit [Quit: Detached from the Matrix]
Leopold___ has joined #freedesktop
alatiera has quit [Ping timeout: 480 seconds]
tzimmermann has joined #freedesktop
Leopold has quit [Ping timeout: 481 seconds]
alatiera has joined #freedesktop
sima has joined #freedesktop
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
enunes has joined #freedesktop
AbleBacon has quit [Read error: Connection reset by peer]
<enunes>
hi, were there any changes on freedesktop infra/network over the weekend? in particular to s3.freedesktop.org . my shared runner on gitlab.fdo had always had a somewhat slow connection to it, but since this weekend it's unstable and sometimes doesn't connect at all
<bentiss>
enunes: no changes over the weekend, and I can not reproduce your timeouts with that URL. I guess bad timing? If other people are doing heavy tasks on s3 and gitlab, there is not much we can do :(
<enunes>
I just run it locally around 10 times either on my laptop or on the runner, and in one of the runs it just doesn't connect and waits for about 2 minutes and gives up
<enunes>
so I tried over my mobile tethered network over a separate provider and it's the same...
<enunes>
so my runner is still disabled, I'm not sure what to do next to bring it back up
<bentiss>
enunes: have you tried using the magic curl retries commands? It's a little bit slower when it fails, but at least this solves transient network errors in many cases
<enunes>
the CI scripts do that in some places but not all, also sometimes the requests come from something like s3cp and that fails in CI when it hits the bad connection now
<enunes>
first LAVA job timeouts after a couple of tries to download a tarball from s3, in the LAVA job second try the s3cp fails
shbrngdo has quit [Read error: Connection reset by peer]
shbrngdo has joined #freedesktop
<bentiss>
enunes: not sure what is going on: I can follow the traces from the curl on the internal logs on nginx, but the PUT s3cp is supposed to do is not even appearing in the logs
<bentiss>
so the connect error is likely that you couldn't connect entirely to the server
<bentiss>
maybe it was BGP that dropped the packet outside of the cluster, or maybe nginx didn't have the capacity to accept new connections, but I can't see anything in the logs
<enunes>
since it started yesterday the plan was to wait for a day and see if any "routing issues" just disappear, should we just wait another day?
<enunes>
mupuf mentioned there was something like that in other countries, so while now I seem to be the only affected runner and likely the only one in my country, it has happened before apparently
kxkamil2 has joined #freedesktop
kxkamil has quit [Ping timeout: 480 seconds]
<karolherbst>
daniels: somehow the label maker got super unreliable :'(
<karolherbst>
daniels: yeah.. something is fucked on the API level
<karolherbst>
or in the bot..
<karolherbst>
it seems like the bot doesn't get all merge requests loaded
<bentiss>
karolherbst: IIRC we found a better solution with whot... but -ETIMNE
<karolherbst>
but that isn't the issue?
<bentiss>
karolherbst: it is. Given that now mr-labbel-maker is immediately notified about a new MR, it tries to apply the labels immediately, and gitlab takes its time to get the MR sorted out
<bentiss>
so mr-label-maker sees that no files are touched, and bails out
<karolherbst>
ahh right.. but when I was running over all merge requests, some aren't processed as well
<karolherbst>
but yeah... the other issue is more pressing I guerss
<daniels>
karolherbst: so you're querying the API for 'all open merge requests', and it is returning a number smaller than reality?
<karolherbst>
yeah.. well... not sure. the script sees some MR I missed in dry-run
<karolherbst>
ehh doesn't see them in non dry-run
<karolherbst>
it's weird
<karolherbst>
but now it's listed...
<karolherbst>
maybe something when applying the label fails and it silently fails or something.. dunno