ChanServ changed the topic of #freedesktop to: infrastructure and online services || for questions about projects, please see each project's contact || for discussions about specifications, please use or
bentiss: any news regarding fastly?
oh, and the migration page is lacking quite a few ticks now
mupuf: unfortunately no
and on the good news, the high load onthe db stopped at 1:30 AM CET+1 (over the night) and the system is much more stable now
It would seem splitting the db triggered some re-indexing or something like that
well, we should probably wait for the US to wake up too, but the fact that the load dropped very suddenly makes me thing of a process finishing its job
that is indeed amazing news!
yeah, no need to buy more expensive HW :)
yet* 😃
heh, yes
well, I think we should also enforce the 1 year expiration policy on pipelines in all projects
this should reduce the size of the db by a fair chunk
(and it's working blazing fast now)
there isn't much value in those
re: oh, and the migration page is lacking quite a few ticks now -> I'm mostly using this as a TODO list, so as I think at new things, I'm just adding them here
Is there a way that if a url has youtube in it then make xdg-open use mpv or else browser?
bentiss: gitlab is super fast now, good work :)
I'm here to complain that gitlab is too fast ! I used to have an excuse for not being so productive and it's gone !
thanks for the super job migrating this glorious thing
yeah, great it's back so fast, great job everybody involved
yah, more seriously : Awesome job !
alanc: fwiw, I am not receiving any email from fd.o either
(and that's on gmail, not my work email)
could be normal though, only update for the xserver I see is metux pushing new commits to their MR xorg/xserver!1865 and I usually don't get notified every time people pish new commtis to an existing MR (thanksfully!).
alanc: and the IP didn't change, it's stil gabe.fd.o the smtp server
(FWIW, I received emails from gitlab with failed/fixed pipelines as the db server was overloaded)
same here
was surprised to see gitlab working and seems faster than before
great work
bentiss, thank you for the on-going hard work and the awesome maintenance page. Gitlab being out allowed me to be productive myself, so I think I'll continue so for the rest of the week. ;-)
thanks everybody ;)
congrats and thanks everyone
magnitudes faster 🚀 🎉 👍
yeah, it's super fast
pq: I knew I wasn't the only one thinking that :D
Thanks a lot for the migration! Did you change anything during the migration, it feels very fast, Congratulation!
bentiss: ooc, gnome gitlab has anubis put in front of it for making the problem of AI scrapers smaller. no idea how well that works but as it's there since quite a while now it probably has some impact at least. do you plan to do something similar for fdo?
alanc: FWIW, I saw him pushing to existing MRs, not creating new MRs
personally I'd be less worried about Git (can always just push again if something gets lost) than about new MRs / issues / comments / ...
kxkamil: yesterday I would have said no, today it's much better. But I'd like to see a couple of days where the US kicks in. So I would say you can push changes, just keep in mind what you did, and do not rely on issues, MRs, etc...
slomo: we should have Fastly as a CDN soon(tm), once this is set up, we won't need anubis because they have bots/spam protection
fomys: almost no changes: we just went from a single db node hosted on kubernetes with a ceph disk to 3 dedicated HA postgresql cluster with actual NVME disk. Very little impact as you can see :)
In the welcome to the new data center note, it says we should consider it roughly "read-only". does that mean we should e.g. create releases and expect them to be there once the migration is completed?
should *not* create releases
more seriously, I'm glad we managed to pin point the pain point :)
bentiss: so kubernetes and ceph is a conspiracy by Big HW to sell you more HW?
jadahl: if you can refrain from creating releases during a couple of days that would be better
mripard: no, it's just that your admin sys doesn't know what to do and thought the postgresql db was just a simple part, not an actual important part
bentiss: noted, thanks!
again, yesterday we had an average load on the db servers of 30-40 on 12 cores machines
today it's fine (between 1 to 2) but I'd like to see what happens when the US start hammering our servers
bentiss: I see, thank you for this big improvement! I hope you will not encounter issue with the remaining tasks
bentiss: let us enjoy the speed while it lasts :^)
Maybe some tariffs are in order for US traffic ?
hey guys, very nice work on the migration
* bilboed
runs away because of bad joke
daniels: mind if I had an aggressive pipeline expiration policy on gfx-ci-bot/dummy-gitlab-jobs? this project has 73939 pipelines alone ;)
*if I add
bentiss: omg ... yes, please
they can expire after 1d
FWIW, the expiration policy is blazing fast compared to previously, so we can add more projects
\o/ \o/
how's the load holding with US waking up ?
still not processed the remaining 98227 pipelines from mesa, but over the past couple of weeks at equinix, we had like 5000 pipelines cleaned, while here, since yesterday, we are at 20000 down
daniels: gfx-ci/mesa-performance-tracking is at 132432 pipelines... how much retention we want?
bentiss: these projects do not require more than a week
Is there a better place to ask about the internals of a specific small freedesktop project?
tanty's proejcts can be dropped, quite likely
are those pipelines that have actually been run?
or just creatred
nirbheek: cerbero is in that list that bentiss pasted above. What amount of retention (in days) for past pipelines would we reasonably need ?
zmike: don't know if the pipeline ran, but it still takes some sapce in the db :)
I would say delete any pipelines that have not been run after 2-3 days at most
bilboed, nirbheek: this can be a year, I'm fine with that
hakzsam: I'm trying to purge the CI database with old pipelines to make it smaller and faster. Do you need to keep all of those pipelines or can we add a retention policy of 1 year (or less)
bentiss: I don't need them, a week should be enough
hakzsam: thanks!
zmike: I can not conditionally prune, so I put 1 year in the field, this should remove 11805 out of the 12892 :)
daniels: I assume gfx-ci/igt-ci-tags can also be like a couple of weeks?
bentiss: yeah
though tbh that service should just be moved to some intel git server somewhere
OK, done :)
thanks everyone who responded
daniels, bentiss: Yeah, igt-ci-tags et al should be moved elsewhere to avoid burning gitlab.fd.o resources
Both on pipelines and tags tbh
We have some code pushing resources there, but don't know who else might be using the results there
heh... 501597 pipelines scheduled for deletion, out of a grand total of 1281274 -> this alone should shrink the CI db by a fair bit
bentiss: do you have a way to set a global retention policy?
Something like 3 months unless you ask for more/less per project
DragoonAethis: in the long run I'll run a script to set to one year (the retention of the artifacts now)
should be doable with rails, but rn, easier for me to edit the URL, and change the 10 projects I'm focusing on
I can appreciate healthy amounts of duct tape too ;)
damn... PG errors: out of memory :(
don't knwo if it comes from sidekiq or the db itself...
minor issue, but not sure if the approve-users bot/webhook is working
* bentiss
oops, invalid token
restarting them (I had the same issue with marge, except marge crashes on boot if the token is invalid)
__tim: thanks :)
works now, thanks
(for retrying the labels)
daniels: any luck with the runners so far?
(to know if I should jump start on these)
TBH I haven't realized how fast it is to push code now, I was just focusing on the web UI :)
I did realize how fast dim ub is now :)
it's really awesome
bentiss: I've not had the chance yet, sorry :( have been stuck with other stuff all week so far
daniels: no worried
I'm a little bit concerned about these "ActiveRecord::StatementInvalid: PG::OutOfMemory: ERROR: out of memory" dead jobs
sigh... __vm_enough_memory: pid: 1054316, comm: postgres, no enough memory for the allocation in the journal of the leader PG
trying to reload the db...
I've changed a sysfs parameter
mayeb we'll need beafier machines :(
I've reduced the number of sidekiq pods, this seemed to have an effect
honestly I think 1m retetion instead of 1y for mesa would probably be fine
currently I see a lot of background job failing with 404 about ""MergeRequest", 107226, "Ci::CompareTestReportsService", nil, nil" -> and this is comparing test reports with old MRs
so it makes sense to keep a little bit the traces
for personal repos, I guess we don't care
huh. wonder what's doing that
I think it's the test report summary in the MR
it compares the previous pipeline report to tell you if things improved or not
anway, as long as we keep the db to a constant size, that would be nice
daniels: might want to keep gfx-ci-bot pipelines around for a bit longer than a day... I go back and look at them once in a while.. but a week or two would be ok?
robclark: this is a project which was just a temporary workaround for an old gitlab-runner bug; the uprev pipelines will stay for as long as any other mesa/virglrenderer pipeline does
bentiss: you can get rid of all logs except the last 3 months tbh