#freedesktop on 2022-09-30 — irc logs at oftc.irclog.whitequark.org

2022-08-14 19:45 ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org

00:08 ngcortes has joined #freedesktop

00:12 Leopold_ has quit [Remote host closed the connection]

00:13 Leopold__ has joined #freedesktop

00:40 genpaku has quit [Remote host closed the connection]

00:40 genpaku has joined #freedesktop

01:07 ybogdano has quit [Ping timeout: 480 seconds]

02:57 ngcortes has quit [Remote host closed the connection]

04:02 ximion has quit []

06:24 alanc has quit [Remote host closed the connection]

06:26 thaller has joined #freedesktop

06:29 alanc has joined #freedesktop

06:29 thaller has quit []

06:29 thaller has joined #freedesktop

06:44 <bentiss> so. gitlab 15.4.1 is a security update, and I'm going to try applying it now. I'll take the oportunity to also move the db to the new DC

06:48 danvet has joined #freedesktop

07:01 <bentiss> still having issues bringing back up the db

07:01 Leopold__ has quit [Remote host closed the connection]

07:04 itoral_ has joined #freedesktop

07:05 <bentiss> alright, db back up. I had to put it on one of the server, not the agents :/

07:06 MajorBiscuit has joined #freedesktop

07:07 <bentiss> the webservice pods are slowly re-attaching themselves

07:07 <bentiss> OK, now time for the security upgrade

07:21 <bentiss> reverting to the old deployment, the db is still not capable to be upgraded :(

07:38 vbenes has joined #freedesktop

07:59 mvlad has joined #freedesktop

08:40 <pq> I'm getting a 500 on gitlab, FWIW. Reversion still on-going?

08:43 kj has joined #freedesktop

08:44 scrumplex_ has quit []

08:45 scrumplex has joined #freedesktop

08:48 rgallaispou has joined #freedesktop

08:49 rgallaispou has quit []

08:53 <bentiss> I was reverting the failed upgrade, but it seems it broke gitlab...

08:54 <bentiss> k, reenabled it, and we are good now

08:55 <bentiss> I'll retry the 15.4.1 migration now. I should have solved the failed migration (and now I know how to solve it)

08:59 <mupuf> bentiss: third time's the charm!@

08:59 <pq> seems to work right now, thanks! :-)

08:59 <bentiss> at least the migration went further this time

08:59 <bentiss> even completed!

09:02 <bentiss> everything but the gitaly pods managed to migrate to the new gitlab version

09:04 <bentiss> and they are all up now :)

09:05 <daniels> bentiss: that's weird, what happened with db migration?

09:07 <mupuf> bentiss: congrats!

09:10 <bentiss> daniels: so... the initial migration failed at https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/migrate/20220406193806_add_maven_package_requests_forwarding_to_application_settings.rb

09:11 <bentiss> the thing is the tables were changed, but the migration was not marked completed

09:12 <bentiss> whenever I tried to restart the migration with the normal upgrade through helm the migration pod would fail because it tried to insert a column that was already there

09:12 <bentiss> the solution was: 1. revert the deployment

09:12 <bentiss> 2. edit the toolbox deployment so it has the next gitlab release

09:13 <bentiss> 3. "gitlab-rake gitlab:db:mark_migration_complete[20220406193806]" to mark the migration as complete

09:13 <bentiss> 4. gitlab-rake db:migrate:down VERSION=20220406193806 -> that made the 500 ion gitlab

09:14 <bentiss> 5. gitlab-rake db:migrate:up VERSION=20220406193806 -> solved the 500 and the failing migration

09:14 <bentiss> 6. redo the helm upgrade, and this time the migration went through

09:16 <bentiss> daniels: also we have to monitor the new large-* machines, they sometimes have too many opened files, and kubelet is failing in weird state. I have 2 fixes: a temporary with "sysctl fs.inotify.max_user_instances=512" and "systemctl reboot"

09:17 <bentiss> (I blame the ceph migration for now)

09:17 <bentiss> besides that, the machines are working well, but kilo is not showing up the peers in the new servers, so that means we have to keep around server-2 for now

09:17 <daniels> bentiss: aha wow, thanks a lot for the migration tip

09:18 * bentiss learned it the hard way :)

09:18 <daniels> hehe

09:18 <daniels> I've never had one fail on me before!

09:19 <bentiss> the last thing we have to take care also is that there are still a few places where we force the pods to be on s1.large, so they will fail to spin up now if we kill them

09:19 <bentiss> the ceph-s3 pod is one of them (easy enough to fix but I'd like to upgrade rook too to be able to be k8s 1.25 compliant)

09:20 <bentiss> and teh gitlab toolbox, because we need local disks

09:20 rgallaispou has joined #freedesktop

09:21 rgallaispou has quit []

09:21 <bentiss> the toolbox is slightly more worrying. I haven't partitioned correctly the s3.xlarge, and there is no spare disk available on them, so we need to reorganize those data on one server at least

09:21 rgallaispou has joined #freedesktop

09:21 <bentiss> I think that's where I am now. Next step would be to also migrate minio-packet.fd.o, but that requires some more deployments :(

09:24 <daniels> local disk for toolbox being for backup?

09:25 <bentiss> daniels: yes, we need to have space for it to fetch all repos and then gzip them

09:25 <daniels> yep

09:25 <bentiss> and local is way better than distributed for that :)

09:26 <bentiss> FWIW, I'll go AFK a bit, the data still hasn't fully migrated yet (only 6% left), so there is not much I can do now on that

09:27 <daniels> hehe yeah, backups over ceph would be a bit hostile ...

09:27 * daniels nods, thanks

09:27 <daniels> I have to go afk for a while too tbf

10:08 fahien has joined #freedesktop

10:10 fahien has quit []

10:11 AbleBacon has quit [Read error: Connection reset by peer]

10:34 fahien has joined #freedesktop

10:44 Guest1750 is now known as frytaped

10:47 thaller is now known as Guest1848

10:47 thaller has joined #freedesktop

10:53 Guest1848 has quit [Ping timeout: 480 seconds]

10:53 rgallaispou has quit [Read error: Connection reset by peer]

11:00 itoral_ has quit []

11:01 Kayden has quit [Read error: Connection reset by peer]

11:01 Kayden has joined #freedesktop

11:51 chipxxx has joined #freedesktop

12:01 fahien has quit [Ping timeout: 480 seconds]

12:29 Leopold_ has joined #freedesktop

12:30 fahien has joined #freedesktop

12:51 GNUtoo has quit [Remote host closed the connection]

12:51 GNUtoo has joined #freedesktop

13:28 ximion has joined #freedesktop

14:12 Haaninjo has joined #freedesktop

14:22 Leopold___ has joined #freedesktop

14:23 Leopold___ has quit [Remote host closed the connection]

14:23 Leopold___ has joined #freedesktop

14:25 <zmike> getting out of disk space on this job https://gitlab.freedesktop.org/mesa/mesa/-/jobs/29270453

14:26 <zmike> and some HTTP 400s on others in https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/701854

14:26 Leopold_ has quit [Ping timeout: 480 seconds]

14:35 <eric_engestrom> ^ that's when uploading artifacts, to be more specific

14:35 <eric_engestrom> bentiss: ^

14:35 <glehmann> I'm getting 500s when using gitlab search

14:36 alatiera has quit [Quit: Ping timeout (120 seconds)]

14:37 alatiera has joined #freedesktop

14:38 alyssa has joined #freedesktop

14:38 <alyssa> getting error 500 commenting on MRs, is something up?

14:38 <bentiss> all the pods are up, so I wonder what is happening there

14:38 <alyssa> gitlab.exceptions.GitlabCreateError: 500: 500 Internal Server Error

14:39 <alyssa> (from python)

14:39 <alyssa> Your comment could not be submitted! Please check your network connection and try again.

14:39 <alyssa> (from the web ui)

14:39 <MTCoster> Just ran into the same issue creating an MR

14:39 <bentiss> postgres: HINT: Check free disk space.

14:39 <alyssa> uh oh

14:39 <bentiss> sigh, now we are running out of space :)

14:40 <bentiss> give me a sec

14:40 <alyssa> have you tried turning it off and on again

14:40 <zmike> no, definitely don't turn it off

14:41 <bentiss> I can just request the disk to big bigger :)

14:41 <bentiss> which I did

14:41 rkanwal has joined #freedesktop

14:41 <bentiss> now we need to wait for kubernetes magic to happen

14:42 <alyssa> download free disk space

14:42 <bentiss> that's roughly it :)

14:42 <bentiss> and done 30 GB more :)

14:43 <bentiss> ok, now I think I can go on weekend

14:44 <alyssa> it works :sparkles:

14:44 <alyssa> thank u

14:47 <bentiss> sigh... when I did the first migration, the postgres disk was 50GB. I just bumped it at 200GB, in just a little bit 2 years time :(

14:48 <alyssa> 150 GB in 2 years produced by hundreds of full time engineers, and I assume GitLab's schema is not super dense ... doesn't seem so strange

14:49 <alyssa> especially if we're keeping around all the random apitraces and videos that get attached to issues

14:50 <daniels> uploads are separate, we don't store all that in psql

14:51 <daniels> I'd put my money on build logs

14:51 <zmike> so you're saying it's my fault

14:51 <daniels> yeah I mean fdo was doing great until you came on the scene

14:52 <bentiss> definitely ci logs, they are all stored in plain text in the db

14:54 <bentiss> though it's weird, I thought they were in the S3 storage... but I know there is something related to ci jobs in the db that takes all the space

14:55 <daniels> iirc they get put as a whole into artifact storage, but then they also get split into their sections and all those are shoved into psql

14:57 <alyssa> aren't build logs rotated after 30 days anyway?

14:58 <alyssa> zmike: sorry zink has to go

14:58 * zmike updates his presentation

15:09 <daniels> alyssa: yep ...

15:11 chipxxx has quit [Read error: Connection reset by peer]

15:14 <alyssa> zmike: think of how many times the word "zink" appears in build logs

15:14 <alyssa> that's so many bytes

15:15 * zmike counts the letters in panfrost and multiplies it by the number of panfrost jobs

15:17 <alyssa> how dare u

15:18 <alyssa> well what about uhhh

15:18 * alyssa sweats

15:18 <alyssa> freedreno

15:37 <daniels> we execute ~12k builds per day fwiw

15:48 <DavidHeidelberg[m]> "how dare u" ‒ Greta, after seeing Mesa3D CI farm...

15:58 rkanwal has quit [Quit: rkanwal]

16:06 <alyssa> same

16:07 <alyssa> that and XDC in person

16:25 jstein has joined #freedesktop

16:27 ybogdano has joined #freedesktop

16:34 <DragoonAethis> discussions like these make me happy the Intel CI farm is behind 7 firewalls and nobody can see the state of that

16:37 strugee has quit [Quit: ZNC - http://znc.in]

16:42 strugee has joined #freedesktop

16:43 Leopold___ has quit [Write error: connection closed]

16:49 MajorBiscuit has quit [Quit: WeeChat 3.5]

16:53 mvlad has quit [Remote host closed the connection]

17:14 ngcortes has joined #freedesktop

18:04 jstein has quit [Ping timeout: 480 seconds]

18:44 fahien has quit [Quit: fahien]

18:55 mattst88 has joined #freedesktop

18:55 <mattst88> o/

18:56 <mattst88> I've been moderating the mesa-announce@ list for years, and recently there's been a huge uptick in spam

18:56 <mattst88> it's almost all from garbage TLDs like .lol and .click

18:56 <mattst88> .quest, .art, .icu

18:57 <mattst88> it's been interesting to learn that so many stupid TLDs exist, but is there any way we can more effectively block these do I don't have to discard so many messages every day?

18:58 <mattst88> in a similar vein, I stopped receiving mail from xorg-announce@ in July (last message received was [ANNOUNCE] xf86-video-nv 2.1.22 on July 27)

18:58 <mattst88> any idea why? Is gmail just dropping these messages now? I don't even see them in spam

19:30 alyssa has left #freedesktop [#freedesktop]

19:31 ngcortes has quit [Remote host closed the connection]

19:31 AbleBacon has joined #freedesktop

20:02 <daniels> yeah gmail is very skeptical about our messages

20:04 <alanc> there was a suggestion a while back that it was something to do with our reply-to header or something like that, but I forgot what it was

20:04 <alanc> oh, right: probably because it adds a Reply-To: header, which breaks the DKIM signature

20:09 anholt_ has quit [Remote host closed the connection]

20:12 anholt has joined #freedesktop

20:15 Haaninjo has quit [Quit: Ex-Chat]

20:29 ngcortes has joined #freedesktop

20:35 <Mithrandir> gmail recently started enforcing SPF a lot more strictly too, but that should just look at envelope sender, so _shouldn't_ matter?

20:37 jstein has joined #freedesktop

20:37 jstein has quit []

20:54 <zmike> is there a reason why all my MRs say "Merge blocked: pipeline must succeed. It's waiting for a manual action to continue." ?

20:54 <zmike> this is new

20:54 <zmike> e.g., https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18705

20:58 GNUtoo has quit [Remote host closed the connection]

21:03 GNUtoo has joined #freedesktop

21:05 <daniels> drop the Marge Part-of/Tested-by/etc

22:28 <DavidHeidelberg[m]> zmike: I see it works, the dropping of this stuff helped?

22:28 <DavidHeidelberg[m]> or did you do any additional magic?

22:29 <zmike> I rebased

22:29 chipxxx has joined #freedesktop

22:29 <zmike> and prayed

22:31 <DavidHeidelberg[m]> Rebase, Pray and Love Marge?

23:41 danvet has quit [Ping timeout: 480 seconds]