ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
ngcortes has joined #freedesktop
Leopold_ has quit [Remote host closed the connection]
Leopold__ has joined #freedesktop
genpaku has quit [Remote host closed the connection]
genpaku has joined #freedesktop
ybogdano has quit [Ping timeout: 480 seconds]
ngcortes has quit [Remote host closed the connection]
ximion has quit []
alanc has quit [Remote host closed the connection]
thaller has joined #freedesktop
alanc has joined #freedesktop
thaller has quit []
thaller has joined #freedesktop
<bentiss> so. gitlab 15.4.1 is a security update, and I'm going to try applying it now. I'll take the oportunity to also move the db to the new DC
danvet has joined #freedesktop
<bentiss> still having issues bringing back up the db
Leopold__ has quit [Remote host closed the connection]
itoral_ has joined #freedesktop
<bentiss> alright, db back up. I had to put it on one of the server, not the agents :/
MajorBiscuit has joined #freedesktop
<bentiss> the webservice pods are slowly re-attaching themselves
<bentiss> OK, now time for the security upgrade
<bentiss> reverting to the old deployment, the db is still not capable to be upgraded :(
vbenes has joined #freedesktop
mvlad has joined #freedesktop
<pq> I'm getting a 500 on gitlab, FWIW. Reversion still on-going?
kj has joined #freedesktop
scrumplex_ has quit []
scrumplex has joined #freedesktop
rgallaispou has joined #freedesktop
rgallaispou has quit []
<bentiss> I was reverting the failed upgrade, but it seems it broke gitlab...
<bentiss> k, reenabled it, and we are good now
<bentiss> I'll retry the 15.4.1 migration now. I should have solved the failed migration (and now I know how to solve it)
<mupuf> bentiss: third time's the charm!@
<pq> seems to work right now, thanks! :-)
<bentiss> at least the migration went further this time
<bentiss> even completed!
<bentiss> everything but the gitaly pods managed to migrate to the new gitlab version
<bentiss> and they are all up now :)
<daniels> bentiss: that's weird, what happened with db migration?
<mupuf> bentiss: congrats!
<bentiss> the thing is the tables were changed, but the migration was not marked completed
<bentiss> whenever I tried to restart the migration with the normal upgrade through helm the migration pod would fail because it tried to insert a column that was already there
<bentiss> the solution was: 1. revert the deployment
<bentiss> 2. edit the toolbox deployment so it has the next gitlab release
<bentiss> 3. "gitlab-rake gitlab:db:mark_migration_complete[20220406193806]" to mark the migration as complete
<bentiss> 4. gitlab-rake db:migrate:down VERSION=20220406193806 -> that made the 500 ion gitlab
<bentiss> 5. gitlab-rake db:migrate:up VERSION=20220406193806 -> solved the 500 and the failing migration
<bentiss> 6. redo the helm upgrade, and this time the migration went through
<bentiss> daniels: also we have to monitor the new large-* machines, they sometimes have too many opened files, and kubelet is failing in weird state. I have 2 fixes: a temporary with "sysctl fs.inotify.max_user_instances=512" and "systemctl reboot"
<bentiss> (I blame the ceph migration for now)
<bentiss> besides that, the machines are working well, but kilo is not showing up the peers in the new servers, so that means we have to keep around server-2 for now
<daniels> bentiss: aha wow, thanks a lot for the migration tip
* bentiss learned it the hard way :)
<daniels> hehe
<daniels> I've never had one fail on me before!
<bentiss> the last thing we have to take care also is that there are still a few places where we force the pods to be on s1.large, so they will fail to spin up now if we kill them
<bentiss> the ceph-s3 pod is one of them (easy enough to fix but I'd like to upgrade rook too to be able to be k8s 1.25 compliant)
<bentiss> and teh gitlab toolbox, because we need local disks
rgallaispou has joined #freedesktop
rgallaispou has quit []
<bentiss> the toolbox is slightly more worrying. I haven't partitioned correctly the s3.xlarge, and there is no spare disk available on them, so we need to reorganize those data on one server at least
rgallaispou has joined #freedesktop
<bentiss> I think that's where I am now. Next step would be to also migrate minio-packet.fd.o, but that requires some more deployments :(
<daniels> local disk for toolbox being for backup?
<bentiss> daniels: yes, we need to have space for it to fetch all repos and then gzip them
<daniels> yep
<bentiss> and local is way better than distributed for that :)
<bentiss> FWIW, I'll go AFK a bit, the data still hasn't fully migrated yet (only 6% left), so there is not much I can do now on that
<daniels> hehe yeah, backups over ceph would be a bit hostile ...
* daniels nods, thanks
<daniels> I have to go afk for a while too tbf
fahien has joined #freedesktop
fahien has quit []
AbleBacon has quit [Read error: Connection reset by peer]
fahien has joined #freedesktop
Guest1750 is now known as frytaped
thaller is now known as Guest1848
thaller has joined #freedesktop
Guest1848 has quit [Ping timeout: 480 seconds]
rgallaispou has quit [Read error: Connection reset by peer]
itoral_ has quit []
Kayden has quit [Read error: Connection reset by peer]
Kayden has joined #freedesktop
chipxxx has joined #freedesktop
fahien has quit [Ping timeout: 480 seconds]
Leopold_ has joined #freedesktop
fahien has joined #freedesktop
GNUtoo has quit [Remote host closed the connection]
GNUtoo has joined #freedesktop
ximion has joined #freedesktop
Haaninjo has joined #freedesktop
Leopold___ has joined #freedesktop
Leopold___ has quit [Remote host closed the connection]
Leopold___ has joined #freedesktop
<zmike> getting out of disk space on this job https://gitlab.freedesktop.org/mesa/mesa/-/jobs/29270453
<zmike> and some HTTP 400s on others in https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/701854
Leopold_ has quit [Ping timeout: 480 seconds]
<eric_engestrom> ^ that's when uploading artifacts, to be more specific
<eric_engestrom> bentiss: ^
<glehmann> I'm getting 500s when using gitlab search
alatiera has quit [Quit: Ping timeout (120 seconds)]
alatiera has joined #freedesktop
alyssa has joined #freedesktop
<alyssa> getting error 500 commenting on MRs, is something up?
<bentiss> all the pods are up, so I wonder what is happening there
<alyssa> gitlab.exceptions.GitlabCreateError: 500: 500 Internal Server Error
<alyssa> (from python)
<alyssa> Your comment could not be submitted! Please check your network connection and try again.
<alyssa> (from the web ui)
<MTCoster> Just ran into the same issue creating an MR
<bentiss> postgres: HINT: Check free disk space.
<alyssa> uh oh
<bentiss> sigh, now we are running out of space :)
<bentiss> give me a sec
<alyssa> have you tried turning it off and on again
<zmike> no, definitely don't turn it off
<bentiss> I can just request the disk to big bigger :)
<bentiss> which I did
rkanwal has joined #freedesktop
<bentiss> now we need to wait for kubernetes magic to happen
<alyssa> download free disk space
<bentiss> that's roughly it :)
<bentiss> and done 30 GB more :)
<bentiss> ok, now I think I can go on weekend
<alyssa> it works :sparkles:
<alyssa> thank u
<bentiss> sigh... when I did the first migration, the postgres disk was 50GB. I just bumped it at 200GB, in just a little bit 2 years time :(
<alyssa> 150 GB in 2 years produced by hundreds of full time engineers, and I assume GitLab's schema is not super dense ... doesn't seem so strange
<alyssa> especially if we're keeping around all the random apitraces and videos that get attached to issues
<daniels> uploads are separate, we don't store all that in psql
<daniels> I'd put my money on build logs
<zmike> so you're saying it's my fault
<daniels> yeah I mean fdo was doing great until you came on the scene
<bentiss> definitely ci logs, they are all stored in plain text in the db
<bentiss> though it's weird, I thought they were in the S3 storage... but I know there is something related to ci jobs in the db that takes all the space
<daniels> iirc they get put as a whole into artifact storage, but then they also get split into their sections and all those are shoved into psql
<alyssa> aren't build logs rotated after 30 days anyway?
<alyssa> zmike: sorry zink has to go
* zmike updates his presentation
<daniels> alyssa: yep ...
chipxxx has quit [Read error: Connection reset by peer]
<alyssa> zmike: think of how many times the word "zink" appears in build logs
<alyssa> that's so many bytes
* zmike counts the letters in panfrost and multiplies it by the number of panfrost jobs
<alyssa> how dare u
<alyssa> well what about uhhh
* alyssa sweats
<alyssa> freedreno
<daniels> we execute ~12k builds per day fwiw
<DavidHeidelberg[m]> "how dare u" ‒ Greta, after seeing Mesa3D CI farm...
rkanwal has quit [Quit: rkanwal]
<alyssa> same
<alyssa> that and XDC in person
jstein has joined #freedesktop
ybogdano has joined #freedesktop
<DragoonAethis> discussions like these make me happy the Intel CI farm is behind 7 firewalls and nobody can see the state of that
strugee has quit [Quit: ZNC - http://znc.in]
strugee has joined #freedesktop
Leopold___ has quit [Write error: connection closed]
MajorBiscuit has quit [Quit: WeeChat 3.5]
mvlad has quit [Remote host closed the connection]
ngcortes has joined #freedesktop
jstein has quit [Ping timeout: 480 seconds]
fahien has quit [Quit: fahien]
mattst88 has joined #freedesktop
<mattst88> o/
<mattst88> I've been moderating the mesa-announce@ list for years, and recently there's been a huge uptick in spam
<mattst88> it's almost all from garbage TLDs like .lol and .click
<mattst88> .quest, .art, .icu
<mattst88> it's been interesting to learn that so many stupid TLDs exist, but is there any way we can more effectively block these do I don't have to discard so many messages every day?
<mattst88> in a similar vein, I stopped receiving mail from xorg-announce@ in July (last message received was [ANNOUNCE] xf86-video-nv 2.1.22 on July 27)
<mattst88> any idea why? Is gmail just dropping these messages now? I don't even see them in spam
alyssa has left #freedesktop [#freedesktop]
ngcortes has quit [Remote host closed the connection]
AbleBacon has joined #freedesktop
<daniels> yeah gmail is very skeptical about our messages
<alanc> there was a suggestion a while back that it was something to do with our reply-to header or something like that, but I forgot what it was
<alanc> oh, right: probably because it adds a Reply-To: header, which breaks the DKIM signature
anholt_ has quit [Remote host closed the connection]
anholt has joined #freedesktop
Haaninjo has quit [Quit: Ex-Chat]
ngcortes has joined #freedesktop
<Mithrandir> gmail recently started enforcing SPF a lot more strictly too, but that should just look at envelope sender, so _shouldn't_ matter?
jstein has joined #freedesktop
jstein has quit []
<zmike> is there a reason why all my MRs say "Merge blocked: pipeline must succeed. It's waiting for a manual action to continue." ?
<zmike> this is new
GNUtoo has quit [Remote host closed the connection]
GNUtoo has joined #freedesktop
<daniels> drop the Marge Part-of/Tested-by/etc
<DavidHeidelberg[m]> zmike: I see it works, the dropping of this stuff helped?
<DavidHeidelberg[m]> or did you do any additional magic?
<zmike> I rebased
chipxxx has joined #freedesktop
<zmike> and prayed
<DavidHeidelberg[m]> Rebase, Pray and Love Marge?
danvet has quit [Ping timeout: 480 seconds]