daniels changed the topic of #freedesktop to: GitLab is currently down for upgrade; will be a while before it's back || https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
egbert is now known as Guest969
egbert has joined #freedesktop
Guest969 has quit [Ping timeout: 480 seconds]
bnilawar has quit [Ping timeout: 480 seconds]
Leopold__ has quit [Remote host closed the connection]
lack has quit [Read error: Connection reset by peer]
lack has joined #freedesktop
Leopold_ has joined #freedesktop
utsweetyfish has quit [Remote host closed the connection]
utsweetyfish has joined #freedesktop
AbleBacon has quit [Read error: Connection reset by peer]
co1umbarius has joined #freedesktop
columbarius has quit [Ping timeout: 480 seconds]
_DOOM_ has joined #freedesktop
<_DOOM_>
I am working the StatusNotifierItem spec as the watcher when a host or item have a NameOwnerChange what should the watcher do if the item/host has a new name?
<_DOOM_>
Should the watcher reannounce the item/host?
<bbhtt>
In the user verification template I don't have external: true, what do I do?
<mupuf>
bbhtt: you have "false"?
<mupuf>
If so, then you don't need to request anything
<bbhtt>
mupuf: Yea
<bbhtt>
Ah thanks
<mupuf>
you should be able to fork :)
<mupuf>
you must work for a company that fd.o trusts
<bbhtt>
I think it's because my account was created before all this
<mupuf>
oh, could be
<mupuf>
tpalli: checking it out
* mupuf
restored the banner about spam since new users need to know they need to request rights
<tpalli>
mupuf thanks!
<mupuf>
tpalli: looks to me like an issue with the unreliable network. It should be improved today
<tpalli>
mupuf okeydokkey
kode54 has joined #freedesktop
lazka has left #freedesktop [bye]
sima has joined #freedesktop
gbissett has joined #freedesktop
<kode54>
aaaaaa, apparently the gitlab was just migrated sideways?
gbissett has quit []
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
MajorBiscuit has joined #freedesktop
tzimmermann has joined #freedesktop
An0num0us has joined #freedesktop
AbleBacon has joined #freedesktop
ximion has joined #freedesktop
ximion has quit []
pendingchaos has quit [Read error: Network is unreachable]
MTCoster has quit [Read error: Network is unreachable]
robclark has quit [Read error: Network is unreachable]
i509vcb has quit [Write error: connection closed]
bwidawsk has quit [Write error: connection closed]
kode54 has quit [Read error: Network is unreachable]
zmike has quit [Read error: Network is unreachable]
pendingchaos has joined #freedesktop
zmike has joined #freedesktop
kode54 has joined #freedesktop
i509vcb has joined #freedesktop
<daniels>
yes, it went up to 16.x
zzag has quit [Remote host closed the connection]
aswar002 has quit [Remote host closed the connection]
zzag has joined #freedesktop
<daniels>
which does feature some big UI changes
<kode54>
ah
bwidawsk has joined #freedesktop
robclark has joined #freedesktop
aswar002 has joined #freedesktop
ebassi has quit [Remote host closed the connection]
jsto has quit [Remote host closed the connection]
jsto has joined #freedesktop
MTCoster has joined #freedesktop
ebassi has joined #freedesktop
Yakov has joined #freedesktop
<Yakov>
is it possilbe to detect windows key with libevdev?
<alatiera>
registry usage should be transparent post-migration still right?
<alatiera>
linux runners seem to work fine, but the windows builds can't reach it for push it seems
<bentiss>
alatiera: minus the fact that it's hosted on the new cluster which is showing some serious disk issues
<alatiera>
though I think the windows job did manage to login
<alatiera>
bentiss ack, thanks
<bentiss>
alatiera: plan is to solve this this morning, but I can not seem to pg_dump the current db right now
<bentiss>
alatiera: yeah, I don't think the login requires an access to the db
<Yakov>
can I get help upon lebevdev here?
<bentiss>
I'm glad I made a dump of the registry db yesterady and I kept it around: the registry db on the new cluster is simply not anwswering any requests
<bentiss>
II'll reset it to yesterday's state soon
<alatiera>
what db is currently backing the registy now
<alatiera>
old machine with the old dump?
<bentiss>
alatiera: the one on the new cluster which is failing
<alatiera>
hmm, seems to be working on my end mostly
<bentiss>
I'm making it pointing at the old cluster with the new db
<alatiera>
(as in it's pushing things)
<bentiss>
alatiera: yeah, you'll have to re-push, the db will be reset to yesterday's state
<alatiera>
weird that it doesn't ack requests on your side huh
<alatiera>
bentiss yea I don't mind
<bentiss>
when I run the db dump on the machine it's running, it's simply hanging, so I guess I must not be the only one having issues
fgdfgdfgd has quit [Ping timeout: 480 seconds]
AbleBacon has quit [Read error: Connection reset by peer]
blatant has joined #freedesktop
<mupuf>
bentiss: yeah, the registry has been unreliable
<mupuf>
daniels: did the update happen? I still see 15.X in the admin
<mupuf>
Anyway, the priority should be fixing the registry :)
<bentiss>
hmm... It seems I can now dump the registry that was failing
<bentiss>
and it seems that when noone is accessing the disks, they are fine. That's weird isn't it :)
<alatiera>
if a disk does io and nobody hears it, did it do it at all?
* alatiera
knows where the door is
<bentiss>
good question :)
<bentiss>
anyway, big question: should I keep running the current registry db with the backup from yesterday, or should I dump the one from 30 min ago?
<bentiss>
mupuf: ^^?
<mupuf>
bentiss: the new one, plrase
<bentiss>
mupuf: ok.
<bentiss>
I need to take the registry down then
<bentiss>
it's down now
<mupuf>
Crossing fingers it will go well
<bentiss>
so far so good
<bentiss>
(replicating)
<bentiss>
creating indexes....
<bentiss>
mupuf: and regarding the gitlab migration to , yes it's not done, but I need a stable cluster for that
<bentiss>
16.x
<mupuf>
Exactly
<bentiss>
and done, respinning up the registry pods
<bentiss>
(they seem to be happy)
<mupuf>
Gitlab reports psql to be 14.9. Isn't that too old for gitlab 16.x?
<bentiss>
mupuf: it's supposed to be 15.9
<bentiss>
oops, no, 14.9, you are correct
<bentiss>
IIRC we were on 13.x before
<mupuf>
I see, hopefully this is good-enough for gitlab 16
<bentiss>
also, for mesa, we need to remove the CI variables pointing at harbor, it's useless now
<mupuf>
hakzsam: user tags are gone
<hakzsam>
like lost?
<mupuf>
Not lost, you can transfer them using skopeo. It is explained in the banner
<mupuf>
I'll send you a link when I reach my pc
<hakzsam>
ok
<bentiss>
mupuf: the link in the banner disappeared
<mupuf>
oh, right, I'll add it back
<bentiss>
mupuf: no rush, I haven't updated it
<alatiera>
I wonder, do we have numbers of the size of the registry with and without user tags?
<bentiss>
or maybe we should promote the instruction as a wiki page
<bentiss>
alatiera: no, and we can not, the blobs are shared
<alatiera>
ah
<bentiss>
what I can give you is the size of the registry on gcs and the one we hold now that has garbage collection
<alatiera>
was curious how much of the blobs were leaf to the users
<mupuf>
bentiss: I would love to see the size difference between the registries
<bentiss>
so on GCS, we had 27TB of data, and I pulled only 9.8TB
<alatiera>
if we remove old mesa/gst images we can probably half that
<bentiss>
on those 9.8 TB, the data contains the main projects plus all new registry repos that were created after I started the registry migration (I think last September, one year ago)
<mupuf>
alatiera: more like 75% down :D
<alatiera>
(but that only works when there are no user tags)
<bentiss>
well, harbor has some more numbers, and since I set it up mesa is roughly 1TB of data
<alatiera>
I have half a script to parse the image tags in yml for the gst repo
<bentiss>
but in any case, we have gc now, so in theory, if we can clear the tags, the blobs will be cleared eventually
<alatiera>
but never finished the "query the reigstry and delete everything not in main|stable branches
<mupuf>
dabrain34[m]1: it doesn't hurt to "docker push ... || {sleep 5; docker push ...}
<mupuf>
in case of network errors
<mupuf>
but still, seems pretty flaky
<bentiss>
mupuf: that runner is still pointing at the registry in the old cluster. And I can see errors from it. We either need to wait for the dns cache refresh or force one
<mupuf>
bentiss: oh, great, thanks :)
<dabrain34[m]1>
shall I do something ?
<bentiss>
dabrain34[m]1: unless you have root access on that server, no
vsyrjala_ is now known as vsyrjala
<mupuf>
bentiss: So, anything else you want to work on today or this week? I would like to write that the down time is over
<mupuf>
(and that DNS may take some time to propagate, but otherwise, we are done)
<bentiss>
mupuf: would be nice if we could upgrade gitlab too
<dabrain34[m]1>
how long should I wait more or less for this DNS propagation ?
<bentiss>
dabrain34[m]1: at most 4 hours
<dabrain34[m]1>
ok
<dabrain34[m]1>
thanks for the support :)
<mupuf>
bentiss: right, yeah, probably a good thing to do
<mupuf>
but the registry work is done for now, right?
<bentiss>
maybe? :)
<mupuf>
we have the data in the new cluster, the DB in the old one
<bentiss>
yeah
<mupuf>
good good
<bentiss>
I've disabled the gc while I am copying the data over to the old cluster
<bentiss>
so yeah, not entirely finished
<mupuf>
good call
<mupuf>
is that a hot transfer, or is there still potential for data loss?
<bentiss>
no hot transfer
<bentiss>
no, hot tranfer
<bentiss>
well, could have like a blob not transfered when I switch from the new to the old cluster, but I'll continue to sync the blobs in the background, so like 10 min delay
<mupuf>
ack
<mupuf>
ok, I'll write something down and ask you for a review
<bentiss>
thanks!
<mupuf>
bentiss: how long do you think it would take to upgrade to gitlab 16?
<mupuf>
~30 minutes?
<bentiss>
mupuf: no idea. It can take a while, and it can be transparent or not depending on how the migration happens
* bentiss
<- lunch, bbl
<mupuf>
bentiss: enjoy!
Yakov has quit [Remote host closed the connection]
bnilawar has quit [Ping timeout: 480 seconds]
Ndfkjhw4 has quit []
Ndfkjhw4 has joined #freedesktop
Major_Biscuit has joined #freedesktop
Ndfkjhw4 has quit []
MajorBiscuit has quit [Ping timeout: 480 seconds]
bmodem has quit [Ping timeout: 480 seconds]
vkareh has joined #freedesktop
funestia[m] has left #freedesktop [#freedesktop]
<hakzsam>
looks like pushing new images to registry is unavailable: received unexpected HTTP status: 500 Internal Server Error ?
<bentiss>
hakzsam: which job, and which runner?
<hakzsam>
it happened to me when I wanted to push a new image to vk-cts-image
<bentiss>
hakzsam: I can see access to your registry on the failing registry pod, so hopefully when the dns cache gets properly expired, you should be fine
<hakzsam>
ok, I will wait a bit then, thanks!
<bentiss>
(should be another 2 hours tops)
<hakzsam>
sounds good
peelz has joined #freedesktop
raghavgururajan has joined #freedesktop
elibrokeit_ has joined #freedesktop
rpigott_ has joined #freedesktop
moses_ has joined #freedesktop
ifreund_ has joined #freedesktop
peelz is now known as Guest1037
_lemes has joined #freedesktop
raghavgururajan is now known as Guest1043
MTCoster_ has joined #freedesktop
MTCoster_ has quit []
dcunit3d_ has joined #freedesktop
ebassi_ has joined #freedesktop
melissawen_ has joined #freedesktop
Sachiel has quit [resistance.oftc.net larich.oftc.net]
vkareh has quit [resistance.oftc.net larich.oftc.net]
ebassi has quit [resistance.oftc.net larich.oftc.net]
MTCoster has quit [resistance.oftc.net larich.oftc.net]
aswar002 has quit [resistance.oftc.net larich.oftc.net]
dcunit3d has quit [resistance.oftc.net larich.oftc.net]
alanc has quit [resistance.oftc.net larich.oftc.net]
sumits has quit [resistance.oftc.net larich.oftc.net]
itaipu has quit [resistance.oftc.net larich.oftc.net]
mattst88 has quit [resistance.oftc.net larich.oftc.net]
kem has quit [resistance.oftc.net larich.oftc.net]
Guest8532 has quit [resistance.oftc.net larich.oftc.net]
lemes has quit [resistance.oftc.net larich.oftc.net]
siqueira has quit [resistance.oftc.net larich.oftc.net]
melissawen has quit [resistance.oftc.net larich.oftc.net]
bleb has quit [resistance.oftc.net larich.oftc.net]
anholt has quit [resistance.oftc.net larich.oftc.net]
lkundrak has quit [resistance.oftc.net larich.oftc.net]
elibrokeit has quit [resistance.oftc.net larich.oftc.net]
Guest7111 has quit [resistance.oftc.net larich.oftc.net]
rpigott has quit [resistance.oftc.net larich.oftc.net]
ifreund has quit [resistance.oftc.net larich.oftc.net]
moses has quit [resistance.oftc.net larich.oftc.net]
abrotman has quit [resistance.oftc.net larich.oftc.net]
Lyude has quit [resistance.oftc.net larich.oftc.net]
demarchi has quit [resistance.oftc.net larich.oftc.net]
abrotman has joined #freedesktop
vkareh has joined #freedesktop
moses_ is now known as moses
elibrokeit_ is now known as elibrokeit
ifreund_ is now known as ifreund
siqueira has joined #freedesktop
MTCoster has joined #freedesktop
Sachiel has joined #freedesktop
aswar002 has joined #freedesktop
demarchi has joined #freedesktop
lkundrak has joined #freedesktop
alanc has joined #freedesktop
Lyude has joined #freedesktop
kem has joined #freedesktop
itaipu has joined #freedesktop
anholt has joined #freedesktop
sumits has joined #freedesktop
mattst88_ has joined #freedesktop
bleb has joined #freedesktop
<bentiss>
alright, fixed the pages jobs and all artifacts uploads... it was trying to access the failing cluster instead of using the current one
MajorBiscuit has joined #freedesktop
Major_Biscuit has quit [Ping timeout: 480 seconds]
spiegela has joined #freedesktop
<bentiss>
I've removed harbor from teh CI configuration in mesa. In theory, no visible impact
<bentiss>
zmike: I think karolherbst and daniels talked about that last week
melissawen_ has left #freedesktop [Leaving]
melissawen has joined #freedesktop
Haaninjo has joined #freedesktop
<karolherbst>
yeah, but it was unclear what's causing this problem or rather what we want to do to fix it... Those containers don't use our rustup script, so the installed rust version comes from _somewhere_
<zmike>
it's blocking further updates
<zmike>
so ideally we want to do something
<zmike>
even if it's just a stopgap
<karolherbst>
sure, but the infra update happened :) I guess we should get back to that issue
<karolherbst>
but I have no idea about that part of CI, it's something something the kernel stuff is doing there
<hakzsam>
yeah, it's blocking every new containers
vkareh has quit [Quit: WeeChat 3.6]
<mupuf>
eric_engestrom: that may be something you can help with ^
<karolherbst>
it's probably the clap_lex upgrade to 0.5.1 which happened like 5 days ago and bindgen (or something) selects 0.5.x
<karolherbst>
yeah.. that bumped the rust req from 1.64.0 to 1.70.0
<karolherbst>
I think the solution here is to make crosvm use rustup (and our script for that) and install rustc 1.70 instead of relying on the distributions rustc
<eric_engestrom>
I don't have much context here, but that last sentence makes sense to me karolherbst :)
<karolherbst>
hakzsam, zmike: there is a workaround you can try
<karolherbst>
mhh.. maybe not, not sure how --locked actually works here with binaries
<karolherbst>
yeah.. it's doing something else
<karolherbst>
there is a thing called a cargo.lock file, but I've never used it and have no idea how it works
<bentiss>
that's weird, the mesa images have not been mirrored from harbour to registry, when harbor says so
<karolherbst>
who is maintaining/managing the crossvm stuff?
<karolherbst>
*crosvm
<karolherbst>
tintou and DavidHeidelberg[m]?
<karolherbst>
please read ^^
<karolherbst>
crosvm generation has to use a fixed rustc, not whatever the distribution uses, else a crate dependency update _might_ not compile, because of too old rustc
<karolherbst>
it's currently broken, see the pipeline link
killpid_ has joined #freedesktop
<tintou>
Yeah I actually just bumped on it
<karolherbst>
_maybe_ using a Cargo.lock file is the more reliable solution here
<karolherbst>
probably the one causing less issues
mattst88_ has quit []
<bentiss>
FWIW, I'm babysitting the mesa pipeline by manually copying from harbor to registry the images that are not here
<zmike>
heroic
<bentiss>
I really don't know why harbor wasn't doing the replication properly
<eric_engestrom>
bentiss: is it possible that somehow multiarch images got lost in translation?
<eric_engestrom>
> .gitlab-ci/meson/build.sh: line 28: /usr/bin/llvm-config-15: cannot execute binary file: Exec format error
<bentiss>
eric_engestrom: it's not a multiarch image, isn't it?
<eric_engestrom>
wait no, that's not a multiarch image, that's an x86_64 image cross-building to s390x
<bentiss>
let me push it again
AbleBacon has joined #freedesktop
<bentiss>
the blobs did not match, they are pushed again
<eric_engestrom>
thanks!
<eric_engestrom>
is the push finished?
<bentiss>
not yet
<bentiss>
I'll restart the job
<eric_engestrom>
ack; I jumped the gun and already did
<bentiss>
I need to remove the image on the runners also
<eric_engestrom>
actually no rush on retrying that job, the MR will fail anyway because other jobs have been taking too long so it's too late for marge anyway
<bentiss>
well, it's for the next run
<bentiss>
but maybe it's because it was running on ml24 I just reinstalled
<bentiss>
and *maybe* s390x is not working there
<eric_engestrom>
bentiss: your latest retry worked, thanks!
<bentiss>
ok... no ideas what is happening, the image also works on ml-24
<bentiss>
hah! it's a runner issue
<bentiss>
/usr/bin/llvm-config-15: cannot execute binary file: Exec format error
<bentiss>
error: Checking out added file "/ppc64le-linux-gnu": mkdirat: No such file or directory
<mupuf>
No, I would have expected the DNS to be updated by now
<mupuf>
But I know it can take a while
<dabrain34[m]1>
the same
<dabrain34[m]1>
ok I'll give a try tomorrow morning
<mupuf>
Maybe you can modify the job to print the IP for registry.freedesktop.org?
<bentiss>
eric_engestrom: a reboot of the runner solved the issue (that's a package we can not put in the current boot apparently)
vyivel has quit [Remote host closed the connection]
<dabrain34[m]1>
it gives 172.29.208.1
<dabrain34[m]1>
where on my machine it gives 147.75.198.156
<bentiss>
looks like a proxy?
<dabrain34[m]1>
what should I expect ?
<bentiss>
147.75.198.156 is the correct IP
MajorBiscuit has quit [Ping timeout: 480 seconds]
<bentiss>
I think the errors on the registry are due to "FATAL: sorry, too many clients already / FATAL: remaining connection slots are reserved for non-replication superuser connections"
<bentiss>
so we are DoS the db
<mupuf>
Oops
<bentiss>
I'll probably split the db in 2 pods, one for registry, and one for gitlab
vyivel has joined #freedesktop
<mupuf>
bentiss: can't just increase the connection count?
<bentiss>
maybe
<bentiss>
but we already increased it to 300, so maybe we are loading too much the db
<mupuf>
I'm all for splitting, but as quick workaround, it would help
<mupuf>
bentiss: I guess the load average and iostats would tell us that better than connection counts
<bentiss>
but if I increase the connection count, I'll have to cut gitlab (or at least the db), while if I split, I just have to stop the registry for 1-2 min
<mupuf>
And how much work is it?
<tpalli>
brw I did bump the RUST_VERSION in latest version of the particular pipeline that failed, not sure if that is the correct solution
<mupuf>
If it isn't much work, then fuck yeah!
<bentiss>
mupuf: should be too hard to do
<tpalli>
s/r/t/
<mupuf>
Modularity is good then
todi has quit []
<bentiss>
alright, cutting down the registry for a short amount of time, while I migrate to a separate db
<mupuf>
bentiss: crossing fingers
<bentiss>
db migrated, waiting for the new config to propagate
<bentiss>
pods are starting
<bentiss>
and running
<mupuf>
\o/
<bentiss>
seems to be working (as in skopeo inspect works)