ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
GNUmoon has joined #freedesktop
pixelcluster has quit [Quit: Konversation terminated!]
Consolatis_ has joined #freedesktop
Consolatis is now known as Guest1900
Consolatis_ is now known as Consolatis
Guest1900 has quit [Ping timeout: 480 seconds]
Asmadeus has joined #freedesktop
Seirdy has quit [Ping timeout: 480 seconds]
Seirdy has joined #freedesktop
androidc512l has joined #freedesktop
anholt has joined #freedesktop
Rainer_Bielefeld_away has joined #freedesktop
androidc512l has quit [Remote host closed the connection]
ximion has quit []
Mithrandir has quit [Quit: brb]
Mithrandir has joined #freedesktop
Arsen has quit [Ping timeout: 480 seconds]
Arsen has joined #freedesktop
jinmiaol1o has joined #freedesktop
sirius has joined #freedesktop
sirius is now known as Guest1913
Guest1913 has quit []
jinmiaol1o has quit []
jinmiaol1o has joined #freedesktop
sirius_ has joined #freedesktop
sirius_ has quit []
msizanoen has joined #freedesktop
jinmiaol1o is now known as jinmiaoluo
jinmiaoluo has quit [Quit: leaving]
jinmiaoluo has joined #freedesktop
jinmiaoluo has quit []
jinmiaoluo has joined #freedesktop
handlerug has joined #freedesktop
pohly has joined #freedesktop
aenuivbiu has joined #freedesktop
Rainer_Bielefeld_away has quit [Ping timeout: 480 seconds]
aenuivbiu has quit [Remote host closed the connection]
danvet has joined #freedesktop
<Adrinael>
Is gitlab down?
dt9 has joined #freedesktop
<dt9>
guys, is freedesktop down right now?
<vyivel>
yep, still down, no eta, be patient
<dt9>
ack, 10x for confirmation
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
msizanoen_ has joined #freedesktop
msizanoen is now known as Guest1922
msizanoen_ is now known as msizanoen
Guest1922 has quit [Ping timeout: 480 seconds]
ofourdan_ has joined #freedesktop
ofourdan_ has left #freedesktop [#freedesktop]
ofourdan has joined #freedesktop
progandy[m] has joined #freedesktop
progandy[m] has quit []
progandy has joined #freedesktop
<daniels>
what a day to not have coffee at home
<bentiss>
daniels: ouch :/
<bentiss>
so... I managed to mess up one disk on server-3, currently restoring it
<daniels>
ah yeah, I was just about to wonder why I wasn't able to SSH to it :P
* bentiss
basically nuked /var/lib/rancher :(
<daniels>
bentiss: oh ... ouch :(
<bentiss>
luckily given that it was at partitioning time I still had the device mounted, so I backed it up before the reboot
<daniels>
bentiss: is there anything I can do to help atm?
<bentiss>
daniels: if you can try to understand why server-5 is not happy with 10.41.x.x that would be good
<daniels>
bentiss: ok! :)
<bentiss>
and it's not kilo the culprit but flannel with wireguard backend FWIW
<emersion>
i'll also be available in a bit, if you have a noob-friendly task :P
<bentiss>
emersion: the more the merrier :)
<bentiss>
emersion: same thing than daniels, it would be nice to understand why server-5 can not talk to the other services
<emersion>
ok!
alpernebbi has joined #freedesktop
<mceier>
someone could write mail describing the situation (and maybe eta); there's at least one mail on xorg-devel ml asking about 504 error
<airlied>
fubar, no eta
<mceier>
;)
<bentiss>
mceier: 2 disks down in the cluster, which means everything on fire
<daniels>
mceier: good point, sent
<mceier>
cool :)
swivel has joined #freedesktop
<hakzsam>
good luck with fixing this guys!
lkundrak has joined #freedesktop
vbenes has quit []
vbenes has joined #freedesktop
<bentiss>
finally, server-3 is back
<Asmadeus>
m
<Asmadeus>
(sorry)
<bentiss>
daniels, emersion: I managed to get the ssd on server-3 back in the pool, it's currently recovering, so hopefully the cluster will restart in a few minutes
msizanoen has quit [Quit: msizanoen]
msizanoen has joined #freedesktop
<emersion>
\o/
* bentiss
attempts at fixing large-5 too
<daniels>
bentiss: I didn't manage to figure out large-5 yet; was looking just before the reboot but it's a mystery to me how the wg traffic gets captured in the first place ...
<bentiss>
daniels: I was asking about server-5 :) not large
<daniels>
ah
* daniels
rubs eyes
<bentiss>
daniels: TBH, wg is most of the time way simpler than regular traffic, but I just don't understand the flannel config that makes the service plane working :/
<daniels>
yeah, so I'd gone to large-5 to look at a working example, and couldn't figure out how it was supposed to work in the first place? at least judging by the routing table that's there
<bentiss>
I fear the issue is because it's not on the same subnet
<bentiss>
because different facility
<daniels>
ah yeah of course, NAT
<bentiss>
and the others are working because the default route makes it use 10.99.x.x and then the interfaces are magically picking up the traffic :(
<bentiss>
but maybe we can leverage kilo to route the 10.41.x.x addresses toward server-2
msizanoen_ has joined #freedesktop
<bentiss>
and then we will have to transfer the cluster to a facility that is not deprecated
msizanoen is now known as Guest1929
msizanoen_ is now known as msizanoen
Guest1929 has quit [Ping timeout: 480 seconds]
<bentiss>
OK, disk on large-5 is back up, we now need to wait for ceph to settle
<bentiss>
and then we can clean up the various OSD leftovers
<bentiss>
daniels: so, for extra safety, what I did now was just remove the failing deployment, drained the node, rebooted it, uncordon it, then zap the *correct* disk, then killed the operator
<bentiss>
it doesn't clean up the old OSD, but at least the disk is back up
<daniels>
hmm, so it _is_ at least hitting the right iptables rules to masquerade which should push it through wg ...
<bentiss>
mayeb a wrong ufw config on the other side
<daniels>
but then why would it only be intermittent? :\
<daniels>
bentiss: \o/ thanks!
Mattia_98 has joined #freedesktop
___nick___ has joined #freedesktop
oSoMoN has joined #freedesktop
oSoMoN has quit []
msizanoen has quit [Remote host closed the connection]
<bentiss>
recovery stopped... running fstrim on all the nodes, that might be the issue
mihalycsaba has joined #freedesktop
pixelcluster has joined #freedesktop
mchehab has joined #freedesktop
mihalycsaba has quit []
msizanoen has joined #freedesktop
* bentiss
reboots large-7
* bentiss
upgrades rook from 1.6.8 to 1.6.11
chaim has joined #freedesktop
Rainer_Bielefeld_away has joined #freedesktop
kj has joined #freedesktop
<daniels>
bentiss: on the network side, tracing through the iptables rules, in the NAT table we go from POSTROUTING -> KUBE-SERVICES -> KUBE-SVC-NPX46M4PTMTKRN6Y for the HTTP plane
<daniels>
that balances connections between targets of 10.99.237.141 (server-2), 10.99.237.145 (server-3), and 10.66.151.3 (server-5)
<daniels>
perhaps unsurprisingly, server-5 is the one which fails to answer itself
<bentiss>
or the other way around :)
jrybar has joined #freedesktop
<daniels>
nope
<daniels>
every time we land in the server-5 chain it times out; every time we land in server-2 or server-3 it works fine
<bentiss>
hmm, interesting
<daniels>
it's on a probability distribution directing 50% to server-5, which explains why it works exactly half the time
<daniels>
(I've currently hacked it to always forward to server-2, which is now working 100% of the time)
<bentiss>
heh, that's what I was about to suggest
<bentiss>
can I try to uncordon server-5 then?
<bentiss>
right now the pg are stuck backfilling, so adding a couple of disks might unblock them
<daniels>
yep, go for it
<daniels>
let's see what happens
<bentiss>
thanks
<daniels>
I need to go afk for a bit anyway, back at 1pm your time
<bentiss>
k
* bentiss
probably needs to find some food too
GNUmoon has quit [Remote host closed the connection]
jkhsjdhjs has joined #freedesktop
GNUmoon has joined #freedesktop
thaller is now known as Guest1934
thaller has joined #freedesktop
pobrn has joined #freedesktop
msizanoen has quit [Ping timeout: 480 seconds]
Guest1934 has quit [Ping timeout: 480 seconds]
<bentiss>
daniels: still failing a lot
<bentiss>
I think I'll grab some lunch and then remove server-5, and use a new c2-medium as an agent, not a server. This should solve the issues with control plane
mihalycsaba has joined #freedesktop
freddy has joined #freedesktop
glehmann has joined #freedesktop
AbleBacon has quit [Read error: Connection reset by peer]
<daniels>
ack
<daniels>
and yeah, I can keep looking into it, but reflexively I think it would be better to get everything in the same facility + use a VPC for all traffic + pare ufw down to the absolute bare minimum ruleset (allow WG into boundary host + allow HTTPS/SSH ingress to elastic + drop all other incoming external) so the k3s-internal traffic can be managed solely by k3s rules?
mihalycsaba has quit []
bengal has joined #freedesktop
GNUmoon has quit [Remote host closed the connection]
bengal has quit []
progandy has quit [Ping timeout: 480 seconds]
aleksander has quit [Quit: Leaving]
GNUmoon has joined #freedesktop
MajorBiscuit has joined #freedesktop
<bentiss>
daniels: sounds appealing :)
bfr__ has joined #freedesktop
<bentiss>
daniels: can I nuke server-5?
Rainer_Bielefeld_away_ has joined #freedesktop
GNUmoon has quit [Remote host closed the connection]
Rainer_Bielefeld_away has quit [Ping timeout: 480 seconds]
GNUmoon has joined #freedesktop
<daniels>
yep, fine by me :)
Rainer_Bielefeld_away_ has quit [Remote host closed the connection]
moses has joined #freedesktop
JoniSt has joined #freedesktop
<shadeslayer>
Hm, I seem to be hitting gateway timeout issues on gitlab
<bentiss>
\o/ managed to kick in the recovery once again
<JoniSt>
Man... Good luck!
<bentiss>
yes, no more stale pgs
<daniels>
JoniSt: there is no data loss, just annoyance
<JoniSt>
That's nice to hear. Reminds me of the fact that a single raid1 btrfs might also not be enough to keep my own Gitlab instance alive if something happens...
cmeissl[m] has joined #freedesktop
jarl has joined #freedesktop
jarl has quit [Remote host closed the connection]
progandy has joined #freedesktop
benzea has joined #freedesktop
<bentiss>
daniels: I think I'll reboot all machines one after the other, some process are stuck
<daniels>
bentiss: yeah ... RBD I'm guessing
<bentiss>
fstrim is also hanging, and what actaully made the recovery start again was to kill all the rook-ceph pods besides the osds
<bentiss>
so a full reboot might help
<daniels>
heh ...
ximion has joined #freedesktop
freddy has quit []
bingle has joined #freedesktop
<karolherbst>
JoniSt: I was thinking about having my own gitlab instance, but making it rock solid _is_ a huge investment and until that's settled, data is better of being replicated externally anyway... :D
<karolherbst>
I am thinking about doing raid6
<JoniSt>
Hmm... Well, raid5/6 always gives me a bit of a weird feeling :P
akselmo has joined #freedesktop
kelvium has joined #freedesktop
akselmo has left #freedesktop [#freedesktop]
<JoniSt>
I mostly store university stuff on that Gitlab (when I do group work) so it wouldn't be thaaat bad if the filesystem crashed
pjakobsson_ has joined #freedesktop
pjakobsson has quit [Ping timeout: 480 seconds]
kelvium has left #freedesktop [#freedesktop]
<karolherbst>
JoniSt: yeah... but I also plan to do a setup complete without any fans or noise, so I have to make sure to not go overboard with the budget
<karolherbst>
and complete mirroring can get extremly expensive
<karolherbst>
well. with SSDs that is
<karolherbst>
there are actually passively cooled cases which would allow a power budget of ~200W, so that's not even a huge issue
<karolherbst>
just.. expensive :D
msizanoen has joined #freedesktop
<DragoonAethis>
karolherbst: or you could go for an actively-cooled case and replace the fans
<karolherbst>
nah
<karolherbst>
it's still audible
<DragoonAethis>
Yeah, but with Noctua/be quiet fans it can be really quiet
<DragoonAethis>
Pretty much inaudible unless you've got the box in a silent room at night or something like that
<karolherbst>
the issue is, that motherboards are generally quite crappy in this regard
<karolherbst>
so if the CPU/whatever isn't hot/warm, the fans can be turned off
<karolherbst>
but firmware....
<Mattia_98>
noctua fans running on low rpm are inaudible, I can vouch for that
<karolherbst>
DragoonAethis: well.. my work laptop has its fans usually completely turned of, so it would be noticable :P
<karolherbst>
and I do have noctua fans for other stuff
<karolherbst>
Mattia_98: thing is.. they are inaudible when it doesn't matter
<karolherbst>
under load is where it matters
jekstrand has joined #freedesktop
<DragoonAethis>
Or alternatively, get a fan controller and write custom cooling scripts
<karolherbst>
so I can go for complete fanless with a power budget of 200W or....
<karolherbst>
the noctua fans aren't silent if they have to cool away ~100W of CPU heat
<karolherbst>
DragoonAethis, Mattia_98: Streacom DB4 is what I was thinking about
<karolherbst>
can manage up to 110W CPU heat and 65W GPU heat
<karolherbst>
where I wouldn't need the GPU cooling
<karolherbst>
thing is.. there isn't much space in it :D
<DragoonAethis>
Unfortunately I'm more of a midi tower+ guy myself ;P
aXlH has joined #freedesktop
<DragoonAethis>
But it looks really nice
<karolherbst>
yeah.. I have one for my desktop for work
<karolherbst>
got myself the be quiet! Dark Rock 4 PRO BK022 CPU cooler which isn't all that bad actually
<DragoonAethis>
I have the non-Pro version, it's pretty good too (but getting the Pro for the next upgrade)
<karolherbst>
definetly worth the money
<karolherbst>
it manages 150W without getting loud
<karolherbst>
and my CPU stays at max clock pretty much all the time
<DragoonAethis>
And for the case it's Fractal Meshify C with the stock Fractal fans (which are almost inaudible, but they don't ramp up with the rest of the system)
<karolherbst>
but for devices which are like on 24/7 I want something without fans :P
JoniSt has quit [Ping timeout: 480 seconds]
<bentiss>
daniels: \o/ ceph is back in the game
<bentiss>
no more degraded objects
<daniels>
bentiss: woo! I saw the tools pod is working now that we're upgraded too, awesome
<bentiss>
though daemons are crashing like hell
<bentiss>
daniels: I fixed the deployment of the tools pod to use the same rook minor version :)
<bentiss>
it's not part of the helm chart :(
<daniels>
ahhhhh, right
<daniels>
I did think it was weird that they'd use :master
<daniels>
ooh, gitaly-2 now running too
<bentiss>
and postgres!
<daniels>
\o/
<daniels>
should I try redis next?
<bentiss>
still having connectivity issues
<daniels>
ah :(
<daniels>
control plane or inter-pod?
<bentiss>
I mean 504
<bentiss>
but yeah, feel free to re-enable redis
<daniels>
oh right, yeah it'll 504 since I killed the redis + webservice pods :P
<bentiss>
that's what I just realized :)
<daniels>
the log noise was getting annoying
<bentiss>
\o/ back online!!!!!
<daniels>
:D :D :D
<bentiss>
that's what... 24h of downtime?
<bentiss>
(back online, right before I got to pick up kids at school)
<pixelcluster>
many thanks to both of you for fixing it!!
<Mattia_98>
nice job guys!
<jekstrand>
\o/ y'all are heroes!
<karolherbst>
\o/ daniels, bentiss: thanks for all the work!
<bentiss>
daniels: I'm off for taday I think. I have removed the PVC for elasticSearch, but we need to clean up the actual data on disk to reclaim the space
<bentiss>
and I am starting to have a strong headache now, so that will be something for tomorrow
<daniels>
bentiss: thanks so much Monsieur Storage Wizard <3
<daniels>
hope you have a nice & quiet night, drink lots of water
<bentiss>
daniels: I shall not say how many reboots it took me :)
JoniSt has joined #freedesktop
<DragoonAethis>
Congrats :D
<JoniSt>
Yay, nice! :D
<daniels>
bentiss: _cough_
<daniels>
bentiss: I'll look at how we can move to VPC-only networking
<hakzsam>
thanks, great job!
<pq>
Thank you! I got a full day of working on my own code instead of reviewing others' stuff. ;-D
Haaninjo has joined #freedesktop
<jkhsjdhjs>
thanks, appreciate it! finally I can look through the pipewire issues :D
<karolherbst>
yay, I can finally work again!
<daniels>
bentiss: oh yeah, when you're back tomorrow could you please push the helm changes?
<kisak>
Thanks for burning half your weekend on that snafu.
<Mattia_98>
Michael already wrote an article on Phoronix. He works fast XD
<danvet>
daniels, bentiss thx a lot!
<bentiss>
daniels: changes in helm-gitlab-config pushed
<daniels>
bentiss: merci!
i-garrison has quit [Read error: Connection reset by peer]
pixelcluster has quit [Read error: Connection reset by peer]
i-garrison has joined #freedesktop
bingle has quit []
freddy has joined #freedesktop
freddy has quit []
benzea has left #freedesktop [#freedesktop]
msizanoen has quit [Remote host closed the connection]
ybogdano has joined #freedesktop
jrybar has quit [Ping timeout: 480 seconds]
Rainer_Bielefeld_away has joined #freedesktop
Seirdy has quit []
progandy has quit [Remote host closed the connection]
MajorBiscuit has quit [Ping timeout: 480 seconds]
progandy has joined #freedesktop
ybogdano has quit [Ping timeout: 480 seconds]
reillybrogan_ has quit []
reillybrogan has joined #freedesktop
ybogdano has joined #freedesktop
___nick___ has quit []
___nick___ has joined #freedesktop
___nick___ has quit []
___nick___ has joined #freedesktop
bengal has quit [Ping timeout: 480 seconds]
Seirdy has joined #freedesktop
Rainer_Bielefeld_away has quit []
PuercoPop has joined #freedesktop
<eric_engestrom>
bentiss, daniels: awesome work these last couple of days! 💪
<eric_engestrom>
(adding to the pile of well deserved praise)
ybogdano is now known as Guest1956
ybogdano has joined #freedesktop
Guest1956 has quit [Ping timeout: 480 seconds]
ybogdano is now known as Guest1958
ybogdano has joined #freedesktop
Guest1958 has quit [Ping timeout: 480 seconds]
___nick___ has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Quit: Ex-Chat]
<alanc>
+100
chaim has quit [Quit: Konversation terminated!]
bengal has joined #freedesktop
eroux_ has joined #freedesktop
eroux has quit [Ping timeout: 480 seconds]
bengal has quit [Ping timeout: 480 seconds]
pohly has quit []
danvet has quit [Ping timeout: 480 seconds]
jekstrand has left #freedesktop [#freedesktop]
Mattia_98 has quit [Remote host closed the connection]
ybogdano is now known as Guest1973
ybogdano has joined #freedesktop
bingle has joined #freedesktop
Guest1973 has quit [Ping timeout: 480 seconds]
ybogdano is now known as Guest1974
ybogdano has joined #freedesktop
thaller has quit [Ping timeout: 480 seconds]
Guest1974 has quit [Ping timeout: 480 seconds]
<dcbaker>
daniels: I have a docker file for the mr-lable-maker: https://gitlab.freedesktop.org/dbaker/mr-label-maker-docker you can pass it GITLAB_TOKEN as an environment variable. I've gotten far enough with it to see that it wants a token, but that's it
<dcbaker>
I took Marcin's work and did a little cleanup to it a littl emore pythonic, but it's otherwise the same (i added setup.py to make installation easier, for example)
<dcbaker>
let me know if that looks reasonable to you whenever you've clamed down from the ceph stuff :)