ChanServ changed the topic of #freedesktop to: infrastructure and online services || for questions about projects, please see each project's contact || for discussions about specifications, please use or
GNUmoon has joined #freedesktop
pixelcluster has quit [Quit: Konversation terminated!]
Consolatis_ has joined #freedesktop
Consolatis is now known as Guest1900
Consolatis_ is now known as Consolatis
Guest1900 has quit [Ping timeout: 480 seconds]
Asmadeus has joined #freedesktop
Seirdy has quit [Ping timeout: 480 seconds]
Seirdy has joined #freedesktop
androidc512l has joined #freedesktop
anholt has joined #freedesktop
Rainer_Bielefeld_away has joined #freedesktop
androidc512l has quit [Remote host closed the connection]
ximion has quit []
Mithrandir has quit [Quit: brb]
Mithrandir has joined #freedesktop
Arsen has quit [Ping timeout: 480 seconds]
Arsen has joined #freedesktop
jinmiaol1o has joined #freedesktop
sirius has joined #freedesktop
sirius is now known as Guest1913
Guest1913 has quit []
jinmiaol1o has quit []
jinmiaol1o has joined #freedesktop
sirius_ has joined #freedesktop
sirius_ has quit []
msizanoen has joined #freedesktop
jinmiaol1o is now known as jinmiaoluo
jinmiaoluo has quit [Quit: leaving]
jinmiaoluo has joined #freedesktop
jinmiaoluo has quit []
jinmiaoluo has joined #freedesktop
handlerug has joined #freedesktop
pohly has joined #freedesktop
aenuivbiu has joined #freedesktop
Rainer_Bielefeld_away has quit [Ping timeout: 480 seconds]
aenuivbiu has quit [Remote host closed the connection]
danvet has joined #freedesktop
<Adrinael> Is gitlab down?
dt9 has joined #freedesktop
<dt9> guys, is freedesktop down right now?
<vyivel> yep, still down, no eta, be patient
<dt9> ack, 10x for confirmation
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
msizanoen_ has joined #freedesktop
msizanoen is now known as Guest1922
msizanoen_ is now known as msizanoen
Guest1922 has quit [Ping timeout: 480 seconds]
ofourdan_ has joined #freedesktop
ofourdan_ has left #freedesktop [#freedesktop]
ofourdan has joined #freedesktop
progandy[m] has joined #freedesktop
progandy[m] has quit []
progandy has joined #freedesktop
<daniels> what a day to not have coffee at home
<bentiss> daniels: ouch :/
<bentiss> so... I managed to mess up one disk on server-3, currently restoring it
<daniels> ah yeah, I was just about to wonder why I wasn't able to SSH to it :P
* bentiss basically nuked /var/lib/rancher :(
<daniels> bentiss: oh ... ouch :(
<bentiss> luckily given that it was at partitioning time I still had the device mounted, so I backed it up before the reboot
<daniels> bentiss: is there anything I can do to help atm?
<bentiss> daniels: if you can try to understand why server-5 is not happy with 10.41.x.x that would be good
<daniels> bentiss: ok! :)
<bentiss> and it's not kilo the culprit but flannel with wireguard backend FWIW
<emersion> i'll also be available in a bit, if you have a noob-friendly task :P
<bentiss> emersion: the more the merrier :)
<bentiss> emersion: same thing than daniels, it would be nice to understand why server-5 can not talk to the other services
<emersion> ok!
alpernebbi has joined #freedesktop
<mceier> someone could write mail describing the situation (and maybe eta); there's at least one mail on xorg-devel ml asking about 504 error
<airlied> fubar, no eta
<mceier> ;)
<bentiss> mceier: 2 disks down in the cluster, which means everything on fire
<daniels> mceier: good point, sent
<mceier> cool :)
swivel has joined #freedesktop
<hakzsam> good luck with fixing this guys!
lkundrak has joined #freedesktop
vbenes has quit []
vbenes has joined #freedesktop
<bentiss> finally, server-3 is back
<Asmadeus> m
<Asmadeus> (sorry)
<bentiss> daniels, emersion: I managed to get the ssd on server-3 back in the pool, it's currently recovering, so hopefully the cluster will restart in a few minutes
msizanoen has quit [Quit: msizanoen]
msizanoen has joined #freedesktop
<emersion> \o/
* bentiss attempts at fixing large-5 too
<daniels> bentiss: I didn't manage to figure out large-5 yet; was looking just before the reboot but it's a mystery to me how the wg traffic gets captured in the first place ...
<bentiss> daniels: I was asking about server-5 :) not large
<daniels> ah
* daniels rubs eyes
<bentiss> daniels: TBH, wg is most of the time way simpler than regular traffic, but I just don't understand the flannel config that makes the service plane working :/
<daniels> yeah, so I'd gone to large-5 to look at a working example, and couldn't figure out how it was supposed to work in the first place? at least judging by the routing table that's there
<bentiss> I fear the issue is because it's not on the same subnet
<bentiss> because different facility
<daniels> ah yeah of course, NAT
<bentiss> and the others are working because the default route makes it use 10.99.x.x and then the interfaces are magically picking up the traffic :(
<bentiss> but maybe we can leverage kilo to route the 10.41.x.x addresses toward server-2
msizanoen_ has joined #freedesktop
<bentiss> and then we will have to transfer the cluster to a facility that is not deprecated
msizanoen is now known as Guest1929
msizanoen_ is now known as msizanoen
Guest1929 has quit [Ping timeout: 480 seconds]
<bentiss> OK, disk on large-5 is back up, we now need to wait for ceph to settle
<bentiss> and then we can clean up the various OSD leftovers
<bentiss> daniels: so, for extra safety, what I did now was just remove the failing deployment, drained the node, rebooted it, uncordon it, then zap the *correct* disk, then killed the operator
<bentiss> it doesn't clean up the old OSD, but at least the disk is back up
<daniels> hmm, so it _is_ at least hitting the right iptables rules to masquerade which should push it through wg ...
<bentiss> mayeb a wrong ufw config on the other side
<daniels> but then why would it only be intermittent? :\
<daniels> bentiss: \o/ thanks!
Mattia_98 has joined #freedesktop
___nick___ has joined #freedesktop
oSoMoN has joined #freedesktop
oSoMoN has quit []
msizanoen has quit [Remote host closed the connection]
<bentiss> recovery stopped... running fstrim on all the nodes, that might be the issue
mihalycsaba has joined #freedesktop
pixelcluster has joined #freedesktop
mchehab has joined #freedesktop
mihalycsaba has quit []
msizanoen has joined #freedesktop
* bentiss reboots large-7
* bentiss upgrades rook from 1.6.8 to 1.6.11
chaim has joined #freedesktop
Rainer_Bielefeld_away has joined #freedesktop
kj has joined #freedesktop
<daniels> bentiss: on the network side, tracing through the iptables rules, in the NAT table we go from POSTROUTING -> KUBE-SERVICES -> KUBE-SVC-NPX46M4PTMTKRN6Y for the HTTP plane
<daniels> that balances connections between targets of (server-2), (server-3), and (server-5)
<daniels> perhaps unsurprisingly, server-5 is the one which fails to answer itself
<bentiss> or the other way around :)
jrybar has joined #freedesktop
<daniels> nope
<daniels> every time we land in the server-5 chain it times out; every time we land in server-2 or server-3 it works fine
<bentiss> hmm, interesting
<daniels> it's on a probability distribution directing 50% to server-5, which explains why it works exactly half the time
<daniels> (I've currently hacked it to always forward to server-2, which is now working 100% of the time)
<bentiss> heh, that's what I was about to suggest
<bentiss> can I try to uncordon server-5 then?
<bentiss> right now the pg are stuck backfilling, so adding a couple of disks might unblock them
<daniels> yep, go for it
<daniels> let's see what happens
<bentiss> thanks
<daniels> I need to go afk for a bit anyway, back at 1pm your time
<bentiss> k
* bentiss probably needs to find some food too
GNUmoon has quit [Remote host closed the connection]
jkhsjdhjs has joined #freedesktop
GNUmoon has joined #freedesktop
thaller is now known as Guest1934
thaller has joined #freedesktop
pobrn has joined #freedesktop
msizanoen has quit [Ping timeout: 480 seconds]
Guest1934 has quit [Ping timeout: 480 seconds]
<bentiss> daniels: still failing a lot
<bentiss> I think I'll grab some lunch and then remove server-5, and use a new c2-medium as an agent, not a server. This should solve the issues with control plane
mihalycsaba has joined #freedesktop
freddy has joined #freedesktop
glehmann has joined #freedesktop
AbleBacon has quit [Read error: Connection reset by peer]
<daniels> ack
<daniels> and yeah, I can keep looking into it, but reflexively I think it would be better to get everything in the same facility + use a VPC for all traffic + pare ufw down to the absolute bare minimum ruleset (allow WG into boundary host + allow HTTPS/SSH ingress to elastic + drop all other incoming external) so the k3s-internal traffic can be managed solely by k3s rules?
mihalycsaba has quit []
bengal has joined #freedesktop
GNUmoon has quit [Remote host closed the connection]
bengal has quit []
progandy has quit [Ping timeout: 480 seconds]
aleksander has quit [Quit: Leaving]
GNUmoon has joined #freedesktop
MajorBiscuit has joined #freedesktop
<bentiss> daniels: sounds appealing :)
bfr__ has joined #freedesktop
<bentiss> daniels: can I nuke server-5?
Rainer_Bielefeld_away_ has joined #freedesktop
GNUmoon has quit [Remote host closed the connection]
Rainer_Bielefeld_away has quit [Ping timeout: 480 seconds]
GNUmoon has joined #freedesktop
<daniels> yep, fine by me :)
Rainer_Bielefeld_away_ has quit [Remote host closed the connection]
moses has joined #freedesktop
JoniSt has joined #freedesktop
<shadeslayer> Hm, I seem to be hitting gateway timeout issues on gitlab
<shadeslayer> though hopefully it's temporary
<JoniSt> Sadly not temporary, the Gitlab had massive data loss
<shadeslayer> ah shit :(
bengal has joined #freedesktop
<pq> JoniSt, no, no data loss AFAIU.
<shadeslayer> I'd be surprised if there was ^^
<JoniSt> Pheeeew. I hadn't heard much news about it yet other than the Phoronix article
<JoniSt> But yeah, I'd assume that the Gitlab gets backed up very regularly
<bentiss> \o/ managed to kick in the recovery once again
<JoniSt> Man... Good luck!
<bentiss> yes, no more stale pgs
<daniels> JoniSt: there is no data loss, just annoyance
<JoniSt> That's nice to hear. Reminds me of the fact that a single raid1 btrfs might also not be enough to keep my own Gitlab instance alive if something happens...
cmeissl[m] has joined #freedesktop
jarl has joined #freedesktop
jarl has quit [Remote host closed the connection]
progandy has joined #freedesktop
benzea has joined #freedesktop
<bentiss> daniels: I think I'll reboot all machines one after the other, some process are stuck
<daniels> bentiss: yeah ... RBD I'm guessing
<bentiss> fstrim is also hanging, and what actaully made the recovery start again was to kill all the rook-ceph pods besides the osds
<bentiss> so a full reboot might help
<daniels> heh ...
ximion has joined #freedesktop
freddy has quit []
bingle has joined #freedesktop
<karolherbst> JoniSt: I was thinking about having my own gitlab instance, but making it rock solid _is_ a huge investment and until that's settled, data is better of being replicated externally anyway... :D
<karolherbst> I am thinking about doing raid6
<JoniSt> Hmm... Well, raid5/6 always gives me a bit of a weird feeling :P
akselmo has joined #freedesktop
kelvium has joined #freedesktop
akselmo has left #freedesktop [#freedesktop]
<JoniSt> I mostly store university stuff on that Gitlab (when I do group work) so it wouldn't be thaaat bad if the filesystem crashed
pjakobsson_ has joined #freedesktop
pjakobsson has quit [Ping timeout: 480 seconds]
kelvium has left #freedesktop [#freedesktop]
<karolherbst> JoniSt: yeah... but I also plan to do a setup complete without any fans or noise, so I have to make sure to not go overboard with the budget
<karolherbst> and complete mirroring can get extremly expensive
<karolherbst> well. with SSDs that is
<karolherbst> there are actually passively cooled cases which would allow a power budget of ~200W, so that's not even a huge issue
<karolherbst> just.. expensive :D
msizanoen has joined #freedesktop
<DragoonAethis> karolherbst: or you could go for an actively-cooled case and replace the fans
<karolherbst> nah
<karolherbst> it's still audible
<DragoonAethis> Yeah, but with Noctua/be quiet fans it can be really quiet
<DragoonAethis> Pretty much inaudible unless you've got the box in a silent room at night or something like that
<karolherbst> the issue is, that motherboards are generally quite crappy in this regard
<karolherbst> so if the CPU/whatever isn't hot/warm, the fans can be turned off
<karolherbst> but firmware....
<Mattia_98> noctua fans running on low rpm are inaudible, I can vouch for that
<karolherbst> DragoonAethis: well.. my work laptop has its fans usually completely turned of, so it would be noticable :P
<karolherbst> and I do have noctua fans for other stuff
<karolherbst> Mattia_98: thing is.. they are inaudible when it doesn't matter
<karolherbst> under load is where it matters
jekstrand has joined #freedesktop
<DragoonAethis> Or alternatively, get a fan controller and write custom cooling scripts
<karolherbst> so I can go for complete fanless with a power budget of 200W or....
<karolherbst> the noctua fans aren't silent if they have to cool away ~100W of CPU heat
<karolherbst> DragoonAethis, Mattia_98: Streacom DB4 is what I was thinking about
<karolherbst> can manage up to 110W CPU heat and 65W GPU heat
<karolherbst> where I wouldn't need the GPU cooling
<karolherbst> thing is.. there isn't much space in it :D
<DragoonAethis> Unfortunately I'm more of a midi tower+ guy myself ;P
aXlH has joined #freedesktop
<DragoonAethis> But it looks really nice
<karolherbst> yeah.. I have one for my desktop for work
<karolherbst> got myself the be quiet! Dark Rock 4 PRO BK022 CPU cooler which isn't all that bad actually
<DragoonAethis> I have the non-Pro version, it's pretty good too (but getting the Pro for the next upgrade)
<karolherbst> definetly worth the money
<karolherbst> it manages 150W without getting loud
<karolherbst> and my CPU stays at max clock pretty much all the time
<DragoonAethis> And for the case it's Fractal Meshify C with the stock Fractal fans (which are almost inaudible, but they don't ramp up with the rest of the system)
<karolherbst> but for devices which are like on 24/7 I want something without fans :P
JoniSt has quit [Ping timeout: 480 seconds]
<bentiss> daniels: \o/ ceph is back in the game
<bentiss> no more degraded objects
<daniels> bentiss: woo! I saw the tools pod is working now that we're upgraded too, awesome
<bentiss> though daemons are crashing like hell
<bentiss> daniels: I fixed the deployment of the tools pod to use the same rook minor version :)
<bentiss> it's not part of the helm chart :(
<daniels> ahhhhh, right
<daniels> I did think it was weird that they'd use :master
<daniels> ooh, gitaly-2 now running too
<bentiss> and postgres!
<daniels> \o/
<daniels> should I try redis next?
<bentiss> still having connectivity issues
<daniels> ah :(
<daniels> control plane or inter-pod?
<bentiss> I mean 504
<bentiss> but yeah, feel free to re-enable redis
<daniels> oh right, yeah it'll 504 since I killed the redis + webservice pods :P
<bentiss> that's what I just realized :)
<daniels> the log noise was getting annoying
<bentiss> \o/ back online!!!!!
<daniels> :D :D :D
<bentiss> that's what... 24h of downtime?
<bentiss> (back online, right before I got to pick up kids at school)
<pixelcluster> many thanks to both of you for fixing it!!
<Mattia_98> nice job guys!
<jekstrand> \o/ y'all are heroes!
<karolherbst> \o/ daniels, bentiss: thanks for all the work!
<bentiss> daniels: I'm off for taday I think. I have removed the PVC for elasticSearch, but we need to clean up the actual data on disk to reclaim the space
<bentiss> and I am starting to have a strong headache now, so that will be something for tomorrow
<daniels> bentiss: thanks so much Monsieur Storage Wizard <3
<daniels> hope you have a nice & quiet night, drink lots of water
<bentiss> daniels: I shall not say how many reboots it took me :)
JoniSt has joined #freedesktop
<DragoonAethis> Congrats :D
<JoniSt> Yay, nice! :D
<daniels> bentiss: _cough_
<daniels> bentiss: I'll look at how we can move to VPC-only networking
<hakzsam> thanks, great job!
<pq> Thank you! I got a full day of working on my own code instead of reviewing others' stuff. ;-D
Haaninjo has joined #freedesktop
<jkhsjdhjs> thanks, appreciate it! finally I can look through the pipewire issues :D
<karolherbst> yay, I can finally work again!
<daniels> bentiss: oh yeah, when you're back tomorrow could you please push the helm changes?
<kisak> Thanks for burning half your weekend on that snafu.
<Mattia_98> Michael already wrote an article on Phoronix. He works fast XD
<danvet> daniels, bentiss thx a lot!
<bentiss> daniels: changes in helm-gitlab-config pushed
<daniels> bentiss: merci!
i-garrison has quit [Read error: Connection reset by peer]
pixelcluster has quit [Read error: Connection reset by peer]
i-garrison has joined #freedesktop
bingle has quit []
freddy has joined #freedesktop
freddy has quit []
benzea has left #freedesktop [#freedesktop]
msizanoen has quit [Remote host closed the connection]
ybogdano has joined #freedesktop
jrybar has quit [Ping timeout: 480 seconds]
Rainer_Bielefeld_away has joined #freedesktop
Seirdy has quit []
progandy has quit [Remote host closed the connection]
MajorBiscuit has quit [Ping timeout: 480 seconds]
progandy has joined #freedesktop
ybogdano has quit [Ping timeout: 480 seconds]
reillybrogan_ has quit []
reillybrogan has joined #freedesktop
ybogdano has joined #freedesktop
___nick___ has quit []
___nick___ has joined #freedesktop
___nick___ has quit []
___nick___ has joined #freedesktop
bengal has quit [Ping timeout: 480 seconds]
Seirdy has joined #freedesktop
Rainer_Bielefeld_away has quit []
PuercoPop has joined #freedesktop
<eric_engestrom> bentiss, daniels: awesome work these last couple of days! 💪
<eric_engestrom> (adding to the pile of well deserved praise)
ybogdano is now known as Guest1956
ybogdano has joined #freedesktop
Guest1956 has quit [Ping timeout: 480 seconds]
ybogdano is now known as Guest1958
ybogdano has joined #freedesktop
Guest1958 has quit [Ping timeout: 480 seconds]
___nick___ has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Quit: Ex-Chat]
<alanc> +100
chaim has quit [Quit: Konversation terminated!]
bengal has joined #freedesktop
eroux_ has joined #freedesktop
eroux has quit [Ping timeout: 480 seconds]
bengal has quit [Ping timeout: 480 seconds]
pohly has quit []
danvet has quit [Ping timeout: 480 seconds]
jekstrand has left #freedesktop [#freedesktop]
Mattia_98 has quit [Remote host closed the connection]
ybogdano is now known as Guest1973
ybogdano has joined #freedesktop
bingle has joined #freedesktop
Guest1973 has quit [Ping timeout: 480 seconds]
ybogdano is now known as Guest1974
ybogdano has joined #freedesktop
thaller has quit [Ping timeout: 480 seconds]
Guest1974 has quit [Ping timeout: 480 seconds]
<dcbaker> daniels: I have a docker file for the mr-lable-maker: you can pass it GITLAB_TOKEN as an environment variable. I've gotten far enough with it to see that it wants a token, but that's it
<dcbaker> I took Marcin's work and did a little cleanup to it a littl emore pythonic, but it's otherwise the same (i added to make installation easier, for example)
<dcbaker> let me know if that looks reasonable to you whenever you've clamed down from the ceph stuff :)
karolherbst has quit [Ping timeout: 480 seconds]
karolherbst has joined #freedesktop
bingle has quit []
pobrn has quit [Ping timeout: 480 seconds]
romangg has quit [Ping timeout: 480 seconds]
dos1 has quit [Ping timeout: 480 seconds]
shadeslayer has quit [Ping timeout: 480 seconds]
mupuf has quit [Ping timeout: 480 seconds]
hakzsam has quit [Ping timeout: 480 seconds]
ivyl has quit [Ping timeout: 480 seconds]
dos1 has joined #freedesktop
ivyl has joined #freedesktop