ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
Hooloovoo has quit [Ping timeout: 480 seconds]
jnoorman has quit [Read error: Network is unreachable]
linyaa has quit [Read error: Network is unreachable]
vignesh has quit [Read error: Network is unreachable]
markco has quit [Read error: Network is unreachable]
zmike has quit [Read error: Network is unreachable]
zmike has joined #freedesktop
kj2 has quit [Read error: Network is unreachable]
pendingchaos has quit [Read error: Network is unreachable]
samuelig has quit [Read error: Network is unreachable]
austriancoder has quit [Read error: Network is unreachable]
rg3igalia has quit [Read error: Network is unreachable]
pendingchaos has joined #freedesktop
austriancoder has joined #freedesktop
samuelig has joined #freedesktop
rg3igalia has joined #freedesktop
jnoorman has joined #freedesktop
linyaa has joined #freedesktop
markco has joined #freedesktop
cascardo_ has quit []
cascardo has joined #freedesktop
vignesh has joined #freedesktop
bwidawsk has quit [Read error: Network is unreachable]
bwidawsk has joined #freedesktop
kj2 has joined #freedesktop
balrog_ has joined #freedesktop
balrog has quit [Read error: Connection reset by peer]
Hooloovoo has joined #freedesktop
guludo has quit [Ping timeout: 480 seconds]
pjakobsson_ has joined #freedesktop
pjakobsson has quit [Ping timeout: 480 seconds]
m5zs7k has quit [Ping timeout: 480 seconds]
m5zs7k has joined #freedesktop
scrumplex has joined #freedesktop
scrumplex_ has quit [Ping timeout: 480 seconds]
jarthur has joined #freedesktop
sewn has joined #freedesktop
ximion has quit [Remote host closed the connection]
pjakobsson has joined #freedesktop
pjakobsson_ has quit [Ping timeout: 480 seconds]
jsa1 has joined #freedesktop
jarthur has quit [Ping timeout: 480 seconds]
jsa1 has quit [Ping timeout: 480 seconds]
haaninjo has joined #freedesktop
haaninjo has quit [Remote host closed the connection]
noodlez1232 has quit [Remote host closed the connection]
noodlez1232 has joined #freedesktop
blu has joined #freedesktop
swatish2 has joined #freedesktop
krastevm has quit [Ping timeout: 480 seconds]
swatish21 has joined #freedesktop
swatish2 has quit [Ping timeout: 480 seconds]
tzimmermann has joined #freedesktop
AbleBacon has quit [Read error: Connection reset by peer]
swatish2 has joined #freedesktop
jsa1 has joined #freedesktop
swatish21 has quit [Ping timeout: 480 seconds]
jsa1 has left #freedesktop [#freedesktop]
jsa1 has joined #freedesktop
swatish2 has quit [Ping timeout: 480 seconds]
swatish2 has joined #freedesktop
sima has joined #freedesktop
sghuge has quit [Remote host closed the connection]
sghuge has joined #freedesktop
dcunit3d has quit [Quit: No Ping reply in 180 seconds.]
dcunit3d has joined #freedesktop
mripard has joined #freedesktop
krastevm has joined #freedesktop
blu has quit [Ping timeout: 480 seconds]
blu has joined #freedesktop
martink has joined #freedesktop
krastevm has quit [Ping timeout: 480 seconds]
blu has quit [Ping timeout: 480 seconds]
<bentiss>
sigh, one OSD (disk) is full and ceph decided to stop all operations
<kode54>
:/
<soreau>
booo
<bentiss>
the weird part is ceph should have rebalanced teh cluster way before, as other disks have some space
<kode54>
maybe the rebalance wasn't queued up yet
<kode54>
would be nice if those were queued by disk fill reaching a threshold instead of just on timers
<bentiss>
nah, it's been filling this one for days
<kode54>
oh
<kode54>
quality software :[
<kode54>
then again, for pro software, not much else you can do without losing a lot of functionality
<kode54>
gitlab is really without any competition for professional self hosting
<bentiss>
that's part of why I don't want to deal with ceph anymore
<kode54>
gitea and forgejo are nowhere near the feature parity
krastevm has joined #freedesktop
<soreau>
needs 'cloud accessible' disks and runners, then run everything on a capable server, like an rpi5 ;)
<kode54>
haha, no
<kode54>
maybe forgejo is cloud capable, but it lacks a lot of the professional features present in gitlab
<kode54>
and it needs the CI functionality
<kode54>
I don't know if forgejo is capable of plugging in the level and diversity of build bots that FDO uses
<kode54>
considering I think forgejo is way lighter on a capable server than gitlab is
<soreau>
raid nfs :P
<kode54>
the problem was it was focusing a bunch of data on a single node of storage
<kode54>
ceph being a bit weird
<kode54>
technically it should have been balancing that across all the nodes
<bentiss>
it should have, but depending on the topology and the data on disk, sometimes it doesn't find a good solution and needs a little push
<soreau>
almost sounds like some other authorative agencies - 'due to our error, you have been penalized'
<bentiss>
(though it must be at least 1 year since I haven't done that)
<kode54>
soreau: think of ceph like S3 or R2, only self hosted
<kode54>
it should have been fully capable of distributing objects across the entire array instead of plopping them all on one node
<kode54>
it could be a nfs of a bunch of raids if you like
<kode54>
and it will locate the objects on their correct nodes when retrieving them, or even duplicate them for load balancing
<kode54>
I think?
<kode54>
sounds like something it would do
martink has quit [Ping timeout: 480 seconds]
<kode54>
but spamming a bunch of data all on one node? that sounds like it's being fiddly
<soreau>
every company that has folks comitting to gitlab should spilt the bill on something nice
<mupuf>
welcome back, gitlab!
<mupuf>
thanks bentiss
<bentiss>
mupuf: only temporary, as I assume the cluster might stop soon :(
<bentiss>
(I just pushed the full boundary a bit so the recovery starts)
<kode54>
ah
<kode54>
you only brought it back
<kode54>
maybe force a rebalance?
<kode54>
god I hope there's logging that can show why it's focusing all the writes to one node
<kode54>
unless it's like one huge object
<kode54>
that's just continuously growing in size
<kode54>
can't really rebalance a single object, unless ceph supports that?
<bentiss>
well, I managed to reweight that OSD and it started the recovery
<bentiss>
usually yeah, there are a lot of big objects on an OSD and this screws up the balance of the whole cluster
<bentiss>
but also, TBH, we are at 98% usage of the ssd pool. So that migration is not so bad in the end, we'll be able to breathe a little bit more
<bentiss>
and we are using too much of SSDs because the db is constantly growing so are the git trees :/
<kode54>
having lots of copies of the kernel doesn't help that
guludo has joined #freedesktop
<kode54>
the mailing list paradigm is really "great" for such huge trees in keeping much of the data client side
<kode54>
but mailing list paradigm sucks so much
* bentiss
found the magic command: `ceph osd reweight-by-utilization`
<bentiss>
I was manually assigning reweight, when this does it automatically :)
<soreau>
script it :P
<bentiss>
heh
<bentiss>
though arguably, why on earth ceph doesn't does that automatically (or the regular balancing doesn't do the equivalent)
<bentiss>
FWIW, we should be good now, max disk usage is 89%, so far from the 95% deadlock
<kode54>
perfect timing, considering the Equinix thing
<bentiss>
yeah...
<daniels>
bentiss: ah thanks for fixing, sorry I was still out at pilates
<bentiss>
daniels: no worries
<bentiss>
luckily I caught it before people started screaming too much
<daniels>
I was screaming too, but only because my hamstrings are )(*@#$
<bentiss>
heh
swatish2 has quit [Ping timeout: 480 seconds]
swatish21 has joined #freedesktop
m5zs7k has quit [Ping timeout: 480 seconds]
m5zs7k has joined #freedesktop
swatish2 has joined #freedesktop
<bentiss>
\o/ HEALTH_OK on the cluster :)
swatish21 has quit [Ping timeout: 480 seconds]
swatish2 has quit [Ping timeout: 480 seconds]
guludo has quit [Ping timeout: 480 seconds]
guludo has joined #freedesktop
jkhsjdhjs_ has joined #freedesktop
jkhsjdhjs has quit [Ping timeout: 480 seconds]
swatish2 has joined #freedesktop
todi1 has quit []
todi has joined #freedesktop
jsa1 has quit [Ping timeout: 480 seconds]
swatish2 has quit [Ping timeout: 480 seconds]
swatish2 has joined #freedesktop
codegirl has quit [Quit: Ping timeout (120 seconds)]
codegirl has joined #freedesktop
blu has joined #freedesktop
krastevm has quit [Ping timeout: 480 seconds]
jsa1 has joined #freedesktop
guludo has quit [Ping timeout: 480 seconds]
swatish2 has quit [Ping timeout: 480 seconds]
haaninjo has joined #freedesktop
guludo has joined #freedesktop
JerryXiao has quit [Quit: Bye]
JerryXiao has joined #freedesktop
codegirl has quit [Quit: Ping timeout (120 seconds)]