ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
GNUmoon has quit [Quit: Leaving]
GNUmoon has joined #freedesktop
Haaninjo has quit [Quit: Ex-Chat]
ybogdano has quit [Ping timeout: 480 seconds]
ximion has quit []
ngcortes has quit [Remote host closed the connection]
jarthur has joined #freedesktop
GNUmoon has quit [Ping timeout: 480 seconds]
GNUmoon has joined #freedesktop
GNUmoon has quit [Ping timeout: 480 seconds]
danvet has joined #freedesktop
ppascher has quit [Quit: Gateway shutdown]
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
thaller is now known as Guest8423
Guest8423 has quit [Read error: Connection reset by peer]
thaller has joined #freedesktop
shbrngdo has quit [Ping timeout: 480 seconds]
shbrngdo has joined #freedesktop
apteryx_ has joined #freedesktop
apteryx has quit [Ping timeout: 480 seconds]
ppascher has joined #freedesktop
ximion has joined #freedesktop
parafoll has joined #freedesktop
parafoll has quit [Excess Flood]
parafoll has joined #freedesktop
parafoll has quit [Remote host closed the connection]
parafoll has joined #freedesktop
parafoll has quit []
pjakobsson_ is now known as pjakobsson
iNKa has quit [Read error: Connection reset by peer]
GNUmoon has joined #freedesktop
hikiko has quit [Ping timeout: 480 seconds]
hikiko has joined #freedesktop
GNUmoon has quit [Ping timeout: 480 seconds]
GNUmoon has joined #freedesktop
pendingchaos has quit [Ping timeout: 480 seconds]
pendingchaos has joined #freedesktop
hikiko has quit [Ping timeout: 480 seconds]
hikiko has joined #freedesktop
ybogdano has joined #freedesktop
___nick___ has joined #freedesktop
___nick___ has quit []
___nick___ has joined #freedesktop
___nick___ has quit []
___nick___ has joined #freedesktop
ybogdano has quit [Ping timeout: 480 seconds]
alyssa has joined #freedesktop
ybogdano has joined #freedesktop
thongthai has joined #freedesktop
<alatiera> have been getting 502s and 504s for a bit now
<alatiera> known issue?
<ocrete> I'm seeing the same thing..
<airlied> yeah gl is hiccuping
newpons has joined #freedesktop
alextee has joined #freedesktop
<alextee> I'm getting 502 when cloning, is something wrong with infra?
<airlied> yes gitlab is having issues
newpons has quit [Excess Flood]
newpons has joined #freedesktop
newpons has quit []
thaller has quit [Remote host closed the connection]
<mupuf> bentiss, daniels: ^
<mupuf> In case you are still up
raghavgururajan has quit [Ping timeout: 480 seconds]
<craftyguy> ahh just came here to ring the bell, but I'm late
<alyssa> craftyguy: party's just getting started 🎉
<craftyguy> heh
thaller has joined #freedesktop
Haaninjo has joined #freedesktop
* emersion falls back to reviews over IRC :P
<alyssa> a classic
ryanpavlik has joined #freedesktop
* daniels works on it
<emersion> ♥
GNUmoon has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Quit: Ex-Chat]
___nick___ has quit [Ping timeout: 480 seconds]
reillybrogan has joined #freedesktop
reillybrogan_ has joined #freedesktop
reillybrogan has quit [Ping timeout: 480 seconds]
<keithp> thanks daniels!
reillybrogan_ has quit []
reillybrogan has joined #freedesktop
r_elop has joined #freedesktop
raghavgururajan has joined #freedesktop
danvet has quit [Ping timeout: 480 seconds]
danvet has joined #freedesktop
lyudess has joined #freedesktop
Lyude has quit [Ping timeout: 480 seconds]
ngcortes has joined #freedesktop
<ngcortes> there a scheduled downtime on gitlab?
<emersion> nope, outage, being worked on
<ngcortes> word, tnx
<alyssa> ngcortes: daniels scheduled it at 21:02 UTC of course ;)
<ngcortes> alyssa, guess it's lunchtime :)
danvet has quit [Ping timeout: 480 seconds]
<daniels> bentiss: I have no idea what's going on with ceph. osd-26 is failing repeatedly with:
<daniels> debug 2021-12-14T22:17:38.270+0000 7f864936e080 -1 bluefs _allocate allocation failed, needed 0x807e9
<daniels> debug 2021-12-14T22:17:38.270+0000 7f864936e080 1 bluefs _allocate unable to allocate 0x90000 on bdev 1, allocator name block, allocator type hybrid, capacity 0x6fc8400000, block size 0x1000, free 0x8e21d5000, fragmentation 0.348327, allocated 0x0
<daniels> debug 2021-12-14T22:17:38.270+0000 7f864936e080 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0x807e9
<daniels> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.5/rpm/el8/BUILD/ceph-16.2.5/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7f864936e080 time 2021-12-14T22:17:38.274652+0000
<daniels> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.5/rpm/el8/BUILD/ceph-16.2.5/src/os/bluestore/BlueFS.cc: 2729: ceph_abort_msg("bluefs enospc")
<daniels> so like ... why can't it allocate 589824 (bytes? blocks?) when it has 38153310208 free
<bentiss> daniels: sorry I'm burned out today
<bentiss> daniels: have you cordoned server-4?
<craftyguy> is the free space there referring to disk or RAM?
<bentiss> daniels: I restarted k3s on server-4 after uncordoning it, the osd in pending are now up
<bentiss> craftyguy: no idea :(
justcheckn has joined #freedesktop
justcheckn has quit []
pinkflames[m] has joined #freedesktop
* Ford_Prefect sends hugs all around
<bentiss> daniels: I also kicked osd-0, it went in a bad state, which means we have only one osd down (the 26) with the error above
<craftyguy> if it's trying to create a file (I have no idea how bluefs works), then maybe the fs is out of free inodes?
<bentiss> maybe it's something like that
<bentiss> it seems everything is back up, but we probably need to audit tomorrow why the disks are almost full
<daniels> bentiss: oh I'm sorry to hear :( and yeah, from what I can see it's always just been 26 which is failing, but it's not clear why since the underlying storage seems like it has enough space - but I'm also frantically trying to learn how ceph actually works so I can see what that underlying storage is
<daniels> craftyguy: oh hmm
<bentiss> daniels: ceph df gives
<bentiss> POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
<bentiss> replicapool-ssd 4 32 715 GiB 188.57k 2.1 TiB 75.70 230 GiB
<bentiss> and 75% is close enough to the limits where the disks are stopping
<bentiss> so we have only 230GB of free space, but I think we have a watermark ar 90%
<craftyguy> what does df -i show?
<craftyguy> that should show inode usage
<bentiss> craftyguy: the ceph disks don't appear on the host 'df'
<bentiss> anyway, it seems to be OK for the past 5 min, going to bed
ybogdano has quit [Ping timeout: 480 seconds]
<craftyguy> thanks for getting it running again! back to work for me, break's over
<bentiss> sorry :)
<daniels> thanks very much indeed
<daniels> sleep well, take care
mooff has quit [Remote host closed the connection]
mooff has joined #freedesktop
<daniels> heh, I wonder if this explains it:
<daniels> pvc-db29ccdf-2e89-4acc-88ee-48fe68c13bd9 520Gi RWO Retain Released old-cluster/repo-data-gitlab-prod-gitaly-no-replicas-0 rook-ceph-block-ssd 189d
ybogdano has joined #freedesktop
thongthai has quit [Remote host closed the connection]