Synopsis
Begin 2023-07-22 16:23:00
End 2023-08-01 10:30:00
Affected Cephfs users, daily backups, ftp.science.ru.nl

After the power down of the Huygens building we are experiencing a problem with bringing Ceph file system back online.

We currently do not know when the Ceph cluster is operational again.

Update 2023-08-01 10:30

Ceph is working again. This CPK is now closed. CPK#1338 is also closed.

Update 2023-07-31 12:30

After some more support from 42on, we managed to restart the cephfs, we cannot be sure all files are there, but almost all files are. Please let us know if you are missing crucial data, in theory we can find and restore the data, though we have to process 500 million entries to find it, which will take a long time.

Things are looking good, but the Ceph mounts will not work yet. Later today we’ll issue an all clear and close this CPK.

Update 2023-07-27 14:00

We are replaying the journal. This process doesn’t report progress, so it’s hard for us to estimate how long this will actually take. After this is done, we will need to figure out the next steps.

Update 2023-07-26 14:20

We are still working to resolve the problem. Also note that our Daily backups are affected by this, so as of Saturday July 22nd, we have no daily backups of systems and we cannot restore from daily backups before the 22nd.

Update 2023-07-24 14:35

We have called in Ceph support (42on.com) to help us debug. We aim to have Ceph up and running in the next few days.

Update 2023-07-24 8:05

The filesystem is not OK yet.

Update 2023-07-23 10:45

After performing a ceph file system scrub the cluster is online.