1342: Some home directories not available
After the Monday morning reboot, the NFS server on home1 refused to start properly. We are investigating why a manual restart of nfs was needed.
CPK messages are initially sent to the CPK mailing list, you can (un)subscribe via this link. You can also follow the service interruption messages via RSS using the link in the title under the RSS icon. If the CPK takes more time to resolve, any updates are published on this website.
For RU wide service interruption see meldingen.ru.nl.
After the Monday morning reboot, the NFS server on home1 refused to start properly. We are investigating why a manual restart of nfs was needed.
RU mail management let us know that yesterday the forwarding to external (non-RU) mail addresses has been stopped as announced earlier. Unfortunately, mail for several dozens of Science users was/is not forwarded to the Science mailservers. These mails can still be found in MS365 (RU mail), either in the Inbox or in the Deleted Items. RU mail management promised that the forwarding will be corrected tomorrow.
Announcement of maintenance, Wednesday afternoon we are going to replace the cpu of one of our main vmhost servers, meaning vms gitlab9 (pep) slurm22 pep3 jitsivm poliep indicoimapp2vm pep4 mariavm01 smtp2 will be down for up to 1 hour. several services depend on the mariavm01 (websites, slurm), so they will be affected too.
Apologies for the short notice, we are now going to replace the motherboard of one of our main vmhost servers, meaning vms gitlab9 (pep) slurm22 pep3 jitsivm poliep indicoimapp2vm pep4 mariavm01 smtp2 will be down for up to 1 hour. several services depend on the mariavm01 (websites, slurm), so they are affected too.
Our daily backup system relies on cephfs storage, which is currently offline, see CPK#1337. This means that as of July 22nd we are unable to perform or restore daily backups. When the cephfs problems are resolved the daily backups should also be OK and restorable again. NB, this has no effect on the Monthly backups, which continue to work normally.
After the power down of the Huygens building we are experiencing a problem with bringing Ceph file system back online. We currently do not know when the Ceph cluster is operational again. Update 2023-08-01 10:30 Ceph is working again. This CPK is now closed. CPK#1338 is also closed. Update 2023-07-31 12:30 After some more support from 42on, we managed to restart the cephfs, we cannot be sure all files are there, but almost all files are. Please let us know if you are missing crucial data, in theory we can find and restore the data, though we have to process 500 million entries to find it, which will take a long time. ...
The VPNsec service will be moved to a new server. This move will cause downtime and existing VPN connections will be destroyed. Downtime is expected not to exceed several minutes.
Last friday, a change in the mailman configuration has been rolled out which had the inadvertent effect that mails were not delivered to external addresses anymore. However, these mailman posts were sent successfully to internal Science mail addresses. The change has been rolled back for the moment but is a necessity meaning that we’re looking for another solution.
The connecting router (dr-huyg) for all servers in the subnets 131.174.30.0/24, 131.174.31.0/24 and 131.174.16.128/26 will be replaced. It is expected that this will cause an interruption of ca. 10 minutes in the connectivity, but unforeseen circumstances may increase this delay. The reason to do this now is because of the planned power interruption on July 22. The old router hardware has a high probability of failing to survive this.
Friday July 21 from 17:00, we will start shutting down compute clusternodes, in order to prepare for the power outage of the Huygens building Saturday July 22. Other servers will be shut down later. The most important servers (mail, home, file, Ceph, gitlab, loginservers) will be shutdown starting Saturday morning 7:00. We will try to keep basic services (DNS/DHCP, SMTP(mail) and license servers) up during this power outage. RU services are not serviced from the Huygens building, so will not be affected. ...