CPK messages are initially sent to the CPK mailing list, you can (un)subscribe via this link. You can also follow the service interruption messages via RSS using the link in the title under the RSS icon. If the CPK takes more time to resolve, any updates are published on this website.

For RU wide service interruption see meldingen.ru.nl.

 

Service Interruptions


1407: NFS problems under investigation

When we moved the gateways of our networks from the old location to the new firewalls, we have received some complaints about NFS filesystems having slowness, longer delays or unavailability. In general NFS should never be a requirement for a clusternode job if you can avoid it, because this I/O is always much slower than local I/O from /scratch. We are investigating how we can optimise the network to resolve this issue, but we are hard pressed to know the exact cause of the problem....

Resolved Reports


1400: Mail forwarding broken to @ru.nl

Due to a change in how the Microsoft Exchange server validates e-mail, mails sent to an @science.ru.nl address (or similar domains for which we accept e-mail) from an external domain and with a forward to your @ru.nl mail are very likely to be dropped, because the receiving server at Microsoft considers this mail forged, in the sense that the sending client did not address the same domain as the receiving domain....

1399: Mattermost broken after automatic update

After an automatic update, Mattermost no longer starts. The issue has been identified; the solution requires relocating the installation. This may take some time. Mattermost is now running on a different server as gitlab, which was a lot of work to figure out, but all should be well now. For some people it may have been longer before everything worked, due to dns caching of the hostname mattermost.science.ru.nl to the wrong address....

1397: delay in receiving email

FNWI users have been experiencing delays in email reception since early this morning. We have been working on this issue but have not solved it yet. No emails have been lost, but we suspect certificate issues between the Science mail servers and the central RU mail filter. Update: the exact cause remains unknown. During investigations together with RU-ILS, we noticed a long-standing Certificate Chain issue. However, even after resolving this, the problem remained....

Updated Aug 22, 2025  ·  Erik Joost Visser · Created Aug 21, 2025 ·  Erik Visser

1398: Firewall change for server networks

In the process of migrating our server networks to new and faster stuff, we are moving the gateways to our (C&CZ) managed firewalls. These firewalls already handle the traffic to our 25Gbit networks, while the older 1Gbit/s connected nodes had the gateway in a temporary router, which came to replace the old DR-HUYG router a few years ago. While we don’t expect too much trouble, a hiccup may occur for up to a minute or so, while the network changes are being applied....

1396: Sending science mail work in progress

We are working to upgrade the mail servers, due to caching in various parts, you may get messages about incorrect certificates or other problems, we expect this to resolve itself once we finish the updates. Update, we apparently still had problems with the certificates, but somehow missed it after last week’s fix. There’s a temporary fix in place, so we can work further on it Tomorrow.

1395: Sending science mail broken certificate

A change in the configuration management code has broken the certificate on the mailservers, we are trying to fix this properly, but until fixed, mail clients will complain about the server certificate. Fixed by restarting sendmail

1394: Clusternode maintenance day - August 5th

We picked August 5th to have a planned downtime/reboot of all clusternodes in the Science Cluster. This will involve at least package updates and a reboot, if other maintenance can be included we will try to fit this in as well.

1393: Servers and services unavailable due to physical moving on August 7th

We will be moving servers from the serverroom ak008 (Huygens A-2.008) to other locations. A lot of services and virtual machines will be offline for a few hours while we disconnect, move and reconnect the hardware. Assuming everything goes well, the services should be up after turning the machines back on again. This CPK will be updated with more servers and information about affected services in the weeks before the move on August 7th 2025....

1392: Network failure in part of network

There are servers unreachable due to an unknown problem with the 25Gbit network switch in room ak008. We don’t know the root cause yet. Update: rebooting the affected switch has resolved the problem. DOWN amanda22.science.ru.nl DOWN cephgrafana.science.ru.nl DOWN cephgw3.science.ru.nl DOWN cephgw4.science.ru.nl DOWN cephmon2.science.ru.nl DOWN cephosd07.science.ru.nl DOWN cephosd08.science.ru.nl DOWN cephosd09.science.ru.nl DOWN cephosd10.science.ru.nl DOWN cephosd11.science.ru.nl DOWN cephosd12.science.ru.nl DOWN cephosd13.science.ru.nl DOWN cephosd23.science.ru.nl DOWN cephosd26.science.ru.nl DOWN cephosd27.science.ru.nl DOWN chemotionvm.science.ru.nl DOWN containervm02.science.ru.nl DOWN dockervm01.science.ru.nl DOWN dockervm02....

1391: Planned network disruption on some server networks

Services on networks that are currently behind our old picos switches will be transferred to be behind our firewalls. This may take a few minutes of downtime due to the changes needed for moving the gateway functionality. If all goes well, it will be one period of a few minutes, if we need to roll back and fix things, there may be a repeat downtime. All went well, there was a few minutes of interruption on the cncz homepage due to ARP caching issues....