CPK messages are initially sent to the CPK mailing list, you can (un)subscribe via this link. You can also follow the service interruption messages via RSS using the link in the title under the RSS icon. If the CPK takes more time to resolve, any updates are published on this website.

For RU wide service interruption see meldingen.ru.nl.

 

Service Interruptions


1407: NFS problems under investigation

When we moved the gateways of our networks from the old location to the new firewalls, we have received some complaints about NFS filesystems having slowness, longer delays or unavailability. In general NFS should never be a requirement for a clusternode job if you can avoid it, because this I/O is always much slower than local I/O from /scratch. We are investigating how we can optimise the network to resolve this issue, but we are hard pressed to know the exact cause of the problem....

1415: Clusternode maintenance day - February 6th 2026

Every half year we do clusternode maintenance, with at least a package ugprade and a reboot, but sometimes other maintenance can happen, such as changes in filesystems or network configurations. The upcoming date for this maintenance is February 6th, 2026 (Friday)

Resolved Reports


1296: Network failure datacenter

Still unclear what the cause is, several links are strongly degraded. Update: A defective switch has been replaced.

Updated Oct 20, 2022  ·  Bram Daams · Created Sep 4, 2022 · 

1295: Saturday May 14 adjacent buildings (Mercator, Proeftuin, Logistiek) 5 minutes without network

RU/ILS network management will switch to new hardware. This will lead to a network interruption of at most 5 minutes.

Updated Oct 20, 2022  ·  Bram Daams · Created May 14, 2022 · 

1294: Coma, coma01 and coma46 network problem

This afternoon three coma nodes lost their network because of an incorrect network configuration. They must have shown intermittent network problems earlier. It took us some time to find out what caused this network problem, but when found, it was easy to fix.

Updated Oct 20, 2022  ·  Bram Daams · Created May 3, 2022 · 

1293: Astro.ru.nl DNS(SEC) service down

During the regular change of DNSSEC keys that secure DNS traffic, an incorrect key was introduced in the external DNS of ru.nl for astro.ru.nl. This made astro.ru.nl disappear from the internet. This error was partly corrected 2022-05-02 at ca. 14:00 hours, but the automatic process used an not accepted encryption. It took ILS until 2022-05-22 to correct that by hand after we eventually noticed the error. Because the DNS answer that astro....

Updated Oct 20, 2022  ·  Bram Daams · Created Apr 28, 2022 · 

1291: Network switch of Astro Coma cluster down

The network switch of the Coma cluster seems to be broken, all attached nodes are separated from the rest of the network. We’ll replace the switch a.s.a.p. and (let) analyze the problem after that.

Updated Oct 20, 2022  ·  Bram Daams · Created Feb 22, 2022 · 

1292: SUSE Linux 15.3 Eduroam doesn't work with U- or s-number, but does with Science account

February 14, ILS switched off antique versions of TLS (1.0 and 1.1) for the Eduroam authentication on ILS LDAP servers. From then on, SUSE Linux 15.3 clients can’t authenticate with U- or s-number. They only have TLS1.2 and the ILS servers offer TLS1.3 first, after that an error occurs. By using the Science-account to authenticate, these users succeed in connecting to Eduroam.

Updated Oct 20, 2022  ·  Bram Daams · Created Feb 14, 2022 · 

1290: Interrupted link to new datacenter switches

Due to human error, the connection between the new datacenter switches and the central router was interrupted.

Updated Oct 20, 2022  ·  Bram Daams · Created Dec 15, 2021 · 

1289: vmhost07 poweroff

Vmhost07 was accidentally shut down. Cause: human error. labservanttest neurotech2 printvm msql01 indicoimapp ldap2 eftw jupytervm

Updated Oct 20, 2022  ·  Bram Daams · Created Dec 2, 2021 · 

1288: Ceph storage expansion caused performance issues

As a result of the expansion of the Ceph storage cluster, the cluster had performance and availability issues. The problems were resolved this morning.

Updated Oct 20, 2022  ·  Bram Daams · Created Nov 16, 2021 · 

1287: Server room network switch powerless

Two modules of an important switch in the main C&CZ server room lost power during the preparation of planned maintenance. This disconnected ca. 75% of the servers in the room from the network. Moving the modules to new PDU’s kimited the downtime to ca. 15 minutes.

Updated Oct 20, 2022  ·  Bram Daams · Created Oct 12, 2021 ·