Data Storage

The data storage on C&CZ servers can be used from all kinds of C&CZ servers and personal computers , but also from other PCs or even from home with WinSCP or via VPN . Almost all disks that are managed by C&CZ, are being backed up regularly, in order to be able to restore data in case of small or large calamities.

Home directories

Every user with a Science login has or is entitled to an amount of disc space of a few Gigabytes on a server. This disc space is called the “home-directory” on Unix/Linux computers and the “H- or U-drive” on Windows-computers. The location of this homedirectory (which server) can be viewed on the Do-It-Yourself website .

Naming

naming example
Login name guest204
Windows path \\home1.science.ru.nl\guest204 (consult DHZ
MacOS / Linux path smb://home1.science.ru.nl/guest204
Linux (on C&CZ managed systems) /home/guest204

Access rights

Long ago the (Unix) home directory of a user, except for a few protected areas, was readable for all users of the server. Nowadays a user’s home directory is only accessible to the user themself. The user can change the access rights. C&CZ checks for home directories that are writable by other users.

Access through NFS

Mounting a home (U:) drive on Linux via NFS/Kerberos .

Functionality and costs of network shares

RAID server shares

Data storage for groups/institutions/projects: there are a few fileservers with RAID storage with partitions that can be rented for a period of 3 years. The price for new discs or a new 3 year extension of an older disc is per July 2018 for FNWI departments:

grootte incl. backup zonder backup
ca. 200 GB € 40 per jaar € 10 per jaar
ca. 400 GB € 80 per jaar € 20 per jaar
> 400 GB up to 1 TB (daily backup not possible with lots of daily changing small files) € ??? per TB/jaar depending on which backups € 50 per TB/jaar
> 1 TB (no backup?) N/A Have a look at Ceph storage

Although even the cheapest version is much more expensive than buying 1 disk for 1 PC, it often makes sense, because of the reliability (redundant disks, backup, support contract) and security (stable server). One or more folders on such a partition can be mapped as a network drive on Windows PCs or NFS-mounted on Unix/Linux hosts. The ability to read and/or write files on these folders can be limited to a group of logins. That group can be managed by the department on the Do-It-Yourself website .
C&CZ has service contracts for these servers and has spares on site, so a failure can be resolved quite fast. Because the disks are part of a RAID set, the failure of 1 single disk or even 2 disks, will not give a disruption of service for users. The partitions are backed up (daily and incremental). Even in the case when the whole server room is lost, data can (eventually) be restored.

Ceph Storage

Starting November 2019 we can provide almost unlimited storage for the Faculty of Science using our Ceph storage cluster. The way Ceph works there is a tradeoff for performance and redundancy. Also it is possible to improve redundancy above single server RAID-6 level, with the additional redundancy options. The physical storage servers are spread accros three locations (datacenters). NB Ceph volumes have no backups, the volumes tend to be too large to backup.

Choices in redundancy

Ceph has different options for storing data (configurable per “pool”). By default, Ceph stores data with 3 copies, so when one copy is lost, the remaining two still have redundancy. Now, because we have three locations, the 3copy pool will remain available when one whole datacenter becomes unavailable.

Besides storing copies of the data blocks, Ceph can use “Erasure Coding” (EC) as alternative way of providing redundancy. The advantage is that much less overhead is required for secure storage, but the disadvantage is high overhead for storing small files. We have two different EC pools; EC8+3, the cheapest, but when one datacenter is destroyed, all the data is lost (very unlikely!), when one datacenter becomes temporarily unavailable, the data is still safe, but off-line. Our EC5+4 pool remains available when a whole datacenter is offline or lost, the data remains safe as long as two datacenters are working well.

Ceph Erasure coding has a high overhead for smaller files, the prices mentioned below are based on the optimal storage overhead, which can be approximated when files stored are at least 4 megabytes or larger.

pool why price per TB* per year without backup
Erasure coding 8+3 cheap € 50
Erasure coding 5+4 cheap + additional redundancy € 60
3 copy faster read + write € 100

* 1TB is 1.000.000.000.000 bytes

The Ceph storage can be used as Windows/Samba share, NFS share or S3 object store. Object store differs fundamentally from a normal filesystem, so data stored in a Windows or NFS share cannot be accessed using the S3 protocol.

The performance properties of Ceph are different from traditional single server storage; write speed usually exceeds read speed and lots of small files is killing for throughput, even worse than on traditional storage.

Naming for network disks

Naming Example
Volume name sharename
Windows path \\sharename-srv.science.ru.nl\sharename
MacOS / Linux path smb://sharename-srv.science.ru.nl/sharename
Linux (on C&CZ managed systems) /vol/sharename

Access rights

Most of the shared disks can be read and written by a specific group of users. The owners of this group can administer on the Do-It-Yourself website which accounts are a member of this group.

Requests

A request for one or more network discs should contain:

  • requested name of the disc(s)
  • requested size (max ca. 500GB with backup)
  • possibly requested backup schedules to lower the price (Daily/Monthly/Yearly)
  • Science loginname of an owner
  • possibly Science loginname of a member
  • charge account (kostenplaats) or project code for the costs in the first three years.

Temporary shared data storage

Every now and then you want to send one or more large files (more than a few tens of MBs) to someone else within the Faculty, mail is unsuited for those large files. To make this easy, one can use a network share, where one can store large files temporarily in order to have someone else copy the files from this location. Note that this is explicitly meant for temporary storage, we do not make backups of this share, every day we remove files older than 21 days old. When copying files to this share, make sure the file timestamps are updated. Some copy programs (like rsync) maintain the original timestamps and older files will be deleted. To update timestamps, you can use the following command:

find . -exec touch {} +

This share can also be used to store temporary files only readable for yourself by using a different name for the share. Note that also in this case, old files will be removed.

Please create a subdirectory with your name first, and put your files in that directory.

For files totaling less than 250GB, also Surfdrive is an alternative. For sending files up to 500GB SURFfilesender can be used.

Temporary disk space naming

Naming
Volume name temp
Windows path \\temp-srv.science.ru.nl\share or \\temp-srv.science.ru.nl\onlyme
MacOS / Linux path smb://temp-srv.science.ru.nl/share or smb://temp-srv.science.ru.nl/onlyme
Linux (on C&CZ managed systems) /vol/temp

Access rights

  • Readable by all users: share
  • Only readable for the owner: onlyme