Data storage solutions
While every scientist has its own way of working with data depending on their affiliation, knowledge and established group protocols, there always comes a time where they are stuck and need to figure out a new way of handling their data. Here are a few options depending on the level of your needs and investment you want to make.
Please take note that none of these solutions are meant for mobility, as moving files with mobile hard drives or keys is a bad habit (security-wise) not compliant anymore with the level of security demanded by the various IT departments.
1) Simple fixed external hard drives
These are disks attached to a single machine, visible from the OS as a simple USB key. Easy to use, they require an external power supply which prevents transport. A good option would be to have two of them attached to two separate computers and make regular copies using tools like https://syncthing.net/.
As it is only attached to one machine, it must remain powered on and have a shared folder to be able to work with multiple users.
2) External drives in RAID 1, 5 or 6 for fault tolerance
These are enclosures that contain multiple disks that are aggregated into RAID, the system sees only a storage space as a USB key. The disks can be configured in RAID 0 (their total capacity and speed is added for more performance at the expense of durability), or in RAID 1 (the disks are paired and mirrored: we divide the total capacity by two but if a disc breaks, there is always a copy). The RAID 5 combines at least 5 disks and is a compromise between RAID 0 and 1, there is a tolerance to the failure because a disk serves as a copy to others, we lose less in capacity and we gain a little speed.
Safer than the simple external disk, it is used in the same way and is only ideal for one machine, which likewise must remain powered on and with a shared folder to work with multiple users. Since RAID takes care of potential hard drive failures you might be tempted to avoid having a duplicate system on another computer, but this will not prevent any data loss if the host system gets for instance corrupted by a cryptovirus, or even a fire hazard.
3) NAS server
This is the most versatile but also the most complicated and expensive solution to implement and maintain. The CIF ATHENA server solution is based on such an architecture: a NAS (Network Attached Storage) is an enclosure that contains several disks but also the equivalent of an operating system. It has a web interface and a little store to install several applications and extend functionality like a Backup tool, a website server, a personal cloud, etc.. It can have multiple users and assign folder rights.
The disks are aggregated as RAID or JBOD or other similar technology but you can build multiple separate volumes. For example, with 12 disks, you can have 6 grouped in RAID 5 for volume 1, 4 in RAID 1 for volume 2, and 2 in RAID 0 for maximum performance as volume 3).
This is the ideal solution for a small research group that has no access to another solution, but it also requires having a person who acts as administrator of the machine and is quite computer savvy.
Examples (without disks, to buy separately)
NB: for most these solutions, the size of the discs is often an option, and it will greatly influence the total cost.
4) Storage space on the servers of the CHUV or the UNIL
For UNIL users, PIs can request space the size they want on the DCSR website:
The CHUV has probably a similar solution. If you’re a CHUV user, please ask the helpdesk for info on this matter.
The advantage is that all management is supported by the service, the storage size is flexible and can be enlarged on the fly. The – small – disadvantage is the cost which is an annual subscription. If you take into account the equivalent cost of the service offered, the price remains very advantageous. For groups needing to archive their data there is also an option to store on tape (very slow for archival only) data that will not be accessed frequently but must remain accessible.