File Server Builder's Guide
by Zach Throckmorton on September 4, 2011 3:30 PM ESTWhat is a file server?
Essentially, a file server is a computer that stores files, is attached to a network, and provides shared access of those files to multiple workstation computers. File servers do not perform computational tasks - that is, they do not run programs for client machines. Furthermore, they do not provide dynamic content like a web server. Still further, file servers are not like database servers in that the former do not provide access to a shared database whereas the latter do. File servers provide access to static files via a local intranet through Windows or Unix protocols as well as over the internet through file transfer or hypertext transfer protocols (FTP and HTTP).
What can you do with a file server?
The primary function of a file server is storage. For the home user, one central storage location can increase overall computing efficiency and reduce overall computing cost. By placing all of your important files in a single location, you do not need to worry about different versions of files you're actively working on, wasting disk space by having multiple copies of less-than-important files scattered on different systems, backing up the right files onto the right backup storage medium from the right computer, making sure every PC in your home has access to the appropriate files, and so on.
From a system builder's perspective, a file server can also liberate your various workstation computers from having to accommodate multiple hard drives, and decrease overall hard drive expenditures. With the rise of SSDs, which offer tremendous performance at a high cost per GB, a file server can free workstations from the performance shackles of platter-based disks - an especially useful consideration for laptops and netbooks, where the small capacity of an SSD is often a deal breaker since these mobile computers usually can house only one drive.
A dedicated file server allows every user in a home - whether they're at home or on the road - to access every file they might need, regardless of which particular device they might be using at any given time. Dedicated file servers also allow you to share your files with friends and coworkers - simply provide them with a URL, a login name and password, and specify what content they can access. For example, maybe you'd like to share your kids' camp photos with the in-laws - but your cloud storage capacity won't fit all of those photos plus all of the other stuff you have stored in your cloud drive locker. Maybe you'd like to share sensitive information with a colleague that you'd rather not upload to a server owned by Amazon or some other third party, but the files are too big to email. Or maybe you'd simply like to access your 200GB library of MP3s while you're holed up in a hotel on business with nothing but your 60GB SSD-based netbook. These few examples are really only the tip of the iceberg when it comes to the utility of a file server.
That said, there are alternatives to a file server for all of these needs. You could dump all of your photos onto a flash drive and give them to the in-laws the next time you see them - but you have to do this every time you want to share more photos - and who knows if you'll get your flash drives back? You could mail a DVD-R to your colleague - but perhaps a DVD-R's ~4GB capacity is insufficient, and snail mail takes days if not weeks to be delivered. If you're on the road, you could just bring along your portable external hard drive - which takes up space, and can be lost or stolen. A file server is a simple, singular solution to all of these problems. Home file servers do not require enterprise-grade hardware and can be very affordable. They can also be made from power-sipping components that won't spike your electrical bill.
What considerations are important in building a file server?
Because the primary role of a file server is storage, this is the most important aspect to think about. How much storage space do you need? Do you want to share 50GB of photos taken on a point and shoot digital camera? 500GB of music? 2TB of movie DVD ISOs? 30TB of mixed media and work-related files? Also, at what rate are your storage demands growing, and how easily do you want to be able to expand your file server?
How easily do you want to be able to administer your files? Many of the more powerful file server operating systems are unfortunately not particularly easy to run for the non-IT professional. However, there are file server OS's that are easy to run. What about being able to recover your files in the event of catastrophe? Placing your files in one computer is tantamount to putting all of your eggs in one basket, which can be risky. What about security? Anything on any sort of network is vulnerable to intrusion. While this guide answers all of these questions, it is aimed at home users and therefore necessarily makes some sacrifices to storage space, administration capabilities, recoverability, and security - simply because home users typically can neither afford nor require professional-grade file server solutions.
Why build a file server instead of using NAS?
Simply put, a NAS (networked attached storage) device is a computer appliance. It is built specifically to provide network-accessible storage. NAS devices typically offer easier administration than file servers (some are a few mouse clicks away from plug and play operability), but are often limited by proprietary software, and are neither as capacious nor as expandable as a dedicated file server. Further, higher-end NAS devices that can house as many hard drives as some of the builds outlined in this guide are more expensive than the file server alternative. Finally, because they are designed with only one purpose in mind, they are not as flexible as a file server, which in a multi-system home, might need to be co-opted into a basic workstation at a later point in time. That said, while NAS devices are outside the scope of this guide, they're worth investigating if you're not already familiar with them.
This guide is laid out differently than my previous builder's guides in that rather than detailing specific systems at specific price points capable of performing specific tasks, it instead discusses options for operating systems and types of components and how these different options are best suited to addressing different needs. That is, maybe you need a lot of storage space but you're not particularly concerned about backups. Or perhaps you don't need much storage space at all but want to use a very straightforward file server operating system. By mixing and matching recommendations to suit your needs, hopefully you'll be able to construct a file server with which you'll be pleased!
152 Comments
View All Comments
mino - Tuesday, September 6, 2011 - link
For a plenty of money :)Basically, a SINGLE decent raid card costs ~200+ for which you have the rest of the system.
And you need at least 2 of them for redundancy.
Also, with a DEDICATED file server and open sourced ZFS, who needs HW RAID? ...
alpha754293 - Tuesday, September 6, 2011 - link
In most cases, the speed of the drives/controller/interface is almost immaterial because you're going to be streaming it over a 1 Gbps network at most.And if you actually HAVE 10GoE or IB or Myrinet or any of the others, I'm pretty sure that if you can afford the $5000 switch, you'd "splurge" on the $1000 "proper" HW RAID card.
Amusing how all these people are like "speed speed speed!!!!" forgetting that the network will likely be the bottleneck. (And wifi is even worse, 0.45 Gbps is the best you can do with wifi-n.)
DigitalFreak - Sunday, September 4, 2011 - link
I've been using Dell PERC-5i cards for years. You can find them relatively cheap on E-bay, and they usually include the battery backup. I believe they're limited to 2TB drives though.JohanAnandtech - Monday, September 5, 2011 - link
"But there's the fact that software RAID (which is what you're getting on your main board) is utterly inferior to those with dedicated RAID cards"hmm. I am not sure those entry-level firmware thingies that have a R in front of them are so superior. They offload most processing tasks to the CPU anyway, and they tend to create problems if they break and you replace them with a new one with a newer firmware. I would be interested to know why you feel that Hardware RAID (except the high end stuff) is superior?
Brutalizer - Monday, September 5, 2011 - link
When you are saying that software raid is inferior to hardware raid, I hope you are aware that hw-raid is not safe against data corruption?You have heard about ECC RAM? Spontaneous bit flips can occur in RAM, which is corrected by ECC memory sticks.
Guess what, the same spontaneous bit flips occur in disks too. And hw-raid does not detect nor correct such bit flips. In other words, hw-raid has no ECC correction functionality. Data might be corrupted by hw-raid!
Neither does NTFS, ext3, XFS, ReiserFS, etc correct bit flips. Read here for more information, there are also research papers on data corruption vs hw-raid, NTFS, JFS, etc:
http://en.wikipedia.org/wiki/ZFS#Data_Integrity
In my opinion, the only reason to use ZFS is because it detects and corrects such bit flips. No other solution does. Read the link for more information.
sor - Monday, September 5, 2011 - link
Many RAID solutions scrub disks, comparing the data on one disk to the other disks in the array. This is not quite as robust as the filesystem being able to checksum, but as your own link points out, the chances of a hard drive flipping bits is something on the order of 1 in 1.6PB, so combined with a RAID that regularly scrubs the data I don't see home users needing to even think about this.Brutalizer - Monday, September 5, 2011 - link
You are neglecting something important here.Say that you repair a raid-5 array. Say that you are using 2TB disks, and you have an error rate of 1 in 10^16 (just as stated in the article). If you repair one disk, then you need to read 2 000 000 000 000 byte, every time you read a bit, an error can occur.
The chances of at LEAST ONE ERROR, can be calculated by this wellknown formula:
1 - (1-P)^n
where P is the probability of an error occuring, and "n" is the number of times the error can occur.
If you insert those numbers, then it turns out that during repair, there is something like 25% of hitting at least one read error. It might you have hit two errors, or three errors, etc. Thus, there are 25% chance of you getting read errors.
If you repair a raid, and then run into read errors - you have lost all your data, if you are using raid-5.
Thus, this silent corruption is a big problem. Say some bits in a video file is flipped - that is no problem. An white pixel might be red instead. Say your rar file has been affected, then you can not open it anymore. Or a database is affected. This is a huge problem for sysadmins:
http://jforonda.blogspot.com/2006/06/silent-data-c...
Brutalizer - Monday, September 5, 2011 - link
PS. There is 1 in 10^16 that the disk will not be able to recover the bit. But there are more factors involved: current spikes (no raid can do this):http://blogs.oracle.com/elowe/entry/zfs_saves_the_...
bugs in firmware, loose cables, etc. Thus, the chance is much higher than 10^ 16.
Also, raid does not scrub disks thoroughly. They only compute parity. That is not checksumming data. See here about raid problems:
http://en.wikipedia.org/wiki/RAID#Problems_with_RA...
alpha754293 - Tuesday, September 6, 2011 - link
@BrutalizerBit flips
I think that CERN was testing that and found that it was like 1 bit in 10^14 bits (read/write) or something like that. That works out (according to the CERN presentation) to be 1 BIT in 11.6 TiB.
If a) you're concerned about silent data corruption on that scale, and b) that you're running ZFS - make sure you have tape backups. Since there ARE no offline data recovery tools available. Not even at Sun/Oracle. (I asked.)
sor - Monday, September 5, 2011 - link
Inferior how? I've been doing storage evaluation for years, and I can say that software raid generally performs better, uses negligible CPU, and is easier to recover from failure (no proprietary hardware). The only reason I'd want a hardware RAID is for ease of use and the battery-backed writeback.