Data First: The File Server (part 2)

In part 1, we had a look at how to buy and setup a file server. Now you may be asking yourself a few questions about how to actually use that thing. The first question we’ll answer is “what should I put on there?”.

Q: What should I put on my file server?

The answer is two-fold: anything you want to access from more than one device, and anything you can’t easily backup automatically otherwise.

In my case, it means everything:

I surely do want to access my music and movies from a variety of devices. I even access them from work.
I’m accessing my pictures only from my laptop, but my wife also wants to access them from her laptop (both laptops have Lightroom installed), so pictures go on the server. It also means they will be backed up automatically – if they were on one of the laptops, it would be difficult to do that since the machine would most likely be asleep when the backup job kicks in, and in this age of SSD-only laptops you’d run out of space pretty quickly anyway.
My code repositories are on the file server too. I check-out the code locally and commit/push changes back on the file server.
Documents, porn, whatever, it’s on there.

Of course there are some caveats. Things may be too slow for you. For instance, if I work in Lightroom, I’ll turn off Wi-Fi and plug the laptop to my home Gigabit network. And even then, it will be noticeably slower than if the pictures were stored locally (but it’s not too bad as far as I’m concerned, since raw picture editing is still the performance bottleneck on my machine). If you’re doing stuff like video editing, that’s not even an option.

When a particular piece of data can’t be efficiently accessed remotely, you can use your file server as the backup device – data would be stored locally on one machine, and backed up automatically to the file server. That’s fine, as long as the backup process is, again, automatic. This generally means the source machine is a desktop computer, so that it’s available during the night, when most backup jobs execute.

I would advise against storing data anywhere else than the file server or an automatically backed-up desktop machine (or otherwise always-on storage unit). Choosing where to put a given piece of data is always a balancing act between where it makes sense, where it’s convenient, and where it’s safe, but remember that, when in doubt, always prefer safety.

10 June 2013 ∞

Data First: The File Server (part 1)

The central piece to a data-first methodology is, in my opinion, having a file server.

The reason for this is that you’re going to want to access your data anytime, anywhere: streaming your music to your work PC, your movies to your iPad, or accessing your documents from your phone. You need a secure and reliable to way to store and serve that data, and this is best done with a file server.

(if you’re already about to ask why I don’t just use iCloud or something, you may need to read my introduction post again)

Let’s look at the basics for getting a file server after the break.

The hardware

Requirements

The absolute minimum requirements are a computer that’s connected to your home network, and that is always on. Most desktop computers would fit the job description, unless you’re one of those weird people that actually turn their computers off instead of letting them go to sleep. Ideally, however, that file server should have a few other requirements:

Low power: it’s going to be always on, so it might as well consume as little energy as possible, not just for the cute baby polar bears but for your electricity bill too.
Quiet: that’s important unless you intend to store it away in your basement or something.
Data redundancy: hard drives fail – it’s not a matter of “if” but a matter of “when”. And when it happens, you don’t want to have to recover your data from a backup. It’s tedious at best, and you may lose the last hour of work even if you’re using things like Time Machine on a Mac. Only with “copy-on-write” snapshot systems would you not lose anything, except maybe the file that was being saved when the disk failed. Shadow Copy, a little know feature of Windows available for almost 10 years does exactly that, but nobody really uses it as far as I can tell because Microsoft never did any fancy UIs and marketing like Apple did with Time Machine. And this would mean running Windows on your file server, which would probably defeat the previous 2 bullet points as it would require desktop-grade hardware instead of a smaller dedicated box.

Anyway, I highly recommend you get a file server that handles some kind of data redundancy. RAID-1 mirroring is the minimum (2 disks), variants of RAID-5 with 4 disks are the most common, and anything above that is a nice bonus. With both RAID-1 and RAID-5 you get one “free” disk failure – i.e. you can have one disk dying on you, and if no other disk dies while you replace it and the system rebuilds itself, then you’re all good. Of course, even if it’s quite rare, it does happen that a second disk dies soon after (especially if all your disks are the same model, bought at the same time), so make sure you always have a backup ready (we’ll talk about that later in this article).
Connected with wires: ideally, the file server should be connected via Gigabit Ethernet. You can always hook it up to a Wi-Fi router down the line, but at least you know the bandwidth bottleneck is not at the source. Sure, the latest forms of Wi-Fi can stream HD videos, but how do you think that’s going to scale when you end up having 2 people streaming from 2 different devices while you’re transfering files and there’s a backup job going on? Yeah. If you can, use wires.

Buying a NAS box

Given these requirements, the easiest way is really to go for a dedicated NAS. Unless you want to go through the trouble of refurbishing an old computer of yours, figuring out how to get the RAID controller to work correctly, and finding a place for a box that’s probably 5 times bigger than it needs to be, that is. You could also build a custom PC, but if you’re considering it, then you’re probably able to figure it out on your own (although I may post a guide in a distant future).

A few years ago, ReadyNAS was all the rage for NASes, with Synology not far behind, but since they’ve been bought by Netgear (which was around the time I bought my NV+), ReadyNAS seems to have fallen behind. Now, the top of the line seems to be Qnap, Thecus, and Synology, if I believe those performance benchmarks.

Have a look at those benchmarks and pick the best models that fit your budget, but do keep an eye out for which configuration is used on each test. For instance, RAID-0 with many bays can out-perform everything else on some tasks, even though in practice it’s probably not a configuration you’d use.

Synology are known to have the more user-friendly administration UIs, so you may want to bias your choice accordingly. You may also be distracted for a while by all the stuff these boxes can do on top of simple file sharing, like BitTorrent clients and web servers and photo galleries and what not. Stay focused on their main role: serving files.

The Software

There’s not much to say about the software. If you bought a pre-built NAS as recommended, it will usually come with its own custom Linux-based OS with a fancy administration UI. You will just boot it, set up the shared folders and permissions, and maybe configure some additional sharing services you may need.

If however you built your own NAS, or are using an existing desktop computer, install whatever you know best – MacOS, Windows, or, ideally, a lean Linux distro. Whatever you end up with, make sure your server is easy to remote into. You’re unlikely to hook up a screen and keyboard to it, so you’ll have to use Remote Desktop (on Windows) or SSH (on Mac/Linux) to do any kind of maintenance work. Note that some Linux distros, like the appropriately named FreeNAS, have a web adminstration panel.

Backups

Once you have your file server running, the first thing you need to do is set up some automated backups.

Let me write this again: YOU NEED TO SET UP SOME AUTOMATED BACKUPS.

I’m not kidding. Don’t say “oh I’m bored, I’ll do it next week-end, I just want to watch Netflix already”. You know perfectly well next week-end you’ll be browsing Facebook and cleaning your house because you had been putting that off for 3 weeks already. So do it now.

Again, if you got a pre-built NAS, this is probably as easy as plugging an external drive to the box and going through the administration panel to find the backup settings. Just do an incremental backup of the whole thing every night to that drive. Bonus points if your backup drive is itself a RAID array.

Your NAS may also have some kind of continuous backup system (sometimes called “snapshots”), so you can enable that too.

If you have a custom box, you’re probably smart enough to setup a scheduled robocopy task (on Windows) or a cronjob running rsync (on Linux/Mac) to backup all your data to a secondary drive. If not, look it up online.

What next?

In the next parts, we’ll discuss a couple things, like what should actually go on that new fancy file server of yours.

8 June 2013 ∞

PieCrust 1.0

PieCrust reached the big milestone of version 1.0 without much fanfare – and this post won’t be any different from the other release announcements. After a few release candidates I figured I would never be quite satisfied, so why not just keep going with the releases and not pay too much attention to the first digit.

You’ll see releases 1.1.0 and up coming soon, with the usual bunch of fixes, changes, and new features. The only difference is that the version number will now reflect better what’s going on, since I’ll be loosely following the semantic versioning specification. In a nuthsell, the digit being incremented reflects whether a release is a bug fix, a non-breaking change, or a major and/or breaking change.

The one big new thing that comes with version 1.0 is an installer script, along with a .phar binary, to make it easier for people to use PieCrust if they don’t want or need the source code. Head over to the PieCrust documentation for more information.

For the rest of the changes, keep reading.

Auto-formats

One popular request has always been to make it possible for users to write pages and posts using other extensions than .html – most specifically .md or .markdown. This is now possible with the auto-format feature, which maps extensions for formats. As of 1.0, no auto-format is declared by default, so you have to specify the ones you want in your config.yml:

site:
    auto_formats:
        md: markdown
        markdown: markdown
        textile: textile

The example above maps extensions .md and .markdown to the Markdown format (same as if you specified format: markdown in the page’s config header), and extension .textile to the Textile format.

As of version 1.1, .md and .textile will be defined by default.

Template data changes

Some page template variables have been changed:

asset is now assets.
link is now siblings, and returns the page’s sibling pages (i.e. in the same folder).
There’s a new family variable that returns a recursive version of siblings (i.e. sibling pages and all the children pages in sub-directories).

The old names are still available, but will trigger warnings when you bake.

Feed preparation

The chef prepare command can now create more than pages and posts for you: you can run chef prepare feed and it will create a boilerplate RSS feed page for you.

You can specify --atom to create a boilerplate Atom feed instead.

Plugin update

If your website has some plugins, you can update them easily with the new chef plugins update command. Right now it will just stupidly re-download the plugins from their source, so it may re-install the same version, but that’s enough for now 🙂 It’s especially handy if you have some custom plugin that’s used by several websites.

Sass, Compass and YUICompressor

Speaking of plugins, the previously plugin-implemented Sass, Compass and YUICompressor processors are now part of the core PieCrust code.

They have also been improved in the process. Most importantly, Compass support is a lot better.

Miscellaneous changes

The monthly blog archives (blog.months) was incorrectly ordered chronologically, instead of reverse-chronogically. This is now fixed.
Anywhere that returns a list of pages or posts should now have consistent behaviour and features, e.g. filtering template functions.
You can get access to Twig’s debug functions by setting the twig/debug site configuration variable to true.
If you want PieCrust to use the Javascript lessc compiler to process LessCSS stylesheets, set the less/use_lessc site configuration variable to true.
Pretty colors for chef commands on Mac/Linux! (this is important)

For the complete list of changes, see the CHANGELOG.

7 June 2013 ∞