The Stochastic Game

Ramblings of General Geekery

DRM-free backup on Comixology

Me, a few months ago after the “scandal” of Comixology removing the ability to buy comics directly from inside their iOS app:

I would hope ComiXology manages to revert the change, but frankly I’d rather put my hopes in more DRM-free comics available directly from the creators and publishers instead.

Well my hopes have been answered in a way: Comixology announced last week that you would be able to download DRM-free versions of your Comixology books for publishers who are OK with that:

The first wave of participating publishers making their books available as DRM-free backups include Image Comics, Dynamite Entertainment, Zenescope Entertainment, MonkeyBrain Comics, Thrillbent, and Top Shelf Productions. In addition, creators and publishers that are self-publishing through comiXology Submit are now able to choose to make their books available with a DRM-free backup.

No surprises here about the publishers who are indeed “OK with that”, since they’re the ones who were already offering DRM-free comics on their own website… but those are excellent news. I can’t stress enough how huge this is.

I’m not sure whose idea it was – whether publishers like Image pressured Comixology to do this, or whether Comixology came to this logical conclusion on their own – but I’m very happy either way. As I said before, I had completely stopped buying Image comics from Comixology, preferring instead their own DRM-free website… but that website was slow as hell and barely usable. Ideally I’d rather give 100% of my money to Image, instead of – probably – 70% through Comixology, but the usability is night and day between the two, and uploading independently acquired files to an iPad is still a huge pain in the ass1.

“For those out there who have not joined the comic reading community because of DRM – you have no excuse now,” said co-founder and Director of ComiXology Submit John D. Roberts


The only problem I’ve found so far is that those backups are extremely bare: just a ZIP file with the pages as JPEG images. They’re the “retina” hi-res versions, so that’s good, but the archive is missing any kind of metadata. The only way to know what it is, short of having a human open it and read the cover, is to parse the file name.

  1. Something I’m hopping will be greatly improved in iOS8. ↩︎

ComiXology Scandal

Unless you’ve been living under a rock, you can’t have missed the news that ComiXology released a new version of their mobile app that drastically changes how comics are purchased. It was reported on technology, gadget, Applerelated, and of course comicbook-related websites. It was even discussed heavily on RPG forums.

A summary of the situation is that:

  • The iPad/iPhone app doesn’t have in-app purchases anymore – you’re forced to buy directly from the web by switching to Safari.
  • The Android app still has in-app purchases, but as I understand it they don’t go through Google Play anymore and, instead, directly hit ComiXology’s servers.

Of course, the internet being what it is, a lot of people are pissed off and are voicing their rage on social networks. I’m not happy with the change either but I’m going to try and articulate my more moderate opinion in a few points here.

It’s probably too soon

The comics industry was in very, very bad shape until recently. Digital comics revived a moribund market in a completely unprecedented way, largely thanks to ComiXology on the iPad. Digital comics let new readers discover series at their own pace without having to enter an intimidating comicbook shop and browsing through stacks of TPBs to find the first story arc. When everything is just a tap away, especially single $2 issues instead of $15 collected volumes, it’s much easier to try things and, eventually, start following one of them. Impulse buying was a big part of ComiXology’s success and the market’s recovery.

But I’m not sure the market has recovered enough at this point. Adding several extra steps between the reader and a purchase may discourage a big percentage of users who are still casual readers and not “fans” yet, and effectively stop to the inertia accumulated over the past couple years.

Profit trumps user experience

It is clear now that this change was made to align with Amazon’s strategy after they were acquired. Amazon is a company that has always walked an extremely fine line of near-zero margins in almost all aspects of their business.

So it’s not surprising that they’re first doing to ComiXology what they did with the Kindle app: avoid the 30% tax that Apple and Google have on their in-app purchasing systems. And it makes sense to do so when you already have your own micro-transaction infrastructure in place, which is the case with Amazon.

The problem is that Apple completely forbids developers from using their own system… and in this case, Amazon chooses their margin over their users’ experience.

That’s extremely disappointing but, again, not surprising coming from Amazon.

Glossing over details

Another disappointing aspect of this whole affair was how unclear the announcement was. This is the email I received:

Dear Comics Enthusiast,

We have introduced a new comiXology iPhone and iPad Comics app, and we are retiring the old one. All your purchased books will be readable in the new app once you’ve downloaded it and taken the following steps:

  • In the original Comics app, log into your comiXology account.
  • Sync your in-app purchases to your comiXology account by tapping the Restore button on the Purchases tab.
  • Download the new comiXology app. This will be your new home for downloading and reading comics.
  • Start shopping on New purchases will appear in the “In Cloud” tab in our new app.

Read this a couple times and tell me if you would have understood what it was all about. It says there’s a new app, but it never says why. Why are they switching to a new app instead of just updating the same one? And where does it say you won’t be able to purchase directly from the app anymore?

This is unacceptably bad communication.

It’s unclear where the money goes

And what happens with that 30% that ComiXology is going to save on each transaction? It’s totally unclear whether this will be redistributed in any way to the creators and publishers.

It would have been extremely easy for ComiXology to mention that more money will go to creators in order to get all fans behind the change. Instead, we’re left to assume this all goes into Jeff Bezos’ pockets… probably because that’s exactly what will happen.

Not really a change for me

That said, since the beginning of the platform, I’ve been buying comics directly on in the hope that this meant more money for the creators… and if not, at least I was giving more money to a small but growing company that was making the industry better. So the new iPad app is effectively not changing anything as far as I’m concerned… except that now I’m not sure where this extra money goes anymore.

Pricing and delayed releases

Some people have mentioned that moving to a true web store will let ComiXology and publishers set a finer pricing scale, i.e. comics sold at, say, $1.50 (Apple enforces price points of $0.99, $1.99, $2.99, and so on). This may prove beneficial, but given that it’s Amazon we’re talking about, it may prove to be another opportunity to put pressure on publishers’ margins.

It will also remove the occasional hassle of issues being delayed, or even blocked, by Apple’s crazy stupid approval process because – shocking! – some of them contain adult material. But then again, it was easy enough to switch to the web store for only those rare issues.

ComiXology is becoming obsolete anyway

Another reason I’m less annoyed by this change is that ComiXology was having a decreasing presence in my reading habits anyway. Image Comics has been offering DRM free comics for a while now, so I effectively stopped buying anything from Image in ComiXology. Most Marvel titles I don’t really need to own so I’m reading them through Marvel Unlimited. This leaves DC/Vertigo titles and indie comics, and those are increasingly purchaseable directly from the author…


So all in all, I don’t care that much about the change from a personal user experience point of view, but it does make me worried about the future of the industry. It also doesn’t shine a good light on Amazon – although I guess that’s the least of their worries.

I would hope ComiXology manages to revert the change, but frankly I’d rather put my hopes in more DRM-free comics available directly from the creators and publishers instead.

Meeting Notes

These past couple years my free time has been consumed by work on PieCrust, Wikked, and, oh, yeah, having 2 kids and 2 cats (what I was thinking, I don’t know). As a result, I haven’t been playing music or drawing much, which I miss a lot.

So I started doing it at work. Well, not playing music, because a drumset in the middle of the open-space would probably be frowned upon, but drawing and doodling.

The result is a whole bunch of post-it notes with some pretty decent art, which I’ve collected over on a “Meeting Notes” page. Check it out!

Wikked Performance

Since I announced Wikked here, I’ve been mostly working on fixing bugs, editing the documentation1, and evaluating its performance – which is what we’ll look at here today.

The big question I wanted to answer was how far you can go with just the default configuration, which is based on SQLite and requires no setup from the user. The reason for this was twofold:

  • I needed to write some advice in the documentation about when you should start looking into more sophisticated setups.
  • I plan to setup a public test wiki where people can try Wikked directly, and I needed to know if it would go down after I post the link on Reddit or HackerNews.

Initial assessment

The first thing I did was to figure out the current status of the code. For this, I took the first stress-test service I could find (which was Load Impact), and got my own private wiki tested.

  • This private wiki runs on the same server as this blog, which is a fairly under-powered server since almost all of my public websites are just static files, thanks to PieCrust: it’s a Linode VPS with only 512Mb of RAM.
  • The test requests a dozen different pages from the website, continually for around 10 seconds, with only a fraction of a second between each request. It increases the number of “users” running that test over time.

Here are some of the results:

As you can see, as the number of concurrent users increases, loading a page stays on average under a second, at 800ms. Then, around 20 concurrent users, things break down horribly and it can take between 3 and 10 seconds to load a page.

For a website running with SQLite on a server so small that Linode doesn’t even offer it anymore2, and designed mainly for private use, I think it’s pretty good. I mean, I initially didn’t plan for Wikked to run for groups larger than 10 or 15 people, let alone 20 people at the same time!

Still, I can obviously do better.

Request profiling

Werkzeug supports easy profiling of requests, so I added an option for that and looked at the output in QCacheGrind3. As I thought, pretty much all the time is spent running the SQL query to get the cached page, so there’s little opportunity to optimize the overall application’s Python code.

In Wikked, SQL queries are done through SQLAlchemy. This is because even though those queries are simple enough that even I could write them by hand, there are subtle differences in SQL dialects depending on the database implementation, especially when it comes to schema creation. I figured I would bypass the ORM layer if I need to in the future.

SQLAlchemy can be forced to log all SQL queries it generates, and that highlighted many simple problems. I won’t go into details but it boiled down to:

  • A couple of unnecessary extra queries, which came from my object model lazily loading stuff from the database when it didn’t need to.
  • Loading more columns than needed for the most common use-case of reading a page. Some of them would generate JOIN statements, too.

I also realized I was doing my main query against an un-indexed column, so I changed the schema accordingly… derp duh derp (I’m a n00b at this stuff).


Now I was ready to run some more stress tests and see if those optimizations made a difference. But although Load Impact is a very cool service, it’s also a commercial service and I was running out of free tests. I didn’t want to spend money on this, since this is all just hobby stuff, so I looked for an alternative I could setup myself.

I found a pretty neat library called FunkLoad, which does functional and load testing. Perfect!

I started 4 Amazon EC2 instances, wrote an equivalent test script, and ran the test. To make it work, I had to install FunkLoad from source (as opposed to from pip), and troubleshoot some problems, but it worked OK in the end.

Without my optimizations, I got slightly better average page loads than before – probably coming from the fact that both my EC2 instances and my Linode server were on the west coast, whereas Load Impact was running from the east coast.

With the optimizations, however, it looked a lot better:

As you can see, Wikked on my small server can now serve 40 concurrent users without breaking a sweat: 300ms on average, and always less than 1s. And it could probably handle up to 50 or 60 concurrent users if you extrapolate the data a bit.

Moar hardware!

Next, I figured I would try to see if it made any difference to run the same setup (Wikked on SQLite) on a beefier server. I launched an EC2 instance that’s way better than my Linode VPS, with 3Gb of RAM and 2 vCPUs.

Well: yes, it does make a difference. This bigger server can serve 80 concurrent users while staying under the 1 second mark most of the time. Yay!


Those numbers may not seem like much but this is as good a time as any to remind you that:

  • I’m sticking to sub-1s times as the limit, because I like fast websites. But I could easily move the limit up to 1.5 seconds and still be within a generally acceptable range (e.g. from my home laptop, Wikipedia serves its pages in around 1.3 seconds).
  • This is about testing the most simple Wikked setup, based on SQLite, because that means the easiest install experience ever compared to other wikis that need a proper SQL server. And SQLite is notoriously limited in terms of concurrent access.
  • Serving even just 40 concurrent users is actually quite high. If you consider, say, 10 minutes per visit on average, that’s around 240 visitors per hour, or 1920 visitors per day if they’re all going to be mostly coming from the same time zone. That’s more than 50.000 visitors a month4.

Still, this is my first real web application, so there’s probably even more room for improvement. I’m always open to suggestions and constructive criticism, so check-out the code and see if you can spot anything stupid!

In the meantime, I’ve got some documentation to update, and a public test wiki to setup!

  1. It’s still missing a custom theme and a fancy logo, by the way. That will be coming as soon as I have any actual idea of what to do there! ↩︎

  2. That’s a referral link, by the way. ↩︎

  3. It’s not a typo. QCacheGrind is a Qt version of KCacheGrind, so that you don’t need to install KDE libraries, and it looks slightly less terrible. ↩︎

  4. The real issue is however how your site will behave if all of a sudden a lot of those visitors arrive at the same time. This is probably not uncommon if you have the kind of wiki where there can be announcements posted to a mailing list or a Facebook group, which can in turn get a lot of members to click the same link. ↩︎

Announcing Wikked

There hasn’t been any updates on this blog for a few months, and there was a good reason for that: I was working on someting new.

The problem is that I was trying to get this new project to a “good enough” state to launch publicly… but somehow I ended up in a seemingly infinite loop of improvements, refactorings, and bug fixing.

Eventually I snapped out of it: fuck it, let’s launch it as is, and see if anybody cares enough to complain that it’s not good enough. I wrote some basic documentation, fought with setuptools for packaging, and uploaded it to the Python package server.


So lo and behold, here is Wikked, a wiki engine entirely managed with text files sitting in a revision control system.

I think it’s pretty cool, so come read more about it after the break!


You’re too lazy to follow the link to the documentation? Here’s your quick start:

pip install wikked
wk init mywiki
cd mywiki
wk runserver

Text files again?

Yes, this is “Part 2” of my personal crusade to both learn about web technologies and have all my data in text files inside Mercurial or Git. I find it so much easier to manage and backup than some piece of data trapped in an SQL database or something.

It’s obviously not a magic bullet – for one, it doesn’t scale well – but for personal websites I find that it’s perfect.

What’s next

The plan for Wikked is to stabilize it, of course: fix any bugs, make it easier to deploy, make it more configurable. I’m also expecting having to add proper support for Git, as right now only Mercurial is fully supported to store page revisions.

Then, it needs a demo website. There’s one already, actually, but I need to make it a bit more solid, like a cron job that resets it to its original state every night.

Last, I want to get some proper feedback about the Wiki Syntax. It was mostly thrown together as I found I needed something for my own wiki, but I’m still not 100% happy about it.

Fly away, monkeys!

That’s it for now. Be sure to send me some feedback, and to report bugs. Especially the part about reporting bugs, because this thing has never seen any other computer than my laptop and my VPS, so it’s pretty much the mother of all “works on my machine”.

Enjoy! 🙂

PieCrust 1.1

It’s been long overdue, since PieCrust 1.0 was released more than 4 months ago, but at last it’s here: PieCrust 1.1!

Pumpkin pie

Every time I figure I will go with a “release small, release often” kind of philosophy, I still end up with more of a “wait, I’ll just get this last feature ready first” kind of vicious circle… sigh.

Anyway, grab the new release, or keep reading if you want to know about the most important changes. As always, big thanks go to the people who reported bugs and/or helped fix them, or generally participated in the evolution of PieCrust.

Removing deprecated stuff

The first, most important change is that anything that was marked as deprecated, and that usually triggered a warning message if you used it, has been removed. So if you’re still using it, it will just break or, worse, silently do something else.

Make sure you don’t have any of those warnings before you update.

Self updating

If you’re running PieCrust from an installed binary (a .phar file), you will be able to update it easily with the chef selfupdate command… well, not this time (you’ll have to re-run the installer), but next time!

By default, the installer gets you the stable version of PieCrust so if you run the selfupdate command, it will always get you the latest stable. But you could also switch to the master branch (where things are in development) by running chef selfupdate master. Running selfupdate (with no argument) from now on would get you the latest master version until you switch back with chef selfupdate stable.

I hope this will encourage people to update more often, and to not be afraid to try things out on the master branch.

Also, it’s probably not working 100% so make sure you report any issues with the self-updater 🙂

Post iterator improvements

The post iterator, the thing you get when you want to loop over pagination.posts or site.pages, has a few new tricks:

  • Each page object that it returns now has access to the assets of that page. This is pretty handy if you want to display thumbnails or something.
  • The iterator itself now has a few “magic” functions to make simple filtering easier and faster to do. You can use is_foo(value) or has_foo(value) directly on the iterator to filter pages that have the foo setting set to value, or to an array that contains value respecively. This saves you from having to use filter() and define a filter in the header.

Temporary caching

You likely have something in your website layout that has to be computed for each page, but ends up being the same all the time during a single bake operation. To speed this up, there’s a new pccache operator that you can use.

Check out the documentation, with an example where it’s used to only compute a tag cloud in a sidebar once per bake. That example is incidentally from my own blog, and it cut the baking times in half… so yeah, I highly recommend it.

New baking infrastructure

PieCrust is now using a brand new system to keep track of what it’s baking now, compared to what was baked last time. This means that it’s now possible to delete files that we know we created last time, but are not valid anymore:

  • A page or asset that was deleted.
  • A page that doesn’t generate as many sub-pages as before.
  • A whole bunch of files that moved because the post_url, or some other URL-generation setting, changed.


A few other noteworthy changes:

  • The .md and .textile file extensions are now added to the auto_formats by default, which means any file (page or blog post) with that extension will be treated as Markdown or Textile respecitively.
  • The concept of “variants”, i.e. different versions of your website’s configuration, are now generalized to the whole of chef. See the documentation about it.
  • There’s no sample website anymore. If you’re feeling nostalgic, however, you can get back that ugly piece of blue as a theme.

That’s it! Grab the new version by re-running the installer, or getting the new source code from Github or BitBucket (where you can also report any issues).

Fucking pick one

Paul Stamatiou has been getting a lot of attention about his article “Android Is Better”. And beyond the obvious flamebait (which seems to be working quite well), he makes a couple of points that I agree with:

  • Most people probably use more Google services (for good or bad) than Apple services, and will find the Android experience better integrated if they tried it.
  • Notifications on Android are a million times more useful and productive than on iOS.
  • It’s a lot easier to customize your phone to your specific workflows.
  • The back button and intents make it a lot easier to work between apps.

These are actually the main points that made me switch to Android a couple years ago, along with a bigger screen.

Some points however I disagree:

  • Google Now is not “magical”. It’s downright creepy and makes your device slow.
  • I don’t find Android’s UI inherently better or more elegant than iOS’, or vice-versa. I’m used to both either way.
  • You still find a lot more polished and refined apps on iOS, which is not to say they are more useful or functional, as people often mix up the two (if anything, Android’s ugly apps actually do more things). But since I’m not an app-whore – I must have only a dozen non-stock apps on my phone and they’re almost all cross-platform – I frankly don’t care. The only app I miss is Sparrow, but that bird is flying away.

Marco Arment has written a nice commentary on the story, where he first criticizes Stamatiou’s use of absolute statements (emphasis his):

Paul’s headline is his thesis, conclusion, and call to action: Android is better, and everyone should try it and will likely convert like he did. But after reading the article, I’m more convinced than ever that the best mobile platform for me is currently iOS.

That sentence contains two huge qualifiers: the best mobile platform for me is currently iOS. I’ve learned to write and think with a broader view, since it’s less insular and more accurately reflects reality. (The world is a big place.)

While reading Paul’s article, I was often struck by how differently he and I use the same technology.

His article exudes a narrow tech-world view by having no such qualifiers.

That’s fine, and as a guy who has always chosen his tech (hardware and software) based on specific needs, and not on generic opinions and reviews, I can’t agree more. I often say that if I ask a question like “what is the best X?”, and someone answers “it’s Y!” without even asking for more details about my situation first, I’m probably not going to listen to that person, quietly labeling him as fanboy or short-sighted in my mental notebook.

I wish Marco would talk to his online buddies about this, actually. For example, MG Siegler, once wrote:

I don’t know about you, but when I read my favorite technology writers, I want an opinion. Is the iPhone 4S the best smartphone, or is it the Galaxy Nexus? I need to buy one, I can’t buy both. Topolsky never gives us that. Instead, he pussyfoots around it. One is great at some things, the other is great at others. Barf.

Fucking pick one. I bet that even now he won’t.

Maybe he just doesn’t read reviews like I do. I just want a reliable opinion of what a product does well, and what it doesn’t. And then I’m going to decide which one is the best, based on what I need. But apparently, Siegler wants somebody to tell him which one is the best.

And then there’s John Gruber. I’m pretty happy with my Nexus 7, myself, but apparently “most people […] agree it was a turd”. In comparison, his first-generation iPad “works just as well as the day [he] bought it”. But oh, wait:

Update: A lot of pushback from readers on my claim above, arguing that their first-gen iPads have been rendered slow and unstable by iOS 5 (the last OS to support the hardware). My son uses mine for iBooks, watching movies, and playing games. Mileage clearly varies with other apps. (And yes, the App Store app in particular is a bit crashy.)

So yeah, mileage clearly varies on the iPad, but not the Nexus 7. And funny enough, my iPhone 3G was also rendered slow and unstable by iOS 4, the last OS to support the hardware. If I was paranoid, I would think Apple likes to leave users with a broken device to force them to upgrade, but hey, your mileage may vary, maybe your iPhone 3G is doing great.

In the end, it’s important to keep in mind that everybody’s got different requirements, budgets and usage patterns. One thing that often gets overlooked by Apple fans, for instance, is that in some countries (like here in Canada) you can’t get an iPhone unless you spend a minimum of $40-ish/month on a data plan. If you want a cheaper plan like me (I use a 500Mb plan which lets me do everything I want except streaming music/video), you have no choice but to go with another OS.

As far as I’m concerned, my iPad and my Nexus 7 get along fine in my backpack, and they must know I love them just the same – just for different things.

Image Comics going DRM-free

Image Comics is now selling DRM free digital comics on their website.

This is huge. Image Comics is the third biggest comics publisher in the U.S. after DC and Marvel. It “owns” famous titles like Spawn or The Walking Dead, and other very good series like Fatale, Invincible, Saga, or Morning Glories (I say “owns” with quotes because the whole concept of Image Comics is to publish creator owned comics, so those titles are actually owned by their respective authors).

With Comic-Con only a couple weeks away, they’re probably hoping (and I’m hoping too) that DRM free digital comics will be a hot topic of discussion, with them at the top. I mean, look at what Image Comics’ Eric Stephenson has to say about it:

My stance on piracy is that piracy is bad for bad entertainment. There’s a pretty strong correlation with things that suck not being greatly pirated, while things that are successful have a higher piracy rate. If you put out a good comic book, even if somebody does download it illegally, if they enjoy it then the likelihood of them purchasing the book is pretty high. Obviously we don’t want everybody giving a copy to a hundred friends, but this argument has been around since home taping was supposedly killing music back in the ’70s, and that didn’t happen. And I don’t think it’s happening now.

This is quite an enlightened view on piracy for someone in his position. My hat’s off to you, sir.

Now does that mean Comixology’s in trouble? Not quite yet, no:

  • First, Image Comics will still sell their issues through them – they’re just adding the option to buy directly from them for people who are, like me, quite keen on actually owning the stuff they buy.
  • Second, as far as I can tell, there’s no back catalog available yet, so you can only buy the new issues coming out now.
  • Third, there’s quite a difference between tapping a button in the Comixology app to buy and read a comic, and buying a comic on a website, downloading the file, transfering the file to your reader app, etc. One takes 2 seconds, the other 2 minutes (if you count storing a copy on your file server). Most people go (erroneously) for the faster and more convenient solution.
  • Fourth, I noticed that some issues are more expensive on the Image Comics website compared to Comixology.

But I’m cautiously optimistic. Hopefully, Image Comics will want to invest a bit in making their own sales portal better, so that they can get the whole price of an issue for themselves (I can only imagine what’s left from an iPad sale once Apple and Comixology have taken their cut…).

I’m crossing my fingers for Dark Horse to start removing DRM in a few months. Unlike DC and Marvel, they’re independantly owned, and have their own online store, so that puts them in the best position to follow suit. They may not be able to do it for their whole catalog (for example, licensed IPs like Star Wars, Buffy or Avatar may have constraints attached), but they could probably do it for titles like Hellboy and B.P.R.D, assuming Mike Mignola is on board with DRM-free comics. One can only hope…

Data First: The File Server (part 2)

In part 1, we had a look at how to buy and setup a file server. Now you may be asking yourself a few questions about how to actually use that thing. The first question we’ll answer is “what should I put on there?”.


Q: What should I put on my file server?

The answer is two-fold: anything you want to access from more than one device, and anything you can’t easily backup automatically otherwise.

In my case, it means everything:

  • I surely do want to access my music and movies from a variety of devices. I even access them from work.
  • I’m accessing my pictures only from my laptop, but my wife also wants to access them from her laptop (both laptops have Lightroom installed), so pictures go on the server. It also means they will be backed up automatically – if they were on one of the laptops, it would be difficult to do that since the machine would most likely be asleep when the backup job kicks in, and in this age of SSD-only laptops you’d run out of space pretty quickly anyway.
  • My code repositories are on the file server too. I check-out the code locally and commit/push changes back on the file server.
  • Documents, porn, whatever, it’s on there.

Of course there are some caveats. Things may be too slow for you. For instance, if I work in Lightroom, I’ll turn off Wi-Fi and plug the laptop to my home Gigabit network. And even then, it will be noticeably slower than if the pictures were stored locally (but it’s not too bad as far as I’m concerned, since raw picture editing is still the performance bottleneck on my machine). If you’re doing stuff like video editing, that’s not even an option.

When a particular piece of data can’t be efficiently accessed remotely, you can use your file server as the backup device – data would be stored locally on one machine, and backed up automatically to the file server. That’s fine, as long as the backup process is, again, automatic. This generally means the source machine is a desktop computer, so that it’s available during the night, when most backup jobs execute.

I would advise against storing data anywhere else than the file server or an automatically backed-up desktop machine (or otherwise always-on storage unit). Choosing where to put a given piece of data is always a balancing act between where it makes sense, where it’s convenient, and where it’s safe, but remember that, when in doubt, always prefer safety.

Data First: The File Server (part 1)

The central piece to a data-first methodology is, in my opinion, having a file server.

The reason for this is that you’re going to want to access your data anytime, anywhere: streaming your music to your work PC, your movies to your iPad, or accessing your documents from your phone. You need a secure and reliable to way to store and serve that data, and this is best done with a file server.

File Server

(if you’re already about to ask why I don’t just use iCloud or something, you may need to read my introduction post again)

Let’s look at the basics for getting a file server after the break.

The hardware


The absolute minimum requirements are a computer that’s connected to your home network, and that is always on. Most desktop computers would fit the job description, unless you’re one of those weird people that actually turn their computers off instead of letting them go to sleep. Ideally, however, that file server should have a few other requirements:

  • Low power: it’s going to be always on, so it might as well consume as little energy as possible, not just for the cute baby polar bears but for your electricity bill too.

  • Quiet: that’s important unless you intend to store it away in your basement or something.

  • Data redundancy: hard drives fail – it’s not a matter of “if” but a matter of “when”. And when it happens, you don’t want to have to recover your data from a backup. It’s tedious at best, and you may lose the last hour of work even if you’re using things like Time Machine on a Mac. Only with “copy-on-write” snapshot systems would you not lose anything, except maybe the file that was being saved when the disk failed. Shadow Copy, a little know feature of Windows available for almost 10 years does exactly that, but nobody really uses it as far as I can tell because Microsoft never did any fancy UIs and marketing like Apple did with Time Machine. And this would mean running Windows on your file server, which would probably defeat the previous 2 bullet points as it would require desktop-grade hardware instead of a smaller dedicated box.

    Anyway, I highly recommend you get a file server that handles some kind of data redundancy. RAID-1 mirroring is the minimum (2 disks), variants of RAID-5 with 4 disks are the most common, and anything above that is a nice bonus. With both RAID-1 and RAID-5 you get one “free” disk failure – i.e. you can have one disk dying on you, and if no other disk dies while you replace it and the system rebuilds itself, then you’re all good. Of course, even if it’s quite rare, it does happen that a second disk dies soon after (especially if all your disks are the same model, bought at the same time), so make sure you always have a backup ready (we’ll talk about that later in this article).

  • Connected with wires: ideally, the file server should be connected via Gigabit Ethernet. You can always hook it up to a Wi-Fi router down the line, but at least you know the bandwidth bottleneck is not at the source. Sure, the latest forms of Wi-Fi can stream HD videos, but how do you think that’s going to scale when you end up having 2 people streaming from 2 different devices while you’re transfering files and there’s a backup job going on? Yeah. If you can, use wires.

Buying a NAS box

Given these requirements, the easiest way is really to go for a dedicated NAS. Unless you want to go through the trouble of refurbishing an old computer of yours, figuring out how to get the RAID controller to work correctly, and finding a place for a box that’s probably 5 times bigger than it needs to be, that is. You could also build a custom PC, but if you’re considering it, then you’re probably able to figure it out on your own (although I may post a guide in a distant future).

A few years ago, ReadyNAS was all the rage for NASes, with Synology not far behind, but since they’ve been bought by Netgear (which was around the time I bought my NV+), ReadyNAS seems to have fallen behind. Now, the top of the line seems to be Qnap, Thecus, and Synology, if I believe those performance benchmarks.

Have a look at those benchmarks and pick the best models that fit your budget, but do keep an eye out for which configuration is used on each test. For instance, RAID-0 with many bays can out-perform everything else on some tasks, even though in practice it’s probably not a configuration you’d use.

Synology are known to have the more user-friendly administration UIs, so you may want to bias your choice accordingly. You may also be distracted for a while by all the stuff these boxes can do on top of simple file sharing, like BitTorrent clients and web servers and photo galleries and what not. Stay focused on their main role: serving files.

The Software

There’s not much to say about the software. If you bought a pre-built NAS as recommended, it will usually come with its own custom Linux-based OS with a fancy administration UI. You will just boot it, set up the shared folders and permissions, and maybe configure some additional sharing services you may need.

If however you built your own NAS, or are using an existing desktop computer, install whatever you know best – MacOS, Windows, or, ideally, a lean Linux distro. Whatever you end up with, make sure your server is easy to remote into. You’re unlikely to hook up a screen and keyboard to it, so you’ll have to use Remote Desktop (on Windows) or SSH (on Mac/Linux) to do any kind of maintenance work. Note that some Linux distros, like the appropriately named FreeNAS, have a web adminstration panel.


Once you have your file server running, the first thing you need to do is set up some automated backups.


I’m not kidding. Don’t say “oh I’m bored, I’ll do it next week-end, I just want to watch Netflix already”. You know perfectly well next week-end you’ll be browsing Facebook and cleaning your house because you had been putting that off for 3 weeks already. So do it now.

Again, if you got a pre-built NAS, this is probably as easy as plugging an external drive to the box and going through the administration panel to find the backup settings. Just do an incremental backup of the whole thing every night to that drive. Bonus points if your backup drive is itself a RAID array.

Your NAS may also have some kind of continuous backup system (sometimes called “snapshots”), so you can enable that too.

If you have a custom box, you’re probably smart enough to setup a scheduled robocopy task (on Windows) or a cronjob running rsync (on Linux/Mac) to backup all your data to a secondary drive. If not, look it up online.

What next?

In the next parts, we’ll discuss a couple things, like what should actually go on that new fancy file server of yours.