The Ten Commandments of Computer Backups

I recently put a lot of thought into how I perform my computer backups. I’m one of those people that, while I would only be mildly pissed off by the failure of a hard drive, would be quite angry at myself if I lost even the merest hint of data that I wanted to keep. I used to perform my backups manually, using the Windows backup utility to back data up onto an external hard drive. It worked fine most of the time, but it definitely had process defects… the largest of which being, I had to remember to do it. It required my interaction to succeed (because I had to plug the drive in) and this meant there was always a human element involved. And humans are lazy.

So I set about designing myself the ultimate foolproof backup system. There would be multiple storage media, there would be encryption, there would be checks and validations and several custom-written applications. Then I started thinking, “what exactly am I protecting myself against?” It’s a good question. Here’s the list I came up with:

  1. I need my data to be safe from storage media failure. This may mean a single backup DVD being unreadable, or maybe my primary hard disk drives it’s head into the sand.
  2. I need my data to be safe from the failure of every drive of a particular type, simultaneously. It happens more than you would think, and the consequences usually aren’t pretty (whole RAID arrays failing, with all their ‘safe’ data, usually makes people a bit upset).
  3. I need to make sure my data can’t be stolen. If it is stolen (or people I don’t want reading my data try to do so) then it should appear as meaningless gibberish.
  4. I need my data to be safe from being corrupted while in storage, or while being transferred between storage devices.
  5. My data needs to be safe from theft or fire, which could mean every storage device in a particular location is unusable.
  6. My data needs to be safe from natural disasters, which could take out an entire city or state. Unlikely, but it’s the kind of thing most people don’t plan for.
  7. I need to be able to search for data that I’ve accidentally deleted, and I need back.
  8. If my data is anywhere not under my direct control, I need to be able to trust the people who do control them.
  9. I have to assume that if my backup hasn’t been tested (i.e. I haven’t tried to restore from it) then the backup isn’t any good.
  10. Finally, I shouldn’t have to do anything… computers should be smart enough these days to back themselves up.

That was all I could think of, though I’m sure there are additional points (leave a comment or email me, please!). Then I figured out what I had to do in order to prevent these situations from happening.

  • Points one and two are the easiest to solve, and are really what most people think of when they think of “backup” plans. The solution is simple: keep your data on multiple storage media, and those different storage media should be different types.
  • Point three is pretty simple to solve: encrypt everything you can possibly encrypt. This also partially side-steps point eight, because if your data is encrypted, you don’t have to trust them to not read it, you only have to trust them to not delete it. And you don’t need to trust them to not delete it if you’ve got the data in multiple locations (i.e. somewhere not under their control).
  • Point four can be partially solved by taking checksums of the data (which can be done at the same time it is encrypted). If a checksum doesn’t match, something has gone wrong and should be tried again or looked at by a human. There is the issue of what happens if the original data is corrupted. I put this in the too-hard basket for now, though the use of a RAID array can reduce the likelihood of this.
  • Points five and six are closely related, and also solved together. Every good backup plan should make use of off-site backups, where a copy of data is kept away from the original. Point five might mean keeping a copy in another building (or in my case, at my parent’s house a few kilometres away). Point six means I might consider going further. Ideally I’d like to store a copy of my data on another continent, just in case of nuclear war. If I survive, my data should too.
  • Point seven means I should be creating archives of data, so that copies of old files are kept so that I can go back in time. I would like to be able to choose copies from every day for a week, then every week for a year. After a year, I’m probably not going to remember that I once had a file.
  • Points nine and ten are quite possibly the trickiest. To solve them, I have to write automatic scripts to do all these backup tasks, then write automatic scripts to try recovering from the data and make sure it’s in perfect state. I also need to do this manually, just in case my scripts stop working (it is a computer, after all).

So there was my analysis of the backup problem done. Now for the design stage. My current working computer systems consist of a laptop (running Windows 7), a desktop (dual-booting Windows 7 and Debian GNU/Linux), and my home server (which runs Debian GNU/Linux). So I chose to do the following:

  • I decided that, since it was turned on all the time, my home server would be the primary location for all my treasured data. Every other location for my data would feed off that. My laptop and my desktop will be synchronised to my server using software such as rsync running on a very frequent schedule. Ideally I will code a switch into the script on my laptop that does syncs less often when I’m not at home, to avoid wasting bandwidth. This will give me three or four working copies of my data, depending on how implementation goes.
  • My server has two hard drives, and I’m going to use this to my advantage. The first hard drive has my primary working copy of data, and the second drive is where the backups go. So I’ll write another script that will take my working copies from my first hard drive, perform archival on them (using tar), encrypt them and checksum them (using encryption that money can’t buy) and copy them to my second hard drive. This gives me the ability to go back in time through my data, if need be. At this stage there are some things I won’t backup, either for legal reasons (I’m fairly sure the MP3 backups of my music collection shouldn’t be stored off-site under Australian law) or for practical reasons (videos are just too large to transfer off-site over the Internet).
  • I still haven’t solved the problem of off-site backups. To solve this, I’m planning to make use of Amazon S3, which is a cloud backup solution offered by everybody’s favourite friendly forgettable online book store, Amazon. Because my data has now been encrypted, I don’t have to trust them at all. I can just copy it across, mark it as being invisible to the wider world, and forget about it. I will also take up an offer from my friend Jamie to store my data on his NAS, which gives me another off-site backup location. I’m in Tasmania, Jamie is in Queensland, Amazon is in the U.S.A., and my data is safe.
  • I’m also planning to fit my server with a DVD burner and write a script that backs up my most crucial data (such as financial information and treasured memories) onto a DVD every week or so. Encrypted, of course. The only problem is that I need to remember to go and change the DVD over every week.
  • Finally, I have to write scripts to occasionally check the consistency of my data, so that nothing suffers from bit rot.

I haven’t completed the process of implementation yet (in fact I’ve hardly started). Already though, I feel safer knowing that I’ve thought about the process of storing my data, and that makes me feel a lot safer. Most people don’t think about backups until it’s too late, and perhaps maybe they should.

TP-Link Homeplug AV200 Review

A few months ago I moved in with my girlfriend (yay! :D), and this necessitated the moving of all my computer equipment to her house. At my old place, I had strung Ethernet cables all down the hallway between the various rooms, and although aesthetically unpleasant, certainly did the trick. Moving in with my girlfriend meant I could no longer have the freedom to string cables everywhere. It looked horrible, somebody was going to trip on something eventually, and being in a rented house meant that we had to keep the place looking half-decent (Ethernet cables, surprisingly enough, are not everybody’s idea of a home decoration). So what to do?

My first thought, naturally enough, was to hook up some wireless adapters. This plan worked very well for one area of the house (where my server rack now sits), but horribly for another (where my desktop is). I read about the new-fangled Homeplug idea, which involves sending Ethernet frames over the AC power network in our home. I was dubious, but intrigued; Homeplug seemed to be the solution to my problems, in theory:

  • Turns existing cables that are in every home into a computer network.
  • Doesn’t use up valuable space in the wireless spectrum.
  • Devices can just plug in via standard Ethernet, without the need for drivers.

Of course I decided to give it a go! I hurried on down to my local computer store and bought myself a pair of Homeplug adapters, these ones made by TP-Link (who, despite being Chinese owned and operated, make some excellent equipment). I plugged one in near my router and cabled it in, and plugged the other end in near my desktop computer. Unfortunately I had to plug it in via the powerboard due to the size of the adapter, but according to the documentation makes no difference. I immediately noticed several problems:

  • The network is slow. Very slow. The theoretical speed of these Homeplug adapters is 200 Mb/s straight out of the box, which should compete with 802.11n very nicely. The real speed I got was 10Mb/s, which is slower than the Internet connection we have. Not good.
  • The whole Homeplug network is a single collision domain. For the un-Ethernet-savvy, this basically means that the 10Mb/s I mentioned above is shared between every device plugged in via Homeplug, instead of standard Ethernet where every device would get 10Mb/s to itself.

Worst of all though, was this:

  • If my desktop was plugged in via Homeplug, every two or three seconds, for no reason other than that Homeplug was plugged in, my computer would freeze. I have no idea why. I reinstalled Windows and used a different Ethernet adapter, and it made no difference at all. On the other hand, Homeplug worked absolutely fine in every other computer I plugged it into.

In the end, I couldn’t stand my computer pausing every three seconds to think, so I gave up on Homeplug (I handed the adapters to my housemate, who is successfully using them to plug a wireless black-hole in his bedroom). I’m now using a top-end wireless adapter and a strong aerial, and it seems to be working.

As an aside, I read that Homeplug does have serious security issues in it’s out-of-the-box configuration. You have to set up something similar to wireless network security in order to prevent your neighbours from connecting to your Homeplug network.

Basically, the short version is this: Homeplug is an awful idea, and avoid it if at all possible. Just use wireless, which is faster and far better tested. But if you are going to buy a Homeplug adapter or two, buying the TP-Link models isn’t a bad idea, they’re pretty decent.

Saying goodbye to the cloud

My friend Michael Wheeler has written an excellent article on the whys and hows of removing your data from the cloud. This post is basically just to point you all towards it.

Over the last few years I’ve been in a similar process, getting rid of my Google account and hosting my own email. I’ve attempted to get rid of Facebook, and learned a lot about myself, my friends, and Facebook in the process. I now no longer have twitter (again) and I’m just generally being a lot more careful with my data.

I think everybody will benefit from thinking just a little more about where their information goes, so I highly recommend you read this article.

linux.conf.au 2011 – Day 1

Today was the first proper day of linux.conf.au, which is being held this year in Brisbane. This morning we were treated to a welcoming speech by conference organiser Dr Shaun Nykvist, and a presentation on the Google Summer of Code happening this year. In the welcoming speech, Shaun detailed how the organisers and volunteers had to work against water and time to get the conference ready despite Brisbane’s horrific flooding:

“I’ve got some lovely photos of our old venue with sandbags against the flood zones. It’s a shame the sandbags were about three metres lower than the water.”

After morning tea (some very lovely cakes and biscuits were provided) it was time for the Miniconfs. During the morning session I attended the Open Programming Miniconf, organised by my friend Chris Neugebauer. The first talk was about perl5i, which is a package of library modules for perl that makes it an almost usable language (almost, I don’t think that there is anything can truly save it). It was very interesting stuff, seeing how the syntax and semantics of a language can change. The speaker (Michael Schwern) was brilliant as well, which is always nice.

The next talk was about the F# programming language, designed by Microsoft. Brian McKenna’s speaking wasn’t great (but it was his first big talk, so that can be forgiven). Although I dislike the idea of languages that run on top of runtimes (such as JVM and .NET), F# looks like a good invention. Indeed, it’s basically where Microsoft develops and tests features that might be useful to put into C#.

After that was a talk by Brianna Laugher on generating English-language text using software tools, from a set of data. She was using it as part of her job at the Bureau of Meteorology to automate the generation of weather reports from their models. The idea was hugely interesting, and something that I want to implement. However, I didn’t really understand how the generation itself worked… quite a few arcane symbols seemed to be in use. I think I got the general gist though.

The final talk before lunch was about Go, the programming language developed at Google. I originally thought Go was a programming language for children, but I’ve now been set straight. It looks like something to test out… which is being added to my very long list of stuff to try now.

After lunch I went to see two talks from the Haecksen Miniconf. The first was about how open source software can help save the world, mostly by developing open source software to fix natural disaster problems, and doing it really really quickly and cheaply. The second was about setting up an overly-complicated home network, an area with which I am well acquainted.

Then it was back to the Open Programming Miniconf, where I learned about the demise of Java (basically, the Java community is dead, but Java itself will probably survive, and the JVM will definitely survive). The final talk before afternoon tea was about how to create compilers for the JVM using a parser written in Scala. Unfortunately due to the use of Scala I lost most of the detail of the talk, it went straight over my head. Which is a pity, because I was really looking forward to that talk. Ah well, guess you can’t win them all. After all this though, I was really interested in designing programming languages and compilers. I might have to give it a go.

During the final session of the day I was treated to a brilliant talk by Adam Harvey, who is a PHP developer (i.e. actually develops the PHP interpreter) telling us about the state of the PHP language. It seems Debian Stable is hugely out of date… but this is nothing new. He’s a great speaker, and I look forward to hearing his talk tomorrow, even though I don’t know what it’s about.

Last up was Jethro Carr, a hacker from NZ I know from attempting to complete his 30 Days of Geek challenge. He talked about the software revision control management tool thing he wrote, and talked about the benefits of using such software. Personally I quite enjoy using Redmine, but it’s requirement for Ruby means that I might be looking for an alternative when I get around to setting up my own installation. Currently I use the Quokforge service, run by one of my friends on OSDev.org.

So that was day 1. Or rather, the official day 1. Since then I’ve bought a printer and been to an Irish pub. More LCA news coming tomorrow, I hope.

linux.conf.au – Day 0

Me at LCA

Yesterday was day 0 of my adventure to linux.conf.au in Brisbane. I woke up extremely early (5:15am AEDT, which is 4:15am AEST) and caught a flight to Sydney and then on to Brisbane. I caught the AirTrain into the city (which is awesome, so much better than any other capital city’s offering) and met my friend Michael to drop off my bag at the hotel. After catching up with my sister for lunch (she lives in Brisbane) I headed off to the venue for the conference, QUT in Kelvin Grove.

I sat around with a few of my friends from the ##australia IRC channel on Freenode, while discussing the conference’s preperation on #linux.conf.au on Freenode. After a while, I went and registered for the conference, and got some awesome swag. The item of note is a Yubico Yubikey, which seems to be a really awesome solution to the password problem.

The LCA Venue

After the registration, I went with a few of the other ##australia geeks to get pizza from a local place in Kelvin Grove. It was a lot of impromptu fun, especially when a few more geeks from linux.conf.au showed up and also had a bite to eat.

After the pizza, it was back to the conference venue for the “noobie’s talk” which introduced us to what happens this week. In short, it sounds like a lot of fun. The presenter of the talk, Rusty Russell, has a great sense of humour. We then went off to the pub with the other first-time attendees, but we didn’t stay long because it was loud and we went off to something better.

Most of the attendees are staying at a place called Urbanest, which looks like an interesting place, mostly because of the density of geeks. Last night I went up to Urbanest for an hour or so, and watched the cricket and talked to a few other geeks. I met Jethro Carr, a geek from NZ whose blog I read. I then retired for the night, because you can only be awake for so long without a drip of caffeine.