Why I Have A Home Server

Home Server
My current HP Microserver and ADSL gateway.

It’s pretty much impossible to use a computer these days without also using the Internet. It’s also pretty much impossible to use the Internet without using a cloud service of some kind. Most people I know depend on cloud services entirely, but not me. There are several good reasons I have my own servers, including my own home server.

It’s a learning experience. This is certainly one for the geeks, but hey, I’m a geek. By running my own serversĀ  I learn about the building blocks of the Internet. I’m a professional systems administrator, and my own home environment is a good place for me to try out things that I don’t get to try at work, or don’t have time to. Part of IT is constantly learning, and that’s what I try to do.

I can run whatever software I want. I’m not limited by whatever Google decides to put into Gmail. I can run my own Exchange server if I want (I do). It may not be free software, but it gives me huge advantages in syncing between devices. If I want to try something out, I just can.

My own privacy is assured. I don’t have to trust my email provider that they aren’t reading my emails or looking through my online backups. I only have to trust myself with my data, and if you can’t trust yourself, who can you trust? I don’t have anything to hide, but I think we should value privacy far more than most people currently do. After listening to Jacob Appelbaum at linux.conf.au in January 2012, I’m assured of this.

I run backups to my own server, and for geographic protection send self-encrypted files to the cloud. I use GPG to encrypt my data, and so should you. I know DropBox and other like services say they encrypt your data so they can’t read it, but how would you ever know?

I will admit that running a home server can be more expensive than trusting the cloud with all my data, as I have to pay for hardware (I spend about $500 a year just on server hardware, but you could spend much less), for power, for a static IP address, and for software licensing (I spend $450 a year here, but with free software I could spend much less).

All in all, running my own home server gives me great satisfaction, confidence in my own abilities, more freedom and more privacy, at the expense of some time (though now it’s up and running, I probably do 10 minutes of maintenance a month) and a bit of cash. Not a bad deal.

The Ten Commandments of Computer Backups

I recently put a lot of thought into how I perform my computer backups. I’m one of those people that, while I would only be mildly pissed off by the failure of a hard drive, would be quite angry at myself if I lost even the merest hint of data that I wanted to keep. I used to perform my backups manually, using the Windows backup utility to back data up onto an external hard drive. It worked fine most of the time, but it definitely had process defects… the largest of which being, I had to remember to do it. It required my interaction to succeed (because I had to plug the drive in) and this meant there was always a human element involved. And humans are lazy.

So I set about designing myself the ultimate foolproof backup system. There would be multiple storage media, there would be encryption, there would be checks and validations and several custom-written applications. Then I started thinking, “what exactly am I protecting myself against?” It’s a good question. Here’s the list I came up with:

  1. I need my data to be safe from storage media failure. This may mean a single backup DVD being unreadable, or maybe my primary hard disk drives it’s head into the sand.
  2. I need my data to be safe from the failure of every drive of a particular type, simultaneously. It happens more than you would think, and the consequences usually aren’t pretty (whole RAID arrays failing, with all their ‘safe’ data, usually makes people a bit upset).
  3. I need to make sure my data can’t be stolen. If it is stolen (or people I don’t want reading my data try to do so) then it should appear as meaningless gibberish.
  4. I need my data to be safe from being corrupted while in storage, or while being transferred between storage devices.
  5. My data needs to be safe from theft or fire, which could mean every storage device in a particular location is unusable.
  6. My data needs to be safe from natural disasters, which could take out an entire city or state. Unlikely, but it’s the kind of thing most people don’t plan for.
  7. I need to be able to search for data that I’ve accidentally deleted, and I need back.
  8. If my data is anywhere not under my direct control, I need to be able to trust the people who do control them.
  9. I have to assume that if my backup hasn’t been tested (i.e. I haven’t tried to restore from it) then the backup isn’t any good.
  10. Finally, I shouldn’t have to do anything… computers should be smart enough these days to back themselves up.

That was all I could think of, though I’m sure there are additional points (leave a comment or email me, please!). Then I figured out what I had to do in order to prevent these situations from happening.

  • Points one and two are the easiest to solve, and are really what most people think of when they think of “backup” plans. The solution is simple: keep your data on multiple storage media, and those different storage media should be different types.
  • Point three is pretty simple to solve: encrypt everything you can possibly encrypt. This also partially side-steps point eight, because if your data is encrypted, you don’t have to trust them to not read it, you only have to trust them to not delete it. And you don’t need to trust them to not delete it if you’ve got the data in multiple locations (i.e. somewhere not under their control).
  • Point four can be partially solved by taking checksums of the data (which can be done at the same time it is encrypted). If a checksum doesn’t match, something has gone wrong and should be tried again or looked at by a human. There is the issue of what happens if the original data is corrupted. I put this in the too-hard basket for now, though the use of a RAID array can reduce the likelihood of this.
  • Points five and six are closely related, and also solved together. Every good backup plan should make use of off-site backups, where a copy of data is kept away from the original. Point five might mean keeping a copy in another building (or in my case, at my parent’s house a few kilometres away). Point six means I might consider going further. Ideally I’d like to store a copy of my data on another continent, just in case of nuclear war. If I survive, my data should too.
  • Point seven means I should be creating archives of data, so that copies of old files are kept so that I can go back in time. I would like to be able to choose copies from every day for a week, then every week for a year. After a year, I’m probably not going to remember that I once had a file.
  • Points nine and ten are quite possibly the trickiest. To solve them, I have to write automatic scripts to do all these backup tasks, then write automatic scripts to try recovering from the data and make sure it’s in perfect state. I also need to do this manually, just in case my scripts stop working (it is a computer, after all).

So there was my analysis of the backup problem done. Now for the design stage. My current working computer systems consist of a laptop (running Windows 7), a desktop (dual-booting Windows 7 and Debian GNU/Linux), and my home server (which runs Debian GNU/Linux). So I chose to do the following:

  • I decided that, since it was turned on all the time, my home server would be the primary location for all my treasured data. Every other location for my data would feed off that. My laptop and my desktop will be synchronised to my server using software such as rsync running on a very frequent schedule. Ideally I will code a switch into the script on my laptop that does syncs less often when I’m not at home, to avoid wasting bandwidth. This will give me three or four working copies of my data, depending on how implementation goes.
  • My server has two hard drives, and I’m going to use this to my advantage. The first hard drive has my primary working copy of data, and the second drive is where the backups go. So I’ll write another script that will take my working copies from my first hard drive, perform archival on them (using tar), encrypt them and checksum them (using encryption that money can’t buy) and copy them to my second hard drive. This gives me the ability to go back in time through my data, if need be. At this stage there are some things I won’t backup, either for legal reasons (I’m fairly sure the MP3 backups of my music collection shouldn’t be stored off-site under Australian law) or for practical reasons (videos are just too large to transfer off-site over the Internet).
  • I still haven’t solved the problem of off-site backups. To solve this, I’m planning to make use of Amazon S3, which is a cloud backup solution offered by everybody’s favourite friendly forgettable online book store, Amazon. Because my data has now been encrypted, I don’t have to trust them at all. I can just copy it across, mark it as being invisible to the wider world, and forget about it. I will also take up an offer from my friend Jamie to store my data on his NAS, which gives me another off-site backup location. I’m in Tasmania, Jamie is in Queensland, Amazon is in the U.S.A., and my data is safe.
  • I’m also planning to fit my server with a DVD burner and write a script that backs up my most crucial data (such as financial information and treasured memories) onto a DVD every week or so. Encrypted, of course. The only problem is that I need to remember to go and change the DVD over every week.
  • Finally, I have to write scripts to occasionally check the consistency of my data, so that nothing suffers from bit rot.

I haven’t completed the process of implementation yet (in fact I’ve hardly started). Already though, I feel safer knowing that I’ve thought about the process of storing my data, and that makes me feel a lot safer. Most people don’t think about backups until it’s too late, and perhaps maybe they should.

Saying goodbye to the cloud

My friend Michael Wheeler has written an excellent article on the whys and hows of removing your data from the cloud. This post is basically just to point you all towards it.

Over the last few years I’ve been in a similar process, getting rid of my Google account and hosting my own email. I’ve attempted to get rid of Facebook, and learned a lot about myself, my friends, and Facebook in the process. I now no longer have twitter (again) and I’m just generally being a lot more careful with my data.

I think everybody will benefit from thinking just a little more about where their information goes, so I highly recommend you read this article.

30 Days of Geek #8: Preferred method of communication with humans

I’ve decided to partake in Jethro Carr’s 30 Days of Geek challenge, so I’ll be writing a post a day on my geekiness for an entire month! You can find all the posts in one spot here.

Naturally, I prefer to communicate with other humans in person. Every other form of communication leaves something to be desired (and usually, that something is something big).

I’m quite sure that I’m not the only person in the world who has trouble picking up on the subtle cues found in all human communications. The hints of sarcasm (or, in my case, the never-ending stream of it), the smiles, the hand movements, the stances, the tones of voice. A lot of it falls under the umbrella term of body language. Body language is just something the Internet cannot do at all. The telephone, surprisingly, does it even worse (at least in my experience). So I like talking in person the best, because it gives me the best chance to pick up on all these cues.

So, my preferences as far as communications goes:

  1. Human contact one on one or in a small group.
  2. Human contact in a large group conversation (there’s a large gap between 1 and 2).
  3. Instant Messaging (I use MSN and Facebook chat the most). Simply because I can log it.
  4. Internet Relay Chat (IRC). If you don’t know what this is, just think chatrooms.
  5. Text messages (SMS).
  6. And, right down the bottom, in a dusty box underneath the staircase, talking on the telephone.

I think the reason I hate the phone so much is because the person who gets called (usually me) has no choice about when the conversation happens. I could be in the middle of something requiring a lot of concentration (such as programming or web scripting, which requires juggling dozens of variables and logical statements in your head) and the phone rings. Concentration lost.

Of course, if I like you enough, I’ll be happy to shelve whatever I’m doing to talk to you. It’s just that this category isn’t large enough for my boss to be part of it.

Where’s My Server? There it is!

If you enjoyed my review of some of the hosting providers I have used over the past few years, you may be interested in Michael Wheeler’s review of Where’s My Server?, a New Zealand-based VPS provider.

One of the most interesting things about WMS, and certainly the thing that caught my eye, was the on-demand pricing they have available, making it very much like cloud computing (as far as the user is concerned, anyway). It’s an interesting concept, and certainly a move that I support (I hardly ever use 100% of my servers’ resources, let alone 100% all month). The only problem is that it makes comparisons with traditional VPS providers who charge a fixed price per month a bit of a pain, and I haven’t quite figured out how to do it exactly.

The other issue I notice is that bandwidth out of New Zealand is very expensive, but this being a function of New Zealand and not Where’s My Server, I don’t think that’s cause for complaint.