Posts Tagged ‘Internet’

The Ten Commandments of Computer Backups

Sunday, June 19th, 2011
Hard Drive Dying

Photo courtesy of Michael Wheeler

I recently put a lot of thought into how I perform my computer backups. I’m one of those people that, while I would only be mildly pissed off by the failure of a hard drive, would be quite angry at myself if I lost even the merest hint of data that I wanted to keep. I used to perform my backups manually, using the Windows backup utility to back data up onto an external hard drive. It worked fine most of the time, but it definitely had process defects… the largest of which being, I had to remember to do it. It required my interaction to succeed (because I had to plug the drive in) and this meant there was always a human element involved. And humans are lazy.

So I set about designing myself the ultimate foolproof backup system. There would be multiple storage media, there would be encryption, there would be checks and validations and several custom-written applications. Then I started thinking, “what exactly am I protecting myself against?” It’s a good question. Here’s the list I came up with:

  1. I need my data to be safe from storage media failure. This may mean a single backup DVD being unreadable, or maybe my primary hard disk drives it’s head into the sand.
  2. I need my data to be safe from the failure of every drive of a particular type, simultaneously. It happens more than you would think, and the consequences usually aren’t pretty (whole RAID arrays failing, with all their ‘safe’ data, usually makes people a bit upset).
  3. I need to make sure my data can’t be stolen. If it is stolen (or people I don’t want reading my data try to do so) then it should appear as meaningless gibberish.
  4. I need my data to be safe from being corrupted while in storage, or while being transferred between storage devices.
  5. My data needs to be safe from theft or fire, which could mean every storage device in a particular location is unusable.
  6. My data needs to be safe from natural disasters, which could take out an entire city or state. Unlikely, but it’s the kind of thing most people don’t plan for.
  7. I need to be able to search for data that I’ve accidentally deleted, and I need back.
  8. If my data is anywhere not under my direct control, I need to be able to trust the people who do control them.
  9. I have to assume that if my backup hasn’t been tested (i.e. I haven’t tried to restore from it) then the backup isn’t any good.
  10. Finally, I shouldn’t have to do anything… computers should be smart enough these days to back themselves up.

That was all I could think of, though I’m sure there are additional points (leave a comment or email me, please!). Then I figured out what I had to do in order to prevent these situations from happening.

  • Points one and two are the easiest to solve, and are really what most people think of when they think of “backup” plans. The solution is simple: keep your data on multiple storage media, and those different storage media should be different types.
  • Point three is pretty simple to solve: encrypt everything you can possibly encrypt. This also partially side-steps point eight, because if your data is encrypted, you don’t have to trust them to not read it, you only have to trust them to not delete it. And you don’t need to trust them to not delete it if you’ve got the data in multiple locations (i.e. somewhere not under their control).
  • Point four can be partially solved by taking checksums of the data (which can be done at the same time it is encrypted). If a checksum doesn’t match, something has gone wrong and should be tried again or looked at by a human. There is the issue of what happens if the original data is corrupted. I put this in the too-hard basket for now, though the use of a RAID array can reduce the likelihood of this.
  • Points five and six are closely related, and also solved together. Every good backup plan should make use of off-site backups, where a copy of data is kept away from the original. Point five might mean keeping a copy in another building (or in my case, at my parent’s house a few kilometres away). Point six means I might consider going further. Ideally I’d like to store a copy of my data on another continent, just in case of nuclear war. If I survive, my data should too.
  • Point seven means I should be creating archives of data, so that copies of old files are kept so that I can go back in time. I would like to be able to choose copies from every day for a week, then every week for a year. After a year, I’m probably not going to remember that I once had a file.
  • Points nine and ten are quite possibly the trickiest. To solve them, I have to write automatic scripts to do all these backup tasks, then write automatic scripts to try recovering from the data and make sure it’s in perfect state. I also need to do this manually, just in case my scripts stop working (it is a computer, after all).

So there was my analysis of the backup problem done. Now for the design stage. My current working computer systems consist of a laptop (running Windows 7), a desktop (dual-booting Windows 7 and Debian GNU/Linux), and my home server (which runs Debian GNU/Linux). So I chose to do the following:

  • I decided that, since it was turned on all the time, my home server would be the primary location for all my treasured data. Every other location for my data would feed off that. My laptop and my desktop will be synchronised to my server using software such as rsync running on a very frequent schedule. Ideally I will code a switch into the script on my laptop that does syncs less often when I’m not at home, to avoid wasting bandwidth. This will give me three or four working copies of my data, depending on how implementation goes.
  • My server has two hard drives, and I’m going to use this to my advantage. The first hard drive has my primary working copy of data, and the second drive is where the backups go. So I’ll write another script that will take my working copies from my first hard drive, perform archival on them (using tar), encrypt them and checksum them (using encryption that money can’t buy) and copy them to my second hard drive. This gives me the ability to go back in time through my data, if need be. At this stage there are some things I won’t backup, either for legal reasons (I’m fairly sure the MP3 backups of my music collection shouldn’t be stored off-site under Australian law) or for practical reasons (videos are just too large to transfer off-site over the Internet).
  • I still haven’t solved the problem of off-site backups. To solve this, I’m planning to make use of Amazon S3, which is a cloud backup solution offered by everybody’s favourite friendly forgettable online book store, Amazon. Because my data has now been encrypted, I don’t have to trust them at all. I can just copy it across, mark it as being invisible to the wider world, and forget about it. I will also take up an offer from my friend Jamie to store my data on his NAS, which gives me another off-site backup location. I’m in Tasmania, Jamie is in Queensland, Amazon is in the U.S.A., and my data is safe.
  • I’m also planning to fit my server with a DVD burner and write a script that backs up my most crucial data (such as financial information and treasured memories) onto a DVD every week or so. Encrypted, of course. The only problem is that I need to remember to go and change the DVD over every week.
  • Finally, I have to write scripts to occasionally check the consistency of my data, so that nothing suffers from bit rot.

I haven’t completed the process of implementation yet (in fact I’ve hardly started). Already though, I feel safer knowing that I’ve thought about the process of storing my data, and that makes me feel a lot safer. Most people don’t think about backups until it’s too late, and perhaps maybe they should.

Saying goodbye to the cloud

Sunday, January 30th, 2011

My friend Michael Wheeler has written an excellent article on the whys and hows of removing your data from the cloud. This post is basically just to point you all towards it.

Over the last few years I’ve been in a similar process, getting rid of my Google account and hosting my own email. I’ve attempted to get rid of Facebook, and learned a lot about myself, my friends, and Facebook in the process. I now no longer have twitter (again) and I’m just generally being a lot more careful with my data.

I think everybody will benefit from thinking just a little more about where their information goes, so I highly recommend you read this article.

30 Days of Geek #8: Preferred method of communication with humans

Monday, November 8th, 2010

I’ve decided to partake in Jethro Carr’s 30 Days of Geek challenge, so I’ll be writing a post a day on my geekiness for an entire month! You can find all the posts in one spot here.

Naturally, I prefer to communicate with other humans in person. Every other form of communication leaves something to be desired (and usually, that something is something big).

I’m quite sure that I’m not the only person in the world who has trouble picking up on the subtle cues found in all human communications. The hints of sarcasm (or, in my case, the never-ending stream of it), the smiles, the hand movements, the stances, the tones of voice. A lot of it falls under the umbrella term of body language. Body language is just something the Internet cannot do at all. The telephone, surprisingly, does it even worse (at least in my experience). So I like talking in person the best, because it gives me the best chance to pick up on all these cues.

So, my preferences as far as communications goes:

  1. Human contact one on one or in a small group.
  2. Human contact in a large group conversation (there’s a large gap between 1 and 2).
  3. Instant Messaging (I use MSN and Facebook chat the most). Simply because I can log it.
  4. Internet Relay Chat (IRC). If you don’t know what this is, just think chatrooms.
  5. Text messages (SMS).
  6. And, right down the bottom, in a dusty box underneath the staircase, talking on the telephone.

I think the reason I hate the phone so much is because the person who gets called (usually me) has no choice about when the conversation happens. I could be in the middle of something requiring a lot of concentration (such as programming or web scripting, which requires juggling dozens of variables and logical statements in your head) and the phone rings. Concentration lost.

Of course, if I like you enough, I’ll be happy to shelve whatever I’m doing to talk to you. It’s just that this category isn’t large enough for my boss to be part of it.

Where’s My Server? There it is!

Tuesday, October 12th, 2010

If you enjoyed my review of some of the hosting providers I have used over the past few years, you may be interested in Michael Wheeler’s review of Where’s My Server?, a New Zealand-based VPS provider.

One of the most interesting things about WMS, and certainly the thing that caught my eye, was the on-demand pricing they have available, making it very much like cloud computing (as far as the user is concerned, anyway). It’s an interesting concept, and certainly a move that I support (I hardly ever use 100% of my servers’ resources, let alone 100% all month). The only problem is that it makes comparisons with traditional VPS providers who charge a fixed price per month a bit of a pain, and I haven’t quite figured out how to do it exactly.

The other issue I notice is that bandwidth out of New Zealand is very expensive, but this being a function of New Zealand and not Where’s My Server, I don’t think that’s cause for complaint.

Quick Hosting Reviews

Tuesday, September 28th, 2010

Over the last few years I’ve used quite a few different hosting providers, so I thought I would give a few quick reviews of them all, so I can share my experience with them.

Silentflame Web Hosting

During the time I was with Silentflame during 2007-2008, I was very happy. Although the service was a bit slow for me, I suspect that this was purely because I was on the other side of the world. The novel thing about this service is the fact that it gives away all it’s profits to charity. It’s a great idea, and one that I think we should see in more businesses (perhaps a tithe would be better though). No native IPv6 on their services yet, unfortunately.

DirectSpace Networks

I only had a VPS with DirectSpace for about a month or two, before I switched to a different provider. I was fairly happy with these guys, never had any issues that weren’t resolved promptly. The biggest criticism I had with my service was that the CPU allocation was too low. I had severe speed issues from the lack of time my processes had to run. No native IPv6 either.

ServerPronto

I had a dedicated server with ServerPronto for around six months this year, and although I no longer have it, this server performed very well for me over the time I was with them. They do have a somewhat convoluted exit process (it involves filling out a paper form and sending it to them snail mail along with a copy of some ID) but this is no problem to navigate, and unlike what others are saying on the Internet, does not result in your identity being stolen and your credit card being abused. The main selling point of ServerPronto is the price, they are extremely cheap dedicated servers. That said, quality does not appear to be an issue. No unexpected restarts, hardware never failed, and the network is very fast. No native IPv6 here either.

The only reason I got rid of this server was that it was costing more than an equivalent VPS and I wasn’t really using it. I’ve since turned the money over to other VPS services, increasing the number of services I can test.

Nullshells Networks

I’ve used Nullshells Networks’ web hosting for a few years now, and I am extremely happy with the service. All the services I had with them (web and email hosting) have always worked flawlessly, and if I’ve had any queries, the owner of the business has been more than happy to help out. I have only two nitpicks; one is the lack of IPv6, and the other is the fact they use a self-signed SSL certificate. While the lack of signing of an SSL certificate is no technical problem, and while I’m savvy enough to check the certificate and add an exception, it is a bit unprofessional. I’m still using Nullshells for my web and email hosting.

For my full review of Nullshells from around a year ago, click here.

Mammoth VPS

Overall I’ve been very pleased with Mammoth VPS, which is an Australian-owned company with servers located in Sydney’s CBD. While they are more expensive than other offerings, this is simply because the bandwidth in Australia is much more expensive than it is in Europe or the USA, so this is no fault of Mammoth. I have had a few issues with unexpected reboots, but apart from messing with my uptime statistics, this is no real problem. It’s always nice to support local businesses, too. No native IPv6 yet, but almost nowhere does.

BuildYourVPS

I’ve only had a BuildYourVPS (actually TOCICI) VPS for a couple of days now, but I wouldn’t recommend them, based on what I’ve experienced so far. When I first signed up, it took 4 rebuilds of the VPS before I could even log in via SSH. I’m assured this is not a regular thing, but I’d take care. After it was set up, the VPS did work very well. No CPU cloggage issues like on most other VPS providers. The network was extremely fast (as you’d expect from having the servers located in a US west-coast Internet exchange). One thing I did notice is that the server is behind a NAT. Fine, I guess, except that it makes some network configuration tasks a bit more confusing, and that the gateway IP they use is actually a special-use IP reserved for testing. Ouch! Zero marks for that one. On the up side, they do support native IPv6, albeit on request.

Edit 8/10/2010: After playing around a bit more with BuildYourVPS services, I’m happy to report that the issue with server builds has been fixed. All my other complaints were simply OpenVZ issues. Pending a few more weeks with the server, I’d be happy to give them a thumbs-up.