John Woltman's
Attempt to Stay Current at Bleeding Edge Blog Technology

Local "Time Machine" backups of remote systems with rsync

Posted Oct. 30, 2010 by John Woltman

The Problem: I need to backup remote websites that have lots of files.  I want to have incremental snapshots, so that I can have multiple backups of the same site without wasting lots of disk space.  In other words, I want Apple's Time Machine for websites.

The Solution: I knew that the familiar rsync program could copy files from a server to my home computer.  So I Googled and came across Michael Jakl's excellent Time Machine for every Unix out there.  I've expanded his script and provided my results here, so that you can use it too.  Read on for more info, and to download the script.

What is Time Machine?

Time Machine is Apple's backup system.  It copies your files from your computer to an external drive.  When it makes the backups, it only copies files that have changed since the last time it ran.  Time Machine's backups act as snapshots - each backup is a complete copy, or snapshot, of your computer from a certain time and date.  This lets you "go back in time" to different versions of your files.  So if you decide you don't like the changes you made to your company newsletter, Time Machine will let you restore from an earlier version - say, from 2 hours ago, or 2 weeks ago!

Time Machine has saved me numerous times when I've accidentally deleted a presentation, or made dumb changes to a spreadsheet.  So if I'm going to be backing up websites, I'd like to have the same conveniences that Time Machine provides.

Enter rsync

Rsync is a command-line program that copies files from one place to another.  The places can be on the same computer, or they can be remote computers like web servers.  Rsync can be to told to only copy files that have changed since the last backup, so it can perform incremental backups, much like Time Machine.  The rsync program predates Time Machine, and it turns out to have very similar features.  So using Michael Jackl's article as a base, I present to you my script, backup.sh.

How the Script Works

The script uses rsync-over-SSH to connect to a remote server and backup the server's files.  Every time it runs, it creates a new folder with a name based on the current date and time.  For example, if you were backing up on January 15, 2009 at 10:30AM, the folder would be called 2010-01-15-103000.  It then connects to the remote server and starts backing up, using a feature of of rsync called "link-dest."  This feature tells rsync to use the files from the previous backup if they haven't changed.  This lets us keep multiple snapshot backups without wasting space.

When it is finished, it creates a shortcut to the latest completed backup called "latest."

Setup and Configuration

Note: This is a command-line script, and as such it assumes familiarity with basic shell usage.

I'm going to walk you through a practical example that is similar to my own setup.  I have two servers, example.com and example.local (a testing and development server).  The general steps are:

  1. Create a folder for each server you want to back up.
  2. Copy the backup-config.txt file to the folder you just created.
  3. Edit the backup-config.txt to reflect your configuration.
  4. Run backup.sh with the name of your backup directory. Ex: ./backup.sh mysite.

The Configuration File

The script will read a configuration file to get the information it needs to start each backup.  The file only contains 4 parameters (and one of them is optional!).

  1. HOST is the remote server to backup.  Ex: HOST=example.com
  2. REMOTEDIR is the folder you want to backup.  To back up a typical website, you might use REMOTEDIR=/var/www.
  3. REMOTEUSER is the username you'll login to the server with.  Ex: REMOTEUSER=webadmin
  4. RSYNCPARAMS are additional parameters you want to pass to rsync.  This is optional, and is turned off in the sample configuration.

An Example

Let's start by making some folders to store our backups.  I created a folder called backups and copied backup.sh into it.  Inside the backups folder I created two more folders, one called example-live and example-testing.  The -live folder is backups of the live example.com, and the -testing folder is for backups of the testing site.  You can name your folders whatever you'd like, but I would recommend against using spaces in the names, they might cause problems.

Now let's configure the example-live backup.  First, copy the sample backup-config.txt into the example-live folder.  Open the text file in your favorite editor (which is VIM), and change the options to something like this:

  • HOST=example.com
  • REMOTEDIR=/var/www
  • REMOTEUSER=backupadmin
  • RSYNCPARAMS="--progress"

Using the "--progress" option shows a progress bar when copying files, which lets you know its working.  Now open a terminal and change into your backups folder.  Then run:

./backup.sh example-live

You will be prompted for backupadmin's password, and after you enter it rsync will begin backing up the files from example.com to your own computer.  When it is finished, you should have two new folders inside of johnwoltman-live.  One will be named after the time the backup started, and the second "latest", which is a shortcut to the first folder.  This shortcut gets updated whenever you complete another backup.

Set up example-testing in a similar manner: copy the sample backup-config.txt into example-testing/, edit it, and run ./backup.sh example-testing.

Features, and Changes from Michael's Script

I must confess, I didn't read his article all the way to the bottom on my first read.  I got to the part where he provided a good example of what the rsync command looks like, and jumped in feet first.  So without further disclaimers, here are the features my script has:

  • Changed: Script is still single file, but can be used with many different servers because it uses a configuration file for each server.
  • Changed: If there are no previous backups (based on existence of the latest shortcut), notifies the user that we're starting the initial backup.
  • Changed: Forgoes the "-x" parameter.  Read about rsync's options in the official documentation.
  • Changed: Duplicates the folder naming convention of Time Machine (YYYY-MM-DD-HHMMSS), and calls the latest backup latest instead of current.
  • Changed: The script's error messages are more informative.
  • Changed: To change global rsync options, edit the DEFAULTOPTIONS.  The default options are:
    • -a ... archive mode
    • -v ... show extra information (verbose mode)
    • -z ... compress the data before sending
    • --delete ... delete files from the backup that don't exist on the server
    • --delete-excluded ... delete files from the backup that have been explicitly excluded (see the rsync documentation).
  • Changed: Returns rsync's error code if rsync fails.
  • Prepends folders of failed backups with "incomplete-."

Conclusion

I wrote this script in the wee hours of Saturday morning.  I've since backed up 4 different websites with it, and it seems to be working great.  I've only tested it on Mac OS X 10.6, but it should work fine on earlier versions, and on most Linux and BSD distributions, provided you have rsync and OpenSSH.  I doubt that it works on Windows.

You could call this script from a task scheduler like Cron to automate the creation of backups.  If you setup SSH private key authentication, you shouldn't even need a password.

I'd like to thank Michael for the time he put into his script and his article.  And I suppose I should thank Google, since Michael's page was the very first hit when I searched for "rsync time machine." :)

I hope you find the script useful.  Download it below.