Sagar.BlogArticle
All posts
All posts
Bash

Practical Script — Automated Backup with rsync

Build a production-worthy backup script using rsync, rotating archives, email notifications, and proper error handling. A real-world application of everything in this course.

February 4, 20269 min read
BashrsyncBackupPracticalScripting

A backup script is one of the first "real" scripts most sysadmins write — and one of the most important to get right. This script uses rsync for efficient incremental backups, hard links for storage efficiency, automated log rotation, and a summary report.

Design Goals

What this script does

  • Rsync-based incremental backup with hard-link snapshots (Time Machine style)
  • Each run creates a new dated snapshot; unchanged files are hard-linked from the previous snapshot (zero extra disk space)
  • Keeps configurable number of snapshots
  • Locked with a PID file to prevent concurrent runs
  • Logs to file with timestamps
  • Sends a summary to stdout (for cron email)
backup.sh
#!/usr/bin/env bash
# backup.sh — Incremental snapshot backup using rsync hard-links

set -euo pipefail

# ── Configuration ─────────────────────────────────────────────────────────────
SOURCE="${1:?Usage: $0 <source> <destination> [keep]}"
DEST="${2:?Usage: $0 <source> <destination> [keep]}"
KEEP="${3:-14}"                              # number of snapshots to keep
LOCKFILE="/tmp/backup-$(echo "$DEST" | tr '/' '_').lock"
LOGFILE="$DEST/backup.log"
DATE=$(date '+%Y-%m-%d_%H-%M-%S')
SNAPSHOT_DIR="$DEST/snapshots/$DATE"
LATEST_LINK="$DEST/latest"

# ── Logging ───────────────────────────────────────────────────────────────────
function log {
    local ts; ts=$(date '+%F %T')
    echo "[$ts] $*" | tee -a "$LOGFILE"
}
function die { log "FATAL: $*"; exit 1; }

# ── Lock ──────────────────────────────────────────────────────────────────────
function acquire_lock {
    if [[ -f "$LOCKFILE" ]]; then
        local old_pid; old_pid=$(cat "$LOCKFILE")
        if kill -0 "$old_pid" 2>/dev/null; then
            die "Already running (PID $old_pid)"
        fi
        log "Stale lock removed (PID $old_pid)"
    fi
    echo $$ > "$LOCKFILE"
}

function release_lock { rm -f "$LOCKFILE"; }

# ── Cleanup ───────────────────────────────────────────────────────────────────
function cleanup {
    local code=$?
    release_lock
    if [[ $code -ne 0 ]]; then
        log "Backup FAILED (exit code $code)"
    fi
}
trap cleanup EXIT
trap 'die "Interrupted by signal"' INT TERM

# ── Main ──────────────────────────────────────────────────────────────────────
function run_backup {
    log "=== Backup started ==="
    log "Source:      $SOURCE"
    log "Destination: $DEST"
    log "Snapshot:    $SNAPSHOT_DIR"

    # Validate
    [[ -d "$SOURCE" ]] || die "Source not found: $SOURCE"
    mkdir -p "$DEST/snapshots"
    acquire_lock

    # rsync with hard-link reference to latest snapshot
    local rsync_opts=(-av --delete --stats)
    if [[ -e "$LATEST_LINK" ]]; then
        rsync_opts+=(--link-dest="$LATEST_LINK")
    fi

    mkdir -p "$SNAPSHOT_DIR"
    log "Running rsync..."
    rsync "${rsync_opts[@]}" "$SOURCE/" "$SNAPSHOT_DIR/" 2>&1 | tee -a "$LOGFILE"

    # Update the 'latest' symlink
    ln -sfn "$SNAPSHOT_DIR" "$LATEST_LINK"
    log "Latest symlink updated → $SNAPSHOT_DIR"

    # Prune old snapshots
    log "Pruning old snapshots (keeping $KEEP)..."
    mapfile -t old_snapshots < <(
        ls -dt "$DEST/snapshots"/2* 2>/dev/null | tail -n +"$((KEEP + 1))"
    )
    local pruned=0
    for snap in "${old_snapshots[@]}"; do
        log "  Removing: $snap"
        rm -rf "$snap"
        (( pruned++ ))
    done
    log "Pruned $pruned old snapshot(s)"

    # Report
    local total_snaps; total_snaps=$(ls -d "$DEST/snapshots"/2* 2>/dev/null | wc -l)
    local disk_used; disk_used=$(du -sh "$DEST/snapshots" 2>/dev/null | cut -f1)
    log "=== Backup complete ==="
    log "Snapshots:   $total_snaps"
    log "Disk used:   $disk_used"
}

run_backup

Setting Up as a Cron Job

# Run at 2 AM daily
crontab -e

# Add:
# 0 2 * * * /usr/local/bin/backup.sh /home/alice /backup/alice 30 >> /var/log/backup.log 2>&1

# Or to receive email on failure only:
MAILTO="admin@example.com"
0 2 * * * /usr/local/bin/backup.sh /home/alice /backup/alice 30 >/dev/null

rsync --link-dest

rsync --link-dest=PREV_DIR compares the source against PREV_DIR. Files that are identical (same size, modification time) are hard-linked from PREV_DIR into the new snapshot instead of copied. This means each snapshot looks like a full backup but only stores the differences on disk.

Exercise

Extend the backup script to add:

  1. A --exclude flag that accepts a comma-separated list of patterns to exclude from the backup (e.g., *.tmp,node_modules,.git)
  2. Convert the comma-separated list to rsync --exclude arguments
  3. Add a --dry-run mode that passes --dry-run to rsync and logs "DRY RUN MODE"