Practical Script — Automated Backup with rsync
Build a production-worthy backup script using rsync, rotating archives, email notifications, and proper error handling. A real-world application of everything in this course.
A backup script is one of the first "real" scripts most sysadmins write — and one of the most important to get right. This script uses rsync for efficient incremental backups, hard links for storage efficiency, automated log rotation, and a summary report.
Design Goals
What this script does
- Rsync-based incremental backup with hard-link snapshots (Time Machine style)
- Each run creates a new dated snapshot; unchanged files are hard-linked from the previous snapshot (zero extra disk space)
- Keeps configurable number of snapshots
- Locked with a PID file to prevent concurrent runs
- Logs to file with timestamps
- Sends a summary to stdout (for cron email)
#!/usr/bin/env bash
# backup.sh — Incremental snapshot backup using rsync hard-links
set -euo pipefail
# ── Configuration ─────────────────────────────────────────────────────────────
SOURCE="${1:?Usage: $0 <source> <destination> [keep]}"
DEST="${2:?Usage: $0 <source> <destination> [keep]}"
KEEP="${3:-14}" # number of snapshots to keep
LOCKFILE="/tmp/backup-$(echo "$DEST" | tr '/' '_').lock"
LOGFILE="$DEST/backup.log"
DATE=$(date '+%Y-%m-%d_%H-%M-%S')
SNAPSHOT_DIR="$DEST/snapshots/$DATE"
LATEST_LINK="$DEST/latest"
# ── Logging ───────────────────────────────────────────────────────────────────
function log {
local ts; ts=$(date '+%F %T')
echo "[$ts] $*" | tee -a "$LOGFILE"
}
function die { log "FATAL: $*"; exit 1; }
# ── Lock ──────────────────────────────────────────────────────────────────────
function acquire_lock {
if [[ -f "$LOCKFILE" ]]; then
local old_pid; old_pid=$(cat "$LOCKFILE")
if kill -0 "$old_pid" 2>/dev/null; then
die "Already running (PID $old_pid)"
fi
log "Stale lock removed (PID $old_pid)"
fi
echo $$ > "$LOCKFILE"
}
function release_lock { rm -f "$LOCKFILE"; }
# ── Cleanup ───────────────────────────────────────────────────────────────────
function cleanup {
local code=$?
release_lock
if [[ $code -ne 0 ]]; then
log "Backup FAILED (exit code $code)"
fi
}
trap cleanup EXIT
trap 'die "Interrupted by signal"' INT TERM
# ── Main ──────────────────────────────────────────────────────────────────────
function run_backup {
log "=== Backup started ==="
log "Source: $SOURCE"
log "Destination: $DEST"
log "Snapshot: $SNAPSHOT_DIR"
# Validate
[[ -d "$SOURCE" ]] || die "Source not found: $SOURCE"
mkdir -p "$DEST/snapshots"
acquire_lock
# rsync with hard-link reference to latest snapshot
local rsync_opts=(-av --delete --stats)
if [[ -e "$LATEST_LINK" ]]; then
rsync_opts+=(--link-dest="$LATEST_LINK")
fi
mkdir -p "$SNAPSHOT_DIR"
log "Running rsync..."
rsync "${rsync_opts[@]}" "$SOURCE/" "$SNAPSHOT_DIR/" 2>&1 | tee -a "$LOGFILE"
# Update the 'latest' symlink
ln -sfn "$SNAPSHOT_DIR" "$LATEST_LINK"
log "Latest symlink updated → $SNAPSHOT_DIR"
# Prune old snapshots
log "Pruning old snapshots (keeping $KEEP)..."
mapfile -t old_snapshots < <(
ls -dt "$DEST/snapshots"/2* 2>/dev/null | tail -n +"$((KEEP + 1))"
)
local pruned=0
for snap in "${old_snapshots[@]}"; do
log " Removing: $snap"
rm -rf "$snap"
(( pruned++ ))
done
log "Pruned $pruned old snapshot(s)"
# Report
local total_snaps; total_snaps=$(ls -d "$DEST/snapshots"/2* 2>/dev/null | wc -l)
local disk_used; disk_used=$(du -sh "$DEST/snapshots" 2>/dev/null | cut -f1)
log "=== Backup complete ==="
log "Snapshots: $total_snaps"
log "Disk used: $disk_used"
}
run_backupSetting Up as a Cron Job
# Run at 2 AM daily
crontab -e
# Add:
# 0 2 * * * /usr/local/bin/backup.sh /home/alice /backup/alice 30 >> /var/log/backup.log 2>&1
# Or to receive email on failure only:
MAILTO="admin@example.com"
0 2 * * * /usr/local/bin/backup.sh /home/alice /backup/alice 30 >/dev/nullHow Hard-Link Snapshots Work
rsync --link-dest
rsync --link-dest=PREV_DIR compares the source against PREV_DIR. Files that are identical (same size, modification time) are hard-linked from PREV_DIR into the new snapshot instead of copied. This means each snapshot looks like a full backup but only stores the differences on disk.
Extend the backup script to add:
- A
--excludeflag that accepts a comma-separated list of patterns to exclude from the backup (e.g.,*.tmp,node_modules,.git) - Convert the comma-separated list to rsync
--excludearguments - Add a
--dry-runmode that passes--dry-runto rsync and logs "DRY RUN MODE"