A cron job like
just do the job, at least for small databases.sqlite3 $DB .dump >$DUMP
However, when the database grows upto some 100's of megabytes, the dump process takes more than 1 minute, while holding lock, resulting in timeout error in application.
If you are on LVM you can use snapshot to do a lock-free backup. Unfortunately we are on normal partition so locking database is inevitable.
After some research and experiments, I decided to use rdiff to make the locking period as short as possible.
rdiff is a tool to make and apply deltas of binary files. Instead of just comparing two files on the fly, rdiff works in two steps:
- compute signature of old file
- compute delta from signatgure and new file
- compute signature of previous backup
- lock the database
- compute delta from the signature and database
- unlock database
- applying delta to the previous backup makes a new backup
In our case, rdiff delta only takes 4-5 seconds while cp'ing the whole database takes 25 seconds.
I wrote a Python script, sqlite-lock (attached at the end), to lock the SQLite database, and taking backup like below:
Actually the database is at a datacenter and backup is on a local server, so rdiff delta is executed on remote side. Network traffic is minimum because only signature and delta are transmitted. As I understand this is basically what rsync does.rdiff signature $LATEST $SIG
sqlite-lock rdiff delta $SIG $DB $DELTA
rdiff patch $DELTA $LATEST $NEW
ln -f $NEW $LATEST
import sys, os
from pysqlite2 import dbapi2 as sqlite3
if len(sys.argv) < 3:
print >>sys.stderr, "Usage: %s DB CMDLINE..." % sys.argv
print >>sys.stderr, "Executes CMDLINE while locking DB"
src = sys.argv
cmdline = sys.argv[2:]
db = sqlite3.connect(src, timeout=60.0, isolation_level=None)
db.execute("SELECT COUNT(*) FROM sqlite_master") # This creates SHARED lock
retval = os.spawnvp(os.P_WAIT, cmdline, cmdline)
if retval != 0:
print >>sys.stderr, "Command returned %d" % retval