Tuesday, June 23, 2009

SSH access control with GeoIP

All admins of our systems are in Japan, so it is reasonable to block SSH login attempts from other countries. (Disabling password login is the best to block bruteforce attacks, but in some cases it simply can't be done for some reason... Also, iptables -m recent does good job for that.)

On Debian-based systems it is possible to call external program from /etc/hosts.allow to allow/deny a client. I implemented a country filter based on GeoIP for SSH protection.

Firstly, we need to regularly update GeoIP database. Free database seems to be updated every month. I run this script by cron.daily to check if it is updated and download it if it is. (Database is saved to the directory where the script is.)

/etc/hosts.allow is something like this:
sshd: ALL: aclexec /usr/local/geoip/check %a
then in /etc/hosts.deny:
sshd: ALL
so that connections not explicitly allowed are denied.

Finally this is the script I call from aclexec. Filtering rules are written in function "rule". In this case clients from Japan and unknown are allowed to connect.
Both scripts use syslog for diagnostic output.

This setup can be used for other services such as FTP.

Thursday, June 4, 2009

Backup SQLite database with rdiff

We want to regularly take backup of running Web applications' SQLite databases (you can't just copy the file... what if the app modifies the database while copying is taking place?)

A cron job like
sqlite3 $DB .dump >$DUMP
just do the job, at least for small databases.
However, when the database grows upto some 100's of megabytes, the dump process takes more than 1 minute, while holding lock, resulting in timeout error in application.

If you are on LVM you can use snapshot to do a lock-free backup. Unfortunately we are on normal partition so locking database is inevitable.

After some research and experiments, I decided to use rdiff to make the locking period as short as possible.

rdiff is a tool to make and apply deltas of binary files. Instead of just comparing two files on the fly, rdiff works in two steps:
  1. compute signature of old file
  2. compute delta from signatgure and new file
To use this to backup an SQLite database,
  1. compute signature of previous backup
  2. lock the database
  3. compute delta from the signature and database
  4. unlock database
  5. applying delta to the previous backup makes a new backup
As you can see, reading the database is the only massive I/O which happens while holding the lock. Signature and delta are small. Thus, this method can be faster than others, provided I/O is the bottleneck.

In our case, rdiff delta only takes 4-5 seconds while cp'ing the whole database takes 25 seconds.

I wrote a Python script, sqlite-lock (attached at the end), to lock the SQLite database, and taking backup like below:
rdiff signature $LATEST $SIG
sqlite-lock rdiff delta $SIG $DB $DELTA
rdiff patch $DELTA $LATEST $NEW
ln -f $NEW $LATEST
Actually the database is at a datacenter and backup is on a local server, so rdiff delta is executed on remote side. Network traffic is minimum because only signature and delta are transmitted. As I understand this is basically what rsync does.

import sys, os
import sqlite3
except ImportError:
from pysqlite2 import dbapi2 as sqlite3

if len(sys.argv) < 3:
print >>sys.stderr, "Usage: %s DB CMDLINE..." % sys.argv[0]
print >>sys.stderr, "Executes CMDLINE while locking DB"

src = sys.argv[1]
cmdline = sys.argv[2:]

db = sqlite3.connect(src, timeout=60.0, isolation_level=None)
db.execute("BEGIN DEFERRED")
db.execute("SELECT COUNT(*) FROM sqlite_master") # This creates SHARED lock
retval = os.spawnvp(os.P_WAIT, cmdline[0], cmdline)
if retval != 0:
print >>sys.stderr, "Command returned %d" % retval