Saturday, March 24, 2012

LOTO (Lock-Out Tag-Out) - Linux Command Line Mutex

I've been doing quite a bit of shell programming in a variety of Linux environments lately, and have for various reasons needed some way to control access to specific files or resources.  For example, I like to have a very aggressive backup system between my computers, so I have a backup script that runs on my laptop every five minutes to rsync my entire /home directory back to my file server and save a snapshot.  Rsync is a very intelligent differential copy system, and the majority of the time my laptop is on my apartment LAN, so it usually finishes in a matter of seconds.

Unfortunately, usually is such a sticky word in computer science.  All it takes is downloading one video file from my camera, or trying to get online in a 300 person lecture hall, and suddenly my speedy rsync script gets hung up and can take several minutes.  If it takes more than five minutes, suddenly the next copy of the same script starts running and slows down the first.  Five more minutes, and a third starts.  This is somewhat analogous to the Sorcerer's Apprentice Syndrome bug in TFTP, in that each copy slows down all the previous copies to the point where nothing useful is being accomplished by anyone.  I wanted a way to easily check if the previous copy of rsync (or even any other network-intensive job) happens to be running, and if so give up.

Source code.

I decided to solve this by writing a program in C which uses the sys/file.h flock() system call to gain an exclusive lock on some magic resource file to signify "owning" the file.  I named this tool "loto," after the OSHA "Lock-Out Tag-Out" policies which are used around dangerous equipment to ensure everyone knows who's working on what (I thought it was a clever analogy).  By using the flock() call instead of trying to warrant access based on just the existence of the lock file, I prevent the possibility of stale locks from scripts which create the lock, start running, then either die or get killed before getting a chance to release the lock.  By tying the lock to an open file descriptor, it becomes the operating system's job to clean up after loto regardless of how it exits.

Now, every shell script I write which starts a large upload has loto get a lock on /var/lock/loto.net, and every computationally expensive script or simulation I spin off into the background locks /var/lock/loto.cpu so my computers don't go catatonic on me.  Further usage details can be gained from the README included in the source code repository:
$ git clone git://github.com/PhirePhly/loto.git

The basic usage is to call loto with a lock file and optional verbosity and wait behavior flags, and then a "--" and a single command to run and wait to finish:
$ loto -L /path/to/lockfile -- echo I have a lock on this situation
$ loto -L cpu -- echo You also get shorthand macros for cpu and net

To have loto wait for the lock instead of give up right away, add the -w flag for unlimited patience or -t #OfSeconds for a limited timeout wait on the lock.

The experienced Unix guru will ask why I didn't just use the flock command line tool.  The honest answer is half that I didn't manage to find it until I was already mostly done with loto.  The other half is that flock isn't part of any Busybox environment I've seen, so I found it easier to write my own little single-file tool than try and get all the build dependencies for flock.  Flock does allow for some fancy shell-fu which loto doesn't support, and it is likely already installed on your system, but I figured I'd release my code on the off chance someone wants a simple version of flock to tweak its behavior.

Depending on how you look at it, loto walks precariously close to being a direct copy of a tool I found online called lockrun, but I changed his style a bit and have started adding some features.  In any case, credit where credit is due.

One thing I am a little stuck on is that I have been trying to come up with any use case where you would want the ability to get non-exclusive locks on a resource.  Can anyone think of a reason to implement a -l (little L) flag to complement the -L exclusive lock?  Anything I come up with just seems contrived...

1 comment:

  1. I use the lockfile program (provided by the procmail package on RedHat derivatives) for this in shell scripts. Not as much fun as writing your own, but still useful. :-)

    ReplyDelete