Wednesday, October 27, 2010

Moving Lots of Files in Linux

Sorry about the whole missing in action thing as of late.  Taking four lab classes this quarter means I've been generating ~20-30 pages of lab reports per week, which leaves less time for writing hobbyist blog posts than I've like.

I recently needed to upgrade the hard drives in my main file server, KWF2.  It has been limping along with a huge LVM of every 300GB IDE drive I could get my hands on, and I knew that it was just a matter of time before one of them failed and the whole file system disappeared as a digital convenience. 2TB SATA drives have fallen to a price that I can afford, so I decided to migrate the entire server onto that.

The problem is that I have to move 600GB of files from one set of hard drives to the new one, but KWF2 is the only system with enough hard drive slots to accommodate the old disk stack, and is the only system with SATA ports.  On top of that, the rest of my desktops are all so painfully slow that SFTP or Samba fall far short of saturating the network, slowing an already painfully slow process.

I needed a solution that would move files over the network, with as little processing overhead as possible, while preserving permissions and ownership while being stored on the second system.  That's when I managed to come up with the following netcat and tar magic:

root@KWF2# tar -c /home/ | nc -l 1234
anyuser@olddesktop# nc KWF2 1234 > homeback.tar

The first line creates a tar file of the entire home directory, and instead of writing it to another file, pipes it into netcat, which waits for a connection on port 1234 before piping the tar output across the network.
The second line on the other desktop connects to the server and dumps the tar feed onto the disk.

Note that you could use compression in the original tar command, but since most of my files are compressed media already (AVI, MP3, etc), compression would only slow down the already slow process.

After replacing the drives, to restore the entire home file system
root@KWF2# nc -l 1234 | tar x
anyuser@olddesktop# nc KWF2 1234 < homeback.tar
Which simply performs the entire process in reverse.

There is commands for tar to preserve permissions and ownership, but when you run tar as root, it automatically does this.

This is a good example of the cleverness you can do with the simple tools in UNIX which all do one thing and do it well.  I was able to maintain a 200Mbps transfer rate through my gigabit ethernet, with neither CPU pegged (something else must be the bottleneck.  PCI?), which meant I was able to replace the hard drives after only 3 days of both computers running, instead of the good week and a half I was looking forward to using SFTP or Samba.

I really need to get myself a Drobo FS or something...