Got thousands of tar archives being transferred from various machines to one repository. Found that some of the archives are bad, either because the network transfer didn’t complete or the archive process was interrupted on the machine that created the archive (due to insufficient disk space, etc). Whatever the cause, needed a method to verify the archive integrity.

Tar provides options for verify the archive integrity by comparing it with the file system. In this case, I needed the ability to verify the archive without access to the original file system.

The BASH one liner below will find all archives below current directory, loop through each one attempting to list its contents, discard the archive content list, capture and log the tar command’s exit status along with archive name and redirect both to a log. Exit status of 0 means the archive is good, 2 means it’s bad (haven’t seen a 1). Redirecting output to /dev/null while capturing exit status turned out trickier than I thought, but this seems to work well:

1
for f in $(find . -name "*tar.bz2"); do tar tfj $f &> /dev/null; err="$?"; echo $err $f >> tar-check.list; done

Read the rest of this entry…

The split command is one gem of a Giant Dork tool, but it expects to be given the number of lines to split a file along. It’s easy enough to fire up a calculator to do this, but sometimes a programmatic method is desirable. Here’s a bashism that’ll do it:

1
split -l $(echo $(( $(cat sourcefile | wc -l) / 3))) sourcefile sourcefile.

Read the rest of this entry…