Shell script to verify tar archives

Got thousands of tar archives being transferred from various machines to one repository. Found that some of the archives are bad, either because the network transfer didn't complete or the archive process was interrupted on the machine that created the archive (due to insufficient disk space, etc). Whatever the cause, needed a method to verify the archive integrity.

Tar provides options for verify the archive integrity by comparing it with the file system. In this case, I needed the ability to verify the archive without access to the original file system.

The BASH one liner below will find all archives below current directory, loop through each one attempting to list its contents, discard the archive content list, capture and log the tar command's exit status along with archive name and redirect both to a log. Exit status of 0 means the archive is good, 2 means it's bad (haven't seen a 1). Redirecting output to /dev/null while capturing exit status turned out trickier than I thought, but this seems to work well:

for f in $(find . -name "*tar.bz2"); do tar tfj $f &> /dev/null; err="$?"; echo $err $f >> tar-check.list; done

Output:

cat tar-check.list
2 ./755806.tar.bz2
2 ./708955.tar.bz2
0 ./313854.tar.bz2
0 ./313857.tar.bz2

BTW, the specific error I've gotten, which the above method will identify, is:

bzip2: Compressed file ends unexpectedly;
        perhaps it is corrupted?  *Possible* reason follows.
bzip2: Inappropriate ioctl for device
        Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

3 Comments

  • 1. Alain Kelder is a Giant D&hellip replies at 21st June 2010, 2:40 pm :

    […] is similar to my little shell script to verify tar archives. Exit status of 0 means the archive is good, 2 means it’s bad (haven’t seen a 1). Using a while […]

  • 2. Chris M replies at 6th February 2012, 4:50 am :

    Many Thanks, I had exactly the same problem. I’ve had to copy nearly 1000 tar files, and needed to find out which ones had been corrupted.

  • 3. annon replies at 19th February 2013, 10:29 am :

    I think you probably want “*tar\.bz2”

Leave a comment

NOTE: Enclose quotes in <blockquote></blockquote>. Enclose code in <pre lang="LANG"></pre> (where LANG is one of these).