This is similar to my little shell one liner to verify tar archives. Exit status of 0 means the archive is good, anything else means there’s a problem. Most of the problem archives I encountered had a status of 2, but a few 3’s and 9’s as well — see “man unzip” for explanation of status codes. Using a while loop because some of the archive files names have spaces in them, which trips up the for loop:

find . -name "*.zip" | while read f; do unzip -t "$f" &> /dev/null; err="$?"; echo checking "$f"; echo $err "$f" >> zip-check.list; done

Read the rest of this entry…

The goal is to download pdf files from http://example.com/dir1/dir2/, where dir1 name is constant, but dir2 is a number between 100 and 499 (e.g. http://example.com/dir1/309/file258.pdf).

First, I’ll say that wget is a powerful tool and can place a burden on the site we’re grabbing data from (and probably get us banned as we’ll be perceived as carrying out a DOS attack). To avoid this, the example below will use a 5 second wait between downloads so as to not trouble the web server too much and keep the site usable for others.

Read the rest of this entry…

BASH script to enumerate titles and chapters of a dvd and rip the longest one into avi using mencoder. DVD device can be a path to mounted iso (e.g. /mnt/iso/) or a real device (e.g. /media/cdrom0/).

To rip all titles and chapters, try this.

Read the rest of this entry…

I’ve always used telnet for SMTP testing. Finally, got sick of copy/pasting and decided to write a quick script. But scripting telnet is a pain. Netcat to the rescue!

Call the script with the following arguments:

$ ./smtp.netcat.test mx.example.com 25 from@example.com to@example.com

And here’s the script:

Read the rest of this entry…

I’ve got a script that loops through a list of directory names and compresses them. There are lots and once in a while I want to check on the progress and find out which directory in the list it’s working on.

The name of the script is compress.sh, tar will run as a child of it, which I can locate using the parent PID:

me@my:~$ show.proc.sh compress.sh
USER     TT         PID  PPID %CPU %MEM    VSZ  STARTED     TIME COMMAND
me       pts/19    1810 24357  0.0  0.0   4496 10:15:37 00:00:00  |           \_ /bin/bash ./compress.sh
me       pts/19    1857  1810  0.5  0.0   3696 10:16:03 00:00:01  |           |   \_ tar cfj 833444.tar.bz2 833444

show.proc.sh:

1
2
3
4
5
6
7
#!/bin/bash
 
# get pid
pid=$(pgrep $1)
 
ps axfo user,tty,pid,ppid,pcpu,pmem,vsz,start,time,args | head -1
ps axfo user,tty,pid,ppid,pcpu,pmem,vsz,start,time,args | grep $pid | grep -v grep

compress.sh:

1
2
3
#!/bin/bash
 
for d in $(cat dir.list.2010-05-11); do tar cfj $d.tar.bz2 $d && rm -rf $d; done

BASH script to enumerate titles and chapters of a dvd and rip each into a separate avi using mencoder. DVD device can be a path to mounted iso (e.g. /mnt/iso/) or a real device (e.g. /media/cdrom0/).

To rip just the longest title, try this.

Read the rest of this entry…

Let’s say we’ve got a bunch of directories and we’d like to get counts of how many directories are how many levels deep.

Our directory structure:

$ find . -type d
.
./test1
./test1/test1.1
./test2
./test2/test2.1
./test2/test2.1/test2.2
./test3
./test3/test3.1
./test3/test3.1/test3.2
./test3/test3.1/test3.2/test3.3

There are a number of ways to do this, but I found awk to be most elegant as I wanted to be able to substract an arbitrary number from the result and awk can count and do arithmetic all by itself:

$ find . -type d | while read f; do awk -F'/' '{print NF -1}' | sort | uniq -c | sort -nr; done
      3 2
      3 1
      2 3
      1 4

Got thousands of tar archives being transferred from various machines to one repository. Found that some of the archives are bad, either because the network transfer didn’t complete or the archive process was interrupted on the machine that created the archive (due to insufficient disk space, etc). Whatever the cause, needed a method to verify the archive integrity.

Tar provides options for verify the archive integrity by comparing it with the file system. In this case, I needed the ability to verify the archive without access to the original file system.

The BASH one liner below will find all archives below current directory, loop through each one attempting to list its contents, discard the archive content list, capture and log the tar command’s exit status along with archive name and redirect both to a log. Exit status of 0 means the archive is good, 2 means it’s bad (haven’t seen a 1). Redirecting output to /dev/null while capturing exit status turned out trickier than I thought, but this seems to work well:

1
for f in $(find . -name "*tar.bz2"); do tar tfj $f &> /dev/null; err="$?"; echo $err $f >> tar-check.list; done

Read the rest of this entry…

The split command is one gem of a Giant Dork tool, but it expects to be given the number of lines to split a file along. It’s easy enough to fire up a calculator to do this, but sometimes a programmatic method is desirable. Here’s a bashism that’ll do it:

1
split -l $(echo $(( $(cat sourcefile | wc -l) / 3))) sourcefile sourcefile.

Read the rest of this entry…

I liked Jason’s solution of passing a file list to du via xargs to produce results sorted by size in human readable format, but wanted ability to limit file list by name and age. Combining the find command with du, worked out nicely:

find . -name "*tar.bz2" -mmin -120 -ls | sort -k7rn | awk '{print$NF}' | xargs du -sh
122M	./sls-monitor/830144.tar.bz2
98M	./sls-monitor/830156.tar.bz2
67M	./sls-off1/905895.tar.bz2
50M	./sls-off1/893748.tar.bz2
16M	./sls-off1/893759.tar.bz2
7.3M	./sls-off1/905897.tar.bz2
5.1M	./sls-off1/854850.tar.bz2
4.4M	./sls-monitor/804331.tar.bz2
3.8M	./sls-monitor/804333.tar.bz2
612K	./sls-off1/893755.tar.bz2
512K	./sls-off1/905898.tar.bz2