Needed a script for work to recursively spider one of our sites to check for problems and build a content inventory. As a side project, produced a little shell script that just checks links and produces a report by HTTP response code.
The goal is to download pdf files from http://example.com/dir1/dir2/, where dir1 name is constant, but dir2 is a number between 100 and 499 (e.g. http://example.com/dir1/309/file258.pdf).
First, I’ll say that wget is a powerful tool and can place a burden on the site we’re grabbing data from (and probably get us banned as we’ll be perceived as carrying out a DOS attack). To avoid this, the example below will use a 5 second wait between downloads so as to not trouble the web server too much and keep the site usable for others.
Let’s say you run a front end system like a reverse proxy, load balancer or http accelerator (e.g. big-ip, squid, pound, varnish, apache, nginx, etc), which pass dynamic requests to back end application servers, then you’re likely to have to deal with your applications seeing client requests as coming from the IP(s) of your front end boxes.
This presents a slew of quality of life reducing issues, unless of course your back end system is Apache, in which case you can use mod_rpaf to make your life pleasant again.
Was trying to figure out how to insert line numbers into a huge file with sed or awk, then stumbled across “nl”, a sweet little baby that comes with GNU coreutils and numbering files is what it does for a living. GNU coreutils is full of gems!
Boring:
ak@gd:~$ cat file blah blah blah bleh blah
Awesome:
ak@gd:~$ cat file | nl -n ln -w1 -s\| 1|blah 2|blah blah 3|bleh blah
Got a csv file created with Open Office from a Microsoft Excel file. It contains characters which in less show up as “^K” (caret upper case K), which trip up the shell and vi. One way to get rid of them is to display them with “cat –show-nonprinting” and then replace with sed:
cat -v file | sed 's/\^K//g'
BASH script to enumerate titles and chapters of a dvd and rip the longest one into avi using mencoder. DVD device can be a path to mounted iso (e.g. /mnt/iso/) or a real device (e.g. /media/cdrom0/).
To rip all titles and chapters, try this.
I’ve always used telnet for SMTP testing. Finally, got sick of copy/pasting and decided to write a quick script. But scripting telnet is a pain. Netcat to the rescue!
Call the script with the following arguments:
$ ./smtp.netcat.test mx.example.com 25 from@example.com to@example.com
And here’s the script:
I’ve got a script that loops through a list of directory names and compresses them. There are lots and once in a while I want to check on the progress and find out which directory in the list it’s working on.
The name of the script is compress.sh, tar will run as a child of it, which I can locate using the parent PID:
me@my:~$ show.proc.sh compress.sh USER TT PID PPID %CPU %MEM VSZ STARTED TIME COMMAND me pts/19 1810 24357 0.0 0.0 4496 10:15:37 00:00:00 | \_ /bin/bash ./compress.sh me pts/19 1857 1810 0.5 0.0 3696 10:16:03 00:00:01 | | \_ tar cfj 833444.tar.bz2 833444
show.proc.sh:
1 2 3 4 5 6 7 | #!/bin/bash # get pid pid=$(pgrep $1) ps axfo user,tty,pid,ppid,pcpu,pmem,vsz,start,time,args | head -1 ps axfo user,tty,pid,ppid,pcpu,pmem,vsz,start,time,args | grep $pid | grep -v grep |
compress.sh:
1 2 3 | #!/bin/bash for d in $(cat dir.list.2010-05-11); do tar cfj $d.tar.bz2 $d && rm -rf $d; done |
Was looking for a way to purge Varnish cache when pages are updated or added. WordPress Varnish plugin looks exactly like what I’m after, except it threw up some errors out of the box.
I’m considering implementing Varnish at Stanford Law and have been playing with it here to understand implications and workout issues. One issue is purging pages from cache as they are modified or new pages are added. Varnish provides several methods to do this (varnishadm, telnet and http).