Responsibly download lots of files with wget

The goal is to download pdf files from, where dir1 name is constant, but dir2 is a number between 100 and 499 (e.g.

First, I'll say that wget is a powerful tool and can place a burden on the site we're grabbing data from (and probably get us banned as we'll be perceived as carrying out a DOS attack). To avoid this, the example below will use a 5 second wait between downloads and limit download bandwidth to 1 MB/s so as to not trouble the web server too much and keep the site usable for others.

for dir2 in $(echo seq 100 499); do wget -A pdf -w 5 --random-wait --limit-rate=1m --retr-symlinks -k -r -U "Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/2010040118 Debian Lenny Firefox/3.0.19"$dir2/; done

Wget created a dir structure I don't want, e.g.:


Rename each file using parent directory name as part of the file name, e.g.:

for f in $(find . -name "*.pdf"); do new=$(echo $f | sed 's_^\./__;s_/_\._g') && echo renaming $f to $new && mv $f $new; done

Then place all in a dir named

for f in $(find . -name "*.pdf"); do echo moving $f && mv $f; done

Leave a comment

NOTE: Enclose quotes in <blockquote></blockquote>. Enclose code in <pre lang="LANG"></pre> (where LANG is one of these).