Programatically download all available Linux Journal Magazine issues
My work subscribed to the Linux Journal and I wanted to grab all the available issues to read at my leisure, but couldn't be bothered to manually download them all. Here's how to grab them all with "wget". Please note that you'll need a valid subscription to use this method. This is not a how to steal the mags. 😉
The Linux Journal folks give issues away for free after a couple of months and for just a $30/year subscription you not only get access to the current issues for a year, but to all the back issues as well, not to mention the "Linux Journal's System Administration Special Edition" bonus pdf. So please don't ask me to give them away.
Ok, so you purchased a subscription. Good for you. Here's how to grab them.
1. Go to www.linuxjournal.com/digital
2. Sign in with your account number (LJxxxxxxx) and your Zip code
3. Click on "Digital Downloads"
4. Save the source of the page as "dljdownload.html"
5. Then run the following to download all available issues:
for pdfcode in $(grep get-pdf.php dljdownload.html | cut -d\" -f2); do
pdfaddress=$(curl "$pdfcode" | grep action=spit2 | cut -d\" -f2 | sed 's/amp;//g')
pdfname=$(curl "$pdfcode" | grep action=spit2 | grep -o "dlj.*pdf")
wget "http://download.linuxjournal.com$pdfaddress" -O $pdfname
done
6. You should end up with a directory of pdf files (69 for me):
ls | head
dlj132.pdf
dlj133.pdf
dlj134.pdf
dlj135.pdf
dlj136.pdf
dlj137.pdf
dlj138.pdf
dlj139.pdf
dlj140.pdf
dlj141.pdf
7. Let's say you were particularly bored and wanted to use issue date and number in the file name:
for old in $(ls dlj*.pdf); do
num=$(echo $old | grep -Eo "[0-9]+")
date=$(pdftotext $old - | head -200 | grep -Eo "(JANUARY|FEBRUARY|MARCH|APRIL|MAY|JUNE|JULY|AUGUST|SEPTEMBER|OCTOBER|NOVEMBER|DECEMBER)\ [0-9]{4}" | head -1 | awk '{print$2"."$1'})
new=ISSUE.$num-$date.pdf
echo renaming $old to $new
mv $old $new
done
Which should produce:
ls | head
ISSUE.132-2005.APRIL.pdf
ISSUE.133-2005.MAY.pdf
ISSUE.134-2005.JUNE.pdf
ISSUE.135-2005.JULY.pdf
ISSUE.136-2005.AUGUST.pdf
ISSUE.137-2005.SEPTEMBER.pdf
ISSUE.138-2005.OCTOBER.pdf
ISSUE.139-2005.NOVEMBER.pdf
ISSUE.140-2005.DECEMBER.pdf
ISSUE.141-2006.JANUARY.pdf
13 Comments
1. Matt Dunlap replies at 10th September 2010, 4:35 pm :
I’ll trade you my subscription to “Seventeen” for your subscription of “Linux Journal”
2. Alain replies at 10th September 2010, 4:47 pm :
Hah, I would of course, except I already subscribe to all the teen magazines.
3. anyone replies at 2nd June 2011, 12:22 pm :
Thanks alot, made my day 🙂
4. abeltje replies at 24th June 2011, 12:21 am :
Thanks a lot for this explanation
great work 🙂
5. Hamradio replies at 20th August 2011, 4:09 am :
Your scripts work fine, thanks a lot Alain!
6. Josiah Ritchie replies at 23rd August 2011, 7:13 pm :
Excellent, just saved me a bunch of time.
7. Josiah Ritchie replies at 23rd August 2011, 7:23 pm :
For some reason, the script didn’t work when put in the background. Any idea why?
8. Jan van Haarst replies at 1st March 2012, 1:02 pm :
I have changed the script so that it works with the current state of the website, and let curl get the name of the file from the response of the remote website, which saves one curl request.
9. Jon replies at 7th April 2012, 4:13 am :
Thanks to Jan van Haarst – this now works, although the OP’s solution no longer does. Thanks to both, though, as this is the *only* page where anyone has attempted to solve this problem. You guys rock 🙂
10. Bjarte replies at 27th July 2012, 2:40 am :
Hi,
If you get tired of the first four steps, try curl!
11. Jeronimo replies at 28th March 2013, 8:32 pm :
good script, i take the Bjarte script (curl version) and append this to the end in case anyone want all the files mobi, pdf, epub, curl is cool.
12. Mattcen replies at 11th April 2013, 2:02 pm :
Hi all,
I’ve taken all the ideas from the OP and previous comments, added a few of my own improvements, and ended up with a single script to do everything:
Hope this helps.
--
Mattcen
13. Antonio replies at 8th April 2014, 1:59 pm :
Hello guys,
I had the need to do a clever download of my favourite magazine.
Basically it’s adding just few enhancement to the previous ones but I had to satifsy some different needs such looking at the issues before downloading them, do a selective download (either filtered by month, by type or by content) and adding a bit of multithreaded support in order to reduce the time spent for the download.
In case you’re interested into trying it out, please feel free to clone and collaborate to my project, you can find it at the following link on bitbucket
https://bitbucket.org/tonyqui/linux-journal-downloader
Hoping it would help you,
Cheers,
Antonio
Leave a comment