Shell one liner to rename files based on content

The shell one line script below will:

1. Find all files below current dir
2. Rename the file with download Source and date range as part of the new file name
3. Renamed file will stay at current location (e.g. file "./path/to/dir/crappyname" becomes "./path/to/dir/DC.Circuit-2006.06.02-2006.07.01.txt")

Note: For it to work as expected, it should find a line that starts with "Source:" and use first two words (e.g. "Source:DC Circuit - US Court of Appeals, District & Bankruptcy Cases, ..." becomes "DC.Circuit"). It should also find a line that starts with "SearchTerms:" out of which it'll extract the date range (e.g. "SearchTerms: date(geq (06/02/2006) and leq (07/01/2006))" becomes "2006.06.02-2006.07.01).

Do a dry run and examine rename.test for errors. This works to verify the quality of download as well, as it should reveal date ranges of downloaded files and their source.

$ for f in $(find . -type f); do dir=`dirname $f` && source=`head -20 $f | grep "^Source:" | awk -F: '{print$2}' | awk '{print$1"."$2}' | sed 's/ //g'` && dates=`head -20 $f | grep "^SearchTerms:" | grep -E -o "[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{2,4}" | sed 's_/_._g' | tr "\n" "-" | sed 's/\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)/\2.\1-\5.\4txt/' | sed 's/\.-/-/'` && echo would rename $f to $dir/$source-$dates && echo would rename $f to dir/$source-$dates >> rename.test; done

Examine output:

$ cat rename.test | sort -dk4 | less

Do the rename:

$ for f in $(find . -type f); do dir=`dirname $f` && source=`head -20 $f | grep "^Source:" | awk -F: '{print$2}' | awk '{print$1"."$2}' | sed 's/ //g'` && dates=`head -20 $f | grep "^SearchTerms:" | grep -E -o "[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{2,4}" | sed 's_/_._g' | tr "\n" "-" | sed 's/\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)/\2.\1-\5.\4txt/' | sed 's/\.-/-/'` && echo renaming $f to $dir/$source-$dates && mv $f $dir/$source-$dates; done

Test our SED by enclosing matches (back references) in parenthesis:

$ echo 01.01.1910-12.31.1914- | sed 's/\([0-9]\{2\}.[0-9]\{2\}.\)\([0-9]\{4\}\)\(-\)\([0-9]\{2\}.[0-9]\{2\}.\)\([0-9]\{4\}\)\(-\)/(\1)(\2)(\3)(\4)(\5)(\6)/'
(01.01.)(1910)(-)(12.31. (1914)(-)

$ echo 01.01.1910-12.31.1914- | sed 's/\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)/\2.\1-\5.\4txt/' | sed 's/\.-/-/'
1910.01.01-1914.12.31.txt

Leave a comment

NOTE: Enclose quotes in <blockquote></blockquote>. Enclose code in <pre lang="LANG"></pre> (where LANG is one of these).