Shell one liner to rename files based on content
The shell one line script below will:
1. Find all files below current dir
2. Rename the file with download Source and date range as part of the new file name
3. Renamed file will stay at current location (e.g. file "./path/to/dir/crappyname" becomes "./path/to/dir/DC.Circuit-2006.06.02-2006.07.01.txt")
Note: For it to work as expected, it should find a line that starts with "Source:" and use first two words (e.g. "Source:DC Circuit - US Court of Appeals, District & Bankruptcy Cases, ..." becomes "DC.Circuit"). It should also find a line that starts with "SearchTerms:" out of which it'll extract the date range (e.g. "SearchTerms: date(geq (06/02/2006) and leq (07/01/2006))" becomes "2006.06.02-2006.07.01).
Do a dry run and examine rename.test for errors. This works to verify the quality of download as well, as it should reveal date ranges of downloaded files and their source.
$ for f in $(find . -type f); do dir=`dirname $f` && source=`head -20 $f | grep "^Source:" | awk -F: '{print$2}' | awk '{print$1"."$2}' | sed 's/ //g'` && dates=`head -20 $f | grep "^SearchTerms:" | grep -E -o "[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{2,4}" | sed 's_/_._g' | tr "\n" "-" | sed 's/\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)/\2.\1-\5.\4txt/' | sed 's/\.-/-/'` && echo would rename $f to $dir/$source-$dates && echo would rename $f to dir/$source-$dates >> rename.test; done
Examine output:
$ cat rename.test | sort -dk4 | less
Do the rename:
$ for f in $(find . -type f); do dir=`dirname $f` && source=`head -20 $f | grep "^Source:" | awk -F: '{print$2}' | awk '{print$1"."$2}' | sed 's/ //g'` && dates=`head -20 $f | grep "^SearchTerms:" | grep -E -o "[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{2,4}" | sed 's_/_._g' | tr "\n" "-" | sed 's/\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)/\2.\1-\5.\4txt/' | sed 's/\.-/-/'` && echo renaming $f to $dir/$source-$dates && mv $f $dir/$source-$dates; done
Test our SED by enclosing matches (back references) in parenthesis:
$ echo 01.01.1910-12.31.1914- | sed 's/\([0-9]\{2\}.[0-9]\{2\}.\)\([0-9]\{4\}\)\(-\)\([0-9]\{2\}.[0-9]\{2\}.\)\([0-9]\{4\}\)\(-\)/(\1)(\2)(\3)(\4)(\5)(\6)/'
(01.01.)(1910)(-)(12.31. (1914)(-)
$ echo 01.01.1910-12.31.1914- | sed 's/\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)\([0-9]\{1,2\}.[0-9]\{1,2\}.\)\([0-9]\{4\}\)\(-\)/\2.\1-\5.\4txt/' | sed 's/\.-/-/'
1910.01.01-1914.12.31.txt
Leave a comment