Easily find and replace text in all blog posts

Thanks to a comment left by John McLear, I discovered today that characters such as &, <, >, " in some of my older posts got converted to special HTML entities.

Fixing this straight in the DB is probably the easiest way to go...

First thing is to figure out which of the WordPress tables are for my blog:

CMD="mysql giantdorks --skip-column-names -Be"
TBS=$($CMD "show tables like 'wp%options'")
for TB in $TBS; do
  URL=$($CMD "select option_value from $TB where option_name='siteurl'")
  echo $TB is for $URL
done
wp_1_options is for http://giantdorks.org/
wp_2_options is for http://giantdorks.org/alain/
wp_3_options is for http://giantdorks.org/jason/

Dump my posts:

mysqldump giantdorks wp_2_posts > giantdorks.alain.posts.sql

Let's count how many special HTML entities are found in my posts:

grep -Eo '&[A-Za-z]+;' giantdorks.alain.posts.sql | sort | uniq -c
    539 &
    744 >
    362 <
    416 "

I'll use the following sed one liner to do the replacement for me:

sed 's/&/\&/g;s/>/>/g;s/</

Let's test it.

This post contained &lt; instead of < and several '&quot;' instead of double quotes:

grep -Eo "(.){15}fp = fsockopen(.){40}" giantdorks.alain.posts.sql | head -1
\n<?php\r\n$fp = fsockopen("127.0.0.1", "80", 

Let's pass to sed for replacement:

grep -Eo "(.){15}fp = fsockopen(.){40}" giantdorks.alain.posts.sql | head -1 | sed 's/&/\&/g;s/>/>/g;s/</
\n

This post contained &gt instead of > and &amp; instead of &:

grep -Eo "(.){45}Example: spam-stats-month Oct 2009" giantdorks.alain.posts.sql | head -1
  echo -e 1>&2 \"\\n Usage error..\\n Example: spam-stats-month Oct 2009

Let's pass to sed for replacement:

grep -Eo "(.){45}Example: spam-stats-month Oct 2009" giantdorks.alain.posts.sql | head -1 | sed 's/&/\&/g;s/>/>/g;s/</
  echo -e 1>&2 \"\\n Usage error..\\n Example: spam-stats-month Oct 2009

Looks good, let's do it:

1. Will make a backup of the sql dump in case something goes terribly wrong
2. Then do the replacement with sed
3. Load the table dump
4. Finally purge the varnish cache to see my changes:

cp giantdorks.alain.posts.sql giantdorks.alain.posts.sql.bak
sed -i 's/&/\&/g;s/>/>/g;s/</

Yay..

Leave a comment

NOTE: Enclose quotes in <blockquote></blockquote>. Enclose code in <pre lang="LANG"></pre> (where LANG is one of these).