Shell script to export data from MySQL tables, modify, then import into another set of tables

For a research project I'm working on, I have a data set with about 12.5 million records, continuously being added to. I'm using a combination of horizontal and vertical database partitioning to split the data among multiple tables to achieve reasonably quick query performance that should hold as the data set grows. I also need to track additional information for each item, which I'm doing in separate tables. For this, needed to take data from two columns of each row from each of the source tables, modify and then import in the destination tables.

Specific steps were:

  • Export (dump) columns filename and downloaded from each source table (filings_%)
  • Edit data from the downloaded column, replacing no with nothing and replacing yes with done
  • Import (load) modified data into filename and status columns into corresponding destination table (downloads_%)

The following script processed 12.5 million rows across 80 tables in under 2 min:

#!/bin/bash
 
# Script to export all tables matching a pattern from a MySQL DB using 
# the SELECT INTO OUTFILE method, then modify contents of the export 
# file, then load processed data into another set of tables. Requires 
# superuser privileges to modify and remove tempoary export files in 
# the MySQL data dir.

log=$(echo $(basename $0) | sed 's/.sh/.log/')
D="sec" # database
T="filings_%" # source table pattern
myopts="--skip-column-names -Be" # add username/password if not using ~/.my.cnf
 
err_exit()
{
 echo -e 1>&2
 exit 1
}
 
mysql_data_dir=$(mysql $myopts "show variables like 'datadir'" | awk '{sub(/\/$/,"");print$NF}')
 
if ! (mysqlshow $D 1>/dev/null); then
  echo ERROR: unable to access database
  exit 1
fi
 
if ! (sudo touch $mysql_data_dir/$D/test 2>/dev/null); then
  echo ERROR: this script will need sudo access to
  echo move exported files out of mysql data dir. 
  echo Come back when you get sudo, son.
  exit 1
else
  sudo rm $mysql_data_dir/$D/test
fi
 
ExportTable()
{
 mysql $D $myopts "
  SELECT filename,downloaded 
  INTO OUTFILE '${srcT}.psv'
  FIELDS TERMINATED BY '|'
  LINES TERMINATED BY '\n'
  FROM $srcT
 "
}

ImportTable()
{
 mysql $D $myopts "
  DELETE FROM $dstT;
  ALTER TABLE $dstT AUTO_INCREMENT = 1;
  LOAD DATA INFILE '${srcT}.psv'
  INTO TABLE ${dstT}
  FIELDS TERMINATED BY '|' 
  LINES TERMINATED BY '\n'
  (filename,status);
 "
}
 
ProcessExport()
{
 sudo sed -i '
   s/no//
   s/yes/done/
 ' $mysql_data_dir/$D/${srcT}.psv
}

for srcT in $(mysql $D $myopts "SHOW TABLES LIKE '$T'"); do 
  echo -e "\n$(date +%F\ %T.%N) INFO: exporting from $srcT" | tee -a $log
  ExportTable || err_exit
  echo "$(date +%F\ %T.%N) INFO: processing export file" | tee -a $log
  ProcessExport || err_exit
  dstT=$(echo $srcT | sed 's/filings/downloads/')
  echo "$(date +%F\ %T.%N) INFO: loading into destination table" | tee -a $log
  ImportTable || err_exit
  echo "$(date +%F\ %T.%N) INFO: removing export file" | tee -a $log
  sudo rm $mysql_data_dir/$D/${srcT}.psv || err_exit
done

Leave a comment

NOTE: Enclose quotes in <blockquote></blockquote>. Enclose code in <pre lang="LANG"></pre> (where LANG is one of these).