Bash and awk to convert delimited data (csv, tsv, etc) to HTML tables

A shell wrapper script that uses awk to convert a delimited file (where delimiter can be any character) to HTML tables.

Example 1 -- simple comma delimited file

Simple comma separated file "test.csv" containing:

abc,efg,hij
klm,nop,qrs

Running the script with just input file name as the argument:

$ csv2htm.sh test.csv

Would produce:

  <table>
      <tr>
        <td>abc</td>
        <td>efg</td>
        <td>hij</td>
      </tr>
      <tr>
        <td>klm</td>
        <td>nop</td>
        <td>qrs</td>
      </tr>
  </table>

Example 2 -- comma delimited with column labels

Again a comma separated file "test.csv", but with first and last rows containing column labels:

H1,H2,H3
abc,efg,hij
klm,nop,qrs
H1,H2,H3

Running the script with optional "--head" and "--foot" arguments, will surround fields from first and last lines of input file in "thead", "tfoot", and "th" HTML tags:

$ csv2htm.sh --head --foot test.csv

Result:

  <table>
    <thead>
      <tr>
        <th>H1</th>
        <th>H2</th>
        <th>H3</th>
      </tr>
    </thead>
      <tr>
        <td>abc</td>
        <td>efg</td>
        <td>hij</td>
      </tr>
      <tr>
        <td>klm</td>
        <td>nop</td>
        <td>qrs</td>
      </tr>
    <tfoot>
      <tr>
        <th>H1</th>
        <th>H2</th>
        <th>H3</th>
      </tr>
    </tfoot>
  </table>

Example 3 -- tab delimited with column labels

Input file can be delimited by characters other than comma -- tab, pipe, colon, whatever. Even multiple characters, such as double tabs, as in this case:

col1		col2		col3
abc		efg		hij
klm		nop		qrs

First line contains column labels (col1, col2, col3), so in addition to specifying double tab as the delimiter, we'll add the "--head" argument also:

$ csv2htm.sh -d '\t\t' --head test.tsv

Would produce:

  <table>
    <thead>
      <tr>
        <th>col1</th>
        <th>col2</th>
        <th>col3</th>
      </tr>
    </thead>
      <tr>
        <td>abc</td>
        <td>efg</td>
        <td>hij</td>
      </tr>
      <tr>
        <td>klm</td>
        <td>nop</td>
        <td>qrs</td>
      </tr>
  </table>

And here's the script:

#!/bin/bash
 
usage()
{
cat <<EOF
 
Usage: $(basename $0) [OPTIONS] input > output
 
Script to produce HTML tables from delimited input. Delimiter can be specified
as an optional argument. If omitted, script defaults to comma.
 
Options:
 
  -d       Specify delimiter to look for, instead of comma.
 
  --head   Treat first line as header, enclosing in <thead> and <th> tags.
 
  --foot   Treat last line as footer, enclosing in <tfoot> and <th> tags. 
 
Examples:
 
  1. $(basename $0) input.csv
 
  Above will parse file 'input.csv' with comma as the field separator and
  output HTML tables to STDOUT.
 
  2. $(basename $0) -d '|' < input.psv > output.htm
 
  Above will parse file "input.psv", looking for the pipe character as the
  delimiter, then output results to "output.htm".
 
  3. $(basename $0) -d '\t' --head --foot < input.tsv > output.htm
 
  Above will parse file "input.tsv", looking for tab as the delimiter, then
  process first and last lines as header/footer (that contain data labels), then
  write output to "output.htm".
 
EOF
}
 
while true; do
  case "$1" in
    -d)
      shift
      d="$1"
      ;;
    --foot)
      foot="-v ftr=1"
      ;;
    --help)
      usage
      exit 0
      ;;
    --head)
      head="-v hdr=1"
      ;;
    -*)
      echo "ERROR: unknown option '$1'"
      echo "see '--help' for usage"
      exit 1
      ;;
    *)
      f=$1
      break
      ;;
  esac
  shift
done
 
if [ -z "$d" ]; then
  d=","
fi
 
if [ -z "$f" ]; then
  echo "ERROR: input file is required"
  echo "see '--help' for usage"
  exit 1
fi
 
if ! [ -f "$f" ]; then
  echo "ERROR: input file '$f' is not readable"
  exit 1
else
  data=$(sed '/^$/d' $f)
  last=$(wc -l <<< "$data")
fi
 
awk -F "$d" -v last=$last $head $foot '
  BEGIN {
    print "  <table>"
  }       
  {
    gsub(/</, "\\&lt;")
    gsub(/>/, "\\&gt;")
    if(NR == 1 && hdr) {  
      printf "    <thead>\n"
    gsub(/&/, "\\&gt;")    }
    if(NR == last && ftr) {  
      printf "    <tfoot>\n"
    }
    print "      <tr>"
    for(f = 1; f <= NF; f++)  {
      if((NR == 1 && hdr) || (NR == last && ftr)) {
        printf "        <th>%s</th>\n", $f
      }
      else printf "        <td>%s</td>\n", $f
    }     
    print "      </tr>"
    if(NR == 1 && hdr) {
      printf "    </thead>\n"
    }
    if(NR == last && ftr) {
      printf "    </tfoot>\n"
    }
  }       
  END {
    print "  </table>"
  }
' <<< "$data"

2 Comments

  • 1. David Ward replies at 4th March 2014, 3:37 pm :

    Thank you! Just what I was after.

  • 2. Morten replies at 23rd August 2014, 6:59 am :

    Thx, great script and just what I was looking for! :)

Leave a comment

NOTE: Enclose quotes in <blockquote></blockquote>. Enclose code in <pre lang="LANG"></pre> (where LANG is one of these).