Alain Kelder | Bash and awk to convert delimited data (csv, tsv, etc) to HTML tables

Bash and awk to convert delimited data (csv, tsv, etc) to HTML tables

15 Feb 2014 Awk, Bash Trackback

A shell wrapper script that uses awk to convert a delimited file (where delimiter can be any character) to HTML tables.

Example 1 -- simple comma delimited file

Simple comma separated file "test.csv" containing:

abc,efg,hij
klm,nop,qrs

Running the script with just input file name as the argument:

$ csv2htm.sh test.csv

Would produce:

Example 2 -- comma delimited with column labels

Again a comma separated file "test.csv", but with first and last rows containing column labels:

H1,H2,H3
abc,efg,hij
klm,nop,qrs
H1,H2,H3

Running the script with optional "--head" and "--foot" arguments, will surround fields from first and last lines of input file in "thead", "tfoot", and "th" HTML tags:

$ csv2htm.sh --head --foot test.csv

Result:

H1	H2	H3
abc	efg	hij
klm	nop	qrs
H1	H2	H3

Example 3 -- tab delimited with column labels

Input file can be delimited by characters other than comma -- tab, pipe, colon, whatever. Even multiple characters, such as double tabs, as in this case:

col1		col2		col3
abc		efg		hij
klm		nop		qrs

First line contains column labels (col1, col2, col3), so in addition to specifying double tab as the delimiter, we'll add the "--head" argument also:

$ csv2htm.sh -d '\t\t' --head test.tsv

Would produce:

col1	col2	col3
abc	efg	hij
klm	nop	qrs

And here's the script:

#!/bin/bash

usage()
{
cat < output

Script to produce HTML tables from delimited input. Delimiter can be specified
as an optional argument. If omitted, script defaults to comma.

Options:

  -d       Specify delimiter to look for, instead of comma.

  --head   Treat first line as header, enclosing in  and  tags.

  --foot   Treat last line as footer, enclosing in  and  tags. 

Examples:

  1. $(basename $0) input.csv

  Above will parse file 'input.csv' with comma as the field separator and
  output HTML tables to STDOUT.

  2. $(basename $0) -d '|' < input.psv > output.htm

  Above will parse file "input.psv", looking for the pipe character as the
  delimiter, then output results to "output.htm".

  3. $(basename $0) -d '\t' --head --foot < input.tsv > output.htm

  Above will parse file "input.tsv", looking for tab as the delimiter, then
  process first and last lines as header/footer (that contain data labels), then
  write output to "output.htm".

EOF
}

while true; do
  case "$1" in
    -d)
      shift
      d="$1"
      ;;
    --foot)
      foot="-v ftr=1"
      ;;
    --help)
      usage
      exit 0
      ;;
    --head)
      head="-v hdr=1"
      ;;
    -*)
      echo "ERROR: unknown option '$1'"
      echo "see '--help' for usage"
      exit 1
      ;;
    *)
      f=$1
      break
      ;;
  esac
  shift
done

if [ -z "$d" ]; then
  d=","
fi

if [ -z "$f" ]; then
  echo "ERROR: input file is required"
  echo "see '--help' for usage"
  exit 1
fi

if ! [ -f "$f" ]; then
  echo "ERROR: input file '$f' is not readable"
  exit 1
else
  data=$(sed '/^$/d' $f)
  last=$(wc -l <<< "$data")
fi

awk -F "$d" -v last=$last $head $foot '
  BEGIN {
    print "  "
  }       
  {
    gsub(//, "\\>")
    if(NR == 1 && hdr) {  
      printf "    \n"
    gsub(/&/, "\\>")    }
    if(NR == last && ftr) {  
      printf "    \n"
    }
    print "      "
    for(f = 1; f <= NF; f++)  {
      if((NR == 1 && hdr) || (NR == last && ftr)) {
        printf "        \n", $f
      }
      else printf "        \n", $f
    }     
    print "      "
    if(NR == 1 && hdr) {
      printf "    \n"
    }
    if(NR == last && ftr) {
      printf "    \n"
    }
  }       
  END {
    print "  
%s %s"
  }
' <<< "$data"

%s	%s

2 Comments

1. David Ward replies at 4th March 2014, 3:37 pm :

Thank you! Just what I was after.
2. Morten replies at 23rd August 2014, 6:59 am :

Thx, great script and just what I was looking for! 🙂

Alain Kelder is a Giant Dork

Bash and awk to convert delimited data (csv, tsv, etc) to HTML tables

Example 1 -- simple comma delimited file

Example 2 -- comma delimited with column labels

Example 3 -- tab delimited with column labels

2 Comments

Leave a comment