Bash shell scripting - csv parsing


Question

I am trying to parse a CSV containing potentially 100k+ lines. Here is the criteria I have:

  1. The index of the identifier
  2. The identifier value

I would like to retrieve all lines in the CSV that have the given value in the given index (delimited by commas).

Any ideas, taking in special consideration for performance?

1
33
10/13/2009 1:53:13 PM

Accepted Answer

First prototype using plain old grep and cut:

grep ${VALUE} inputfile.csv | cut -d, -f${INDEX}

If that's fast enough and gives the proper output, you're done. :)

26
10/14/2009 10:30:55 AM

As an alternative to cut- or awk-based one-liners, you could use the specialized csvtool aka ocaml-csv:

$ cat yourfile | csvtool -t ',' col "$index" - | grep "$value"

According to the docs, it handles escaping, quoting, etc.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon