I am trying to parse a CSV containing potentially 100k+ lines. Here is the criteria I have:
I would like to retrieve all lines in the CSV that have the given value in the given index (delimited by commas).
Any ideas, taking in special consideration for performance?
First prototype using plain old grep
and cut
:
grep ${VALUE} inputfile.csv | cut -d, -f${INDEX}
If that's fast enough and gives the proper output, you're done. :)
As an alternative to cut
- or awk
-based one-liners, you could use the specialized csvtool
aka ocaml-csv
:
$ cat yourfile | csvtool -t ',' col "$index" - | grep "$value"
According to the docs, it handles escaping, quoting, etc.