find field pattern in a csv file then count unique field values

Question

I am using Cygwin (bash) to create a script to find, group and count fields in multiple CSV files. Each row will have comma-separated fields with each field following a similar convention. There is a numeric value, then an equal sign (=), then an alphanumeric value. The "(number)=" may or may not be present in a row and, if present, the field position may vary, but appear only once in the row. In addition, the value after the equal sign will vary in length.

An example of my objective will be best. CSV File:

35=D,11=ABCD1,1=ABC,55=XYZ,38=100,40=P,18=M,54=1,59=0,10=111
35=D,11=ABCD2,1=ABC,55=XYZ,38=200,40=P,18=M,54=1,44=10.00,59=0,10=133
35=D,11=ABCD3,1=ABC,55=XYZ,38=300,40=P,18=M B,54=1,44=10.00,59=0,110=200,10=113
35=D,11=ABCD4,1=ABC,55=XYZ,38=400,40=P,18=M B F,54=1,44=10.00,59=0,110=300,10=144
35=D,11=ABCD5,1=ABC,55=ZYX,38=300,40=2,54=1,44=10.00,59=3,10=132
35=D,11=ABCD6,1=ABC,55=QQQ,38=100,40=1,18=C,54=2,59=3,10=131

The "18=" field values are space-separated. I would like to have a script or one-liner that would identify each unique "18=" value and then count the appearance of each. The output using the above file would be (sort is optional):

18=M 2
18=M B 1
18=M B F 1
18=C 1

As mentioned, this script should read a number of files with records in this format. I have tried different grep combinations and dabbled with awk, but I am less familiar with its proper implementation.

The first two answers do work (thanks a lot!). Would it be possible to expand to aggregate the "38=" values grouped by the unique "18=" count results?

You should not change or expand your question with additional constraints as it invalidates the existing answers. Put up a new question if commenting to the answerers is helping you out. Questions are not the place for feedback. You read the help-tour: "no distractions, no chit-chat" — Anthon
– Anthon, Commented Feb 11, 2015 at 7:08

Joseph R. · Accepted Answer · 2014-11-10 19:26:57Z

This is probably best done in Perl with a hash structure:

perl -nle '($x)=/(18=[^,]+)/;$y{$x}++; END{print "$_ $y{$_}" for keys %y}' files

Explanation

For each line, Perl looks for 18= followed by as many non-comma characters as possible; whatever it finds, it stores in the variable $x. This variable is then used as a key to the hash %y, whose associated value is incremented for each key $x found.

At the very END (i.e., after all lines have been processed), we print the keys (18=... fields) and associated values (number of occurrences) of the hash variable %y.

Costas · Accepted Answer · 2014-11-10 19:49:48Z

1

Do you try the combination?

grep -ho "18=[^,]*" list_of_files | sort | uniq -c

answered Nov 10, 2014 at 19:49

Costas

15k24 silver badges38 bronze badges

Add a comment |

Stack Exchange Network

find field pattern in a csv file then count unique field values

2 Answers 2

You must log in to answer this question.

Hot Network Questions

find field pattern in a csv file then count unique field values

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions