0

I'd like to count number of xml nodes in my xml file(grep or somehow).

....
<countryCode>GBR</countryCode>
<countryCode>USA</countryCode>
<countryCode>CAN</countryCode>
...
<countryCode>CAN</countryCode>
<someNode>USA</someNode>
<countryCode>CAN</countryCode>
<someNode>Otherone</someNode>
<countryCode>GBR</countryCode>
...

How to get count of individual countries like CAN = 3, USA = 1, GBR = 2? Without passing in the names of the countries there might be some more countries?

Update:

There are other nodes beside countrycode

1
  • Do you know that each line contains exactly one XML element? There are no lines with two elements? No elements that span multiple lines? Do you know that all equivalent country codes are on identical lines? Commented Mar 6, 2012 at 16:27

8 Answers 8

5

My simple suggestion would be to use sort and uniq -c

$ echo '<countryCode>GBR</countryCode>
<countryCode>USA</countryCode>
<countryCode>CAN</countryCode>
<countryCode>CAN</countryCode>
<countryCode>CAN</countryCode>
<countryCode>GBR</countryCode>' | sort | uniq -c
      3 <countryCode>CAN</countryCode>
      2 <countryCode>GBR</countryCode>
      1 <countryCode>USA</countryCode>

Where you'd pipe in the output of your grep instead of an echo. A more robust solution would be to use XPath. If youre XML file looks like

<countries>
  <countryCode>GBR</countryCode>
  <countryCode>USA</countryCode>
  <countryCode>CAN</countryCode>
  <countryCode>CAN</countryCode>
  <countryCode>CAN</countryCode>
  <countryCode>GBR</countryCode>
</countries>

Then you could use:

$ xpath -q -e '/countries/countryCode/text()'  countries.xml  | sort | uniq -c
      3 CAN
      2 GBR
      1 USA

I say it's more robust because using tools designed for parsing flat text will be inherently flaky for dealing with XML. Depending on the context of the original XML file, a different XPath query might work better, which would match them anywhere:

$ xpath -q -e '//countryCode/text()'  countries.xml  | sort | uniq -c
      3 CAN
      2 GBR
      1 USA
Sign up to request clarification or add additional context in comments.

Comments

2

grep can give a total count, but it doesn't do a per-pattern; for that you should use uniq -c:

$ uniq -c <(sort file)
  1 
  1  
  3 <countryCode>CAN</countryCode>
  2 <countryCode>GBR</countryCode>
  1 <countryCode>USA</countryCode>

If you want to get rid of the empty lines and tags, add sed:

$ sed -e '/^[[:space:]]*$/d' -e 's/<.*>\([A-Z]*\)<.*>/\1/g' test | sort | uniq -c
  3 CAN
  2 GBR
  1 USA

To delete lines that don't have a country code, add another command to sed:

$ sed -e '/countryCode/!d' -e '/^[[:space:]]*$/d' -e 's/<.*>\([A-Z]*\)<.*>/\1/g' test | sort | uniq -c
  3 CAN
  2 GBR
  1 USA

Comments

1

quick and dirty (only based on your example text):

awk -F'>|<' '{a[$3]++;}END{for(x in a)print x,a[x]}' file

test:

kent$  cat t.txt
<countryCode>GBR</countryCode>
<countryCode>USA</countryCode>
<countryCode>CAN</countryCode>
<countryCode>CAN</countryCode>
<countryCode>CAN</countryCode>
<countryCode>GBR</countryCode>

kent$  awk -F'>|<' '{a[$3]++;}END{for(x in a)print x,a[x]}' t.txt 
USA 1
GBR 2
CAN 3

Comments

1
sed -n "s/<countryCode>\(.*\)<\/countryCode>/\1/p"|sort|uniq -c

Comments

0
cat dummy | sort |cut -c14-16 | sort |tail -6 |awk  '{col[$1]++} END {for (i in col) print i, col[i]}'

Dummy is ur file name and replace 6 in -6 with n-2(n - no of lines in ur data file)

Comments

0

Something like this maybe:

grep -e 'regex' file.xml | sort | uniq -c

Of course you need to provide regex that matches your needs.

Comments

0

If your file is set up as you had shown to us, awk can do it like:

awk -F '<\/?countryCode>' '{ a[$2]++} END { for (e in a) { printf("%s\t%i\n",e,a[e]) }' INPUTFILE

If there are more than one <countryCode> tag on a line, you can still set up some pipe to make it into one line, e.g.:

sed 's/<countryCode>/\n<countryCode>/g' INPUTFILE | awk ...

Note if the <countryCode> spans to multiple lines, it does not work as expected.

Anyway, I'd recommend to use xpath for this kind of task (perl's xml::xpath module has a CLI utility for this.

Comments

0

Quick and simple:

grep countryCode ./file.xml | sort | uniq -c

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.