1

I'm trying to create a bash script to parse an xml file and save it to a csv file.

For example:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <List>
    <Job id="1" name="John/>
    <Job id="2" name="Zack"/>
    <Job id="3" name="Bob"/>
</List>

I would like the script to save information into a csv file as such:

John | 1
Zack | 2
Bob  | 3

The name and id will be in a different cell.

Is there any way I can do this?

2

4 Answers 4

5

You've posted a query similar to your pervious one. I'd again suggest using a XML parser. You could say:

xmlstarlet sel -t -m //List/Job -v @name -o "|" -v @id -n file.xml

It would return

John|1
Zack|2
Bob|3

for your sample data.

Pipe the output to sed: sed "s/|/\t| /" if you want it to appear as in your example.

Sign up to request clarification or add additional context in comments.

Comments

2

Try something like this

#!/bin/bash
while read -r line; do
  [[ $line =~ "name=\""(.*)"\"" ]] && name="${BASH_REMATCH[1]}" && [[ $line =~ "Job id=\""([^\"]+) ]] &&  echo "$name | ${BASH_REMATCH[1]}"
done < file 

The line with John is malformed. With it fixed, example output

John | 1
Zack | 2
Bob | 3

3 Comments

in this instance name="John/>, there is no double quota after John, so recommend to replace [[ $line =~ "name=\""(.*)"\"" ]] to [[ $line =~ "name=\""([^\"|/]*) ]]
@BMW Thanks. I assumed it shouldn't be malformed xml, but if it is could do that or something like ([A-Za-z]*)
dude, can u elaborate on that short script? I am quite confused. :) nevertheless its looking crazy good.
2

Extending xmlstarlet approach:

Given this xml file as input:

<DATA>
  <RECORD>
    <NAME>John</NAME>
    <SURNAME>Smith</SURNAME>
    <CONTACTS>
      "Smith" LTD,
      London, Mtg Str, 12,
      UK
    </CONTACTS>
  </RECORD>
</DATA>

And this script:

xmlstarlet sel -e utf-8 -t \
  -o "NAME, SURNAME, CONTACTS" -n \
  -m //DATA/RECORD \
  -o "\"" \
  -v $"str:replace(normalize-space(NAME), '\"', '\"\"')" -o "\",\"" \
  -v $"str:replace(normalize-space(SURNAME),      '\"', '\"\"')" -o "\",\"" \
  -v $"str:replace(normalize-space(CONTACTS), '\"', '\"\"')" -o "\",\"" \
  -o "\"" \
  -n file.xml

You'll have the following output:

NAME, SURNAME, CONTACTS
"John", "Smith", """Smith"" LTD, London, Mtg Str, 12, UK"

2 Comments

This is a good solution, and elegant. Just I got: compilation error: element with-param XSLT-with-param: Failed to compile select expression 'str:replace' because of unclosed parenthesis in normalize-space call; should read "str:replace(normalize-space(NAME) , '\"', '\"\"')"
Thanks for this. Anyone else extracting URLs from XML may find the &amp; isn't escaped. Fix this by adding -T after the sel command, e.g. xmlstarlet sel -T -e utf-8...... (see stackoverflow.com/questions/46255304/…)
1

Using sed

sed -nr 's/.*id=\"([0-9]*)\"[^\"]*\"(\w*).*/\2 | \1/p' file

Additional, base on BroSlow's cript, I merge the options.

#!/bin/bash

while read -r line; do
  [[ $line =~ id=\"([0-9]+).*name=\"([^\"|/]*) ]] && echo "${BASH_REMATCH[2]} | ${BASH_REMATCH[1]}"
done < file

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.