2

The task:

I'm trying to get the attribute-value from xml tags with a shell script, split the value up and save them in a .csv-file.

This is how the xml looks like:

<host>
  <servers>
    <server name="Type1Name1-Port1" >...</server>
    <server name="Type2Name2-Port2" >...</server>
    <server name="Type3Name3-Port3" >...</server>
    ...
    <server name="TypexNamex-Portx" >...</server>
  </servers>
</host>

I'd like to get the values from the "name"-attribute and split them up like following:
Type;Name;Port

The output csv file I want should look like this:

Type1;Name1;Port1
Type2;Name2;Port2
Type3;Name3;Port3
...
Typex;Namex;Portx

The problem:

  • I can't install anything on the server
  • I can only use "ksh-awk" / "xmllint wihtout --xpath" / "standard linux commands"

I can use any shell-language I want to. I prefer bash and ksh.

My questions:

  • Do you think it is possible to solve my task?
  • What is the best approach for the sub-tasks? (reading, splitting, writing)

EDIT:

Example data of a server-name:

T-TTT_AAA-A-SSS-PPPP

Where T represents the Type, A the Applicationname, S the Server-Name, P the Port. The length of T, A and S are variable. P is constant.

4
  • Yes, it should be possible with a mix of xmllint --shell and awk or sed. What is the separator between your Type1 and Name1 ? Capitalization ? Commented Feb 10, 2017 at 14:17
  • Could you show an example using real data (or data that have a form similar than what you really have) ? This should be important for the awk part Commented Feb 10, 2017 at 14:31
  • @Aserre .. Done.. Commented Feb 10, 2017 at 15:01
  • So, if you have T-TTT_AAA-A-SSS-PPPP, the output should be T-TTT;AAA-A-SSS;PPPP, right ? Commented Feb 10, 2017 at 15:03

3 Answers 3

1

Here is what I came up with, using only common tools : xmllint and sed :

echo 'cat //host/servers/server/@name' | xmllint --shell data.xml | sed -n 's: name=\"\([A-Z][a-z0-9]*\)\([A-Z][a-z0-9]*\)-\(.*\)\":\1,\2,\3:p'

The sed part is done according to OP's examples at the moment of posting.

Breakdown:

  • echo 'cat //host/servers/server/@name' : we pass this command to xmllint. It will catch the name attribute of all the nodes inside <host><servers><server ...> ... </server></servers></hosts>
  • xmllint --shell data.xml : iterates through data.xml and executes the commands passed as argument in an interactive shell.
  • sed -n 's: name=\"\([A-Z][a-z0-9]*\)\([A-Z][a-z0-9]*\)-\(.*\)\":\1;\2;\3:p' : we process the output of xmllint to only keep the data we are interested
    • xmllint will produce the following output : name="Type1Name1-Port1"
    • We define 3 capture groups : a capital letter followed by any character except capital (for Type), another capital letter followed by any character except capital (for Name), and any character between the - and " character
    • We tell sed to only print the matched strings, separated by semicolumns

Output :

Type1;Name1;Port1
Type2;Name2;Port2
Type3;Name3;Port3
Typex;Namex;Portx

EDIT:

To fit the pattern you indicated in the comments, you'll just have to change the sed regex, for instance :

sed -n 's: name=\"\(.*\)_\(.*\)-\(.\{4\}\)\":\1,\2,\3:p'

This will match the format T-TTT_AAA-A-SSS-PPPP, with any length for the type and server name. Try to fiddle around the regex or ask another question in the regex tag if this is not exactly what you need.

Sign up to request clarification or add additional context in comments.

4 Comments

This worked for me. Thank you. But if there's another tag with the attribute "name", won't the ouput be falsified?
no, it will specifically look for the "name" attribute inside the <server> tag
But no part of the statement: sed -n 's: name=\"\(.*\)_\(.*\)-\(.\{4\}\)\":\1,\2,\3:p' says it has to be inside the <server> tag.
Yeah, but it doesn't matter, as it is executed after xmllint has extracted the data. So sed will execute the regex only after xmllint has extracted the name value from the <server> tags
1

Without xmllint you can parse input like

<host>
  <servers>
    <server name="Type1_Name1-Port1" >...</server>
    <server name="Type-2_Name2-Port2" >...</server>
    <server name="Type3_Name-3-Port3" >...</server>
  </servers>
</host>

with

sed -n '/<server name=/ s/[^"]*"\([^_]*\)_\([^"]*\)-\([^"]*\)".*/\1;\2;\3/p' inputfile

1 Comment

Works. See accepted answer for a smaller statement with exact same ouput.
0
xidel -e '//server/@name' f.xml |  sed ...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.