1


I need to parse a given XML file for specific content. Unfortunately I only have xmllint WITHOUT xpath on my system (and I'm not allowed to install / update any other sources). The XML would contain:

<?xml version="1.0"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
  <SOAP-ENV:Body>
    <CreateIncidentResponse xmlns="http://schemas.hp.com/SM/7" xmlns:cmn="http://schemas.hp.com/SM/7/Common" xmlns:xmime="http://www.w3.org/2005/05/xmlmime" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" message="Success" returnCode="0" schemaRevisionDate="2016-02-16" schemaRevisionLevel="2" status="SUCCESS" xsi:schemaLocation="http://schemas.hp.com/SM/7 /Incident.xsd">
      <model>
        <keys>
          <IncidentID type="String">IM0832268</IncidentID>
        </keys>
        <instance recordid="IM0832268 - Paul test 3 incident via soap" uniquequery="number=&quot;IM0832268&quot;">
          <IncidentID type="String">IM0832268</IncidentID>
          <Category type="String">request for change</Category>
          <OpenTime type="DateTime">2016-03-18T16:06:28+00:00</OpenTime>
          <OpenedBy type="String">Harlass, Alexander</OpenedBy>
          <Priority type="String">4</Priority>
          <Urgency type="String">medium</Urgency>
          <UpdatedTime type="DateTime">2016-03-18T16:06:28+00:00</UpdatedTime>
          <AssignmentGroup type="String">TS3-AOS</AssignmentGroup>
          <Description type="Array">
            <Description type="String">RH test incident description via soap row 1</Description>
            <Description type="String">RH test incident description via soap row 2</Description>
          </Description>
          <Contact type="String">Harlass, Rudolf</Contact>
          <Title type="String">Paul test 3 incident via soap</Title>
          <TicketOwner type="String">INTEGRATION.OVO</TicketOwner>
          <UpdatedBy type="String">INTEGRATION.OVO</UpdatedBy>
          <Status type="String">Open</Status>
          <Area type="String">it products</Area>
          <Subarea type="String">utilization</Subarea>
          <ProblemType type="String">request for change</ProblemType>
          <Impact type="String">low</Impact>
          <Service type="String">PI Automation and Orchestration Service</Service>
          <VIP type="Boolean">false</VIP>
          <TargetResolutionDate type="DateTime">2016-03-25T15:00:00+00:00</TargetResolutionDate>
          <SOD type="String">OML</SOD>
          <SourceId type="String">4711</SourceId>
          <UserIncident type="Boolean">false</UserIncident>
          <AlertId type="String">4712</AlertId>
          <MonitoredId type="String">MI4713</MonitoredId>
        </instance>
      </model>
      <messages>
        <cmn:message type="String">Audit Record successfully recorded and added.</cmn:message>
      </messages>
    </CreateIncidentResponse>
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

In the end I would need an output like this:

Create SUCCESS
Messages:
    Audit Record successfully recorded and added.
Incident ID: IM0832268
    Status: Open
    Severity: 4
    Brief Description: RH test incident description via soap row 1
    Opened by: integration.ovo
    Opened time: March 20, 2016 11:54:08 PM CET

I do know how to create a string containing the output, but unfortunately I'm not that familiar with sed or similar tools.
Any help on how to extract the needed strings from the xml would be appreciated.
Thanks in advance

5
  • Do you have xsltproc on your system? Commented Mar 21, 2016 at 12:05
  • (Follow this comment, if you need a quick hack. This is not a long term solution.) Even though you cannot install anything, you can usually compile & copy the binary (& dependencies) to that system, at a path, to which you have write permission. Worst case, /tmp is read-write. You can try copying the new version of xmllint to that path & execute it from there. Commented Mar 21, 2016 at 12:14
  • maybe awk.info/?doc/tools/xmlparse.html can help. Not sure if gawk is required or if your system has it, but most gawk specific code can be re-written as plain awk without too much trouble. Good luck. Commented Mar 21, 2016 at 12:29
  • Does this answer your question? How to parse XML in Bash? Commented Feb 28, 2024 at 14:40
  • See this wrapper to have --xpath option even if there's none: github.com/sputnick-dev/xmllint But in 2024, all xmllint should have --xpath option Commented Mar 5, 2024 at 11:46

1 Answer 1

3

Most systems contain python or perl or some other language that has actual XML processing capabilities. This would yield a far better solution that attempting to produce a nicely formatted report from a large chunk of XML in bash. Having said that, here are some ideas for extracting this data with bash.

Given a string like:

<IncidentID type="String">IM0832268</IncidentID>

You can get the value using awk like this (assuming your data is in a file called data.xml):

awk -F'[<>]' '/IncidentID/ {print $3}' data.xml

Tje -F'[<>]' sets the awk field separator to be either < or >, so that the given line is split in fields like this:

| 1  |  2                     |  3      |  4        |  5 |
|    |IncidentID type="String"|IM0832268|/IncidentID|    |

The above example will actually return two lines (because there are two IncidentID tags in your data):

IM0832268
IM0832268

If you know these will always be the same, you can just take the first one:

awk -F'[<>]' '/IncidentID/ {print $3; exit}' data.xml

To extract an attribute from a line like:

<CreateIncidentResponse xmlns="http://schemas.hp.com/SM/7" xmlns:cmn="http://schemas.hp.com/SM/7/Common" xmlns:xmime="http://www.w3.org/2005/05/xmlmime" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" message="Success" returnCode="0" schemaRevisionDate="2016-02-16" schemaRevisionLevel="2" status="SUCCESS" xsi:schemaLocation="http://schemas.hp.com/SM/7 /Incident.xsd">

You can first split it into one line per attribute, like this:

grep '<CreateIncidentResponse' data.xml | tr ' ' '\n'

Which will give you:

<CreateIncidentResponse
xmlns="http://schemas.hp.com/SM/7"
xmlns:cmn="http://schemas.hp.com/SM/7/Common"
xmlns:xmime="http://www.w3.org/2005/05/xmlmime"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
message="Success"
returnCode="0"
schemaRevisionDate="2016-02-16"
schemaRevisionLevel="2"
status="SUCCESS"
xsi:schemaLocation="http://schemas.hp.com/SM/7
/Incident.xsd">

Which you can then pass to awk to extract attribute values. For example, to get the value of the message attribute:

grep '<CreateIncidentResponse' data.xml | tr ' ' '\n' |
awk -F'"' '/message/ {print $2}'

Which would yield:

Success

Hopefully this is enough to get you started.

Sign up to request clarification or add additional context in comments.

3 Comments

+1 for first paragraph but -1 for undermining the correct first paragraph with the brittle hack presented in the rest of your answer.
"brittle hack", I like that. I think I need to order a t-shirt.
:-) May I suggest: "My brittle hacks are more robust than your production code."

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.