26

I have an xml file that I want to configure using a bash script. For example if I had this xml:

<a>

  <b>
    <bb>
        <yyy>
            Bla 
        </yyy>
    </bb>
  </b>

  <c>
    <cc>
      Something
    </cc>
  </c>

  <d>
    bla
  </d>
</a>

(confidential info removed)

I would like to write a bash script that will remove section <b> (or comment it) but keep the rest of the xml intact. I am pretty new the the whole scripting thing. I was wondering if anyone could give me a hint as to what I should look into.

I was thinking that sed could be used except sed is a line editor. I think it would be easy to remove the <b> tags however I am unsure if sed would be able to remove all the text between the <b> tags.

I will also need to write a script to add back the deleted section.

1
  • 2
    I would have to advise against using bash/sed/awk/etc. for this sort of thing and recommend using Python, Ruby or Perl. Commented Apr 1, 2010 at 16:36

7 Answers 7

31

This would not be difficult to do in sed, as sed also works on ranges.

Try this (assuming xml is in a file named foo.xml):

sed -i '/<b>/,/<\/b>/d' foo.xml

-i will write the change into the original file (use -i.bak to keep a backup copy of the original)

This sed command will perform an action d (delete) on all of the lines specified by the range

# all of the lines between a line that matches <b>
# and the next line that matches <\/b>, inclusive
/<b>/,/<\/b>/

So, in plain English, this command will delete all of the lines between and including the line with <b> and the line with </b>

If you'd rather comment out the lines, try one of these:

# block comment
sed -i 's/<b>/<!-- <b>/; s/<\/b>/<\/b> -->/' foo.xml

# comment out every line in the range
sed -i '/<b>/,/<\/b>/s/.*/<!-- & -->/' foo.xml
Sign up to request clarification or add additional context in comments.

4 Comments

This works if there's nothing of importance on preceding <b> on the same line, and nothing of importance following </b> on the same line, i.e. doesn't work for XML in general but may work for the asker's special case.
The block comment (replacing <b> with <!-- <b> and </b> with </b> -->) would work if there was anything important on the line before <b> or after </b>. The biggest problem with that would be if there was already a comment inside of the commented block -- xml doesn't like nested comments.
Plenty of other cases this doesn't handle -- for instance, it can't distinguish tags from literal text inside a CDATA block. Much, much better to use XML-aware tools for the job.
Thanks :) it works well for multi line tags, but doesn't work at all for a tag on an unique line, such as <b>my text here</b>
16

Using xmlstarlet:

#xmlstarlet ed -d "/a/b" file.xml > tmp.xml
xmlstarlet ed -d "//b" file.xml > tmp.xml
mv tmp.xml file.xml

2 Comments

Should probably do it right and add a conditional (to only rename over the original file if the operation succeeds). To do it even more right, one might also use mktemp to generate a temporary file with a guaranteed-nonconflicting name, which can also have the side effect of avoiding some security attacks related to use of constant temporary file names.
Even with those caveats, though, this is still a much better answer than using sed.
10

You can use an XSLT such as this that is a modified identity transform. It copies all of the content by default, and has an empty template for b that does nothing(effectively deleting from output):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<!--Identity transform copies all items by default -->
<xsl:template match="@* | node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<!--Empty template to match on b elements and prevent it from being copied to output -->
<xsl:template match="b"/>

</xsl:stylesheet>

Create a bash script that executes the transform using Java and the Xalan commandline utility like this:

java org.apache.xalan.xslt.Process -IN foo.xml -XSL foo.xsl -OUT foo.out

The result is this:

<?xml version="1.0" encoding="UTF-16"?><a><c><cc>
      Something
    </cc></c><d>
    bla
  </d></a>

EDIT: if you would prefer to have the b commented out, to make it easier to put back, then use this stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <!--Identity transform copies all items by default -->
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <!--Match on b element, wrap in a comment and construct text representing XML structure by applying templates in "comment" mode -->
    <xsl:template match="b">
        <xsl:comment>
            <xsl:apply-templates select="self::*" mode="comment" />
        </xsl:comment>
    </xsl:template>

    <xsl:template match="*" mode="comment">
        <xsl:value-of select="'&lt;'"/>
            <xsl:value-of select="name()"/>
        <xsl:value-of select="'&gt;'"/>
            <xsl:apply-templates select="@*|node()" mode="comment" />
        <xsl:value-of select="'&lt;/'"/>
            <xsl:value-of select="name()"/>
        <xsl:value-of select="'&gt;'"/>
    </xsl:template>

    <xsl:template match="text()" mode="comment">
        <xsl:value-of select="."/>
    </xsl:template>

    <xsl:template match="@*" mode="comment">
        <xsl:value-of select="name()"/>
        <xsl:text>="</xsl:text>
        <xsl:value-of select="."/>
        <xsl:text>" </xsl:text>
    </xsl:template>

</xsl:stylesheet>

It produces this output:

<?xml version="1.0" encoding="UTF-16"?><a><!--<b><bb><yyy>
            Bla
        </yyy></bb></b>--><c><cc>
      Something
    </cc></c><d>
    bla
  </d></a>

Comments

6

If you want the most appropriate replacement for sed for XML data, it would be an XSLT processor. Like sed it's a complex language but specialized for the task of XML-to-anything transformations.

On the other hand, this does seem to be the point at which I would seriously consider switching to a real programming language, like Python.

1 Comment

That consideration adds to my perception of XML being overkill for configuration files ;)
4
# edit file inplace
xmlstarlet ed -L -d "//b" file.xml

1 Comment

apt-get install xmlstarlet on ubuntu 9.x version, with default repositories. did not find -L flag in documentation. is it in ubuntu 10.0.4 ?
3

@OP, you can use awk eg

$ cat file
<a>                              

some text before   <b>
    <bb>
        <yyy>
            Bla
        </yyy>
    </bb>
  </b> some text after

  <c>
    <cc>
      Something
    </cc>
  </c>

  <d>
    bla
  </d>
</a>

$ awk 'BEGIN{RS="</b>"}/<b>/{gsub(/<b>.*/,"")}1' file
<a>

some text before
 some text after

  <c>
    <cc>
      Something
    </cc>
  </c>

  <d>
    bla
  </d>
</a>

Comments

0
sed -i '/<b>/,/<\/b>/d' foo.xml

Will this work if b tag has a value defined as well in about HTML, b tag starts as <b id="Test Step">

1 Comment

Is this a question or an answer?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.