Grep and Sed Equivalent for XML Command Line Processing

Question

When doing shell scripting, typically data will be in files of single line records like csv. It's really simple to handle this data with grep and sed. But I have to deal with XML often, so I'd really like a way to script access to that XML data via the command line. What are the best tools?

xml_grep is fine for grepping, as stated in stackoverflow.com/a/2222224/871134 — Deleplace
– Deleplace, Commented Jul 28, 2015 at 16:18

Russ · Accepted Answer · 2008-09-18 12:14:07Z

108

I've found xmlstarlet to be pretty good at this sort of thing.

http://xmlstar.sourceforge.net/

Should be available in most distro repositories, too. An introductory tutorial is here:

http://www.ibm.com/developerworks/library/x-starlet.html

answered Sep 18, 2008 at 12:14

Russ

1,5641 gold badge9 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Steve Bennett Over a year ago

Thought I'd point out that there are Windows binaries available on the Sourceforge site.

Steve Bennett Over a year ago

Doesn't support XQuery though, as far as I can tell.

Charles Duffy Over a year ago

@SteveBennett indeed it doesn't, but the features it adds on top of raw XPath are good enough to make it competitive with "grep and sed". If you want the fancy, fancy goodness of XQuery... well, that's more like an XML equivalent to perl or awk. :)

Joseph Holsten · Accepted Answer · 2025-03-09 19:30:33Z

40

Some promising tools:

nokogiri: parsing HTML/XML DOMs in ruby using XPath & CSS selectors
hpricot: deprecated
fxgrep: Uses its own XPath-like syntax to query documents. Written in SML, so installation may be difficult.
LT XML: XML toolkit derived from SGML tools, including sggrep, sgsort, xmlnorm and others. Uses its own query syntax. The documentation is very formal. Written in C. LT XML 2 claims support of XPath, XInclude and other W3C standards.
xmlgrep2: simple and powerful searching with XPath. Written in Perl using XML::LibXML and libxml2.
XQSharp: Supports XQuery, the extension to XPath. Written for the .NET Framework.
xml-coreutils: Laird Breyer's toolkit equivalent to GNU coreutils. Discussed in an interesting essay on what the ideal toolkit should include.
xmldiff: Simple tool for comparing two xml files.
xmltk: doesn't seem to have package in debian, ubuntu, fedora, or macports, hasn't had a release since 2007, and uses non-portable build automation.

xml-coreutils seems the best documented and most UNIX-oriented.

Update 2025-03-09: I've been referring to https://github.com/dbohdan/structured-text-tools for CSV and other field-separated data (I like miller, sc-im & visidata). But notably it has many XML tools in https://github.com/dbohdan/structured-text-tools?tab=readme-ov-file#xml which have not been mentioned here.

edited Mar 9 at 19:30

answered Sep 18, 2008 at 11:39

Joseph Holsten

20.9k6 gold badges28 silver badges29 bronze badges

2 Comments

alastairs Over a year ago

Couldn't you create a wrapper script for the Ruby program, and pass in the arguments' array in the script to hpricot? E.g., in a PHP shell script, something like the following should work: <?php /path/to/hpricot $argv ?>

daparic Over a year ago

xmldiff is cool

Vi. · Accepted Answer · 2010-06-29 09:31:10Z

26

There is also xml2 and 2xml pair. It will allow usual string editing tools to process XML.

Example. q.xml:

<?xml version="1.0"?>
<foo>
    text
    more text
    <textnode>ddd</textnode><textnode a="bv">dsss</textnode>
    <![CDATA[ asfdasdsa <foo> sdfsdfdsf <bar> ]]>
</foo>

xml2 < q.xml

/foo=
/foo=   text
/foo=   more text
/foo=   
/foo/textnode=ddd
/foo/textnode
/foo/textnode/@a=bv
/foo/textnode=dsss
/foo=
/foo=    asfdasdsa <foo> sdfsdfdsf <bar> 
/foo=

xml2 < q.xml | grep textnode | sed 's!/foo!/bar/baz!' | 2xml

<bar><baz><textnode>ddd</textnode><textnode a="bv">dsss</textnode></baz></bar>

P.S. There are also html2 / 2html.

edited Jun 29, 2010 at 9:31

answered Jun 22, 2010 at 22:01

Vi.

39k21 gold badges103 silver badges155 bronze badges

4 Comments

Vi. Over a year ago

@Joseph Holsten Yes. It allows hacking with XML without thinking through XPath things.

Joseph Holsten Over a year ago

Nice! I had been focusing on tools that don't use an intermediate format, but the idea of a high-fidelity, line-oriented representation of xml seems like a great way to keep using real grep and sed. Have you tried pyxie? How does it compare? Any other line oriented representations? Would you consider this better than just replacing xml newlines with an entity (
)? This would let you stick records on the same line at least. Oh, and could you edit your post to include a link to the project?

Vi. Over a year ago

@Joseph Holsten No, I don't think pyxie format whould be more useful than xml2 format. xml2 provides "full path" in nested XML elements, so allow more line-oriented matching and substitution. Also 2xml can easily recreate XML from partial (filtered) xml2 output.

mogsie Over a year ago

+1 I can't upvote this enough... cat foo.xml | xml2 | grep /bar | 2xml — gives you the same structure as the original, but all elements have been stripped except "bar" elements. Awesome.

bortzmeyer · Accepted Answer · 2009-03-04 08:12:52Z

25

To Joseph Holsten's excellent list, I add the xpath command-line script which comes with Perl library XML::XPath. A great way to extract information from XML files:

 xpath -q -e '/entry[@xml:lang="fr"]' *xml

answered Mar 4, 2009 at 8:12

bortzmeyer

35.8k13 gold badges72 silver badges94 bronze badges

1 Comment

antonj Over a year ago

This is installed by default in osx, but without -q -e options. Example, get attribute "package" value from the "manifest" node in "AndroidManifest.xml": xpath AndroidManifest.xml 'string(/manifest/@package)' 2> /dev/null

Dave Jarvis · Accepted Answer · 2013-04-18 17:52:14Z

18

You can use xmllint:

xmllint --xpath //title books.xml

Should be bundled with most distros, and is also bundled with Cygwin.

$ xmllint --version
xmllint: using libxml version 20900

See:

$ xmllint
Usage : xmllint [options] XMLfiles ...
        Parse the XML files and output the result of the parsing
        --version : display the version of the XML library used
        --debug : dump a debug tree of the in-memory document
        ...
        --schematron schema : do validation against a schematron
        --sax1: use the old SAX1 interfaces for processing
        --sax: do not build a tree but work just at the SAX level
        --oldxml10: use XML-1.0 parsing rules before the 5th edition
        --xpath expr: evaluate the XPath expression, inply --noout

edited Apr 18, 2013 at 17:52

answered Jan 24, 2013 at 0:41

Dave Jarvis

31.3k43 gold badges186 silver badges326 bronze badges

4 Comments

Miserable Variable Over a year ago

There is no --xpath argument to xmllint: manpagez.com/man/1/xmllint

Dave Jarvis Over a year ago

@MiserableVariable: The man page is incorrect. I just looked at the man page for my version: the xpath argument is not listed. This is a documentation error. Try running the program, instead.

Daniel Beck Over a year ago

@MiserableVariable --xpath is a fairly recent addition and e.g. not in RHEL 6 versions of xmllint.

marbu Over a year ago

To be more precise, xmllint --xpath was introduced in libxml2 2.7.7 (in 2010).

Community · Accepted Answer · 2017-04-13 12:13:47Z

9

If you're looking for a solution on Windows, Powershell has built-in functionality for reading and writing XML.

test.xml:

<root>
  <one>I like applesauce</one>
  <two>You sure bet I do!</two>
</root>

Powershell script:

# load XML file into local variable and cast as XML type.
$doc = [xml](Get-Content ./test.xml)

$doc.root.one                                   #echoes "I like applesauce"
$doc.root.one = "Who doesn't like applesauce?"  #replace inner text of <one> node

# create new node...
$newNode = $doc.CreateElement("three")
$newNode.set_InnerText("And don't you forget it!")

# ...and position it in the hierarchy
$doc.root.AppendChild($newNode)

# write results to disk
$doc.save("./testNew.xml")

testNew.xml:

<root>
  <one>Who likes applesauce?</one>
  <two>You sure bet I do!</two>
  <three>And don't you forget it!</three>
</root>

Source: https://serverfault.com/questions/26976/update-xml-from-the-command-line-windows

edited Apr 13, 2017 at 12:13

CommunityBot

11 silver badge

answered Jul 29, 2013 at 21:29

Clay

11.8k5 gold badges51 silver badges46 bronze badges

2 Comments

Richard Hauer Over a year ago

battled with various linux tools for a few hours before resorting to Powershell. I'm surprised this is so hard - linux cmd-line is normally really good but there seems to be a hole here. Note: Use case for me was: 1) locate nodes by xpath, 2) remove if found, 3) add new nodes, 4) save file. I was updating a bunch of solr configs. If anyone knows of an easy/reliable way to do this I'm all ears

Joseph Holsten Over a year ago

Wow, this really tiptoes up to the line of an acceptable solution. But honestly, I'd probably accept it if it looked like xps $doc .root.one xps $doc 'AppendChild("three")' and xps $doc '.three.set_InnerText("And don't you forget it!")', which is clearly inferior!

taggo · Accepted Answer · 2011-05-30 12:20:37Z

8

There're also xmlsed & xmlgrep of the NetBSD xmltools!

http://blog.huoc.org/xmltools-not-dead.html

answered May 30, 2011 at 12:20

taggo

811 silver badge1 bronze badge

Comments

Adrian Mouat · Accepted Answer · 2008-09-18 20:41:16Z

6

Depends on exactly what you want to do.

XSLT may be the way to go, but there is a learning curve. Try xsltproc and note that you can hand in parameters.

answered Sep 18, 2008 at 20:41

Adrian Mouat

46.9k17 gold badges114 silver badges106 bronze badges

Comments

Devy · Accepted Answer · 2019-09-21 04:38:17Z

6

D. Bohdan maintains an open source GitHub repo that keeps a list of command line tools for structured text tools, there a section for XML/HTML tools:

https://github.com/dbohdan/structured-text-tools#xml-html

answered Sep 21, 2019 at 4:38

Devy

10.3k9 gold badges68 silver badges61 bronze badges

Comments

Gilles Quénot · Accepted Answer · 2015-01-13 03:32:08Z

4

There's also saxon-lint from command line with the ability to use XPath 3.0/XQuery 3.0. (Other command-line tools use XPath 1.0).

EXAMPLES :

http/html:

$ saxon-lint --html --xpath 'count(//a)' http://stackoverflow.com/q/91791
328

xml :

$ saxon-lint --xpath '//a[@class="x"]' file.xml

edited Jan 13, 2015 at 3:32

answered Jan 12, 2015 at 14:48

Gilles Quénot

188k43 gold badges232 silver badges229 bronze badges

Comments

Oliver Hallam · Accepted Answer · 2009-03-03 20:59:32Z

3

XQuery might be a good solution. It is (relatively) easy to learn and is a W3C standard.

I would recommend XQSharp for a command line processor.

edited Mar 3, 2009 at 20:59

answered Oct 30, 2008 at 23:12

Oliver Hallam

4,2621 gold badge28 silver badges30 bronze badges

1 Comment

Charles Duffy Over a year ago

BaseX also has a command-line XQuery processor (in addition to its database mode), and stays up-to-date with bleeding-edge versions of the standard (following the evolving draft of XQuery 3.0 quite closely).

daparic · Accepted Answer · 2017-03-16 03:21:13Z

3

I first used xmlstarlet and still using it. When the query gets tough, i need XML's xpath2 and xquery feature support I turn to xidel http://www.videlibri.de/xidel.html

answered Mar 16, 2017 at 3:21

daparic

4,6222 gold badges47 silver badges46 bronze badges

Comments

methuselah-0 · Accepted Answer · 2020-03-21 18:16:48Z

Grep Equivalent

You can define a bash function, say "xp" ("xpath") that wraps some python3 code. To use it you need to install python3 and python-lxml. Benefits:

regex matching which you lack in e.g. xmllint.
Use as a filter (in a pipe) on the commandline

It's easy and powerful to use like this:

xmldoc=$(cat <<EOF
<?xml version="1.0" encoding="utf-8"?>
<job xmlns="http://www.sample.com/">programming</job>
EOF
)
selection='//*[namespace-uri()="http://www.sample.com/" and local-name()="job" and re:test(.,"^pro.*ing$")]/text()'
echo "$xmldoc" | xp "$selection"
# prints programming

xp() looks something like this:

xp()
{ 
local selection="$1";
local xmldoc;
if ! [[ -t 0 ]]; then
    read -rd '' xmldoc;
else
    xmldoc="$2";
fi;
python3 <(printf '%b' "from lxml.html import tostring\nfrom lxml import etree\nfrom sys import stdin\nregexpNS = \"http://exslt.org/regular-expressions\"\ntree = etree.parse(stdin)\nfor e in tree.xpath('""$selection""', namespaces={'re':regexpNS}):\n  if isinstance(e, str):\n    print(e)\n  else:\n    print(tostring(e).decode('UTF-8'))") <<< "$xmldoc"
}

Sed Equivalent

Consider using xq which gives you the full power of the jq "programming language". If you have python-pip installed, you can install xq with pip install yq, then in below example we are replacing "Keep Accounts" with "Keep Accounts 2":

xmldoc=$(cat <<'EOF'
<resources>
    <string name="app_name">Keep Accounts</string>
    <string name="login">"login"</string>
    <string name="login_password">"password："</string>
    <string name="login_account_hint">input to login</string>
    <string name="login_password_hint">input your password</string>
    <string name="login_fail">login failed</string>
</resources>
EOF
)
echo "$xmldoc" | xq '.resources.string = ([.resources.string[]|select(."#text" == "Keep Accounts") ."#text" = "Keep Accounts 2"])' -x

Ben · Accepted Answer · 2008-09-18 11:47:15Z

-1

JEdit has a plugin called "XQuery" which provides querying functionality for XML documents.

Not quite the command line, but it works!

answered Sep 18, 2008 at 11:47

Ben

5124 silver badges12 bronze badges

1 Comment

Joseph Holsten Over a year ago

While JEdit likely has a way to search through a file, that does not make it a competitor to grep(1).

Collectives™ on Stack Overflow

Grep and Sed Equivalent for XML Command Line Processing

14 Answers 14

3 Comments

2 Comments

4 Comments

1 Comment

4 Comments

2 Comments

Comments

Comments

Comments

EXAMPLES :

Comments

1 Comment

Comments

Grep Equivalent

Sed Equivalent

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

14 Answers 14

3 Comments

2 Comments

4 Comments

1 Comment

4 Comments

2 Comments

Comments

Comments

Comments

EXAMPLES :

Comments

1 Comment

Comments

Grep Equivalent

Sed Equivalent

Comments

1 Comment

Linked

Related