When doing shell scripting, typically data will be in files of single line records like csv. It's really simple to handle this data with grep and sed. But I have to deal with XML often, so I'd really like a way to script access to that XML data via the command line. What are the best tools?
-
xml_grep is fine for grepping, as stated in stackoverflow.com/a/2222224/871134Deleplace– Deleplace2015-07-28 16:18:46 +00:00Commented Jul 28, 2015 at 16:18
14 Answers
I've found xmlstarlet to be pretty good at this sort of thing.
http://xmlstar.sourceforge.net/
Should be available in most distro repositories, too. An introductory tutorial is here:
3 Comments
Some promising tools:
nokogiri: parsing HTML/XML DOMs in ruby using XPath & CSS selectors
hpricot: deprecated
fxgrep: Uses its own XPath-like syntax to query documents. Written in SML, so installation may be difficult.
LT XML: XML toolkit derived from SGML tools, including
sggrep,sgsort,xmlnormand others. Uses its own query syntax. The documentation is very formal. Written in C. LT XML 2 claims support of XPath, XInclude and other W3C standards.xmlgrep2: simple and powerful searching with XPath. Written in Perl using XML::LibXML and libxml2.
XQSharp: Supports XQuery, the extension to XPath. Written for the .NET Framework.
xml-coreutils: Laird Breyer's toolkit equivalent to GNU coreutils. Discussed in an interesting essay on what the ideal toolkit should include.
xmldiff: Simple tool for comparing two xml files.
xmltk: doesn't seem to have package in debian, ubuntu, fedora, or macports, hasn't had a release since 2007, and uses non-portable build automation.
xml-coreutils seems the best documented and most UNIX-oriented.
Update 2025-03-09: I've been referring to https://github.com/dbohdan/structured-text-tools for CSV and other field-separated data (I like miller, sc-im & visidata). But notably it has many XML tools in https://github.com/dbohdan/structured-text-tools?tab=readme-ov-file#xml which have not been mentioned here.
There is also xml2 and 2xml pair. It will allow usual string editing tools to process XML.
Example. q.xml:
<?xml version="1.0"?>
<foo>
text
more text
<textnode>ddd</textnode><textnode a="bv">dsss</textnode>
<![CDATA[ asfdasdsa <foo> sdfsdfdsf <bar> ]]>
</foo>
xml2 < q.xml
/foo=
/foo= text
/foo= more text
/foo=
/foo/textnode=ddd
/foo/textnode
/foo/textnode/@a=bv
/foo/textnode=dsss
/foo=
/foo= asfdasdsa <foo> sdfsdfdsf <bar>
/foo=
xml2 < q.xml | grep textnode | sed 's!/foo!/bar/baz!' | 2xml
<bar><baz><textnode>ddd</textnode><textnode a="bv">dsss</textnode></baz></bar>
P.S. There are also html2 / 2html.
4 Comments
2xml can easily recreate XML from partial (filtered) xml2 output.cat foo.xml | xml2 | grep /bar | 2xml — gives you the same structure as the original, but all elements have been stripped except "bar" elements. Awesome.To Joseph Holsten's excellent list, I add the xpath command-line script which comes with Perl library XML::XPath. A great way to extract information from XML files:
xpath -q -e '/entry[@xml:lang="fr"]' *xml
1 Comment
-q -e options. Example, get attribute "package" value from the "manifest" node in "AndroidManifest.xml": xpath AndroidManifest.xml 'string(/manifest/@package)' 2> /dev/nullYou can use xmllint:
xmllint --xpath //title books.xml
Should be bundled with most distros, and is also bundled with Cygwin.
$ xmllint --version
xmllint: using libxml version 20900
See:
$ xmllint
Usage : xmllint [options] XMLfiles ...
Parse the XML files and output the result of the parsing
--version : display the version of the XML library used
--debug : dump a debug tree of the in-memory document
...
--schematron schema : do validation against a schematron
--sax1: use the old SAX1 interfaces for processing
--sax: do not build a tree but work just at the SAX level
--oldxml10: use XML-1.0 parsing rules before the 5th edition
--xpath expr: evaluate the XPath expression, inply --noout
4 Comments
--xpath is a fairly recent addition and e.g. not in RHEL 6 versions of xmllint.xmllint --xpath was introduced in libxml2 2.7.7 (in 2010).If you're looking for a solution on Windows, Powershell has built-in functionality for reading and writing XML.
test.xml:
<root>
<one>I like applesauce</one>
<two>You sure bet I do!</two>
</root>
Powershell script:
# load XML file into local variable and cast as XML type.
$doc = [xml](Get-Content ./test.xml)
$doc.root.one #echoes "I like applesauce"
$doc.root.one = "Who doesn't like applesauce?" #replace inner text of <one> node
# create new node...
$newNode = $doc.CreateElement("three")
$newNode.set_InnerText("And don't you forget it!")
# ...and position it in the hierarchy
$doc.root.AppendChild($newNode)
# write results to disk
$doc.save("./testNew.xml")
testNew.xml:
<root>
<one>Who likes applesauce?</one>
<two>You sure bet I do!</two>
<three>And don't you forget it!</three>
</root>
Source: https://serverfault.com/questions/26976/update-xml-from-the-command-line-windows
2 Comments
xps $doc .root.one xps $doc 'AppendChild("three")' and xps $doc '.three.set_InnerText("And don't you forget it!")', which is clearly inferior!There're also xmlsed & xmlgrep of the NetBSD xmltools!
Comments
Depends on exactly what you want to do.
XSLT may be the way to go, but there is a learning curve. Try xsltproc and note that you can hand in parameters.
Comments
D. Bohdan maintains an open source GitHub repo that keeps a list of command line tools for structured text tools, there a section for XML/HTML tools:
Comments
There's also saxon-lint from command line with the ability to use XPath 3.0/XQuery 3.0. (Other command-line tools use XPath 1.0).
EXAMPLES :
http/html:
$ saxon-lint --html --xpath 'count(//a)' http://stackoverflow.com/q/91791
328
xml :
$ saxon-lint --xpath '//a[@class="x"]' file.xml
Comments
XQuery might be a good solution. It is (relatively) easy to learn and is a W3C standard.
I would recommend XQSharp for a command line processor.
1 Comment
I first used xmlstarlet and still using it. When the query gets tough, i need XML's xpath2 and xquery feature support I turn to xidel http://www.videlibri.de/xidel.html
Comments
Grep Equivalent
You can define a bash function, say "xp" ("xpath") that wraps some python3 code. To use it you need to install python3 and python-lxml. Benefits:
- regex matching which you lack in e.g. xmllint.
- Use as a filter (in a pipe) on the commandline
It's easy and powerful to use like this:
xmldoc=$(cat <<EOF
<?xml version="1.0" encoding="utf-8"?>
<job xmlns="http://www.sample.com/">programming</job>
EOF
)
selection='//*[namespace-uri()="http://www.sample.com/" and local-name()="job" and re:test(.,"^pro.*ing$")]/text()'
echo "$xmldoc" | xp "$selection"
# prints programming
xp() looks something like this:
xp()
{
local selection="$1";
local xmldoc;
if ! [[ -t 0 ]]; then
read -rd '' xmldoc;
else
xmldoc="$2";
fi;
python3 <(printf '%b' "from lxml.html import tostring\nfrom lxml import etree\nfrom sys import stdin\nregexpNS = \"http://exslt.org/regular-expressions\"\ntree = etree.parse(stdin)\nfor e in tree.xpath('""$selection""', namespaces={'re':regexpNS}):\n if isinstance(e, str):\n print(e)\n else:\n print(tostring(e).decode('UTF-8'))") <<< "$xmldoc"
}
Sed Equivalent
Consider using xq which gives you the full power of the jq "programming language". If you have python-pip installed, you can install xq with pip install yq, then in below example we are replacing "Keep Accounts" with "Keep Accounts 2":
xmldoc=$(cat <<'EOF'
<resources>
<string name="app_name">Keep Accounts</string>
<string name="login">"login"</string>
<string name="login_password">"password:"</string>
<string name="login_account_hint">input to login</string>
<string name="login_password_hint">input your password</string>
<string name="login_fail">login failed</string>
</resources>
EOF
)
echo "$xmldoc" | xq '.resources.string = ([.resources.string[]|select(."#text" == "Keep Accounts") ."#text" = "Keep Accounts 2"])' -x
Comments
JEdit has a plugin called "XQuery" which provides querying functionality for XML documents.
Not quite the command line, but it works!
1 Comment
grep(1).