XPath in PowerShell

Question

I am converting some ruby scripts to posh:

> gem install nokogiri

> irb

> require 'nokogiri'

> $html = Nokogiri::HTML("<div><img src='//127.0.0.1:5598/user/first.png' />
                       <img src='//127.0.0.1:5598/user/second.png' /></div>")

> $html.xpath('//img[contains(@src,"first")]')

# Output: <img src='//127.0.0.1:5598/user/first.png' />

In PowerShell, I have:

> [System.Reflection.Assembly]::LoadWithPartialName("System.Xml.Linq")

> [System.Reflection.Assembly]::LoadWithPartialName("System.Xml.XPath")

> $html = [System.Xml.Linq.XDocument]::Parse("<div>
                       <img src='//127.0.0.1:5598/user/first.png' />
                       <img src='//127.0.0.1:5598/user/second.png' /></div>")

> [System.Xml.XPath.Extensions]::XPathSelectElement($html, 
                                  '//img[contains(@src,"first")]')

# It displays the properties of XElement type object

How to get the same output?

Is there a better way parsing html in PowerShell v.4?

mousio · Accepted Answer · 2013-07-11 21:48:10Z

2

Just add .ToString() and you will get the same output.

Here is a simpler alternative which produces the same:

$html = [xml] "<div><img src='//127.0.0.1:5598/user/first.png' />
                    <img src='//127.0.0.1:5598/user/second.png' /></div>"
$html.SelectSingleNode('//img[contains(@src,"first")]').OuterXml

or even

($html.div.img | ?{ $_.src -match 'first' }).outerxml

Note that I am assuming you are dealing with XML as per your own PowerShell example (I am not used to handling HTML)…

edited Jul 11, 2013 at 21:48

answered Jul 11, 2013 at 12:05

mousio

10.4k4 gold badges38 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

CB. · Accepted Answer · 2013-07-11 12:27:06Z

2

Another way for query XML just using CMDLET:

$xml = [xml]@"
<div>
<img src='//127.0.0.1:5598/user/first.png' />
<img src='//127.0.0.1:5598/user/second.png' />
</div>
"@

(select-xml -xml $xml -xpath '//img[contains(@src,"first")]' ) | % { $_.node.src }

answered Jul 11, 2013 at 12:27

CB.

61.3k9 gold badges171 silver badges165 bronze badges

Comments

Loïc MICHEL · Accepted Answer · 2013-07-11 12:15:43Z

1

another alternative using invoke-webrequest (PS V3) :

$ie = new-object -com "InternetExplorer.Application"
$ie.Navigate("c:\temp\test.html")
$html=$ie.Document
$html.images|% { if ($_.src -match "first") {echo $_.outerHTML}}

note that if its not a local file you can use :

 $html = Invoke-WebRequest "http://yourURL"

then parse $html.ParsedHtml.body

answered Jul 11, 2013 at 12:15

Loïc MICHEL

26.3k10 gold badges79 silver badges111 bronze badges

1 Comment

mousio Over a year ago

Good thinking, nice approach! Don't forget to close IE, though :]

Collectives™ on Stack Overflow

XPath in PowerShell

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related