6

I need a little help with understanding of an XML in PowerShell. I have several XML files like this:

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.example.com/xml/catalog/2006-10-31">
    <product product-id="11210">
        ...
        <available-flag>true</available-flag>
        <online-flag>false</online-flag>
        <online-flag site-id="ru">true</online-flag>
        <online-flag site-id="fr">true</online-flag>
        <online-flag site-id="uk">false</online-flag>
        <online-flag site-id="de">true</online-flag>
        ...
    </product>
    <product product-id="50610">
        ...
        <available-flag>true</available-flag>
        <online-flag>true</online-flag>
        <online-flag site-id="ru">false</online-flag>
        <online-flag site-id="fr">true</online-flag>
        <online-flag site-id="uk">false</online-flag>
        <online-flag site-id="de">fasle</online-flag>
        ...
    </product>
    <product product-id="82929">
        ...
        <available-flag>true</available-flag>
        <online-flag>true</online-flag>
        <online-flag site-id="ru">false</online-flag>
        <online-flag site-id="fr">true</online-flag>
        <online-flag site-id="uk">false</online-flag>
        <online-flag site-id="de">true</online-flag>
        ...
    </product>
</catalog>

I need to get the values of two elements in PowerShell:

  • <online-flag> (without site-id attribute)
  • <online-flag site-id="ru">

for the product with product-id="50610".

I have the following code:

$Path = "C:\Temp\0\2017-08-12_190211.xml"
$XPath = "/ns:catalog/ns:product[@product-id='50610']"

$files = Get-ChildItem $Path | Where {-not $_.PSIsContainer}

if ($files -eq $null) {
    return
}

foreach ($file in $files) {
    [xml]$xml = Get-Content $file
    $namespace = $xml.DocumentElement.NamespaceURI
    $ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
    $ns.AddNamespace("ns", $namespace)
    $product = $xml.SelectSingleNode($XPath, $ns)
}

Several questions:

  1. With this code I am able to select the needed product node. PowerShell shows:

    online-flag        : {true, online-flag, online-flag, online-flag...}
    

    But how then I can select the values of the needed online-flag elements (if it is possible both ways: XPath one and the object one)?

  2. Is it possible to select a node in the "object" way? Like this:

    $product = $xml.catalog.product |
               Where-Object {$_."product-id".value -eq "50610"}
    
  3. If I have several files, what is the best way to select filename, global online-flag (without attributes), specific online-flag?

3 Answers 3

4

Use two different XPath expressions:

  1. for selecting a node without a particular attribute:

    //ns:product[@product-id='50610']/ns:online-flag[not(@site-id)]
    
  2. for selecting a node with a particular attribute value:

    //ns:product[@product-id='50610']/ns:online-flag[@site-id='ru']
    

You can select nodes relative to an already selected node by making the XPath expression relative to the current node (.):

$XPath = "/ns:catalog/ns:product[@product-id='50610']"
...
$product = $xml.SelectSingleNode($XPath, $ns)
$product.SelectSingleNode("./ns:online-flag[not(@site-id)]", $ns)
$product.SelectSingleNode("./ns:online-flag[@site-id='ru']", $ns)

If you need result data consisting of the filename and the two node values I'd recommend building custom objects:

$files | ForEach-Object {
    [xml]$xml = Get-Content $_
    ...
    New-Object -Type PSObject -Property @{
        'Filename'  = $_
        'online'    = $product.SelectSingleNode("./ns:online-flag[not(@site-id)]", $ns).'#text'
        'ru_online' = $product.SelectSingleNode("./ns:online-flag[@site-id='ru']", $ns).'#text'
    }
}

Using dot-notation and filtering via Where-Object should be possible, but I wouldn't recommend it. I find XPath far more efficient.

Sign up to request clarification or add additional context in comments.

5 Comments

Hello Ansgar! Thanks for your answer. I already mentioned that dot-notation is working and I agree that it is not convenient. The issue with your example is that my xml files a quite big and selecting two nodes will take time. Is it possible to select a product first like in my example and then use XPath to select the values of online-flag elements? What will be the XPath in this case?
I tried all the following with no luck: $product.SelectSingleNode("/ns:product/ns:online-flag[@site-id='ru']", $ns), $product.SelectSingleNode("/ns:online-flag[@site-id='ru']", $ns), $product.SelectSingleNode("/product/online-flag[@site-id='ru']"), $product.SelectSingleNode("/online-flag[@site-id='ru']"). This $product.GetElementsByTagName("online-flag") works. But the result is not a singe value but a list of values.
It is needed to use $product.SelectSingleNode("ns:online-flag[not(@site-id)]", $ns) or $product.SelectSingleNode("./ns:online-flag[not(@site-id)]", $ns) to search in the current node. Thanks a lot!
I think that a namespace is <xml><ns:node>val</ns:node>? I cannot see it in the xmlfile
1

I was able to get the data I need with the "object" way:

$product = $xml.catalog.product | Where-Object {$_."product-id" -eq "50610"}
$of = $product."online-flag"
$glblsid = $of | Where-Object {$_ -is [System.String]}
$specsid = ($of | Where-Object {$_."site-id" -eq "ru"})."#text"

But I don't like the way I managed to do this. Is there a more convenient solution?

And answer to the second question is yes - see the first line.

Comments

1

To complete this topic. I measured the performance of 3 methods: dot style, XPath on the file and XPath on the node. There is no significant differece between them. Here are the details.

I parsed 2 times 2 files 60MB each.

  1. Object style (dot style)

    ...
    $StartTime = Get-Date
    foreach ($file in $files) {
        [xml]$xml = Get-Content $file
    
        #Object style
        $product = $xml.catalog.product | Where-Object {$_."product-id" -eq "50610"}
        $of = $product."online-flag"
        $glblsid = $of | Where-Object {$_ -is [System.String]}
        $specsid = ($of | Where-Object {$_."site-id" -eq "ru"})."#text"
        Write-Output "$($file.Name) $glblsid $specsid"
    }
    $EndTime = Get-Date
    $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime
    Write-Output $TimeSpan.TotalMilliseconds
    

    Results:

    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    36269,535
    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    36628,3304
    
  2. XPath on the file:

    ...
    $StartTime = Get-Date
    foreach ($file in $files) {
        [xml]$xml = Get-Content $file
    
        #XPath on the file
        $namespace = $xml.DocumentElement.NamespaceURI
        $ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
        $ns.AddNamespace("ns", $namespace)
        $glblsid = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']/ns:online-flag[not(@site-id)]", $ns).'#text'
        $specsid = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']/ns:online-flag[@site-id='ru']", $ns).'#text'
        Write-Output "$($file.Name) $glblsid $specsid"
    }
    $EndTime = Get-Date
    $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime
    Write-Output $TimeSpan.TotalMilliseconds
    

    Results:

    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    36129,1368
    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    38890,3014
    
  3. XPath on the node:

    ...
    $StartTime = Get-Date
    foreach ($file in $files) {
        [xml]$xml = Get-Content $file
    
        #XPath on the node
        $namespace = $xml.DocumentElement.NamespaceURI
        $ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
        $ns.AddNamespace("ns", $namespace)
        $product = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']", $ns)
        $glblsid = $product.SelectSingleNode("ns:online-flag[not(@site-id)]", $ns).'#text'
        $specsid = $product.SelectSingleNode("ns:online-flag[@site-id='ru']", $ns).'#text'
        Write-Output "$($file.Name) $glblsid $specsid"
    }
    $EndTime = Get-Date
    $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime
    Write-Output $TimeSpan.TotalMilliseconds
    

    Results:

    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    33477,1708
    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    34116,7626
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.