1

I'm currently trying to get a Powershell script to extract information from an XML file and I've been trying a few different things like Select-Xml and SelectNodes, but I'm struggling. The information is in a format like this:

<school>
   <students>
      <student name="Bob" subject="Math" year="5">
         <details SID="38571273" code="1122" group="" />
      </student>
        
      <student name="John" subject="Science" year="5">
         <details SID="38343555" code="1123" group="" />
      </student>
   </students>
</school>

I want to extract information like the name, subject, year, SID, code, and group and store it in an array for each student so I can process it. I'm writing a Powershell script to do this, but I'm quite new to it, as well as XML. Any help would be greatly appreciated!

2
  • 1
    Please post the actual XML, SID="38343555" is not a valid tag name. Commented Mar 16, 2022 at 1:30
  • Fair enough. I edited the post. Forgot to add the tags. Commented Mar 16, 2022 at 1:50

2 Answers 2

4

For simple tasks like this one, you can access the XML items with dot notation, i.e. $xml.school.students.student will give an array of <student> elements.

After that, you can use select (aka Select-Object) to pick out certain properties. Either direct properties by name, or name/expression pairs (@{name='...', expression={...}} for more complex ones, or if you want to rename the result "columns":

$xml.school.students.student | select name,subject,year,@{n='SID'; e={$_.details.SID}}

gives an array of PSCustomObjects:

name subject year SID     
---- ------- ---- ---     
Bob  Math    5    38571273
John Science 5    38343555
Sign up to request clarification or add additional context in comments.

Comments

1

With the updated XML I revised my answer -

I think this is what you're looking for. You would just have to update the path to your xml.

$XMLPath = "ENTER_PATH"
function Get-StudentInfo {
    param (
        $xPATH,
        $title)
    Select-Xml -Path $XMLPath -XPath "/school/students/$xPATH" | ForEach-Object { $_.Node.$title }
}

$students = [PSCustomObject]@{
    Names    = Get-StudentInfo -xPath "student" -title "name"
    Subjects = Get-StudentInfo -xPath "student" -title "subject"
    Years    = Get-StudentInfo -xPath "student" -title "year"
    SIDs     = Get-StudentInfo -xPath "student/details" -title "SID"
    Codes    = Get-StudentInfo -xPath "student/details" -title "code"
    Groups   = Get-StudentInfo -xPath "student/details" -title "group"
}

$students.Names


From there you would have an object $students that has properties with an array for each property.

Names    : {Bob, John}
Subjects : {Math, Science}
Years    : {5, 5}
SIDs     : {38343555, 38343555}
Codes    : {1123, 1123}
Groups   : {, }

To get a list of all the students names you would use $students.names or for subject $students.subjects


A version that loads the XML file only once, instead of once per call to Get-StudentInfo:

$students = Select-Xml -Path "test.xml" -XPath '/school/students/student' | Select-Object @(
    @{name='Name';    expr={ $_ | Select-Xml '@name' }}
    @{name='Subject'; expr={ $_ | Select-Xml '@subject' }}
    @{name='Year';    expr={ $_ | Select-Xml '@year' }}
    @{name='SID';     expr={ $_ | Select-Xml 'details/@SID' }}
    @{name='Code';    expr={ $_ | Select-Xml 'details/@code' }}
    @{name='Group';   expr={ $_ | Select-Xml 'details/@group' }}
)
$students

result

Name Subject Year SID      Code Group
---- ------- ---- ---      ---- -----
Bob  Math    5    38571273 1122      
John Science 5    38343555 1123 

7 Comments

You probably want to parameterize "test.xml" too :)
Eh kinda pointless, since the XPath is only somewhat parameterized. Can only assume the file is one big file and not frequently changing, but easy enough haha
Tomalak's answer is def better! Should note that it needs to be declared as an xml object first [xml]$xml = Get-Content -Path "C:\file.xml"
Select-Xml has the issue that it loads and parses the XML file from scratch every time, so for selecting 6 XPaths, you would load the file 6 times, which is a huge amount of overhead. It's better to call it once, just to retrieve the <student> elements, and then work with those after that.
Also, [xml]$xml = Get-Content -Path "C:\file.xml" is the wrong way to load an XML file. It will read the file with whatever encoding Get-Content feels like using today, ignoring the <?xml encoding="..."?> declaration, and thus silently mangling your data in the process if you're not lucky. The correct way is $xml = New-Object xml and then $xml.Load($path). This parses the file while paying attention to its encoding.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.