1

I am trying to delete files in a folder based on any folder containing an XML file with the tag Modality containing anyType="CT" but quickly an into an issue trying to filter by XML content

I am able to return some content, but as soon as I try any filtering or to try and drill down into the content I get an empty result.

This is as deep as I can query and still return content from the xml file

$xmlfile = get-Content .\7.86.7.7053.61.159438.472144765.1719.XML
$xmlfile.ArrayOfPublicXMLElement.PublicXMLElement.ElementName

as soon as I try to drill deeper down I get no result e.g.

$xmlfile.ArrayOfPublicXMLElement.PublicXMLElement.Elementname |where {$_.name -eq "Modality"}
$xmlfile.ArrayOfPublicXMLElement.PublicXMLElement.Elementname |where {$_.name -eq "anyType"}
$xmlfile.ArrayOfPublicXMLElement.PublicXMLElement.Elementname |where {$_.name -eq "CT"}
$xmlfile.ArrayOfPublicXMLElement.PublicXMLElement | where {$_.name -eq "00080060"}

Here is a copy of the XML I am trying to filter I am assuming it is due to the format of the XML file I am having so much difficulty or just a massive misunderstanding of XML format or how powershell interacts with it?

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfPublicXMLElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <PublicXMLElement>
    <ElementName>Acquisition Time</ElementName>
    <Tag>00080032</Tag>
    <VR>TM</VR>
    <ElementData>
      <anyType xsi:type="xsd:string">105343</anyType>
    </ElementData>
  </PublicXMLElement>    <ElementName>Accession Number</ElementName>
    <Tag>00080050</Tag>
    <VR>SH</VR>
    <ElementData>
      <anyType xsi:type="xsd:string" />
    </ElementData>
  </PublicXMLElement>
  <PublicXMLElement>
    <ElementName>Modality</ElementName>
    <Tag>00080060</Tag>
    <VR>CS</VR>
    <ElementData>
      <anyType xsi:type="xsd:string">CT</anyType>
    </ElementData>
  </PublicXMLElement>
  <PublicXMLElement>
    <ElementName>Station Name</ElementName>
    <Tag>00081010</Tag>
    <VR>SH</VR>
    <ElementData>
      <anyType xsi:type="xsd:string">M_Source</anyType>
    </ElementData>
  </PublicXMLElement>
  <PublicXMLElement>
    <ElementName>Rescale Slope</ElementName>
    <Tag>00281053</Tag>
    <VR>DS</VR>
    <ElementData>
      <anyType xsi:type="xsd:string">1.0</anyType>
    </ElementData>
  </PublicXMLElement>
</ArrayOfPublicXMLElement>
4
  • The XML is invalid. The first </PublicXMLElement> has no matching start tag. Commented Nov 14, 2020 at 9:00
  • I would assume then there is nothing I can do about it? I am unable to change the XML. Commented Nov 14, 2020 at 9:49
  • Who or what prevents you to get a valid XML? If you can't edit it, all you have is GIGO. Commented Nov 14, 2020 at 9:57
  • I would like to apologise, I had trimmed the document as the full XML file has around 1635. I trimmed 1 line too many !!!! Commented Nov 14, 2020 at 13:09

2 Answers 2

3

If all you have is invalid XML, and if I understand correctly that you wish to remove all of these files where:

  • there is a tag <ElementName>Modality</ElementName>
  • that has an tag <ElementData>,
  • which in turn has a tag <anyType> containing value CT

then you will have to resort to using regex.

$regex = '(?s)<ElementName>Modality</ElementName>.*<ElementData>\s*<anyType[^>]*>CT</anyType>'
Get-ChildItem -Path 'D:\Test' -Filter '*.xml' -File -Recurse | ForEach-Object {
    $content = Get-Content -Path $_.FullName -Raw
    if ($content -match $regex) {
        $_ | Remove-Item -Force -WhatIf  # see below
    }
}

Remove the -WhatIf switch if you are satisfied the code would remove the correct files to actually delete those.

Regex details

(?s)                                    Dot matches line breaks
<ElementName>Modality</ElementName>     Match the character string “<ElementName>Modality</ElementName>” literally
.                                       Match any single character
   *                                    Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
<ElementData>                           Match the character string “<ElementData>” literally
\s                                      Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line)
   *                                    Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
<anyType                                Match the character string “<anyType” literally
[^>]                                    Match any character that is NOT a “>”
   *                                    Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>CT</anyType>                           Match the character string “>CT</anyType>” literally
Sign up to request clarification or add additional context in comments.

Comments

2

Is this working the way you like it?

$xmlfile.ArrayOfPublicXMLElement.PublicXMLElement | Where-Object { ($_.ElementName -like "Modality") -and ($_.ElementData.anyType.InnerText -like "CT")}

This is an easy method to get the number of matches:

(@($xmlfile.ArrayOfPublicXMLElement.PublicXMLElement | Where-Object { ($_.ElementName -like "Modality") -and ($_.ElementData.anyType.InnerText -like "CT")})).Count

2 Comments

Hi Apologies for not replying sooner, this is how I imagined it to work in powershell, but for some reason I get an empty result every time I try to go about this method of querying the XML file. The response above from Theo is working for me, but I would like to thank you for your time taken in answering my question, and my apologies that I cannot get it to work. The count matches 0 )
And perseverance counts !! turns out that I was rather daft in missing the [xml] as soon as I added that it was able to return the results I wanted !!!! THANK YOU AS WELL I cannot have 2 answers although both resolve my issue !!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.