0

I need some help on this issue:

I have a lot of this xml files on directory, and i need to delete part of the xml data.(Everything with opex:**something), **ExtendedXIP and LegacyXIP , but i can figure out what am`i doing wrong.

This is my xml example:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<opex:OPEXMetadata xmlns:opex="http://www.openpreservationexchange.org/opex/v1.0">
<opex:Transfer>
<opex:SourceID>4720184e-9d02-47f6-867f-2603ab116669</opex:SourceID>
</opex:Transfer>
<opex:Properties>
<opex:Title>Fakamae_2018002_RM</opex:Title>
<opex:Description>Fakamae_2018002_RM</opex:Description>
<opex:Identifiers>
  <opex:Identifier type="code">Fakamae_2018002_RM</opex:Identifier>
</opex:Identifiers>
</opex:Properties>
<opex:DescriptiveMetadata>
<LegacyXIP xmlns="http://preservica.com/LegacyXIP">
  <AccessionRef>88158870-ba1a-44a1-ad70-5cc898a5b436</AccessionRef>
  <AccumulationRef>3b955682-e827-43bb-a446-2dd635f01ef0</AccumulationRef>
</LegacyXIP>
<ExtendedXIP xmlns="http://preservica.com/ExtendedXIP/v6.0">
  <DigitalSurrogate>false</DigitalSurrogate>
  <CoverageFrom>2019-09-21T00:00:00.000Z</CoverageFrom>
     </ExtendedXIP>
<METATRANSCRIPT:METATRANSCRIPT xmlns:METATRANSCRIPT="http://www.mpi.nl/IMDI/Schema/IMDI"
     xmlns="http://www.mpi.nl/IMDI/Schema/IMDI" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ArchiveHandle="hdl:2196/00-0000-0000-0013-4C3F-7" Date="2018-12-17" FormatId="IMDI 3.04" Originator="CMDI Maker by CLASS - Cologne Language Archive Services" Type="SESSION" Version="1.0" xsi:schemaLocation="http://www.mpi.nl/IMDI/Schema/IMDI http://www.mpi.nl/IMDI/Schema/IMDI_3.0.xsd">
  <Session>
    <Name>Fakamae_2018002_RM</Name>
    <Title>Rua Tau Tupuna</Title>
    <Date>2018-05-09</Date>
    <Description LanguageId="ISO639-3:eng" Link="">A story about a grandmother and granddaughtertranslations.</Description>
    <MDGroup>
      <Location>
        <Continent Link="http://www.mpi.nl/IMDI/Schema/Continents.xml"  Type="ClosedVocabulary">Oceania</Continent>
        <Country Link="http://www.mpi.nl/IMDI/Schema/Countries.xml"  Type="OpenVocabulary">Vanuatu</Country>
        <Region>Shefa Province</Region>
        <Address>Tongamea village Emae island</Address>
      </Location>
      <Project>
        <Name>fakamae-dewar-0487</Name>
          <Contact>
          <Name>Amy Dewar</Name>
          <Address />
          <Email>[email protected]</Email>
          <Organisation>University of Newcastle, Australia</Organisation>
        </Contact>
        <Description LanguageId="ISO639-3:eng" Link="" />
      </Project>
  </METATRANSCRIPT:METATRANSCRIPT>
  </opex:DescriptiveMetadata>
  </opex:OPEXMetadata>

This is what i need to get:

 <METATRANSCRIPT:METATRANSCRIPT xmlns:METATRANSCRIPT="http://www.mpi.nl/IMDI/Schema/IMDI"  xmlns="http://www.mpi.nl/IMDI/Schema/IMDI" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  ArchiveHandle="hdl:2196/00-0000-0000-0013-4C3F-7" Date="2018-12-17" FormatId="IMDI 3.04" Originator="CMDI Maker by CLASS - Cologne Language Archive Services" Type="SESSION" Version="1.0" xsi:schemaLocation="http://www.mpi.nl/IMDI/Schema/IMDI http://www.mpi.nl/IMDI/Schema/IMDI_3.0.xsd">
    <Name>Fakamae_2018002_RM</Name>
    <Title>Rua Tau Tupuna</Title>
    <Date>2018-05-09</Date>
    <Description LanguageId="ISO639-3:eng" Link="">A story about a grandmother The grandmother sends her granddaughter around the village asking for fire from all the old women. This text was recorded in video and archived files are in mp4 video format and wav audio format. The eaf ELAN file contains both English and Bislama translations.</Description>
    <MDGroup>
      <Location>
        <Continent Link="http://www.mpi.nl/IMDI/Schema/Continents.xml"  Type="ClosedVocabulary">Oceania</Continent>
        <Country Link="http://www.mpi.nl/IMDI/Schema/Countries.xml" Type="OpenVocabulary">Vanuatu</Country>
        <Region>Shefa Province</Region>
        <Address>Tongamea village Emae island</Address>
      </Location>
      <Project>
        <Name>fakamae-dewar-0487</Name>
        <Title>Documentation of Fakamae, a Polynesian Outlier of Vanuatu</Title>
        <Id>MDP0369</Id>
        <Contact>
          <Name>Amy Dewar</Name>
          <Address />
          <Email>[email protected]</Email>
          <Organisation>University of Newcastle, Australia</Organisation>
        </Contact>
        <Description LanguageId="ISO639-3:eng" Link="" />
      </Project>
   </METATRANSCRIPT:METATRANSCRIPT>

This is my code i have so far:

$XMLFile = "C:\Users\User\Documents\task.xml"
$xml = [xml](Get-Content $XMLFile)

# Load the existing document

$DeleteNames = Select-Xml -Xml $xml -Namespace @{opex='http://www.openpreservationexchange.org /opex/v1.0'} -Xpath //opex:Transfer/opex:Properties
# Specify tag names to delete and then find them

($Doc.Task.ChildNodes |Where-Object { $DeleteNames -contains $_.Name }) | ForEach-Object {
# Remove each node from its parent
[void]$_.ParentNode.RemoveChild($_)
}

# Save the modified document
$xml.Save($XMLFile)

I only need the xml data between

METATRANSCRIPT >>> METATRANSCRIPT

Thanks a lot for any help.

2
  • 1
    You said you only need the xml data between metatranscript; Is there a reason you're deleting instead of just pulling the metatranscript out? Commented May 23, 2020 at 3:14
  • Hi Zachary, thanks for your answer, no reason for that, i just think that was more easy remove the node that i dont need, instead pull Metatranscript. If that is a better solution, im ok with that, i don`t know how to do. Commented May 25, 2020 at 11:08

1 Answer 1

1

In powershell, XML is a series of nodes within nodes. So a problem you would be facing is that if you remove a parent node you inherently would delete the child as well. Metatranscript is a child of opex:DescriptiveMetadata so if you remove that you will remove Metatranscript. One approach would be to treat the file as plain text rather than xml then delete lines that start with < opex etc. Another approach would be to get all nodes then then recursively check parents to see whether or not the parent nodes are kept and clean up the rest.

That being said, deleting unwanted nodes may be the wrong approach to the problem you are describing. If you just want the contents of METATRANSCRIPT, then the following would do the trick

[xml]$xml=Get-Content test.xml -Raw
$xml.GetElementsByTagName("METATRANSCRIPT:METATRANSCRIPT")[0].OuterXml |out-file Newxml.xml
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.