I have a list of nodes which i would like to remove from a xml document. But i am running into a issue while removing the elements and writing the modified document into a new xml file.
Here is a python program i wrote [I am using elementTree]
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('autogen_test.xml')
root = tree.getroot()
keeper_data = ['4294905264']
instances = tree.findall('./DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
removeList = list()
for instance in instances:
#print instance
data1 = instance.find('./DVAL/DVAL_ID')
if data1.attrib.get("ID") not in keeper_data:
removeList.append(instance)
for tag in removeList:
parent = tree.findall('./DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
parent.remove(tag)
tree.write("out.xml")
My sample xml is as below [this is a standard input and i cannot modify it]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DIMENSIONS SYSTEM "dimensions.dtd">
<DIMENSIONS>
<NUM_DVALS>88816</NUM_DVALS>
<DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
<DIMENSION_ID ID="4294905334"/>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="2"/>
<SYN DISPLAY="TRUE" SEARCH="FALSE" CLASSIFY="FALSE">Brand</SYN>
</DVAL>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905325"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">hanes</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905315"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">lee</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905281"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">levi's</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905264"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">braun</SYN>
</DVAL>
</DIMENSION_NODE>
</DIMENSION_NODE>
</DIMENSION>
</DIMENSIONS>
Even after iterating through the list and finding all the node to remove. The tree.write("out.xml") always prints out the original xml. Basically i will need to remove the identified from the original xml.
Expected Output:
<DIMENSIONS>
<NUM_DVALS>88816</NUM_DVALS>
<DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
<DIMENSION_ID ID="4294905334" />
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905264" />
<SYN CLASSIFY="TRUE" DISPLAY="TRUE" SEARCH="TRUE">braun</SYN>
</DVAL>
</DIMENSION_NODE>
</DIMENSION_NODE>
</DIMENSION>
</DIMENSIONS>
4294905264?