17

I need to merge two xml files on the third block of the xml. So, files A.xml and B.xml look like this:

A.xml

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="T">
      <result_data type="value" value="19"/>
      <result_data type="value" value="15"/>
      <result_data type="value" value="14"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
    </result>
  </results>
</sample>

B.xml

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="Q">
      <result_data type="value" value="11"/>
      <result_data type="value" value="21"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
      <result_data type="value" value="15"/>
    </result>
  </results>
</sample>

I need to merge on 'results'

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="T">
      <result_data type="value" value="19"/>
      <result_data type="value" value="15"/>
      <result_data type="value" value="14"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
   </result>
   <result type="Q">
      <result_data type="value" value="11"/>
      <result_data type="value" value="21"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
      <result_data type="value" value="15"/>
   </result>
  </results>
</sample>

What I have done so far is this:

import os, os.path, sys
import glob
from xml.etree import ElementTree

def run(files):
    xml_files = glob.glob(files +"/*.xml")
    xml_element_tree = None
    for xml_file in xml_files:
        # get root
        data = ElementTree.parse(xml_file).getroot()
        # print ElementTree.tostring(data)
        for result in data.iter('result'):
            if xml_element_tree is None:
                xml_element_tree = data 
            else:
                xml_element_tree.extend(result) 
    if xml_element_tree is not None:
        print ElementTree.tostring(xml_element_tree)

As you can see, I assign the initial xml_element_tree to data which has the heading etc, and then extend with 'result'. However, this gives me this:

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="T">
      <result_data type="value" value="19"/>
      <result_data type="value" value="15"/>
      <result_data type="value" value="14"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
   </result>
  </results>
   <result_data type="value" value="11"/>
      <result_data type="value" value="21"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
      <result_data type="value" value="15"/>
   </result>
</sample>

where the results need to be at the bottom. Any help will be appreciated.

4
  • Possible duplicate of my question: stackoverflow.com/questions/14878706/… Commented Apr 10, 2013 at 9:27
  • Your sample XML files are malformed, and yeah its a duplicate Commented Apr 10, 2013 at 9:27
  • Why are they malformed? Commented Apr 10, 2013 at 9:30
  • 1
    <sample="1"> is not valid xml. Anyway this is a duplicate question so read the answer in that. Commented Apr 10, 2013 at 9:35

2 Answers 2

17

Although this is mostly a duplicate and the answer can be found here, I already did this so I can share this Python code:

import os, os.path, sys
import glob
from xml.etree import ElementTree

def run(files):
    xml_files = glob.glob(files +"/*.xml")
    xml_element_tree = None
    for xml_file in xml_files:
        data = ElementTree.parse(xml_file).getroot()
        # print ElementTree.tostring(data)
        for result in data.iter('results'):
            if xml_element_tree is None:
                xml_element_tree = data 
                insertion_point = xml_element_tree.findall("./results")[0]
            else:
                insertion_point.extend(result) 
    if xml_element_tree is not None:
        print ElementTree.tostring(xml_element_tree)

However, this question contains another problem not present in the other post. The sample XML files are not valid XML, so it's not possible to have an XML tag with:

<sample="1">
    ...
</sample>

Instead change to something like:

<sample id="1">
    ...
</sample>
Sign up to request clarification or add additional context in comments.

Comments

0

You could try this solution:

import glob
from xml.etree import ElementTree

def newRunRun(folder):
    xml_files = glob.glob(folder+"/*.xml")
    node = None
    for xmlFile in xml_files:      
        tree = ElementTree.parse(xmlFile)
        root = tree.getroot()
        if node is None:
            node = root
        else:
            elements = root.find("./results")           
            for element in elements._children:
                node[1].append(element)                
    print ElementTree.tostring(node)

folder = "resources"
newRunRun(folder) 

As you can see, I´m using the first doc as a container, inserting inside it the elements of others docs... This is the ouput generated:

<sample id="1">
<workflow value="x" version="1" />
  <results>
   <result type="Q">
      <result_data type="value" value="11" />
      <result_data type="value" value="21" />
      <result_data type="value" value="13" />
      <result_data type="value" value="12" />
      <result_data type="value" value="15" />
    </result>
  <result type="T">
      <result_data type="value" value="19" />
      <result_data type="value" value="15" />
      <result_data type="value" value="14" />
      <result_data type="value" value="13" />
      <result_data type="value" value="12" />
    </result>
  </results>
</sample>

Using the version: Python 2.7.15

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.