2

I have a DataFrame on the following format which I want to transform to XML

Parameter Name   |  Value  |  Comment
lev1.lev12          5        "Comment 1" 
lev1.lev13.lev14    10       "Comment 2"
lev2.lev22          "hi"     "Comment 3"
lev2.lev23          NaN      "No need to set value"

The levels of the XML structure is defined in the Parameter Name, where each level is separated by ".". A comment should be written as a separate line before the actual key-value pair. If the value is NaN, then the comment and the empty value should be written as comment.

So the wanted output here would be

<lev1>
      <!-- Comment 1 -->
      <lev12>5</lev12>
      <lev13>
             <!-- Comment 2 -->
             <lev14>10</lev14>
      </lev13>
</lev1>
<lev2>
      <!-- Comment 3 -->
      <lev22> "hi" </lev22>
            
      <!-- No need to set value -->
      <!-- <lev23></lev23> --> 
</lev2>

I have written the initial function that will make it possible to iterate through the DataFrame, but don't fully understand how to use ElementTree or lxml to create the XML structure.

def df_to_xml(row,etree):
    param = row['Parameter Name']
    val = row['Value']
    comment = row['Comment']

    param_levels = param.split(".")

    for level in param_levels[:-1]:

         ## With each level iterate down the tree structure.

    ## At the lowest level, add the comment and then the value

tree = ET.ElementTree()
df.apply(lambda x: df_to_xml(x,tree),axis=1)

# Write tree to xml. 

How would I go about traversing the tree to the right level and adding the comment and value in the for loop?

Appreciate any tips or input.

1 Answer 1

2

With the dataframe you provided:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    {
        "Parameter Name": [
            "lev1.lev12",
            "lev1.lev13.lev14",
            "lev2.lev22",
            "lev2.lev23",
        ],
        "Value": [5, 10, "hi", np.nan],
        "Comment": ["Comment 1", "Comment 2", "Comment 3", "No need to set value"],
    }
)

Here is one way to do it:

# Setup
df["Parameter Name"] = df["Parameter Name"].apply(lambda x: x.split("."))
df[["A", "B", "C"]] = pd.DataFrame(df["Parameter Name"].tolist(), index=df.index)
df = (
    df.drop(columns="Parameter Name")
    .reindex(columns=["A", "B", "C", "Comment", "Value"])
    .set_index(["A", "B", "C"])
)

print(df)
# Output

enter image description here

And then, with the help of Python standard library's ElementTree XML module:

import xml.etree.ElementTree as ET

# Initialize an empty tree
tree = ET.fromstring("<?xml version='1.0'?><data></data>")

# Add levels
for a, b, c in df.index:
    if tree.find(a) is None:
        elm_a = ET.SubElement(tree, a)
        elm_a.insert(0, ET.Comment(df.loc[(a, b, c), "Comment"]))
    else:
        elm_a = tree.find(a)
    if (elm_a.find(b) is None) and (df.loc[(a, b, c), "Value"] is not np.nan):
        elm_b = ET.SubElement(elm_a, b)

# Add comments and values
for a, b, c in df.index:
    comment = df.loc[(a, b, c), "Comment"]
    value = df.loc[(a, b, c), "Value"]

    if (c is np.nan) and (value is not np.nan):  # e.g. lev1.lev12
        elm = tree.find(a).find(b)
        elm.text = str(value)

    if (c is np.nan) and (value is np.nan):  # e.g. lev2.lev23
        elm = tree.find(a)
        elm.insert(0, ET.Comment(comment))
        elm.insert(1, ET.Comment(f"<{b}></{b}>"))

    if not (c is np.nan):  # e.g. lev1.lev13.lev14
        elm = tree.find(a).find(b)
        elm.insert(0, ET.Comment(comment))
        elm3 = ET.SubElement(elm, c)
        elm3.text = str(value)

So that:

ET.dump(tree)
# Output
<data>
    <lev1>
        <!--Comment 1-->
        <lev12>5</lev12>
        <lev13>
            <!--Comment 2-->
            <lev14>10</lev14>
        </lev13>
    </lev1>
    <lev2>
        <!--No need to set value-->
        <!--<lev23></lev23>-->
        <!--Comment 3-->
        <lev22>hi</lev22>
    </lev2>
</data>
Sign up to request clarification or add additional context in comments.

2 Comments

This is excellent, is it possible to make the number of levels dynamic? Perhaps one day it is more than 3 levels and some days it's just 2?
Hi, not sure to understand what you are asking. You should post a separate question - as this one has been properly answered, I think - with the new requirements, base conditions, and expected output, it will be easier to help. In any case, please consider accepting and upvoting the answer if you found it helpful. Cheers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.