Creating dataframe from xml

Question

I have an xml that I want to parse out and create a dataframe. What I have been trying so far is something like this:

all_dicts = []
fields = ['f1','f2','f3','f4','f5','f6','f7']
for i in root.findall('.//item'):
    d = {}
    for j in product.findall('.//subitems'):
        for k in j.findall('.//subitem'):
            if k.attrib['name'] in fields:
                d[k.attrib['name']] = k.text

    all_dicts.append(d)

This gives me a list of dictionaries that I can easily do pd.DataFrame(all_dicts) to get what I want. However, the subitems tend to have multiple sub-elements that have the same name. For example, each subitem could have multiple times where k.attrib['name'] == f1, so it adds an item to the dictionary with the same key and therefore just overwrites the previous value when I need all of them. Is there a way to create such as data frame easily?

Trenton McKinney · Accepted Answer · 2020-06-08 17:51:40Z

Use dict.get to check if the key exists
- If the key does not exist, add it as a list
- If the key does exist, append to the list
Without a comprehensive example of the xml, I can't offer a more detailed example.

all_dicts = []
fields = ['f1','f2','f3','f4','f5','f6','f7']
for i in root.findall('.//item'):
    d = dict()
    for j in product.findall('.//subitems'):
        for k in j.findall('.//subitem'):
            n = k.attrib['name']
            if n in fields:
                if d.get(n) == None:  # check if key exist
                    d[n] = [k.text]  # add key as a list 
                else:
                    d[n].append(k.text)  # append to list

    all_dicts.append(d)

Alternatively, only add the dict value as a list, if the field is 'f1'.

all_dicts = []
fields = ['f1','f2','f3','f4','f5','f6','f7']
for i in root.findall('.//item'):
    d = dict()
    for j in product.findall('.//subitems'):
        for k in j.findall('.//subitem'):
            n = k.attrib['name']
            if n in fields and n == 'f1':  # if field is 'f1' add list
                if d.get(n) == None:  # check if key exist
                    d[n] = [k.text]  # add key as a list
                else:
                    d[n].append(k.text)  # append to list
            elif n in fields:  # if field isn't 'f1' just add the text
                d[n] = k.text

    all_dicts.append(d)

Collectives™ on Stack Overflow

Creating dataframe from xml

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related