0

I have an xml that I want to parse out and create a dataframe. What I have been trying so far is something like this:

all_dicts = []
fields = ['f1','f2','f3','f4','f5','f6','f7']
for i in root.findall('.//item'):
    d = {}
    for j in product.findall('.//subitems'):
        for k in j.findall('.//subitem'):
            if k.attrib['name'] in fields:
                d[k.attrib['name']] = k.text

    all_dicts.append(d)

This gives me a list of dictionaries that I can easily do pd.DataFrame(all_dicts) to get what I want. However, the subitems tend to have multiple sub-elements that have the same name. For example, each subitem could have multiple times where k.attrib['name'] == f1, so it adds an item to the dictionary with the same key and therefore just overwrites the previous value when I need all of them. Is there a way to create such as data frame easily?

1 Answer 1

1
  • Use dict.get to check if the key exists
    • If the key does not exist, add it as a list
    • If the key does exist, append to the list
  • Without a comprehensive example of the xml, I can't offer a more detailed example.
all_dicts = []
fields = ['f1','f2','f3','f4','f5','f6','f7']
for i in root.findall('.//item'):
    d = dict()
    for j in product.findall('.//subitems'):
        for k in j.findall('.//subitem'):
            n = k.attrib['name']
            if n in fields:
                if d.get(n) == None:  # check if key exist
                    d[n] = [k.text]  # add key as a list 
                else:
                    d[n].append(k.text)  # append to list

    all_dicts.append(d)
  • Alternatively, only add the dict value as a list, if the field is 'f1'.
all_dicts = []
fields = ['f1','f2','f3','f4','f5','f6','f7']
for i in root.findall('.//item'):
    d = dict()
    for j in product.findall('.//subitems'):
        for k in j.findall('.//subitem'):
            n = k.attrib['name']
            if n in fields and n == 'f1':  # if field is 'f1' add list
                if d.get(n) == None:  # check if key exist
                    d[n] = [k.text]  # add key as a list
                else:
                    d[n].append(k.text)  # append to list
            elif n in fields:  # if field isn't 'f1' just add the text
                d[n] = k.text

    all_dicts.append(d)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.