0

I need some help in my python code for handling an XML file. I want to get subtags and store them in lists and do some stuff with them. Until now my code was working because I was thinking that the XML structure is the same for every file i had. so I used ElementTree library for parsing etc, then .findall(tagname) and after that I did some stuff with the lists. But then I realized that some files have more tags and because of that I don't get everything i need. To give you an idea,

<parent tag (same for every file)>
  <tag1>
    .....
  </tag1>
  <tag2>
    .....
  </tag2>
  <tag3>
    .....
  </tag3>
  <unknown tag1>
    .....
  </unknown tag1>
  <unknown tag2>
    .....
  </unknown tag2>
  <tag2>
    .....
  </tag2>
  <tag2>
    .....
  </tag2>
  <unknown tag1>
    .....
  </unknown tag1>
</parent tag>

So my current code is:

list1 = root.findall('tag1')
list2 = root.findall('tag2')
list3 = root.findall('tag3')

and then I do something for what is inside those tags which is working. I need help on how to detect every tag under parent tag, and then store them in a list so i can do the findall() funtion for each tag in the list. Something like

List_of_tags = [tag1, tag2, tag3, unknown tag1, etc]

for tag in list_of_tags:

....

Thank you in advance!

I actually parse xml files with ElemntTree like that:

try:
    tree = ET.parse(filename)
except IOError as e:
    print 'No such file or directory'
else:
    root = tree.getroot()
0

2 Answers 2

1

You can use xmltodict

pip install xmltodict

And here's how you can get all the child tags under a parent tag

import xmltodict
my_xml = """<parent_tag>
  <tag1>
    .....
  </tag1>
  <tag2>
    .....
  </tag2>
  <tag3>
    .....
  </tag3>
  <unknown_tag1>
    .....
  </unknown_tag1>
  <unknown_tag2>
    .....
  </unknown_tag2>
  <tag2>
    .....
  </tag2>
  <tag2>
    .....
  </tag2>
  <unknown_tag1>
    .....
  </unknown_tag1>
</parent_tag>"""

xmld = xmltodict.parse(my_xml)

child_tags = xmld['parent_tag'].keys()

for child_tag in child_tags:
    print(child_tag)

The output will look like this:

tag1
tag2
tag3
unknown_tag1
unknown_tag2
Sign up to request clarification or add additional context in comments.

6 Comments

first of all, thank you for your time. I actually parse the xml file with element tree, and then i get the root. I used your code and instead of my_xml i was thinking of using root. But i get an error "must be string or read-only buffer, not Element". I tried fromstring() method but i get the same error. Can you help?
Can you please update your question to include how you are parsing the xml file
I just did it, sorry for that
See if this helps you to convert the element tree to string stackoverflow.com/questions/15304229/…
If you want to use element tree instead of xmltodict to do this, you may find this helpful as well. stackoverflow.com/questions/10408927/…
|
0

----- SOLUTION -----

child_tags = root.getchildren()
for child in child_tags:
    k = child.tag
    tags.append(k)

for tag in tags:
    list1 = root.findall(tag)
    tagslist = tagslist + list1

#remove duplicates
tagslist = list(dict.fromkeys(tagslist))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.