7

I certain this type of question has been asked before, but I can't seem to get the right set of words to find the answer myself...

I've got an XML file, for example

<document>
   <page>
      <title>title1</title>
      <id>1</id>
      <text>this is text1</text>
   </page>
   <page>
      <title>title2</title>
      <id>2</id>
      <text>this is text2</text>
   </page>
   <page>
      <title>title3</title>
      <id>3</id>
      <comment>random comment</comment>
      <text>this is text3</text>
   </page>
</document>

I am trying to find a way to, ideally, store each values within tags into an array.

Now I had originally tried just printing everything with the code below, but that only worked until the time where there is the random tag which throws off the indexing. So, is there a way to simple get the text from tag? Or is there an absolute need to know the array index?

import xml.etree.ElementTree as ET
tree = ET.parse('./xml_file.xml')
root = tree.getroot()

for child in root:
    print(child[2].text)

I apologies if this is common question, I really couldn't figure out any answers online.

3 Answers 3

7
import xml.etree.ElementTree as ET
tree = ET.parse('./all_foods.xml')
my_text = [item.text for item in tree.iter()]

This will give you list of text that you want. If you want some specific text you can use

my_tags = [item.text for item in tree.iter() if item.text == "title1"]
Sign up to request clarification or add additional context in comments.

1 Comment

You saved my life! How to find value using the tag. For ex: id, I need the id value.
4

Since from your question it sounds like you're looking to get a specific key, you can simple use find(<key_name>).text to get the contents of the XML key with that name

import xml.etree.ElementTree as ET
tree = ET.parse('./all_foods.xml')
root = tree.getroot()
for x in root:
    print(x.find("title").text)

>>>
   title1
   title2
   title3

3 Comments

I'm not sure why, but this is giving me "TypeError: 'ElementTree' object is not iterable." Regardless, thanks for the help I will look into this find function. In the mean time I got an alternate answer from nick_gabpe which works. I do however appreciate this, more to learn is always great.
Sorry, it should be for x in root not for x in tree, it'll work if you change that
Does this still work? I appear to have an issue that x.find(string) returns None on items that don't match the search string, which then causes an error when trying to get the .text attribute. Thus it just crashes. Can't think of a nice way around this without having a massive sequences of try:except: blocks (I'm needing to access several different named tags in a single file)
0

You can also use pandas read_xml():

import pandas as pd

xml_="""<document>
   <page>
      <title>title1</title>
      <id>1</id>
      <text>this is text1</text>
   </page>
   <page>
      <title>title2</title>
      <id>2</id>
      <text>this is text2</text>
   </page>
   <page>
      <title>title3</title>
      <id>3</id>
      <comment>random comment</comment>
      <text>this is text3</text>
   </page>
</document>"""

df = pd.read_xml(xml_, xpath="page")
print(df.to_string())

Output:

    title  id           text         comment
0  title1   1  this is text1            None
1  title2   2  this is text2            None
2  title3   3  this is text3  random comment

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.