0

I have parsed an XML file with BeautifulSoup in Python and I am having trouble extracting the data out of it. An example of the structure of the XML is below:

<Products page="0" pages="-1" records="27">
  <Product id="ABC001">
    <Name>This product name</Name>
    <Cur>USD</Cur>
    <Tag>Text</Tag>
    <Classes>
      <Class id="USD">
        <ClassCur>USD</ClassCur>
        <Identifier>XYZ123456</Identifier>
      </Class>
    </Classes>
  </Product>
  <Product id="XYZ002">
    <Name>That product name</Name>
    <Cur>EUR</Cur>
    <Tag>More Text</Tag>
    <Classes>
      <Class id="EUR">
        <ClassCur>EUR</ClassCur>
        <Identifier>VDSHG123456</Identifier>
      </Class>
    </Classes>
  </Product>
</Products>

The first thing I have been trying to accomplish but have so far failed to do is to extract all of the Product and Class id's "ABC001", "XYZ002" etc..

What I have tried is

products = soup.find_all("Product")

for p in products:
    print(p.find("name")) # gets the name tag
    print(p.find("cur")) # gets the cur tag
    # ...etc

However, I can't figure out how to access id within Product. For example, p.find("product") returns None.

Note that while I am using bs4 I don't have to - it's just that I have done a lot of web scraping with Python + bs4 and have found bs4 to be useful in navigating through HTML, so assumed it would be the ideal way of handling XML.

1
  • 2
    Show us the code you've tried so far. Commented Aug 31, 2016 at 18:03

1 Answer 1

1

id is an attribute of Product, not a child element, so you access it with:

p['id']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.