Python extract nodes containing tag using ElementTree

Question

I need to extract from an XML few nodes IF one of them contains keyword. Finally I got to point where I'll have the keywords printed if found. Now is the tricky part (at least for me ;-)). I'll explain it below in more details. XML:

<?xml version="1.0"?>
<ItemSearchResponse xmlns="http://url">
  <Items>
    <Item>
      <ItemAttributes>
        <ListPrice>
          <Amount>2260</Amount>
        </ListPrice>
      </ItemAttributes>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1853</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1853</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1200</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
    </Item>
  </Items>
</ItemSearchResponse>

My script prints out the Amount value if found and == 1853. What I actually need is: when 1853 found - the script should extract the whole <Offers> to new file. I got script running and got stuck. I have really no clue how to get back from <Amount> and copy the whole <Offers> group.

Script 1:

import xml.etree.ElementTree as ET
import sys

name = str.strip(sys.argv[1])
filename = str.strip(sys.argv[2])

fp = open("sample.xml","r")
element = ET.parse(fp)

for elem in element.iter():
    if elem.tag == '{http://url}Price':
        output = {}
        for elem1 in list(elem):
            if elem1.tag == '{http://url}Amount':
                if elem1.text == name:
                    output['Amount'] = elem1.text
                    print output

And my output:

python sample1.py '1853' x
{'Amount': '1853'}
{'Amount': '1853'}

The 'x'-thing here is no relevant.

How to get back from <Amount> and copy the whole <Offers> group to a new file or just print the thing out. It need to be done with ElementTree.

only ElementTree? because this package pythonhosted.org/pyquery is funny for doing this kind of think, it is a jquery like system — Philippe T.
– Philippe T., Commented Sep 5, 2013 at 10:06

Community · Accepted Answer · 2017-05-23 12:05:11Z

3

What about this:

import xml.etree.ElementTree as ET
import sys

name = str.strip(sys.argv[1])
filename = str.strip(sys.argv[2])

fp = open("sample.xml","r")
tree = ET.parse(fp)
root = tree.getroot()

for offers in root.findall('.//{http://url}Offers'):
    value_found = False
    for amount in offers.findall('.//{http://url}Amount'):
        if amount.text == name:
            value_found = True
            break
    if value_found:
        print ET.tostring(offers)

Prints

<url:Offers xmlns:url="http://url">
    <url:Offer>
      <url:OfferListing>
        <url:Price>
          <url:Amount>1853</url:Amount>
        </url:Price>
      </url:OfferListing>
    </url:Offer>
  </url:Offers>

<url:Offers xmlns:url="http://url">
    <url:Offer>
      <url:OfferListing>
        <url:Price>
          <url:Amount>1853</url:Amount>
        </url:Price>
      </url:OfferListing>
    </url:Offer>
  </url:Offers>

To write to files, you can do something like: (borrowed from this answer)

for i, offers in enumerate(root.findall('.//{http://url}Offers'), start=1):
    value_found = False
    for amount in offers.findall('.//{http://url}Amount'):
        if amount.text == name:
            value_found = True
            break
    if value_found:
        tree = ET.ElementTree(offers)
        tree.write("offers%d.xml" % i,
           xml_declaration=True, encoding='utf-8',
           method="xml", default_namespace='http://url')

which writes files like:

<?xml version='1.0' encoding='utf-8'?>
<Offers xmlns="http://url">
    <Offer>
      <OfferListing>
        <Price>
          <Amount>1853</Amount>
        </Price>
      </OfferListing>
    </Offer>
  </Offers>

edited May 23, 2017 at 12:05

CommunityBot

11 silver badge

answered Sep 5, 2013 at 10:28

paul trmbrth

20.8k4 gold badges56 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

jakkolwiek Over a year ago

It's not like that. I'm looking for the Offers with 1853 in Amount. If found, I need to extract the whole <Offers> with childnodes to new file. So, when 1853 given, two groups should be extracted - <Offers><Offer><OfferListing><Price><Amount>1853</Amount></Price></OfferListing></Offer></Offers><Offers><Offer><OfferListing><Price><Amount>1853</Amount></Price></OfferListing></Offer></Offers>. I thougt also about the xml.dom, but I'm not sure if I think in the right way here

paul trmbrth Over a year ago

My bad. I removed the 2nd break and called ET.tostring(offers)

jakkolwiek Over a year ago

Yep, this is just perfect! I see I need still learn about enumerates to fully understand the thing, but - thank you a lot! This is a great help!

paul trmbrth Over a year ago

@jakkolwiek, enumerate() is simply a very neat helper to count in loops. My greatest discovery lately was the "start" parameter ;)

jakkolwiek Over a year ago

Actually I still don't get one thing... Let's say in my source xml, there are like 300 <Amount> tags with value = 1853. It's all printed nicely in terminal but in file there is only last tag written. I've tried also to stream the strings to file but still can't get it right. And still - in terminal is everything fine, but in file ends up only one last record.

|

Collectives™ on Stack Overflow

Python extract nodes containing tag using ElementTree

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related