0

I need to extract from an XML few nodes IF one of them contains keyword. Finally I got to point where I'll have the keywords printed if found. Now is the tricky part (at least for me ;-)). I'll explain it below in more details. XML:

<?xml version="1.0"?>
<ItemSearchResponse xmlns="http://url">
  <Items>
    <Item>
      <ItemAttributes>
        <ListPrice>
          <Amount>2260</Amount>
        </ListPrice>
      </ItemAttributes>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1853</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1853</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1200</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
    </Item>
  </Items>
</ItemSearchResponse>

My script prints out the Amount value if found and == 1853. What I actually need is: when 1853 found - the script should extract the whole <Offers> to new file. I got script running and got stuck. I have really no clue how to get back from <Amount> and copy the whole <Offers> group.

Script 1:

import xml.etree.ElementTree as ET
import sys

name = str.strip(sys.argv[1])
filename = str.strip(sys.argv[2])

fp = open("sample.xml","r")
element = ET.parse(fp)

for elem in element.iter():
    if elem.tag == '{http://url}Price':
        output = {}
        for elem1 in list(elem):
            if elem1.tag == '{http://url}Amount':
                if elem1.text == name:
                    output['Amount'] = elem1.text
                    print output

And my output:

python sample1.py '1853' x
{'Amount': '1853'}
{'Amount': '1853'}

The 'x'-thing here is no relevant.

How to get back from <Amount> and copy the whole <Offers> group to a new file or just print the thing out. It need to be done with ElementTree.

2
  • only ElementTree? because this package pythonhosted.org/pyquery is funny for doing this kind of think, it is a jquery like system Commented Sep 5, 2013 at 10:06
  • I'm limited here to standard :/ Commented Sep 5, 2013 at 10:22

1 Answer 1

3

What about this:

import xml.etree.ElementTree as ET
import sys

name = str.strip(sys.argv[1])
filename = str.strip(sys.argv[2])

fp = open("sample.xml","r")
tree = ET.parse(fp)
root = tree.getroot()

for offers in root.findall('.//{http://url}Offers'):
    value_found = False
    for amount in offers.findall('.//{http://url}Amount'):
        if amount.text == name:
            value_found = True
            break
    if value_found:
        print ET.tostring(offers)

Prints

<url:Offers xmlns:url="http://url">
    <url:Offer>
      <url:OfferListing>
        <url:Price>
          <url:Amount>1853</url:Amount>
        </url:Price>
      </url:OfferListing>
    </url:Offer>
  </url:Offers>

<url:Offers xmlns:url="http://url">
    <url:Offer>
      <url:OfferListing>
        <url:Price>
          <url:Amount>1853</url:Amount>
        </url:Price>
      </url:OfferListing>
    </url:Offer>
  </url:Offers>

To write to files, you can do something like: (borrowed from this answer)

for i, offers in enumerate(root.findall('.//{http://url}Offers'), start=1):
    value_found = False
    for amount in offers.findall('.//{http://url}Amount'):
        if amount.text == name:
            value_found = True
            break
    if value_found:
        tree = ET.ElementTree(offers)
        tree.write("offers%d.xml" % i,
           xml_declaration=True, encoding='utf-8',
           method="xml", default_namespace='http://url')

which writes files like:

<?xml version='1.0' encoding='utf-8'?>
<Offers xmlns="http://url">
    <Offer>
      <OfferListing>
        <Price>
          <Amount>1853</Amount>
        </Price>
      </OfferListing>
    </Offer>
  </Offers>
Sign up to request clarification or add additional context in comments.

7 Comments

It's not like that. I'm looking for the Offers with 1853 in Amount. If found, I need to extract the whole <Offers> with childnodes to new file. So, when 1853 given, two groups should be extracted - <Offers><Offer><OfferListing><Price><Amount>1853</Amount></Price></OfferListing></Offer></Offers><Offers><Offer><OfferListing><Price><Amount>1853</Amount></Price></OfferListing></Offer></Offers>. I thougt also about the xml.dom, but I'm not sure if I think in the right way here
My bad. I removed the 2nd break and called ET.tostring(offers)
Yep, this is just perfect! I see I need still learn about enumerates to fully understand the thing, but - thank you a lot! This is a great help!
@jakkolwiek, enumerate() is simply a very neat helper to count in loops. My greatest discovery lately was the "start" parameter ;)
Actually I still don't get one thing... Let's say in my source xml, there are like 300 <Amount> tags with value = 1853. It's all printed nicely in terminal but in file there is only last tag written. I've tried also to stream the strings to file but still can't get it right. And still - in terminal is everything fine, but in file ends up only one last record.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.