2

This is my XML file:

<?xml version="1.0" ?>
<Items>
    <Item>
        <ASIN>3570102769</ASIN>
        <DetailPageURL>http://www.amazon.de/Inside-IS-Tage-Islamischen-Staat/dp/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3570102769</DetailPageURL>
        <ItemLinks>
            <ItemLink>
                <Description>Add To Wishlist</Description>
                <URL>http://www.amazon.de/gp/registry/wishlist/add-item.html%3Fasin.0%3D3570102769%26SubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
            <ItemLink>
                <Description>Tell A Friend</Description>
                <URL>http://www.amazon.de/gp/pdp/taf/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Customer Reviews</Description>
                <URL>http://www.amazon.de/review/product/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Offers</Description>
                <URL>http://www.amazon.de/gp/offer-listing/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
        </ItemLinks>
        <ItemAttributes>
            <Author>Jürgen Todenhöfer</Author>
            <Binding>Gebundene Ausgabe</Binding>
            <EAN>9783570102763</EAN>
            <EANList>
                <EANListElement>9783570102763</EANListElement>
            </EANList>
            <ISBN>3570102769</ISBN>
            <IsEligibleForTradeIn>1</IsEligibleForTradeIn>
            <ItemDimensions>
                <Height Units="hundredths-inches">874</Height>
                <Length Units="hundredths-inches">575</Length>
                <Width Units="hundredths-inches">126</Width>
            </ItemDimensions>
            <Label>C. Bertelsmann Verlag</Label>
            <Languages>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Published</Type>
                </Language>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Original</Type>
                </Language>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Unbekannt</Type>
                </Language>
            </Languages>
            <ListPrice>
                <Amount>1799</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 17,99</FormattedPrice>
            </ListPrice>
            <Manufacturer>C. Bertelsmann Verlag</Manufacturer>
            <ManufacturerMinimumAge Units="months">192</ManufacturerMinimumAge>
            <NumberOfPages>288</NumberOfPages>
            <PackageDimensions>
                <Height Units="hundredths-inches">118</Height>
                <Length Units="hundredths-inches">567</Length>
                <Weight Units="hundredths-pounds">93</Weight>
                <Width Units="hundredths-inches">252</Width>
            </PackageDimensions>
            <PackageQuantity>1</PackageQuantity>
            <ProductGroup>Book</ProductGroup>
            <ProductTypeName>ABIS_BOOK</ProductTypeName>
            <PublicationDate>2015-04-27</PublicationDate>
            <Publisher>C. Bertelsmann Verlag</Publisher>
            <Studio>C. Bertelsmann Verlag</Studio>
            <Title>Inside IS - 10 Tage im 'Islamischen Staat'</Title>
            <TradeInValue>
                <Amount>930</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 9,30</FormattedPrice>
            </TradeInValue>
        </ItemAttributes>
        <OfferSummary>
            <LowestNewPrice>
                <Amount>1799</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 17,99</FormattedPrice>
            </LowestNewPrice>
            <LowestUsedPrice>
                <Amount>1390</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 13,90</FormattedPrice>
            </LowestUsedPrice>
            <LowestCollectiblePrice>
                <Amount>4999</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 49,99</FormattedPrice>
            </LowestCollectiblePrice>
            <TotalNew>56</TotalNew>
            <TotalUsed>8</TotalUsed>
            <TotalCollectible>1</TotalCollectible>
            <TotalRefurbished>0</TotalRefurbished>
        </OfferSummary>
        <Offers>
            <TotalOffers>1</TotalOffers>
            <TotalOfferPages>1</TotalOfferPages>
            <MoreOffersUrl>http://www.amazon.de/gp/offer-listing/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</MoreOffersUrl>
            <Offer>
                <OfferAttributes>
                    <Condition>New</Condition>
                </OfferAttributes>
                <OfferListing>
                    <OfferListingId>9KHCZj9qtL6ucVBPASfXaryQjU8tWbc0n%2F3F4F7GraOKW6Csji2OxpD93%2FkoHwgIGQctlnrtx4RWIeJULAcvvsFhiopFi08JdsZ%2FeO3u6g0%3D</OfferListingId>
                    <Price>
                        <Amount>1799</Amount>
                        <CurrencyCode>EUR</CurrencyCode>
                        <FormattedPrice>EUR 17,99</FormattedPrice>
                    </Price>
                    <Availability>Gewöhnlich versandfertig in 24 Stunden</Availability>
                    <AvailabilityAttributes>
                        <AvailabilityType>now</AvailabilityType>
                        <MinimumHours>0</MinimumHours>
                        <MaximumHours>0</MaximumHours>
                    </AvailabilityAttributes>
                    <IsEligibleForSuperSaverShipping>1</IsEligibleForSuperSaverShipping>
                </OfferListing>
            </Offer>
        </Offers>
    </Item>
    <Item>
        <ASIN>3813506479</ASIN>
        <DetailPageURL>http://www.amazon.de/Altes-Land-Roman-D%C3%B6rte-Hansen/dp/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3813506479</DetailPageURL>
        <ItemLinks>
            <ItemLink>
                <Description>Add To Wishlist</Description>
                <URL>http://www.amazon.de/gp/registry/wishlist/add-item.html%3Fasin.0%3D3813506479%26SubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
            <ItemLink>
                <Description>Tell A Friend</Description>
                <URL>http://www.amazon.de/gp/pdp/taf/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Customer Reviews</Description>
                <URL>http://www.amazon.de/review/product/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Offers</Description>
                <URL>http://www.amazon.de/gp/offer-listing/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
        </ItemLinks>
        <ItemAttributes>
            <Author>Dörte Hansen</Author>
            <Binding>Gebundene Ausgabe</Binding>
            <EAN>9783813506471</EAN>
            <EANList>
                <EANListElement>9783813506471</EANListElement>
            </EANList>
            <ISBN>3813506479</ISBN>
            <IsEligibleForTradeIn>1</IsEligibleForTradeIn>
            <ItemDimensions>
                <Height Units="hundredths-inches">870</Height>
                <Length Units="hundredths-inches">567</Length>
                <Width Units="hundredths-inches">114</Width>
            </ItemDimensions>
            <Label>Albrecht Knaus Verlag</Label>
            <Languages>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Published</Type>
                </Language>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Original</Type>
                </Language>
            </Languages>
            <ListPrice>
                <Amount>1999</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 19,99</FormattedPrice>
            </ListPrice>
            <Manufacturer>Albrecht Knaus Verlag</Manufacturer>
            <NumberOfPages>288</NumberOfPages>
            <PackageDimensions>
                <Height Units="hundredths-inches">118</Height>
                <Length Units="hundredths-inches">858</Length>
                <Weight Units="hundredths-pounds">101</Weight>
                <Width Units="hundredths-inches">559</Width>
            </PackageDimensions>
            <ProductGroup>Book</ProductGroup>
            <ProductTypeName>ABIS_BOOK</ProductTypeName>
            <PublicationDate>2015-02-16</PublicationDate>
            <Publisher>Albrecht Knaus Verlag</Publisher>
            <Studio>Albrecht Knaus Verlag</Studio>
            <Title>Altes Land: Roman</Title>
            <TradeInValue>
                <Amount>965</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 9,65</FormattedPrice>
            </TradeInValue>
        </ItemAttributes>
        <OfferSummary>
            <LowestNewPrice>
                <Amount>1999</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 19,99</FormattedPrice>
            </LowestNewPrice>
            <LowestUsedPrice>
                <Amount>1599</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 15,99</FormattedPrice>
            </LowestUsedPrice>
            <TotalNew>72</TotalNew>
            <TotalUsed>8</TotalUsed>
            <TotalCollectible>0</TotalCollectible>
            <TotalRefurbished>0</TotalRefurbished>
        </OfferSummary>
        <Offers>
            <TotalOffers>1</TotalOffers>
            <TotalOfferPages>1</TotalOfferPages>
            <MoreOffersUrl>http://www.amazon.de/gp/offer-listing/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</MoreOffersUrl>
            <Offer>
                <OfferAttributes>
                    <Condition>New</Condition>
                </OfferAttributes>
                <OfferListing>
                    <OfferListingId>aeRv5KPt26T8S0hLrgV8Bv9UPYABYOMijGRxffbNJXUZSN4XfeeOZZpCZ28EURzmgMLlcYEBSRlMXS%2F8Z0pN1JbYerndME%2B2VK3RosfdQJA%3D</OfferListingId>
                    <Price>
                        <Amount>1999</Amount>
                        <CurrencyCode>EUR</CurrencyCode>
                        <FormattedPrice>EUR 19,99</FormattedPrice>
                    </Price>
                    <Availability>Gewöhnlich versandfertig in 24 Stunden</Availability>
                    <AvailabilityAttributes>
                        <AvailabilityType>now</AvailabilityType>
                        <MinimumHours>0</MinimumHours>
                        <MaximumHours>0</MaximumHours>
                    </AvailabilityAttributes>
                    <IsEligibleForSuperSaverShipping>1</IsEligibleForSuperSaverShipping>
                </OfferListing>
            </Offer>
        </Offers>
    </Item>
</Items>

I want to get any ASIN element. So I tried this:

from lxml import etree
doc = etree.fromstring(xmlstring)
items = doc.xpath('//Items/Item')
for a in items:
    asin = a.xpath('//ASIN/text()')
    print asin

What I get is this:

['3570102769', '3813506479']
['3570102769', '3813506479']

But I want this:

['3570102769']
['3813506479']

I don't understand what's the problem here? I think I should iterate over any element and in every element is one item with one asin. Why does it return two times two asin?

1
  • There are only 2 ASIN elements at given XML? You're expecting 2, 2 element lists? Commented May 11, 2015 at 9:01

1 Answer 1

2

When you're searching for a.xpath('//ASIN/text()') you're searching the complete document tree again. Quoting from the XML Path language specification:

//para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node

So what you're doing is iterating over the matched Item nodes and saying "Give me all ASIN nodes in this document please". The context for this (the Item node) is ignored.

What you should do instead, is directly select the ASIN child-node directly. Keeping to your original implementation this could look like this:

doc = etree.fromstring(xmlstring)
items = doc.xpath('//Items/Item')
for a in items:
    asin = a.xpath('ASIN/text()')
    print asin

which gives the output you desire:

['3570102769']
['3813506479']

Alternatively, if you're not certain where in the Item node your ASIN appears, you could use .//ASIN/text()

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! This works perfect. Now it's absolutely clear to me.
@JulianBaehr You can use the current node (.) as context, i.e. asin = a.xpath('.//ASIN/text()'), but if <ASIN> is a direct child of <Item> that's not really necessary. //foo is an absolute path that starts at the root and traverses the entire tree (it's a shorthand for /descendant-or-self::node()/child::foo). Prepending it with a dot makes it a relative path, but it still traverses the entire subtree. Avoid if you don't really want to traverse the entire tree.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.