1

I have a html below which i was trying to parse using xpath. But i am only get empty sting in return. Can anyone please tell me where i am mistaken. I have tried everything but couldn't succeed.

Xpath Code for label :

divLbl=ch.xpath("//div[@class='left-container']/article/ul[@class='list-unstyled row']/li[@class='col-sm-6 mrg-bottom']/span[@class='text-light']")

Xpath Code for value of corresponding label :

divVal=ch.xpath("//div[@class='left-container']/article/ul[@class='list-unstyled row']/li[@class='col-sm-6 mrg-bottom']/span[@class='text-light']/strong")

HTML value :

<div>
                        <h2 class="rowbreak"><strong>Information of the Car</strong></h2>
                        <ul class=" list-unstyled row">
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Make Year:</span> <strong>Aug 2009</strong></li>
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-road text-light"></span> <span class=" text-light">Kilometers:</span> <strong>127,553</strong></li>
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-map-marker text-light"></span> <span class=" text-light">City:</span> 
                                <strong class="carCity_795606">  
                                                                        <a href="javascript:void(0);" onclick="javascript: $( &quot;#maplinkbtn&quot; ).trigger( &quot;click&quot; ); ">
                                    Sambalpur                                    </a>
                                                                    </strong>

                            </li>
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Listing Date:</span> <strong>27 Apr 2015</strong></li>
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">No. of Owners:</span> <strong> First Owner</strong>
                            </li>
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-tint text-light"></span> <span class=" text-light">Fuel Type:</span> <strong> Petrol</strong></li>
                              <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">Posted by:</span> <strong> 
                                  Dealer</strong>
                            </li>
                        </ul>
           </div>

Edited HTML:

 <div>
                    <h2 class="rowbreak"><strong>Information of the Car</strong></h2>
                    <ul class=" list-unstyled row">
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Make Year:</span> <strong>Aug 2009</strong></li>
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-road text-light"></span> <span class=" text-light">Kilometers:</span> <strong>127,553</strong></li>
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-map-marker text-light"></span> <span class=" text-light">City:</span> 
                            <strong class="carCity_795606">  
                                                                    <a href="javascript:void(0);" onclick="javascript: $( &quot;#maplinkbtn&quot; ).trigger( &quot;click&quot; ); ">
                                Sambalpur                                    </a>
                                                                </strong>

                        </li>
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Listing Date:</span> <strong>27 Apr 2015</strong></li>
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">No. of Owners:</span> <strong> First Owner</strong>
                        </li>
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-tint text-light"></span> <span class=" text-light">Fuel Type:</span> <strong> Petrol</strong></li>
                          <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">Posted by:</span> <strong> 
                              Dealer</strong>
                        </li>
                    </ul>
       </div>

 <h2 class="rowbreak"></h2>
    <ul class=" list-unstyled row">
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light">One Time Tax :</span> <strong>Individual</strong></li>
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light">Registration No. :</span> <strong>OR03F3141</strong></li>
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light"> Insurance &amp; Expiry :</span> <strong>No Insurance&nbsp;</strong></li>
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light">Registration Place: </span> <strong> Sambalpur</strong></li>
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light">Transmission :</span> <strong>Manual</strong></li>
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light">Color :</span> <strong>Silver</strong></li>
                        </ul>

1 Answer 1

3

The XPath you are currently using is quite fragile - you are checking every single element in the chain and using "layout-oriented" classes.

I would start with h2 element containing strong element with "Information of the Car" text and get the following ul element. E.g. to get all the labels:

//h2[strong = 'Information of the Car']/following-sibling::ul/li/span/text()

Demo:

In [3]: ch = fromstring(data)

In [4]: ch.xpath("//h2[strong = 'Information of the Car']/following-sibling::ul/li/span/text()")
['Make Year:', 'Kilometers:', 'City:', 'No. of Owners:', 'Fuel Type:', 'Posted by:']

Sample (getting names and values):

In [25]: for field in ch.xpath("//h2/following-sibling::ul/li"):
    name = ''.join(field.xpath(".//span/text()")).strip()
    value = ''.join(field.xpath(".//strong//text()")).strip()
    print name, value
   ....:     
Make Year: Aug 2009
Kilometers: 127,553
City: Sambalpur
Listing Date: 27 Apr 2015
No. of Owners: First Owner
Fuel Type: Petrol
Posted by: Dealer
One Time Tax : Individual
Registration No. : OR03F3141
Insurance & Expiry : No Insurance
Registration Place: Sambalpur
Transmission : Manual
Color : Silver
Sign up to request clarification or add additional context in comments.

8 Comments

totally agreed on "layout-oriented" classes +1
Thanks alecxe. But what is the problem with the layout-oriented classes. Why can't they be parsed using the xpath? Could you please elaborate more?
@Naresh sure, the point here is that the classes like, for example, col-sm-6 are used for the page layout, design and container/box sizing - these are the things likely to be changed in the future which would cause your code to break. Besides, using classes like these does not make your expressions readable and concise.
@Naresh Content should be separated from presentation if possible. If you use layout instructions to find the content, then any change to the layout will prevent you from finding the content - even if it is still there.
@Naresh sure, I've used the updated html snippet and provided an example code to get the name and values from both uls, check it out. Hope that helps.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.