2

I am working on scrapy , i am scraping a site and using xpath to scrape items. But some of the div contains javascript, so when i used xpath until the div id that contains javascript code is returning an empty list,and without including that div element(which contains javascript) can able to fetch HTML data

HTML code

<div class="subContent2">    
   <div id="contentDetails">
       <div class="eventDetails">
            <h2>
                <a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a>
            </h2>
       </div>
   </div>
</div> 

Spider Code

class ExampleSpider(BaseSpider):
    name = "example"
    domain_name = "www.example.com"
    start_urls = ["http://www.example.com/jkl/index.php"]


    def parse(self, response):
         hxs = HtmlXPathSelector(response)
         required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]')

So how can i get text(Some data) from the anchor tag inside the h2 element as mentioned above, is there any alternate way for fetching data from the elements that contains javascript in scrapy

1 Answer 1

2
<div class="subContent2">    
   <div id="contentDetails">
       <div class="eventDetails">
            <h2>
                <a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a>
            </h2>
       </div>
   </div>
</div> 

The problem is not the javascript code in this case to get 'Some data' string.

You need either to get the subnode:

required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]/h2/a/text()')

enter image description here

or use string function:

required_data = hxs.select('string(//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"])')
Sign up to request clarification or add additional context in comments.

1 Comment

:Thanks for u r reply,i got an empty unicode string as below when i used string function [<HtmlXPathSelector xpath='string(//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]/h2/a/text())' data=u''>]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.