scrape data through xpath from div that contains javascript in scrapy python

Question

I am working on scrapy , i am scraping a site and using xpath to scrape items. But some of the div contains javascript, so when i used xpath until the div id that contains javascript code is returning an empty list,and without including that div element(which contains javascript) can able to fetch HTML data

HTML code

<div class="subContent2">    
   <div id="contentDetails">
       <div class="eventDetails">
            <h2>
                <a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a>
            </h2>
       </div>
   </div>
</div>

Spider Code

class ExampleSpider(BaseSpider):
    name = "example"
    domain_name = "www.example.com"
    start_urls = ["http://www.example.com/jkl/index.php"]


    def parse(self, response):
         hxs = HtmlXPathSelector(response)
         required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]')

So how can i get text(Some data) from the anchor tag inside the h2 element as mentioned above, is there any alternate way for fetching data from the elements that contains javascript in scrapy

warvariuc · Accepted Answer · 2012-06-13 06:30:26Z

2

<div class="subContent2">    
   <div id="contentDetails">
       <div class="eventDetails">
            <h2>
                <a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a>
            </h2>
       </div>
   </div>
</div>

The problem is not the javascript code in this case to get 'Some data' string.

You need either to get the subnode:

required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]/h2/a/text()')

enter image description here

or use string function:

required_data = hxs.select('string(//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"])')

edited Jun 13, 2012 at 6:30

answered Jun 12, 2012 at 13:55

warvariuc

60.1k45 gold badges183 silver badges234 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Shiva Krishna Bavandla Over a year ago

:Thanks for u r reply,i got an empty unicode string as below when i used string function [<HtmlXPathSelector xpath='string(//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]/h2/a/text())' data=u''>]

Collectives™ on Stack Overflow

scrape data through xpath from div that contains javascript in scrapy python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related