2

I am in the process of learning and simultaneously building a web spider using scrapy. I need help with extracting some information from the following javascript code:

<script language="JavaScript" type="text/javascript+gk-onload">

    SKART = (SKART) ? SKART : {};
    SKART.analytics = SKART.analytics || {};
    SKART.analytics["category"] = "television";
    SKART.analytics["vertical"] = "television";
    SKART.analytics["supercategory"] = "homeentertainmentlarge";
    SKART.analytics["subcategory"] = "television";

</script>

I wish to extract the category information as television using Xpath. Please help me with the selectors I should use.

1 Answer 1

2

You can use the Selector's built-in support for regular expressions through re():

pattern = r'SKART\.analytics\["category"\] = "(\w+)";'
response.xpath('//script[@type="text/javascript+gk-onload"]').re(pattern)

Demo (using scrapy shell):

$ scrapy shell index.html
In [1]: pattern = r'SKART\.analytics\["category"\] = "(\w+)";'

In [2]: response.xpath('//script[@type="text/javascript+gk-onload"]').re(pattern)
Out[2]: [u'television']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.