1

How to get the data from javascript content using scrapy python ? The javascript look like this

<script type="text/javascript">
  var ad_reply_url = "http://www2.mudah.my/ar/send/0?ca=3_s&id=49825097&l=0";
  var mcvl = "";
  var images = [
     'http://img.rnudah.com/images/13/133608119523265.jpg', 
     'http://img.rnudah.com/images/13/135608116569903.jpg', 
     'http://img.rnudah.com/images/13/137608113616541.jpg', 
     'http://img.rnudah.com/images/13/139608119186498.jpg'
  ];
 var thumbnails = [
    'http://img.rnudah.com/thumbs/13/133608119523265.jpg',
    'http://img.rnudah.com/thumbs/13/135608116569903.jpg',
    'http://img.rnudah.com/thumbs/13/137608113616541.jpg',
    'http://img.rnudah.com/thumbs/13/139608119186498.jpg'
 ];</script>

So, what I want is. I want the data from var images and print that data like this

['http://img.rnudah.com/images/13/133608119523265.jpg','http://img.rnudah.com/images/13/135608116569903.jpg', 'http://img.rnudah.com/images/13/137608113616541.jpg','http://img.rnudah.com/images/13/139608119186498.jpg' ];

Can anyone help me ? thanks.

1 Answer 1

1

I'm not using Scrapy Python, just regular Python. It is pretty straightforward though:

Code Sample:

import ast
import re

page_source = '''
<script type="text/javascript">
  var ad_reply_url = "http://www2.mudah.my/ar/send/0?ca=3_s&id=49825097&l=0";
  var mcvl = "";
  var images = [
     'http://img.rnudah.com/images/13/133608119523265.jpg',
     'http://img.rnudah.com/images/13/135608116569903.jpg',
     'http://img.rnudah.com/images/13/137608113616541.jpg',
     'http://img.rnudah.com/images/13/139608119186498.jpg'
  ];
 var thumbnails = [
    'http://img.rnudah.com/thumbs/13/133608119523265.jpg',
    'http://img.rnudah.com/thumbs/13/135608116569903.jpg',
    'http://img.rnudah.com/thumbs/13/137608113616541.jpg',
    'http://img.rnudah.com/thumbs/13/139608119186498.jpg'
 ];</script>
'''

variables = re.findall('(?si)var(.*?);', page_source)

var_collection = {}
for var in variables:
    var = var.strip()
    var_key = var.split(' = ')[0]
    var_value = ast.literal_eval(var.split(' = ')[1])
    var_collection.update({var_key: var_value})

print(var_collection['images'])

Output:

['http://img.rnudah.com/images/13/133608119523265.jpg', 'http://img.rnudah.com/images/13/135608116569903.jpg', 'http://img.rnudah.com/images/13/137608113616541.jpg', 'http://img.rnudah.com/images/13/139608119186498.jpg']

Related: https://stackoverflow.com/a/18108644/295246

Sign up to request clarification or add additional context in comments.

3 Comments

okay .. thanks give me a hint .. just now I was trying to manipulate your code and now I got what I want ... thanks man ! :)
@shahril Glad it helped. Feel free to upvote or accept this answer as your solution, to your discretion. Thanks!
@Ale Pleasure is mine! Seems this question doesn't come up too often :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.