0

So I have several problems that I am trying to tackle.

First I am trying to parse this javascript I got from html.

$(document).ready(function() { $('#commodity-show-thumbnails').bxSlider({ mode: 'vertical', auto: false, controls: true, pager: false, minSlides: 4, maxSlides: 4, moveSlides: 1, slideWidth: 250 }); itemSelector('commodity-show-form', 'commodity-show-addcart-submit', [['color', 'Choose color'], ['size', 'Choose size']], { "39805": { "params": ["Smokey Blue/Mica Blue", "36"]}, "39806": { "params": ["Smokey Blue/Mica Blue", "36,5"]}, "39807": { "params": ["Smokey Blue/Mica Blue", "37,5"]}, "39808": { "params": ["Smokey Blue/Mica Blue", "38"]}, "39809": { "params": ["Smokey Blue/Mica Blue", "38,5"]}, "39810": { "params": ["Smokey Blue/Mica Blue", "39"]}, "39811": { "params": ["Smokey Blue/Mica Blue", "40"]}, "39812": { "params": ["Smokey Blue/Mica Blue", "40,5"]}, "39814": { "params": ["Smokey Blue/Mica Blue", "42"]} }, [39805,39806,39807,39808,39809,39810,39811,39812,39814], 'main-cart', 'commodity-show-image'); });

res = re.findall(r'{ "params": (.+?)}', text)  # text is where javascript text is stored

final = [eval(i) for i in res]

print(final)

I got following output

[['Smokey Blue/Mica Blue', '36'], ['Smokey Blue/Mica Blue', '36,5'], ['Smokey Blue/Mica Blue', '37,5'], ['Smokey Blue/Mica Blue', '38'], ['Smokey Blue/Mica Blue', '38,5'], ['Smokey Blue/Mica Blue', '39'], ['Smokey Blue/Mica Blue', '40'], ['Smokey Blue/Mica Blue', '40,5'], ['Smokey Blue/Mica Blue', '42']]

But now I don't know how to go from here on.I want to find the value this value 39805 from

{ "39805": { "params": ["Smokey Blue/Mica Blue", "36"]}. How would I parse it so that says if I am looking for value associated with 36, it would give me 39805?

I am sorry but I am really bad with parsing and I am pretty new to this.

5
  • It seems like you really want the result of the parsing to be a dict like {39805':'36', '39807':'37',...} is that correct? Commented Dec 30, 2016 at 17:56
  • are you trying to scrape a web page for content generated by JavaScript? Commented Dec 30, 2016 at 17:56
  • @saulspatz yes. But instead of having whole dict, I am thinking of parsing for particular value. Like Parse for value 36 and get it value,39805 Commented Dec 30, 2016 at 18:05
  • @emett Speer I am not sure. I believe it is JavaScript Commented Dec 30, 2016 at 18:06
  • @b0baboi You shouldn't use requests for that then. You want something like selenium, dryscrape, ghost.py. They will render the JS so you can access the HTML it generates. Commented Dec 30, 2016 at 18:35

2 Answers 2

1

You can get that 36 like this:

import re
import ast

a="""$(document).ready(function() { $('#commodity-show-thumbnails').bxSlider({ mode: 'vertical', auto: false, controls: true, pager: false, minSlides: 4, maxSlides: 4, moveSlides: 1, slideWidth: 250 }); itemSelector('commodity-show-form', 'commodity-show-addcart-submit', [['color', 'Choose color'], ['size', 'Choose size']], { "39805": { "params": ["Smokey Blue/Mica Blue", "36"]}, "39806": { "params": ["Smokey Blue/Mica Blue", "36,5"]}, "39807": { "params": ["Smokey Blue/Mica Blue", "37,5"]}, "39808": { "params": ["Smokey Blue/Mica Blue", "38"]}, "39809": { "params": ["Smokey Blue/Mica Blue", "38,5"]}, "39810": { "params": ["Smokey Blue/Mica Blue", "39"]}, "39811": { "params": ["Smokey Blue/Mica Blue", "40"]}, "39812": { "params": ["Smokey Blue/Mica Blue", "40,5"]}, "39814": { "params": ["Smokey Blue/Mica Blue", "42"]} }, [39805,39806,39807,39808,39809,39810,39811,39812,39814], 'main-cart', 'commodity-show-image'); });"""
b = re.findall(r'.*?({ ".*?} }).*}', a)[0]

d1 = ast.literal_eval(b)
print d1, '\n'

for a,b in d1.iteritems():
    if b['params'][1]=='36':
        print a

Output:

{'39809': {'params': ['Smokey Blue/Mica Blue', '38,5']}, '39808': {'params': ['Smokey Blue/Mica Blue', '38']}, '39805': {'params': ['Smokey Blue/Mica Blue', '36']}, '39807': {'params': ['Smokey Blue/Mica Blue', '37,5']}, '39806': {'params': ['Smokey Blue/Mica Blue', '36,5']}, '39812': {'params': ['Smokey Blue/Mica Blue', '40,5']}, '39814': {'params': ['Smokey Blue/Mica Blue', '42']}, '39810': {'params': ['Smokey Blue/Mica Blue', '39']}, '39811': {'params': ['Smokey Blue/Mica Blue', '40']}} 

39805
Sign up to request clarification or add additional context in comments.

2 Comments

I was actually looking for value assoicated with 36. So it would be 39805
@b0baboi Modified. Check now. You may accept the answer by clicking on the tick mark. meta.stackoverflow.com/a/251399/4082217
1

EDIT: I just realized that in some cases, the size has two numbers, like "36,5". I assume this means 36 and a a half. Anyway, my original script didn't take account for that, which is why it gave the wrong answer (which I carelessly didn't notice.) Here's a revised script that seems to work:

import re
text='''$(document).ready(function() { $('#commodity-show-thumbnails').bxSlider({ mode: 'vertical', auto: false, controls: true, pager: false, minSlides: 4, maxSlides: 4, moveSlides: 1, slideWidth: 250 }); itemSelector('commodity-show-form', 'commodity-show-addcart-submit', [['color', 'Choose color'], ['size', 'Choose size']], { "39805": { "params": ["Smokey Blue/Mica Blue", "36"]}, "39806": { "params": ["Smokey Blue/Mica Blue", "36,5"]}, "39807": { "params": ["Smokey Blue/Mica Blue", "37,5"]}, "39808": { "params": ["Smokey Blue/Mica Blue", "38"]}, "39809": { "params": ["Smokey Blue/Mica Blue", "38,5"]}, "39810": { "params": ["Smokey Blue/Mica Blue", "39"]}, "39811": { "params": ["Smokey Blue/Mica Blue", "40"]}, "39812": { "params": ["Smokey Blue/Mica Blue", "40,5"]}, "39814": { "params": ["Smokey Blue/Mica Blue", "42"]} }, [39805,39806,39807,39808,39809,39810,39811,39812,39814], 'main-cart', 'commodity-show-image'); });'''
pattern = re.compile(r' "([0-9]+).*?params.*?([0-9]+(,5)?)')

s={b:a for a,b,_ in pattern.findall(text)}

print(s['36'], s['36,5'])

Now this prints 39805 39806, which looks right to me.

Here's all the data:

for a in sorted(s):print(a, s[a])
36 39805
36,5 39806
37,5 39807
38 39808
38,5 39809
39 39810
40 39811
40,5 39812
42 39814

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.