Scrapy to extract data from javascript script

Question

I'm trying to extract the odds of games from an espn website. The 'moneyLine' odds are buried in a script that I just can't figure out how to access. Ideally I would have the odds in rows for each game. I've managed to extract team names and scores in rows, I would like odds to go with it.

scrapy shell
fetch('http://www.espn.com/nfl/schedule/_/week/1')
response.xpath("//script[contains(., 'moneyLine')]/text()")

This is the output

[<Selector xpath="//script[contains(., 'moneyLine')]/text()" data='\n\t\t\tvar espn = espn || {};\n\n\t\t\t// Build '>]

Here is a sample from firefox inspector window, I can see the 'moneyLine' items, just can't isolate them

it can gives you all script as one string and you have to use standard string functions or regex to work with this. If you get string which is correct JSON string then you can use module json to convert it into python dictionary. — furas
– furas, Commented Dec 9, 2017 at 8:30
page can use javascript to read data from another url (mostly as JSON data). If you find this url using DevTool in Firefox then you can read it with scrapy. — furas
– furas, Commented Dec 9, 2017 at 8:33
is 'page' a method? What 'another url' are you referring to? — xristian
– xristian, Commented Dec 9, 2017 at 10:08
page means web page / portal - You have to use DevTool to check all XHR requests - if one of them sends back your data then you have another url. — furas
– furas, Commented Dec 9, 2017 at 10:18

furas · Accepted Answer · 2017-12-09 11:09:44Z

3

Your data is in <script> between data: and queue: in JSON format.

You can use standard string functions (ie. find(), slicing) to cut off this part.
And then you can use module json to convert to python dictionary.
And then you have to only find where moneyLine is in this dictionary.

scrapy shell 'http://www.espn.com/nfl/schedule/_/week/1'

# get `<script>` as text
items = response.xpath("//script[contains(., 'moneyLine')]/text()")
txt = items.extract_first()

# find start and end of data 
#(I found this manually checking txt)
start = txt.find('data:') + 6 # manually found how many add to get correct JSON string
end = txt.find('queue:') - 6  # manually found how many substract to get correct JSON string

json_string = txt[start:end]

# convert to python dictionary
import json
data = json.loads(json_string)

# example data 
#(I found this manually using `data.keys(), data['sports'][0].keys(), etc.)
data['sports'][0]['leagues'][0]['events'][0]['odds']['home']['moneyLine']

answered Dec 9, 2017 at 11:09

furas

149k12 gold badges121 silver badges171 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

xristian Over a year ago

Thank you! This get's me most of the way there. Clearly, I have some learning to do with xpath and json...just need to find the time

Collectives™ on Stack Overflow

Scrapy to extract data from javascript script

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related