0

I'm trying to extract the odds of games from an espn website. The 'moneyLine' odds are buried in a script that I just can't figure out how to access. Ideally I would have the odds in rows for each game. I've managed to extract team names and scores in rows, I would like odds to go with it.

scrapy shell
fetch('http://www.espn.com/nfl/schedule/_/week/1')
response.xpath("//script[contains(., 'moneyLine')]/text()")

This is the output

[<Selector xpath="//script[contains(., 'moneyLine')]/text()" data='\n\t\t\tvar espn = espn || {};\n\n\t\t\t// Build '>]

Here is a sample from firefox inspector window, I can see the 'moneyLine' items, just can't isolate them
enter image description here

4
  • it can gives you all script as one string and you have to use standard string functions or regex to work with this. If you get string which is correct JSON string then you can use module json to convert it into python dictionary. Commented Dec 9, 2017 at 8:30
  • page can use javascript to read data from another url (mostly as JSON data). If you find this url using DevTool in Firefox then you can read it with scrapy. Commented Dec 9, 2017 at 8:33
  • is 'page' a method? What 'another url' are you referring to? Commented Dec 9, 2017 at 10:08
  • page means web page / portal - You have to use DevTool to check all XHR requests - if one of them sends back your data then you have another url. Commented Dec 9, 2017 at 10:18

1 Answer 1

3

Your data is in <script> between data: and queue: in JSON format.

You can use standard string functions (ie. find(), slicing) to cut off this part.
And then you can use module json to convert to python dictionary.
And then you have to only find where moneyLine is in this dictionary.

scrapy shell 'http://www.espn.com/nfl/schedule/_/week/1'

# get `<script>` as text
items = response.xpath("//script[contains(., 'moneyLine')]/text()")
txt = items.extract_first()

# find start and end of data 
#(I found this manually checking txt)
start = txt.find('data:') + 6 # manually found how many add to get correct JSON string
end = txt.find('queue:') - 6  # manually found how many substract to get correct JSON string

json_string = txt[start:end]

# convert to python dictionary
import json
data = json.loads(json_string)

# example data 
#(I found this manually using `data.keys(), data['sports'][0].keys(), etc.)
data['sports'][0]['leagues'][0]['events'][0]['odds']['home']['moneyLine']
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! This get's me most of the way there. Clearly, I have some learning to do with xpath and json...just need to find the time

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.