0

I am trying to scrape running routes, to geoprocess in R, from the following site: http://runkeeper.com/user/127244964/route/1149604

I am trying to do to that with this code:

from bs4 import BeautifulSoup

import urllib2
import csv
import os
import requests

page1 = urllib2.urlopen("http://runkeeper.com/user/212579518/route/513771")
soup = BeautifulSoup(page1)
print(soup)

When I print the results I see that the data that I need is on a text/javascript:


var routePoints = [{"latitude":38.918704,"longitude":-77.036478,"deltaDistance":0,"type":"StartPoint","altitude":40,"deltaPause":0}

I need to scrape the variables inside the dictionary. Any suggestions on how to do this?

Thanks.

2 Answers 2

1

This will search the soup data with regex and load it into an object for your usage.

import re
import json

point_re = re.compile('.*routePoints =(.*);')
point_json = point_re.search(str(soup)).group(1)
point_data = json.loads(point_json)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, this seems to get all the points that I need. If I wanted to save this to a csv file what would be your suggestion? Also if you have any suggestions on a good tutorial for BeautifulSoup tutorial/book I would appreciate it.
you could use docs.python.org/2/library/csv.html but it is just as easy to open a file and write the lines you want as long as you are just dumping numerics it will be pretty easy.
0

Use regexp to strip everything outside the square brackets (or alternately, to only select the content of the outermost brackets), then use json.loads on the brackets.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.