I want to scrape the following data from http://maps.latimes.com/neighborhoods/population/density/neighborhood/list/:
var hoodFeatures = {
type: "FeatureCollection",
features: [{
type: "Feature",
properties: {
name: "Koreatown",
slug: "koreatown",
url: "/neighborhoods/neighborhood/koreatown/",
has_statistics: true,
label: 'Rank: 1<br>Population per Sqmi: 42,611',
population: "115,070",
stratum: "high"
},
geometry: { "type": "MultiPolygon", "coordinates": [ [ [ [ -118.286908, 34.076510 ], [ -118.289208, 34.052511 ], [ -118.315909, 34.052611 ], [ -118.323009, 34.054810 ], [ -118.319309, 34.061910 ], [ -118.314093, 34.062362 ], [ -118.313709, 34.076310 ], [ -118.286908, 34.076510 ] ] ] ] }
},
From the above html, I want to take each of:
name
population per sqmi
population
geometry
and turn it into a data frame by name
So far I've tried
import requests
import json
from bs4 import BeautifulSoup
response_obj = requests.get('http://maps.latimes.com/neighborhoods/population/density/neighborhood/list/').text
soup = BeautifulSoup(response_obj,'lxml')
The object has the script info, but I don't understand how to use the json module as advised in this thread: Parsing variable data out of a javascript tag using python
json_text = '{%s}' % (soup.partition('{')[2].rpartition('}')[0],)
value = json.loads(json_text)
value
I get this error
TypeError Traceback (most recent call last)
<ipython-input-12-37c4c0188ed0> in <module>
1 #Splits the text on the first bracket and last bracket of the javascript into JSON format
----> 2 json_text = '{%s}' % (soup.partition('{')[2].rpartition('}')[0],)
3 value = json.loads(json_text)
4 value
5 #import pprint
TypeError: 'NoneType' object is not callable
Any suggestions? Thanks
soupis not string and it may treadpartitionas tag name<partition>which not exists and you getNone. You would have to work withsoup.textwhich is a string. You could also find tag<script>to work only with text wich may have javascript code -code = soup.find('script').text