1

I have a movie dataset, the structe is this:

{u'detail_url': u'http://www.movieinsider.com/m4032/open-grave-/',
 u'douban_info': {u'aka': [u'\u5929\u5751\u575f\u5730',
                           u'\u655e\u5f00\u7684\u575f\u5893'],
                  u'alt': u'http://movie.douban.com/subject/13899371/',
                  u'casts': [{u'alt': u'http://movie.douban.com/celebrity/1314499/',
                              u'avatars': {u'large': u'http://img4.douban.com/img/celebrity/large/46059.jpg',
                                           u'medium': u'http://img4.douban.com/img/celebrity/medium/46059.jpg',
                                           u'small': u'http://img4.douban.com/img/celebrity/small/46059.jpg'},
                              u'id': u'1314499',
                              u'name': u'\u7ea6\u745f\u592b\xb7\u6469\u6839'},
                             {u'alt': u'http://movie.douban.com/celebrity/1036300/',
                              u'avatars': {u'large': u'http://img4.douban.com/img/celebrity/large/1393386887.76.jpg',
                                           u'medium': u'http://img4.douban.com/img/celebrity/medium/1393386887.76.jpg',
                                           u'small': u'http://img4.douban.com/img/celebrity/small/1393386887.76.jpg'},
                              u'id': u'1036300',
                              u'name': u'\u6c99\u5c14\u6258\xb7\u79d1\u666e\u96f7'},
                             {u'alt': u'http://movie.douban.com/celebrity/1049595/',
                              u'avatars': {u'large': u'http://img3.douban.com/img/celebrity/large/1377843669.55.jpg',
                                           u'medium': u'http://img3.douban.com/img/celebrity/medium/1377843669.55.jpg',
                                           u'small': u'http://img3.douban.com/img/celebrity/small/1377843669.55.jpg'},
                              u'id': u'1049595',
                              u'name': u'\u6258\u9a6c\u65af\xb7\u514b\u83b1\u8212\u66fc'},
                             {u'alt': u'http://movie.douban.com/celebrity/1318450/',
                              u'avatars': {u'large': u'http://img3.douban.com/img/celebrity/large/43010.jpg',
                                           u'medium': u'http://img3.douban.com/img/celebrity/medium/43010.jpg',
                                           u'small': u'http://img3.douban.com/img/celebrity/small/43010.jpg'},
                              u'id': u'1318450',
                              u'name': u'\u827e\u7433\xb7\u7406\u67e5\u5179'}],
                  u'collect_count': 1507,
                  u'comments_count': 468,
                  u'countries': [u'\u7f8e\u56fd'],
                  u'current_season': None,
                  u'directors': [{u'alt': u'http://movie.douban.com/celebrity/1302444/',
                                  u'avatars': {u'large': u'http://img3.douban.com/img/celebrity/large/45953.jpg',
                                               u'medium': u'http://img3.douban.com/img/celebrity/medium/45953.jpg',
                                               u'small': u'http://img3.douban.com/img/celebrity/small/45953.jpg'},
                                  u'id': u'1302444',
                                  u'name': u'\u5188\u624e\u7f57\xb7\u6d1b\u4f69\u5179-\u52a0\u52d2\u679c'}],
                  u'do_count': None,
                  u'douban_site': u'',
                  u'episodes_count': None,
                  u'genres': [u'\u6050\u6016'],
                  u'id': u'13899371',
                  u'images': {u'large': u'http://img3.douban.com/view/movie_poster_cover/lpst/public/p2162073011.jpg',
                              u'medium': u'http://img3.douban.com/view/movie_poster_cover/spst/public/p2162073011.jpg',
                              u'small': u'http://img3.douban.com/view/movie_poster_cover/ipst/public/p2162073011.jpg'},
                  u'mobile_url': u'http://movie.douban.com/subject/13899371/mobile',
                  u'original_title': u'Open Grave',
                  u'rating': {u'average': 5.6,
                              u'max': 10,
                              u'min': 0,
                              u'stars': u'30'},
                  u'ratings_count': 1283,
                  u'reviews_count': 6,
                  u'schedule_url': u'',
                  u'seasons_count': None,
                  u'subtype': u'movie',
                  u'summary': u'\u672c\u8bb2\u8ff0\u4e86\u67d0\u4e2a\u504f\u50fb\u8352\u51c9\u7684\u68ee\u6797\u91cc\uff0c\u516d\u4e2a\u4eba\u5728\u4e00\u5904\u6709\u8150\u5c38\u7684\u9732\u5929\u575f\u573a\u65c1\u9192\u6765\uff0c\u5374\u597d\u50cf\u5f97\u4e86\u5065\u5fd8\u4e00\u822c\u7684\u6050\u6016\u95f9\u9b3c\u6545\u4e8b\u3002\u4ed6\u4eec\u65e0\u5904\u53ef\u53bb\uff0c\u88ab\u8feb\u628a\u795e\u79d8\u4e8b\u4ef6\u7684\u7ebf\u7d22\u62fc\u51d1\u8d77\u6765\uff0c\u6700\u7ec8\u5c06\u5e26\u9886\u4ed6\u4eec\u8d70\u8fdb\u60ca\u4eba\u7684\u7ed3\u5c40\uff0c\u800c\u4e0d\u81f3\u4e8e\u4f7f\u771f\u76f8\u6765\u7684\u592a\u665a\u3002',
                  u'title': u'\u5f00\u68fa',
                  u'wish_count': 640,
                  u'year': u'2013'},
 u'movie_tt_id': u'tt2071550',
 u'name': u'Open Grave ',
 u'omdb_info': {u'Actors': u'Sharlto Copley, Thomas Kretschmann, Josie Ho, Joseph Morgan',
                u'Awards': u'2 nominations.',
                u'Country': u'USA',
                u'Director': u'Gonzalo L\xf3pez-Gallego',
                u'Genre': u'Horror, Mystery, Thriller',
                u'Language': u'English',
                u'Metascore': u'33',
                u'Plot': u'A man wakes up in the wilderness, in a pit full of dead bodies, with no memory and must determine if the murderer is one of the strangers who rescued him, or if he himself is the killer.',
                u'Poster': u'http://ia.media-imdb.com/images/M/MV5BMTc1MDM5MTI0Ml5BMl5BanBnXkFtZTgwOTMyODI1MDE@._V1_SX300.jpg',
                u'Rated': u'R',
                u'Released': u'3 Jan 2014',
                u'Response': u'True',
                u'Runtime': u'102 min',
                u'Title': u'Open Grave',
                u'Type': u'movie',
                u'Writer': u'Eddie Borey, Chris Borey',
                u'Year': u'2013',
                u'imdbID': u'tt2071550',
                u'imdbRating': u'6.3',
                u'imdbVotes': u'18896'}}

So, its deeped nested dataset. To read it into Pandas, I think there are 2 options

  1. Just extract the necessray info inner nested, make it into a column in the dataframe
  2. Turn nested dataset into dataframes as well, and then merge them into the parent dataset.

==============

I'm not sure the current best practice for this, so I got these problems when doing the above:

1.I don't know how to extract the inner json

import json
from pprint import pprint
import pandas as pd
from pandas import Series,DataFrame

with open('..\lib\movie_list_2014_v2.json') as data_file:    
    data = json.load(data_file)

pd_data = DataFrame(data)
pprint(data[1])
pd_data['imdb_rating']=pd_data['omdb_info']['imdbRating']

Gives me error, I believe its becuase[omdb_info] is un-parsed json

2.I looked into books, looks like there is no auto-conversion to read nested data into dataframe, so I need to manually make all of them into dataframes. I think this is very painful. (A lot of nested info in douban_info)

2
  • 1
    You may want to consider to flatten the deeply nested JSON in JavaScript (see this link). If you prefer to do that in python, I believe you need a for loop to check whether the value of a dict isinstance of list, and if so replicate the data to flatten the embedded/nested structure. But the running a for loop is not a strength of python. Alternatively, consider put your data into MongoDB and process there with $unwind functionality. Commented Jul 28, 2015 at 15:48
  • The json you've posted is not valid format for python json library. You are getting Expecting property name enclosed in double quotes I guess? You should 1st transform your json in a similar way as in here. I've tried your file with the fix they include there and does not work, but I don't know enough regular expressions to fix it (I mean, in less than a week or so). But anyway, once your json is ok, I would load it just the way you do, and then load data['omdb_info'] into your pd_data Commented Jul 29, 2015 at 15:42

1 Answer 1

1

You can use the pandas json_normalize function to flatten the JSON, although it will result in some long column names for the more deeply nested data.

from pandas.io.json import json_normalize

result = json_normalize(movies)

From there, you can deal with only the columns that you need.

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.