I have pandas dataframe where one of the columns is in JSON format. It contains lists of movie production companies for a given title. Below the sample structure:
ID | production_companies
---------------
1 | "[{'name': 'Paramount Pictures', 'id': 4}, {'name': 'United Artists', 'id': 60}, {'name': 'Metro-Goldwyn-Mayer (MGM)', 'id': 8411}]"
2 | "[{'name': 'Walt Disney Pictures', 'id': 2}]"
3 | "[{'name': 'Bold Films', 'id': 2266}, {'name': 'Blumhouse Productions', 'id': 3172}, {'name': 'Right of Way Films', 'id': 32157}]"
4 | nan
5 | nan
6 | nan
7 | "[{'name': 'Ghost House Pictures', 'id': 768}, {'name': 'North Box Productions', 'id': 22637}]"
As you see one movie (row) can have multiple production companies. I want to create for each movie separate columns containing names of the producers. Columns should look like: name_1, name_2, name_3,... etc. If there is no second or third producer it should be NaN.
I don't have much experience working with JSON formats and I've tried a few methods (iterators with lambda functions) but they are not even close to what I need.
Therefore I hope for your help guys!
EDIT:
The following code ("movies" is the main database):
from pandas.io.json import json_normalize
companies = list(movies['production_companies'])
json_normalize(companies)
gives me the following error:
AttributeError: 'str' object has no attribute 'values'
json_normalize.