1

I have called some data from the World Health Organisation in JSON format, I would like to read this into a Pandas DataFrame.

I have called this from this page: WHO Measles First Dose Rate

{u'dimension': [{u'display': u'Indicator', u'label': u'GHO'},
  {u'display': u'PUBLISH STATES', u'label': u'PUBLISHSTATE'},
  {u'display': u'Year', u'label': u'YEAR'},
  {u'display': u'WHO region', u'label': u'REGION'},
  {u'display': u'World Bank income group', u'label': u'WORLDBANKINCOMEGROUP'},
  {u'display': u'Country', u'label': u'COUNTRY'}],
 u'fact': [{u'Value': u'25',
   u'dim': {u'COUNTRY': u'Afghanistan',
    u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
    u'PUBLISHSTATE': u'Published',
    u'REGION': u'Eastern Mediterranean',
    u'WORLDBANKINCOMEGROUP': u'Low-income',
    u'YEAR': u'1993'}},
  {u'Value': u'57',
   u'dim': {u'COUNTRY': u'Afghanistan',
    u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
    u'PUBLISHSTATE': u'Published',
    u'REGION': u'Eastern Mediterranean',
    u'WORLDBANKINCOMEGROUP': u'Low-income',
    u'YEAR': u'2013'}},
  {u'Value': u'62',
   u'dim': {u'COUNTRY': u'Angola',
    u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
    u'PUBLISHSTATE': u'Published',
    u'REGION': u'Africa',
    u'WORLDBANKINCOMEGROUP': u'Upper-middle-income',
    u'YEAR': u'1996'}},
  {u'Value': u'94',
   u'dim': {u'COUNTRY': u'Andorra',
    u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
    u'PUBLISHSTATE': u'Published',
    u'REGION': u'Europe',
    u'WORLDBANKINCOMEGROUP': u'High-income',
    u'YEAR': u'2005'}},
  {u'Value': u'34',
   u'dim': {u'COUNTRY': u'United Arab Emirates',
    u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
    u'PUBLISHSTATE': u'Published',
    u'REGION': u'Eastern Mediterranean',
    u'WORLDBANKINCOMEGROUP': u'High-income',
    u'YEAR': u'1980'}},

I have tried

#Setting Up and loading JSON into object ready to turn into dataframe
url = "http://apps.who.int/gho/athena/data/GHO/WHS8_110.json?profile=simple&filter=COUNTRY:*"
response = requests.get(url)
response_json = response.content
json.loads(response_json)
whoDataSetVaccinationRate = json.loads(response_json)    

#Attempt to load JSON Data into Pandas Dataframe
whoDataSetVaccinationRateDF = pd.DataFrame(whoDataSetVaccinationRate['fact']
, columns=['COUNTRY', 'YEAR','REGION'])
whoDataSetVaccinationRateDF

And that seems to read - but I only get NaN values in the dataframe for COUNTRY and YEAR: DataFrameError

And I realise I want it laid out differently in the dataframe anyway - and I'm not sure how to call that. This is how I want my dataframe to look: DataFrameGood

1 Answer 1

3

Use json_normalize with pivot:

from pandas.io.json import json_normalize   
import urllib.request, json 

#https://stackoverflow.com/a/12965254
url = "http://apps.who.int/gho/athena/data/GHO/WHS8_110.json?profile=simple&filter=COUNTRY:*"

with urllib.request.urlopen(url) as url:
    data = json.loads(url.read().decode())

df = json_normalize(data['fact']).pivot('dim.COUNTRY','dim.YEAR','Value').astype(float)
print (df.head())

dim.YEAR     1980  1981  1982  1983  1984  1985  1986  1987  1988  1989  ...   \
dim.COUNTRY                                                              ...    
Afghanistan  11.0   NaN   8.0   9.0  14.0  14.0  14.0  31.0  34.0  22.0  ...    
Albania      90.0  90.0  93.0  96.0  96.0  96.0  96.0  96.0  96.0  96.0  ...    
Algeria       NaN   NaN   NaN   NaN   NaN  68.0  67.0  73.0  81.0  82.0  ...    
Andorra       NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...    
Angola        NaN   NaN   NaN  26.0  35.0  44.0  44.0  55.0  56.0  48.0  ...    

dim.YEAR     2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  
dim.COUNTRY                                                              
Afghanistan  55.0  59.0  60.0  62.0  64.0  59.0  57.0  60.0  62.0  62.0  
Albania      97.0  98.0  97.0  99.0  99.0  98.0  99.0  98.0  97.0  96.0  
Algeria      92.0  88.0  92.0  95.0  95.0  95.0  95.0  95.0  95.0  94.0  
Andorra      94.0  98.0  98.0  99.0  99.0  98.0  95.0  96.0  96.0  97.0  
Angola       71.0  61.0  57.0  72.0  64.0  72.0  66.0  60.0  55.0  49.0  

[5 rows x 37 columns]
Sign up to request clarification or add additional context in comments.

6 Comments

This is AWESOME! You are truly a lifesaver - this is exactly what i was trying to do. The extra information you added on a better method to read the JSON in is also very elegant code - Thank You
Unfortunately the code you have provided is python 3 specific - I am limited to python 2.7. I am trying to modify using urllib2.urlopen but not having much luck :( Maybe you can suggest a method in 2.7?
@kiltannen - Solution from https://stackoverflow.com/a/12965254 for python2 does not work?
Thank you - working my way through this now. Will get back to you. Thx Heaps for all the help
@kiltannen - Good news, super!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.