I am trying to pull out an element from this JSON data and format it into another column in my pandas DataFrame.
Here is the code I have so far:
#Import libraries
import json
import requests
from IPython.display import JSON
import pandas as pd
#Load data
astronaut_db_url = 'https://supercluster-iadb.s3.us-east-2.amazonaws.com/adb.json'
astronauts_db = requests.get(astronaut_db_url).json()
#Format data
df = pd.json_normalize(astronauts_db['astronauts'])
df_astro = df[['_id','astroNumber','awards','name','gender','inSpace','overallNumber','spacewalkCount','species','speciesGroup',
'totalMinutesInSpace','totalSecondsSpacewalking','lastLaunchDate.utc']]
#Get row per award
df_awards = df_astro.explode(['awards']).reset_index(drop=True)
df_awards.head()
df_awards['awards'][0]['title']
I want to grab the title of the award for each astronaut in my DataFrame and create a new column with the list of awards in one cell that looks like the following:
Astronaut_ID Awards
dh3405kdmnd [First Person In Space, First Person to Cross Karman Line]
ert549fkfl3 [Crossed Karman Line, First Person on Moon]
My idea for tackling this problem was to:
- Get a row for each award for every astronaut
- Strip the JSON cells down to just the title
- Recombine in one cell per astronaut
I am not sure how to complete step 2 of this process. Can someone help point me in the right direction?