1

All, I have the following nested dictionary (from a JSON API response). I would like to access the individual items by means of a pandas dataframe.

The dictionary looks as follows:

{'pagination': {'limit': 2, 'offset': 0, 'count': 2, 'total': 1474969}, 'data': [{'flight_date': '2022-10-12', 'flight_status': 'active', 'departure': {'airport': 'Tullamarine', 'timezone': 'Australia/Melbourne', 'iata': 'MEL', 'icao': 'YMML', 'terminal': '2', 'gate': '16', 'delay': 20, 'scheduled': '2022-10-12T00:50:00+00:00', 'estimated': '2022-10-12T00:50:00+00:00', 'actual': '2022-10-12T01:09:00+00:00', 'estimated_runway': '2022-10-12T01:09:00+00:00', 'actual_runway': '2022-10-12T01:09:00+00:00'}, 'arrival': {'airport': 'Hong Kong International', 'timezone': 'Asia/Hong_Kong', 'iata': 'HKG', 'icao': 'VHHH', 'terminal': '1', 'gate': None, 'baggage': None, 'delay': None, 'scheduled': '2022-10-12T06:55:00+00:00', 'estimated': '2022-10-12T06:55:00+00:00', 'actual': None, 'estimated_runway': None, 'actual_runway': None}, 'airline': {'name': 'Finnair', 'iata': 'AY', 'icao': 'FIN'}, 'flight': {'number': '5844', 'iata': 'AY5844', 'icao': 'FIN5844', 'codeshared': {'airline_name': 'cathay pacific', 'airline_iata': 'cx', 'airline_icao': 'cpa', 'flight_number': '178', 'flight_iata': 'cx178', 'flight_icao': 'cpa178'}}, 'aircraft': None, 'live': None}, {'flight_date': '2022-10-12', 'flight_status': 'active', 'departure': {'airport': 'Tullamarine', 'timezone': 'Australia/Melbourne', 'iata': 'MEL', 'icao': 'YMML', 'terminal': '2', 'gate': '5', 'delay': 25, 'scheduled': '2022-10-12T00:30:00+00:00', 'estimated': '2022-10-12T00:30:00+00:00', 'actual': '2022-10-12T00:55:00+00:00', 'estimated_runway': '2022-10-12T00:55:00+00:00', 'actual_runway': '2022-10-12T00:55:00+00:00'}, 'arrival': {'airport': 'Kuala Lumpur International Airport (klia)', 'timezone': 'Asia/Kuala_Lumpur', 'iata': 'KUL', 'icao': 'WMKK', 'terminal': '1', 'gate': None, 'baggage': None, 'delay': 3, 'scheduled': '2022-10-12T06:00:00+00:00', 'estimated': '2022-10-12T06:00:00+00:00', 'actual': None, 'estimated_runway': None, 'actual_runway': None}, 'airline': {'name': 'KLM', 'iata': 'KL', 'icao': 'KLM'}, 'flight': {'number': '4109', 'iata': 'KL4109', 'icao': 'KLM4109', 'codeshared': {'airline_name': 'malaysia airlines', 'airline_iata': 'mh', 'airline_icao': 'mas', 'flight_number': '128', 'flight_iata': 'mh128', 'flight_icao': 'mas128'}}, 'aircraft': None, 'live': None}]}

The dictionary is stored under the variable name api_response. I am using the following code to convert to a dataframe as described in https://sparkbyexamples.com/pandas/pandas-convert-json-to-dataframe/

My code:

import boto3
import json
from datetime import datetime
import calendar
import random
import time
import requests
import pandas as pd

aircraftdata = ''
params = {
'access_key': 'KEY',
'limit': '2',
'flight_status':'active'
  }


url = "http://api.aviationstack.com/v1/flights"


api_result = requests.get('http://api.aviationstack.com/v1/flights', params)
api_statuscode =  api_result.status_code
api_response = api_result.json()


print (type(api_response)) #dictionary
print (api_response)


df = pd.DataFrame.from_dict(api_response, orient = 'index')

This yields the following error:

AttributeError: 'list' object has no attribute 'items'

I would like to obtain a dataframe with for each flight the live data:


flight_iata, live_latitude, live_longitude
AA1004, 36.2, -106.8

2 Answers 2

2
df = pd.json_normalize(api_response["data"])
df = df[df.loc[:, df.columns.str.contains("live", case=False)].columns]

print(df)

                live.updated  live.latitude  live.longitude  live.altitude  live.direction  live.speed_horizontal  live.speed_vertical  live.is_ground
0  2019-12-12T10:00:00+00:00        36.2856        -106.807        8846.82          114.34                894.348                1.188           False

If you want to drop live. from the headers you can:

df.columns = df.columns.str.split(".").str[-1]
print(df)

                     updated  latitude  longitude  altitude  direction  speed_horizontal  speed_vertical  is_ground
0  2019-12-12T10:00:00+00:00   36.2856   -106.807   8846.82     114.34           894.348           1.188      False
Sign up to request clarification or add additional context in comments.

8 Comments

Hi Jason, thanks for the answer, however, "NameError: name 'data' is not defined " is the error Im receiving when trying your solution?
your data is called "api_response" edited to reflect
Thanks, this seems to be working. Somehow however, under live data the result i am getting is None under all live items
Hmmm I'm obviously just using the dict you posted, perhaps there is more data that I can't see?
Hi Jason, I have edited the question to show the actual dictionary I get when making the API call, this indeed differs from my original example (sorry for this)
|
1

Considering the desired output, let's say that the dictionary dic looks like the following

dic = {
    "pagination": {
        "limit": 100,
        "offset": 0,
        "count": 100,
        "total": 1669022
    },
    "data": [
        {
            "flight_date": "2019-12-12",
            "flight_status": "active",
            "departure": {
                "airport": "San Francisco International",
                "timezone": "America/Los_Angeles",
                "iata": "SFO",
                "icao": "KSFO",
                "terminal": "2",
                "gate": "D11",
                "delay": 13,
                "scheduled": "2019-12-12T04:20:00+00:00",
                "estimated": "2019-12-12T04:20:00+00:00",
                "actual": "2019-12-12T04:20:13+00:00",
                "estimated_runway": "2019-12-12T04:20:13+00:00",
                "actual_runway": "2019-12-12T04:20:13+00:00"
            },
            "arrival": {
                "airport": "Dallas/Fort Worth International",
                "timezone": "America/Chicago",
                "iata": "DFW",
                "icao": "KDFW",
                "terminal": "A",
                "gate": "A22",
                "baggage": "A17",
                "delay": 0,
                "scheduled": "2019-12-12T04:20:00+00:00",
                "estimated": "2019-12-12T04:20:00+00:00",
                "actual": None,
                "estimated_runway": None,
                "actual_runway": None
            },
            "airline": {
                "name": "American Airlines",
                "iata": "AA",
                "icao": "AAL"
            },
            "flight": {
                "number": "1004",
                "iata": "AA1004",
                "icao": "AAL1004",
                "codeshared": None
            },
            "aircraft": {
                "registration": "N160AN",
                "iata": "A321",
                "icao": "A321",
                "icao24": "A0F1BB"
            },
            "live": {
                "updated": "2019-12-12T10:00:00+00:00",
                "latitude": 36.28560000,
                "longitude": -106.80700000,
                "altitude": 8846.820,
                "direction": 114.340,
                "speed_horizontal": 894.348,
                "speed_vertical": 1.188,
                "is_ground": False
            }
        }
    ]
}

In order to obtain the desired output, one can start by converting the dictionary to a dataframe with pandas.DataFrame

df = pd.DataFrame(dic['data'], columns=['flight', 'live'])

[Out]:

                                              flight                                               live
0  {'number': '1004', 'iata': 'AA1004', 'icao': '...  {'updated': '2019-12-12T10:00:00+00:00', 'lati...

Then, one can use .apply() with a custom lambda function as follows

df = df[['flight', 'live']].apply(lambda x: pd.Series([x['flight']['iata'], x['live']['latitude'], x['live']['longitude'], x['live']['altitude']]), axis=1)

[Out]:

        0        1        2        3
0  AA1004  36.2856 -106.807  8846.82

Finally, the only thing missing is the name of the columns. And in order to change it, one can do the following

df.columns = ['flight_iata', 'live_latitude', 'live_longitude', 'live_altitude']

[Out]:

  flight_iata  live_latitude  live_longitude  live_altitude
0      AA1004        36.2856        -106.807        8846.82

And that is the desired output.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.