2

Working with Nested JSON data that I am trying to transform to a Pandas dataframe. The json_normalize function offers a way to accomplish this.

{'locations': [{'accuracy': 17,
                'activity': [{'activity': [{'confidence': 100,
                                            'type': 'STILL'}],
                              'timestampMs': '1542652'}],
                'altitude': -10,
                'latitudeE7': 3777321,
                'longitudeE7': -122423125,
                'timestampMs': '1542654',
                'verticalAccuracy': 2}]}

I utilized the function to normalize locations, however, the nested part 'activity' is not flat.

Here's my attempt:

activity_data = json_normalize(d, 'locations', ['activity','type', 'confidence'], 
                               meta_prefix='Prefix.',
                               errors='ignore') 

DataFrame:

[{u'activity': [{u'confidence': 100, u'type': ...   -10.0   NaN 377777377   -1224229340 1542652023196   

The Activity column still has nested elements which I need unpacked in its own column.

Any suggestions/tips would be much appreciated.

0

1 Answer 1

3

Use recursion to flatten the nested dicts

def flatten_json(nested_json: dict, exclude: list=['']) -> dict:
    """
    Flatten a list of nested dicts.
    """
    out = dict()
    def flatten(x: (list, dict, str), name: str='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude:
                    flatten(x[a], f'{name}{a}_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, f'{name}{i}_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

Data:

  • To create the dataset, I used the given data.
  • data is a json
data = {'locations': [{'accuracy': 17,'activity': [{'activity': [{'confidence': 100,'type': 'STILL'}],'timestampMs': '1542652'}],'altitude': -10,'latitudeE7': 3777321,'longitudeE7': -122423125,'timestampMs': '1542654','verticalAccuracy': 2},
                      {'accuracy': 17,'activity': [{'activity': [{'confidence': 100,'type': 'STILL'}],'timestampMs': '1542652'}],'altitude': -10,'latitudeE7': 3777321,'longitudeE7': -122423125,'timestampMs': '1542654','verticalAccuracy': 2},
                      {'accuracy': 17,'activity': [{'activity': [{'confidence': 100,'type': 'STILL'}],'timestampMs': '1542652'}],'altitude': -10,'latitudeE7': 3777321,'longitudeE7': -122423125,'timestampMs': '1542654','verticalAccuracy': 2}]}

Using flatten_json:

df = pd.DataFrame([flatten_json(x) for x in data['locations']])

Output:

 accuracy  activity_0_activity_0_confidence activity_0_activity_0_type activity_0_timestampMs  altitude  latitudeE7  longitudeE7 timestampMs  verticalAccuracy
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.