0

I have the following JSON structure:

{
  "comments_v2": [
    {
      "timestamp": 1196272984,
      "data": [
        {
          "comment": {
            "timestamp": 1196272984,
            "comment": "OSI Beach Party Weekend, CA",
            "author": "xxxx"
          }
        }
      ],
      "title": "xxxx commented on his own photo."
    },
    {
      "timestamp": 1232918783,
      "data": [
        {
          "comment": {
            "timestamp": 1232918783,
            "comment": "We'll see about that.",
            "author": "xxxx"
          }
        }
      ]
    }
  ]
}

I'm trying to flatten this JSON into a pandas dataframe and here is my solution:

# Read file
df = pd.read_json(codecs.open(infile, "r", "utf-8-sig"))

# Normalize
df = pd.json_normalize(df["comments_v2"])
child_column = pd.json_normalize(df["data"])
child_column = pd.concat([child_column.drop([0], axis=1), child_column[0].apply(pd.Series)], axis=1)
df_merge = df.join(child_column)
df_merge.drop(["data"], axis=1, inplace=True)

The resulting dataframe is as follows:

timestamp title comment.timestamp comment.comment comment.author comment.group
1196272984 xxxx commented on his own photo 1196272984 OSI Beach Party Weekend, CA XXXXX NaN

Is there a simpler way to flat the JSON to obtain the result shown above?

Thank you!

2 Answers 2

1

Use record_path='data' as argument of pd.json_normalize:

import json
import codecs

with codecs.open(infile, 'r', 'utf-8-sig') as jsonfile:
    data = json.load(jsonfile)
    df = pd.json_normalize(data['comments_v2'], 'data')

Output:

>>> df
   comment.timestamp              comment.comment comment.author
0         1196272984  OSI Beach Party Weekend, CA           xxxx
1         1232918783        We'll see about that.           xxxx
Sign up to request clarification or add additional context in comments.

Comments

1

try flatten_json (set json to js in this example)

from flatten_json import flatten^M
dic_flattened = (flatten(d, '.') for d in list(js['comments_v2']))^M
df = pd.DataFrame(dic_flattened)^M
df


    timestamp  data.0.comment.timestamp       data.0.comment.comment data.0.comment.author                             title
0  1196272984                1196272984  OSI Beach Party Weekend, CA                  xxxx  xxxx commented on his own photo.
1  1232918783                1232918783        We'll see about that.                  xxxx                               NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.