2

Actually i receive a Pandas generated JSON, witch i load to create a Dataframe. Its Dataframe have some nested Array columns witch i need to iterate over;

Simply loading a pandas.read_json() gives an column with arrays, and i cant work under that. Bellow code:

import pandas

dataframe = pandas.read_json('/Users/***/Downloads/df_teste.json', orient='table')
print(dataframe)

Returns me an Dataframe like the above:

student_id name created_at languages
1 Foo 2019-01-03 14:30:32.146000+00:00 [{'language_id': 1, 'name': 'English', 'optin_...
2 Bar 2019-01-03 14:30:32.146000+00:00 [{'language_id': 1, 'name': 'English', 'optin_...

And my question is: How can i read like this one?

student_id language_id language_name optin_at
1 1 English 2019-01-03T14:30:32.148Z
2 1 English 2021-05-30T00:33:02.915Z
2 2 Portuguese 2022-03-07T07:42:07.082Z

For testing purposes, i am loading using the bellow JSON:

{
  "schema": {
    "fields": [
      { "name": "student_id", "type": "string" },
      { "name": "name", "type": "string" },
      { "name": "created_at", "type": "datetime", "tz": "UTC" },
      { "name": "languages", "type": "string" }
    ],
    "pandas_version": "0.20.0"
  },
  "data": [
    {
      "student_id": "1",
      "name": "Foo",
      "created_at": "2019-01-03T14:30:32.146Z",
      "languages": [
        {
          "language_id": 1,
          "name": "English",
          "optin_at": "2019-01-03T14:30:32.148Z"
        }
      ]
    },
    {
      "student_id": "2",
      "name": "Bar",
      "created_at": "2019-01-03T14:30:32.146Z",
      "languages": [
        {
          "language_id": 1,
          "name": "English",
          "optin_at": "2021-05-30T00:33:02.915Z"
        },
        {
          "language_id": 2,
          "name": "Portuguese",
          "optin_at": "2022-03-07T07:42:07.082Z"
        }
      ]
    }
  ]
}

1 Answer 1

2

You can use json_normalize to make expected output, Here I have created the expected output with use of json_normalize from your input json.

import pandas as pd
from pandas import json_normalize

data = {
  "schema": {
    "fields": [
      { "name": "student_id", "type": "string" },
      { "name": "name", "type": "string" },
      { "name": "created_at", "type": "datetime", "tz": "UTC" },
      { "name": "languages", "type": "string" }
    ],
    "pandas_version": "0.20.0"
  },
  "data": [
    {
      "student_id": "1",
      "name": "Foo",
      "created_at": "2019-01-03T14:30:32.146Z",
      "languages": [
        {
          "language_id": 1,
          "name": "English",
          "optin_at": "2019-01-03T14:30:32.148Z"
        }
      ]
    },
    {
      "student_id": "2",
      "name": "Bar",
      "created_at": "2019-01-03T14:30:32.146Z",
      "languages": [
        {
          "language_id": 1,
          "name": "English",
          "optin_at": "2021-05-30T00:33:02.915Z"
        },
        {
          "language_id": 2,
          "name": "Portuguese",
          "optin_at": "2022-03-07T07:42:07.082Z"
        }
      ]
    }
  ]
}


student_detail = ['student_id']
df = pd.json_normalize(data['data'], 'languages', student_detail)
print(df)

By printing the df you can get the output you want. Here is the output. Ps the columns are unordered you can order it. Thanks

language_id        name                  optin_at student_id
0            1     English  2019-01-03T14:30:32.148Z          1
1            1     English  2021-05-30T00:33:02.915Z          2
2            2  Portuguese  2022-03-07T07:42:07.082Z          2

Hope it helps, If not please do let know. Thanks,

PS: Im not sure this is what something you expecting. You can simply use pd.to_datetime on the optin_at column. I have provided code below,

df['optin_at'] = pd.to_datetime(df['optin_at'])
print(df['optin_at'])

If you add this two lines, it prints the optin_at in datetime format. Output

0   2019-01-03 14:30:32.148000+00:00
1   2021-05-30 00:33:02.915000+00:00
2   2022-03-07 07:42:07.082000+00:00

Hope this helps, let know if this is not case. Thanks

Sign up to request clarification or add additional context in comments.

1 Comment

How can i read the optin_at column like an datetime object?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.