0

I have project.json file, which contains data like this :

{"student_id": "ST0001", "project": [{"subject_id": "S003", "date_of_submission": "2021-05-23 20:03:05"}, {"subject_id": "S004", "date_of_submission": "2021-05-24 21:03:05"}, {"subject_id": "S005", "date_of_submission": "2021-05-30 05:09:30"}], "project_year": "Second"}
{"student_id": "ST0002", "project": [{"subject_id": "S003", "date_of_submission": "2021-06-02 15:05:05"}, {"subject_id": "S007", "date_of_submission": "2021-04-28 21:03:01"}], "project_year": "Second"}
{"student_id": "ST0002", "project": [{"subject_id": "S0018", "date_of_submission": "2020-06-03 08:15:21"}], "project_year": "First"}

I need to extract project_subject_id and project_date_of_submission into a separate column like :

student_id project_subject_id project_date_of_submission project_year
ST0001 S003 23/05/2021 20:03 Second
ST0001 S004 24/05/2021 21:03 Second
ST0001 S005 30/05/2021 05:09 Second
ST0002 S003 02/06/2021 15:05 Second
ST0002 S007 28/04/2021 21:03 Second
ST0002 S0018 03/06/2020 08:15 First

Here's what I have tired :

import pandas as pd

df_pr = pd.read_json('project.json', lines=True)
0

1 Answer 1

1
import pandas as pd

df=pd.read_json('project.json', lines=True)

df = pd.DataFrame(df).explode('project')
df = df.join(pd.json_normalize(df.pop('project')))

df.set_index("student_id",inplace=True)

print(df)
"""
student_id project_year subject_id   date_of_submission
ST0001           Second       S003  2021-05-23 20:03:05
ST0001           Second       S004  2021-05-24 21:03:05
ST0001           Second       S005  2021-05-30 05:09:30
ST0002           Second       S003  2021-06-02 15:05:05
ST0002           Second       S007  2021-04-28 21:03:01
ST0002            First      S0018  2020-06-03 08:15:21
"""
# If you want project_year at last then you can do this:

df["project_year"]=df.pop("project_year")

# It will change the position of project_year from 2nd to last.

We can explode that project column and normalize json of project and join with main dataframe.

You can try this as well:

import pandas as pd
import json

with open("project.json") as f:
    lines=f.readlines()

dd=pd.DataFrame()
for line in lines:
    df=pd.DataFrame.from_dict(json.loads(line))
    df=df.join(pd.DataFrame(df.pop('project').values.tolist()))
    dd=dd.append(df)
print(dd)
"""
  student_id project_year subject_id   date_of_submission
0     ST0001       Second       S003  2021-05-23 20:03:05
1     ST0001       Second       S004  2021-05-24 21:03:05
2     ST0001       Second       S005  2021-05-30 05:09:30
0     ST0002       Second       S003  2021-06-02 15:05:05
1     ST0002       Second       S007  2021-04-28 21:03:01
0     ST0002        First      S0018  2020-06-03 08:15:21
"""
# If you need student_id as index then :

dd.set_index("student_id",inplace=True)
print(dd)
"""
student_id project_year subject_id   date_of_submission
ST0001           Second       S003  2021-05-23 20:03:05
ST0001           Second       S004  2021-05-24 21:03:05
ST0001           Second       S005  2021-05-30 05:09:30
ST0002           Second       S003  2021-06-02 15:05:05
ST0002           Second       S007  2021-04-28 21:03:01
ST0002            First      S0018  2020-06-03 08:15:21
"""

What we are doing is reading each line of project.json file and converting it into dataframe(df) and appending each dataframe in dd.

Sign up to request clarification or add additional context in comments.

6 Comments

I haven't noticed your code/I forget we can read json like that and wrote below code but you can use above code. :)
The first answer results as, | | project_year | subject_id | date_of_submission | |------------|--------------|------------|--------------------| | student_id | | | | | ST0001 | Second | S003 | 23/05/2021 20:03 | | ST0001 | Second | S003 | 23/05/2021 20:03 | | ST0001 | Second | S003 | 23/05/2021 20:03 |
Isn't that what you expect?
I am not getting the output as you have show on the first answer, instead the student_id is looped thrice Please see drive.google.com/file/d/1H4kvTifQXQm9xzdZpj0ibrTOHbFr7D9C/…
I found that is same as you asked in question, please provide what is your expected output. And I think that your screenshot is same as I gave example answer, what is in your question as well.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.