2

I have a column consists of JSON structured data. My df looks like this:

ClientToken                      Data
7a9ee887-8a09-ff9592e08245       [{"summaryId":"4814223456","duration":952,"startTime":1587442919}]
bac49563-2cf0-cb08e69daa48       [{"summaryId":"4814239586","duration":132,"startTime":1587443876}]

I want to expand it to:

ClientToken                      summaryId         duration           startTime
7a9ee887-8a09-ff9592e08245       4814223456             952           1587442919
bac49563-2cf0-cb08e69daa48       4814239586             132           1587443876`

Any ideas?

1
  • "My df looks like this" is df an instance of pandas dataframe? if no, what is df an instance of? Commented Apr 29, 2020 at 9:37

2 Answers 2

2

You can try:

df[["ClientToken"]].join(df.Data.apply(lambda x: pd.Series(json.loads(x[1:-1]))))

Explanations:

  1. Select the Data column and apply the following steps:
    1. Because the "Data" content is wrapped in a list and this is a string, we can remove [] manually using x[1:-1] (remove first and last character).
    2. Since the "Data" column is a string and we actually want a JSON, we need to convert it. One solution is to use the json.loads() function from the json module. The code becomes json.loads(x[1:-1])
    3. Then, convert the dictto a pd.Series using pd.Series(json.loads(x[1:-1]))
  2. Add these new columns to the existing dataframe using join. Also, you will notice I used double [] to select the "ClientToken" column as a dataframe.

Code + illustration:

import pandas as pd
import json

# step 1.1
print(df.Data.apply(lambda x: x[1:-1]))
# 0    {"summaryId":"4814223456","duration":952,"star...
# 1    {"summaryId":"4814239586","duration":132,"star...
# Name: Data, dtype: object

# step 1.2
print(df.Data.apply(lambda x: json.loads(x[1:-1])))
# 0    {'summaryId': '4814223456', 'duration': 952, '...
# 1    {'summaryId': '4814239586', 'duration': 132, '...
# Name: Data, dtype: object

# step 1.3
print(df.Data.apply(lambda x: pd.Series(json.loads(x[1:-1]))))
#     summaryId  duration   startTime
# 0  4814223456       952  1587442919
# 1  4814239586       132  1587443876

# step 2
print(df[["ClientToken"]].join(df.Data.apply(lambda x: pd.Series(json.loads(x[1:-1])))))
#                   ClientToken   summaryId  duration   startTime
# 0  7a9ee887-8a09-ff9592e08245  4814223456       952  1587442919
# 1  bac49563-2cf0-cb08e69daa48  4814239586       132  1587443876

Edit 1:

As it seems that there are some rows where the list in Data has multiple dicts, you can try:

df[["ClientToken"]].join(df.Data.apply(lambda x: [pd.Series(y)
                                                  for y in json.loads(x)]) \
                    .explode() \
                    .apply(pd.Series))
Sign up to request clarification or add additional context in comments.

22 Comments

Tried this: df2 = df[["ClientToken"]].join(df.Data.apply(lambda x: pd.Series(x[0]))) After running this df2 contained 2 cols: 1) ClientToken 2) Column name 0 with "["
Does the "Data" is a string representing a JSON or a real JSON ? Obviously, it's a string. I will update the answer
Thank you Alexandre! Data is text. When looking at the df.info it is an "object"
I get the following error: JSONDecodeError: Extra data: line 1 column 368 (char 367)
With Pandas > 0.25 this indeed works well!! Many thanks!!
|
0

An alternative, using defaultdict and ast literal eval:

from collections import defaultdict
import ast
d = defaultdict(list)
#iterate through the Data column and append to dictionary for each key
for ent in df.Data:
    for entry in ast.literal_eval(ent):
        for k, v in entry.items():
            d[k].append(v)

#concat to ClientToken column
pd.concat([df.ClientToken,pd.DataFrame(d)],axis=1)

    ClientToken summaryId   duration    startTime
0   7a9ee887-8a09-ff9592e08245  4814223456  952 1587442919
1   bac49563-2cf0-cb08e69daa48  4814239586  132 1587443876

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.