1

I have searched for a lot of similar topics online, but I have not found the solution yet.

My pandas dataframe looks like this:

index    FOR
0        [{'id': '2766', 'name': '0803 Computer Softwar...
1        [{'id': '2766', 'name': '0803 Computer Softwar...
2        [{'id': '2766', 'name': '0803 Computer Softwar...
3        [{'id': '2766', 'name': '0803 Computer Softwar...
4        [{'id': '2766', 'name': '0803 Computer Softwar...

And I would like to flatten all 4 rows to become like the following dataframe while below is just the result for the first row:

index   id      name
0       2766    0803 Computer Software

I found a similar solution here. Unfortunately, I got a "TypeError" as the following: TypeError: the JSON object must be str, bytes or bytearray, not 'list'

My code was:

dfs = []
for i in test['FOR']:
    data = json.loads(i)
    dfx = pd.json_normalize(data)
    dfs.append(dfx)   

df = pd.concat(dfs).reset_index(inplace = True)
print(df)

Would anyone can help me here? Thank you very much!

0

2 Answers 2

2

try using literal_eval from the ast standard lib.

from ast import literal_eval


df_flattened = pd.json_normalize(df['FOR'].map(literal_eval))

then drop duplicates.

print(df_flattened.drop_duplicates())

     id                    name
0  2766  0803 Computer Software
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the help. Unfortunately, I got the error below: ValueError: malformed node or string: [{'id': '2766', 'name': '0803 Computer Software'}]
@Chen update your sample to mimic your question - I don't see a list of json objects.
I am so sorry I copied the table from Colab so it might lost something. I am new to CoLab. But if I printed: type(test['FOR'][0]) it did show a list. I apologize for the confusion.
@Chen whats in each list, just a single json object or many?
If I understand you correctly, there are two: [{"id": "2766", "name": "0803 Computer Software"}]
|
0

After a few weeks not touching related works, I encountered another similar case and I think I have got the solution so far for this case. Please feel free to correct me or provide any other ideas. I really appreciated all the helps and all the generous support!

chuck = []

for i in range(len(test)):
    chuck.append(json_normalize(test.iloc[i,:]['FOR']))

test_df = pd.concat(chuck)

And then drop duplicated columns for the test_df

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.