0

i have a column dataframe with a json array that i want to split in columns for every row.

Dataframe

      FIRST_NAME                                       CUSTOMFIELDS
    0       Maria [{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALU...
    1       John  [{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALU...
    ...

Goal

I need convert the json content in that column into a dataframe

+------------+-----------------+-------------+-----------------+
| FIRST NAME |   FIELD_NAME    | FIELD_VALUE | CUSTOM_FIELD_ID |
+------------+-----------------+-------------+-----------------+
| Maria      | CONTACT_FIELD_1 | EN          | CONTACT_FIELD_1 |
| John       | CONTACT_FIELD_1 | false       | CONTACT_FIELD_1 |
+------------+-----------------+-------------+-----------------+

1 Answer 1

1

The code snippet below should work for you.

import pandas as pd
df = pd.DataFrame()
df['FIELD'] = [[{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALUE': 'EN', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_1'}, {'FIELD_NAME': 'CONTACT_FIELD_10', 'FIELD_VALUE': 'false', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_10'}]]

temp_dict = {}
counter = 0
for entry in df['FIELD'][0]:
    temp_dict[counter] = entry
    counter += 1

new_dataframe = pd.DataFrame.from_dict(temp_dict, orient='index')

new_dataframe #outputs dataframe

Edited answer to reflect edited question:

Under the assumption that each entry in CUSTOMFIELDS is a list with 1 element (which is different from original question; the entry had 2 elements), the following will work for you and create a dataframe in the requested format.

import pandas as pd

# Need to recreate example problem
df = pd.DataFrame()
df['CUSTOMFIELDS'] = [[{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALUE': 'EN', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_1'}], 
                      [{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALUE': 'FR', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_1'}]]
df['FIRST_NAME'] = ['Maria', 'John']

#begin solution
counter = 0
dataframe_solution = pd.DataFrame()
for index, row in df.iterrows():
    dataframe_solution = pd.concat([dataframe_solution, pd.DataFrame.from_dict(row['CUSTOMFIELDS'][0], orient = 'index').transpose()], sort = False, ignore_index = True)
    dataframe_solution.loc[counter,'FIRST_NAME'] = row['FIRST_NAME']
    counter += 1

Your dataframe is in dataframe_solution

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you jammin, i forgot to mention get the json array field values for every json document. i've completed my question
Originally, you had a list in the CUSTOMFIELDS column with 2 elements in it. Is that still the case?
Made an assumption about the data but let me know if that doesn't work and I need to adjust it
@Vince no problem! Happy to help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.