Pandas DataFrame - Convert columns into JSON and add as a new column

Question

Consider having following DataFrame that I got from MySQL table of size of 11k rows:

col1 |  col2 | col3  | col4
-----------------------------
 cat | black | small | lovely
-----------------------------
 dog | white | medium| brave 
-----------------------------
mice | grey  | tinny | fast

...

I want to convert it dynamically to the following:

col1 |     newcol
------------------------------------------------------------
 cat | {"col2": "black", "col3": "small", "col4": "lovely"}
------------------------------------------------------------
 dog | {"col2": "white", "col3": "medium", "col4": "brave"}
------------------------------------------------------------
mice | {"col2": "grey", "col3": "tinny", "col4": "fast"}

...

I described the full end to end solution of the problem here:codeflex.co/mysql-table-migration-with-pandas-dataframe — ybonda
– ybonda, Commented Jan 19, 2021 at 7:28

anky · Accepted Answer · 2020-12-30 09:54:54Z

10

You can do agg as dict on axis=1

For dictionary:

out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(dict,1))

For json:

out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(pd.Series.to_json,1))

print(out)

   col1                                            new_col
0   cat  {'col2': 'black', 'col3': 'small', 'col4': 'lo...
1   dog  {'col2': 'white', 'col3': 'medium', 'col4': 'b...
2  mice  {'col2': 'grey', 'col3': 'tinny', 'col4': 'fast'}

edited Dec 30, 2020 at 9:54

answered Dec 29, 2020 at 18:02

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Benjamin Rowell Over a year ago

This is definitely the cleanest solution so far, I'd mark this one as correct.

ybonda Over a year ago

It looks good, but how do I get JSON instead of dicrionary?

anky Over a year ago

@ybonda then replace dict with pd.Series.to_json in the code like : df[['col1']].assign(new_col=df.iloc[:,1:].agg(pd.Series.to_json,1)), edited my answer

Benjamin Rowell · Accepted Answer · 2020-12-29 17:59:01Z

2

As you might expect, there are many ways to do this, but this is what came to mind:

>>> import pandas as pd
>>> d = {"col1": ["cat", 'dog', 'mice'], "col2": ["black", "white", "grey"], "col3": ["small", 'medium', 'tinny'], 'col4': ['lovely', 'brave','fast']}
>>> df = pd.DataFrame(d)
>>> pd.concat([df[['col1']], pd.DataFrame({"newcol": df[['col2','col3','col4']].to_dict(orient='records')})], axis=1)

For scenarios where you may not know the column names you want from the DataFrame, you can use the following to select the column indices. In this case, from column 1, until the end.

>>> pd.concat([df[['col1']], pd.DataFrame({"newcol": df.iloc[:, 1:].to_dict(orient='records')})], axis=1)

edited Dec 29, 2020 at 17:59

answered Dec 29, 2020 at 17:51

Benjamin Rowell

1,4119 silver badges17 bronze badges

1 Comment

ybonda Over a year ago

I have 11k of rows... I need to do it dynamically

Ferris · Accepted Answer · 2020-12-30 07:30:53Z

use df.to_json(orient='records') to dump json record list, then loads json to list of dict, assign to new column.

import pandas as pd
df = pd.DataFrame({'col1': ['cat', 'dog', 'mice'], 
        'col2' : ['black', 'white', 'grey'], 
        'col3' : ['small', 'medium', 'tinny']})

# create json column
# data_json = df.iloc[:, 1:].to_json(orient='records')
# data = json.loads(data_json)
data = df.iloc[:, 1:].to_dict(orient='records')

# keep first column
dfn = df.iloc[:, [0]].copy()
dfn['newcol'] = data
# dfn['newcol'] = pd.Series(data).map(json.dumps)

dfn

   col1                               newcol
0   cat   {"col2": "black", "col3": "small"}
1   dog  {"col2": "white", "col3": "medium"}
2  mice    {"col2": "grey", "col3": "tinny"}

data_json(type str)

[{"col2":"black","col3":"small"},{"col2":"white","col3":"medium"},{"col2":"grey","col3":"tinny"}]

JavaGoPro · Accepted Answer · 2021-01-26 09:48:29Z

1

In addition to Anky's answer I found this article that describes more manipulations: https://codeflex.co/mysql-table-migration-with-pandas-dataframe/

answered Jan 26, 2021 at 9:48

JavaGoPro

2424 silver badges7 bronze badges

Comments

basicknowledge · Accepted Answer · 2020-12-29 17:58:23Z

I am using three columns in the example below.

data = {'col1': ['cat', 'dog', 'mice'], 'col2' : ['black', 'white', 'grey'], 'col3' : ['small', 'medium', 'tinny']}
import pandas as pd
df = pd.DataFrame(data)
col = list(df.columns)

We can use a lambda function as follows

df.apply(lambda x: {col[1]:x[1], col[2]:x[2]}, axis =1)

You can add it to the dataframe as follows

df['new_col'] = df.apply(lambda x: {col[1]:x[1], col[2]:x[2]}, axis =1)

This produces the following output.

df
   col1   col2    col3                              new_col
0   cat  black   small   {'col2': 'black', 'col3': 'small'}
1   dog  white  medium  {'col2': 'white', 'col3': 'medium'}
2  mice   grey   tinny    {'col2': 'grey', 'col3': 'tinny'}

And then drop the unwanted columns using df.drop

This should produce the required output.


df.drop(['col2', 'col3'], axis = 1)
   col1                              new_col
0   cat   {'col2': 'black', 'col3': 'small'}
1   dog  {'col2': 'white', 'col3': 'medium'}
2  mice    {'col2': 'grey', 'col3': 'tinny'}

Hemant · Accepted Answer · 2020-12-30 05:02:32Z

For the give requirement i would suggest using itertuples for generating a list of dicts and assigning it to the dataframe as follows

import pandas as pd 
data = {'col1': ['cat', 'dog', 'mice'], 'col2' : ['black', 'white', 'grey'], 'col3' : ['small', 'medium', 'tinny'], 'col4': ['lovely','brave','fast']} 
df = pd.DataFrame(data) 

def getDictColumn_df1(df, new_col_name="newcol", cols_from_start=1):
    df[new_col_name] = tuple(map(lambda row: row._asdict(), df.iloc[:,cols_from_start:].itertuples()))
    return df[['col1', new_col_name]]

getDictColumn_df1(df)

to know more about itertuples() check this out

For a little different case when you want to keep all indexes and convert all values to a single dict, you can do

def getDictColumn_df2(df, new_col_name="newcol"):
    df[new_col_name] = tuple(map(lambda row: row._asdict(), df.itertuples(index=False)))
    return df[[new_col_name]]

getDictColumn_df2(df)

Collectives™ on Stack Overflow

Pandas DataFrame - Convert columns into JSON and add as a new column

6 Answers 6

3 Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related