4

Consider having following DataFrame that I got from MySQL table of size of 11k rows:

col1 |  col2 | col3  | col4
-----------------------------
 cat | black | small | lovely
-----------------------------
 dog | white | medium| brave 
-----------------------------
mice | grey  | tinny | fast

...

I want to convert it dynamically to the following:

col1 |     newcol
------------------------------------------------------------
 cat | {"col2": "black", "col3": "small", "col4": "lovely"}
------------------------------------------------------------
 dog | {"col2": "white", "col3": "medium", "col4": "brave"}
------------------------------------------------------------
mice | {"col2": "grey", "col3": "tinny", "col4": "fast"}

...
1

6 Answers 6

10

You can do agg as dict on axis=1

For dictionary:

out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(dict,1))

For json:

out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(pd.Series.to_json,1))

print(out)

   col1                                            new_col
0   cat  {'col2': 'black', 'col3': 'small', 'col4': 'lo...
1   dog  {'col2': 'white', 'col3': 'medium', 'col4': 'b...
2  mice  {'col2': 'grey', 'col3': 'tinny', 'col4': 'fast'}
Sign up to request clarification or add additional context in comments.

3 Comments

This is definitely the cleanest solution so far, I'd mark this one as correct.
It looks good, but how do I get JSON instead of dicrionary?
@ybonda then replace dict with pd.Series.to_json in the code like : df[['col1']].assign(new_col=df.iloc[:,1:].agg(pd.Series.to_json,1)), edited my answer
2

As you might expect, there are many ways to do this, but this is what came to mind:

>>> import pandas as pd
>>> d = {"col1": ["cat", 'dog', 'mice'], "col2": ["black", "white", "grey"], "col3": ["small", 'medium', 'tinny'], 'col4': ['lovely', 'brave','fast']}
>>> df = pd.DataFrame(d)
>>> pd.concat([df[['col1']], pd.DataFrame({"newcol": df[['col2','col3','col4']].to_dict(orient='records')})], axis=1)

For scenarios where you may not know the column names you want from the DataFrame, you can use the following to select the column indices. In this case, from column 1, until the end.

>>> pd.concat([df[['col1']], pd.DataFrame({"newcol": df.iloc[:, 1:].to_dict(orient='records')})], axis=1)

1 Comment

I have 11k of rows... I need to do it dynamically
2

use df.to_json(orient='records') to dump json record list, then loads json to list of dict, assign to new column.

import pandas as pd
df = pd.DataFrame({'col1': ['cat', 'dog', 'mice'], 
        'col2' : ['black', 'white', 'grey'], 
        'col3' : ['small', 'medium', 'tinny']})

# create json column
# data_json = df.iloc[:, 1:].to_json(orient='records')
# data = json.loads(data_json)
data = df.iloc[:, 1:].to_dict(orient='records')

# keep first column
dfn = df.iloc[:, [0]].copy()
dfn['newcol'] = data
# dfn['newcol'] = pd.Series(data).map(json.dumps)

dfn

   col1                               newcol
0   cat   {"col2": "black", "col3": "small"}
1   dog  {"col2": "white", "col3": "medium"}
2  mice    {"col2": "grey", "col3": "tinny"}

data_json(type str)

[{"col2":"black","col3":"small"},{"col2":"white","col3":"medium"},{"col2":"grey","col3":"tinny"}]

Comments

1

In addition to Anky's answer I found this article that describes more manipulations: https://codeflex.co/mysql-table-migration-with-pandas-dataframe/

Comments

0

I am using three columns in the example below.

data = {'col1': ['cat', 'dog', 'mice'], 'col2' : ['black', 'white', 'grey'], 'col3' : ['small', 'medium', 'tinny']}
import pandas as pd
df = pd.DataFrame(data)
col = list(df.columns)

We can use a lambda function as follows

df.apply(lambda x: {col[1]:x[1], col[2]:x[2]}, axis =1)

You can add it to the dataframe as follows

df['new_col'] = df.apply(lambda x: {col[1]:x[1], col[2]:x[2]}, axis =1)

This produces the following output.

df
   col1   col2    col3                              new_col
0   cat  black   small   {'col2': 'black', 'col3': 'small'}
1   dog  white  medium  {'col2': 'white', 'col3': 'medium'}
2  mice   grey   tinny    {'col2': 'grey', 'col3': 'tinny'}

And then drop the unwanted columns using df.drop

This should produce the required output.


df.drop(['col2', 'col3'], axis = 1)
   col1                              new_col
0   cat   {'col2': 'black', 'col3': 'small'}
1   dog  {'col2': 'white', 'col3': 'medium'}
2  mice    {'col2': 'grey', 'col3': 'tinny'}

Comments

0

For the give requirement i would suggest using itertuples for generating a list of dicts and assigning it to the dataframe as follows

import pandas as pd 
data = {'col1': ['cat', 'dog', 'mice'], 'col2' : ['black', 'white', 'grey'], 'col3' : ['small', 'medium', 'tinny'], 'col4': ['lovely','brave','fast']} 
df = pd.DataFrame(data) 

def getDictColumn_df1(df, new_col_name="newcol", cols_from_start=1):
    df[new_col_name] = tuple(map(lambda row: row._asdict(), df.iloc[:,cols_from_start:].itertuples()))
    return df[['col1', new_col_name]]

getDictColumn_df1(df)

to know more about itertuples() check this out

For a little different case when you want to keep all indexes and convert all values to a single dict, you can do

def getDictColumn_df2(df, new_col_name="newcol"):
    df[new_col_name] = tuple(map(lambda row: row._asdict(), df.itertuples(index=False)))
    return df[[new_col_name]]

getDictColumn_df2(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.