Efficient way to add new column to pandas dataframe

Question

I know two ways of adding a new column to pandas dataframe

df_new = df.assign(new_column=default_value)

and

df[new_column] = default_value

The first one does not add columns inplace, but the second one does. So, which one is more efficient to use?

Apart from these two is there is any all the more efficient method than these?

The original question discusses relative performance in the comments. stackoverflow.com/questions/12555323/… — Alexander
– Alexander, Commented Sep 12, 2018 at 7:22

jezrael · Accepted Answer · 2018-09-12 07:30:03Z

17

I think second one, assign is used if want nice code witch chaining all functions - one line code:

df = pd.DataFrame({'A':np.random.rand(10000)})

default_value = 10

In [114]: %timeit df_new = df.assign(new_column=default_value)
228 µs ± 4.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [115]: %timeit df['new_column'] = default_value
86.1 µs ± 654 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I use perfplot for ploting:

import perfplot

default_value = 10

def chained(df):
    df = df.assign(new_column=default_value)
    return df

def no_chained(df):
    df['new_column'] = default_value
    return df

def make_df(n):
    df = pd.DataFrame({'A':np.random.rand(n)})
    return df

perfplot.show(
    setup=make_df,
    kernels=[chained, no_chained],
    n_range=[2**k for k in range(2, 25)],
    logx=True,
    logy=True,
    equality_check=False,
    xlabel='len(df)')

edited Sep 12, 2018 at 7:30

answered Sep 12, 2018 at 7:17

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Efficient way to add new column to pandas dataframe

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related