Add multiple empty columns to pandas DataFrame

Question

How do I add multiple empty columns to a DataFrame from a list?

I can do:

df["B"] = None
df["C"] = None
df["D"] = None

But I can't do:

df[["B", "C", "D"]] = None

KeyError: "['B' 'C' 'D'] not in index"

None is different to 0, but some answers are assuming it's equivalent. Also, assigning None will give a dtype of object, but assigning 0 will give a dtype of int. — smci
– smci, Commented Apr 19, 2020 at 11:11
Also you can't do df[['B','C','D']] = None, None, None or [None, None, None] or pd.DataFrame([None, None, None]) — smci
– smci, Commented Apr 19, 2020 at 11:13
Related : the more general How to add multiple columns to pandas dataframe in one assignment? — smci
– smci, Commented Apr 19, 2020 at 11:16

Dror · Accepted Answer · 2017-09-12 13:48:22Z

129

You could use df.reindex to add new columns:

In [18]: df = pd.DataFrame(np.random.randint(10, size=(5,1)), columns=['A'])

In [19]: df
Out[19]: 
   A
0  4
1  7
2  0
3  7
4  6

In [20]: df.reindex(columns=list('ABCD'))
Out[20]: 
   A   B   C   D
0  4 NaN NaN NaN
1  7 NaN NaN NaN
2  0 NaN NaN NaN
3  7 NaN NaN NaN
4  6 NaN NaN NaN

reindex will return a new DataFrame, with columns appearing in the order they are listed:

In [31]: df.reindex(columns=list('DCBA'))
Out[31]: 
    D   C   B  A
0 NaN NaN NaN  4
1 NaN NaN NaN  7
2 NaN NaN NaN  0
3 NaN NaN NaN  7
4 NaN NaN NaN  6

The reindex method as a fill_value parameter as well:

In [22]: df.reindex(columns=list('ABCD'), fill_value=0)
Out[22]: 
   A  B  C  D
0  4  0  0  0
1  7  0  0  0
2  0  0  0  0
3  7  0  0  0
4  6  0  0  0

edited Sep 12, 2017 at 13:48

Dror

13.2k24 gold badges100 silver badges171 bronze badges

answered Jun 19, 2015 at 17:00

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Marco Spinaci Over a year ago

After experimenting with a moderately large Data Frame (~2.5k rows for 80k columns), and this solution appears to be orders of magnitude faster than the accepted one.BTW is there a reason why this specific command does not accept an "inplace=True" parameter? df = df.reindex(...) appears to use up quite a bit of RAM.

unutbu Over a year ago

@MarcoSpinaci: I recommend never using inplace=True. It doesn't do what most people think it does. Under the hood, an entirely new DataFrame is always created, and then the data from the new DataFrame is copied into the original DataFrame. That doesn't save any memory. So inplace=True is window-dressing without substance, and moreover, is misleadingly named. I haven't checked the code, but I expect df = df.reindex(...) requires at least 2x the memory required for df, and of course more when reindex is used to expand the number of rows.

toto_tico Over a year ago

@unutbu, nevertheless, it is useful when you are iterating containers, e.g. a list or a dictionary, it would avoid the use of indexes that makes the code a bit more dirty...

Sam Over a year ago

@unutbu it is indeed a lot faster when i profiled my ~200 columns creation code, could you briefly explain why doing reindex is much faster than concat or simply setting multiple columns to a numpy array?

floer32 · Accepted Answer · 2018-12-19 23:08:13Z

93

I'd concat using a DataFrame:

In [23]:
df = pd.DataFrame(columns=['A'])
df

Out[23]:
Empty DataFrame
Columns: [A]
Index: []

In [24]:    
pd.concat([df,pd.DataFrame(columns=list('BCD'))])

Out[24]:
Empty DataFrame
Columns: [A, B, C, D]
Index: []

So by passing a list containing your original df, and a new one with the columns you wish to add, this will return a new df with the additional columns.

Caveat: See the discussion of performance in the other answers and/or the comment discussions. reindex may be preferable where performance is critical.

edited Dec 19, 2018 at 23:08

floer32

2,2704 gold badges30 silver badges52 bronze badges

answered Jun 18, 2015 at 22:13

EdChum

397k204 gold badges836 silver badges583 bronze badges

Comments

toto_tico · Accepted Answer · 2017-12-05 09:30:05Z

50

If you don't want to rewrite the name of the old columns, then you can use reindex:

df.reindex(columns=[*df.columns.tolist(), 'new_column1', 'new_column2'], fill_value=0)

Full example:

In [1]: df = pd.DataFrame(np.random.randint(10, size=(3,1)), columns=['A'])

In [1]: df
Out[1]: 
   A
0  4
1  7
2  0

In [2]: df.reindex(columns=[*df.columns.tolist(), 'col1', 'col2'], fill_value=0)
Out[2]: 

   A  col1  col2
0  1     0     0
1  2     0     0

And, if you already have a list with the column names, :

In [3]: my_cols_list=['col1','col2']

In [4]: df.reindex(columns=[*df.columns.tolist(), *my_cols_list], fill_value=0)
Out[4]: 
   A  col1  col2
0  1     0     0
1  2     0     0

edited Dec 5, 2017 at 9:30

answered Jul 6, 2017 at 14:11

toto_tico

19.2k10 gold badges102 silver badges121 bronze badges

Comments

Yonas Kassa · Accepted Answer · 2020-07-15 15:33:29Z

12

Summary of alternative solutions:

columns_add = ['a', 'b', 'c']

for loop:

for newcol in columns_add:
    df[newcol]= None

dict method:

df.assign(**dict([(_,None) for _ in columns_add]))

tuple assignment:

df['a'], df['b'], df['c'] = None, None, None

answered Jul 15, 2020 at 15:33

Yonas Kassa

3,7901 gold badge22 silver badges27 bronze badges

1 Comment

Joe Ferndz Over a year ago

df.assign(**dict.fromkeys(columns_add, None)) should also work

alexprice · Accepted Answer · 2020-06-06 15:50:36Z

10

Why not just use loop:

for newcol in ['B','C','D']:
    df[newcol]=np.nan

edited Jun 6, 2020 at 15:50

answered May 4, 2019 at 17:04

alexprice

4145 silver badges13 bronze badges

1 Comment

smci Over a year ago

0 is not the same value as None. Also, it'll force the dtype to integer, whereas None won't.

Mykola Zotko · Accepted Answer · 2021-09-09 07:40:59Z

8

You can make use of Pandas broadcasting:

df = pd.DataFrame({'A': [1, 1, 1]})

df[['B', 'C']] = 2, 3
# df[['B', 'C']] = [2, 3]

Result:

To add empty columns:

df[['B', 'C', 'D']] = 3 * [np.nan]

Result:

   A   B   C   D
0  1 NaN NaN NaN
1  1 NaN NaN NaN
2  1 NaN NaN NaN

edited Sep 9, 2021 at 7:40

answered Sep 9, 2021 at 7:12

Mykola Zotko

18.2k6 gold badges88 silver badges90 bronze badges

Comments

jizhihaoSAMA · Accepted Answer · 2020-06-22 03:14:40Z

4

I'd use

df["B"], df["C"], df["D"] = None, None, None

or

df["B"], df["C"], df["D"] = ["None" for a in range(3)]

edited Jun 22, 2020 at 3:14

jizhihaoSAMA

12.7k9 gold badges32 silver badges52 bronze badges

answered Jun 22, 2020 at 2:38

lumiere_profues

411 bronze badge

Comments

Oleg O · Accepted Answer · 2019-11-20 12:26:04Z

1

Just to add to the list of funny ways:

columns_add = ['a', 'b', 'c']
df = df.assign(**dict(zip(columns_add, [0] * len(columns_add)))

answered Nov 20, 2019 at 12:26

Oleg O

1,0757 silver badges12 bronze badges

2 Comments

smci Over a year ago

0 is not the same value as None. Also, it'll force the dtype to integer, whereas None won't.

smci Over a year ago

Anyway you're missing a trailing fourth close-parenthesis.

Ka Wa Yip · Accepted Answer · 2024-07-04 15:07:28Z

0

You can do this now in Pandas 2.0

df = pd.DataFrame({'A': [1, 1, 1]})
df[['B','C','D']] = np.nan

or

df = pd.DataFrame({'A': [1, 1, 1]})
df[['B','C','D']] = None

assign np.nan or None to every entry of the three columns 'B', 'C' and 'D'.

answered Jul 4, 2024 at 15:07

Ka Wa Yip

3,0594 gold badges27 silver badges39 bronze badges

Collectives™ on Stack Overflow

Add multiple empty columns to pandas DataFrame

9 Answers 9

4 Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

4 Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related