126

How do I add multiple empty columns to a DataFrame from a list?

I can do:

df["B"] = None
df["C"] = None
df["D"] = None

But I can't do:

df[["B", "C", "D"]] = None

KeyError: "['B' 'C' 'D'] not in index"

3
  • None is different to 0, but some answers are assuming it's equivalent. Also, assigning None will give a dtype of object, but assigning 0 will give a dtype of int. Commented Apr 19, 2020 at 11:11
  • Also you can't do df[['B','C','D']] = None, None, None or [None, None, None] or pd.DataFrame([None, None, None]) Commented Apr 19, 2020 at 11:13
  • Related : the more general How to add multiple columns to pandas dataframe in one assignment? Commented Apr 19, 2020 at 11:16

9 Answers 9

129

You could use df.reindex to add new columns:

In [18]: df = pd.DataFrame(np.random.randint(10, size=(5,1)), columns=['A'])

In [19]: df
Out[19]: 
   A
0  4
1  7
2  0
3  7
4  6

In [20]: df.reindex(columns=list('ABCD'))
Out[20]: 
   A   B   C   D
0  4 NaN NaN NaN
1  7 NaN NaN NaN
2  0 NaN NaN NaN
3  7 NaN NaN NaN
4  6 NaN NaN NaN

reindex will return a new DataFrame, with columns appearing in the order they are listed:

In [31]: df.reindex(columns=list('DCBA'))
Out[31]: 
    D   C   B  A
0 NaN NaN NaN  4
1 NaN NaN NaN  7
2 NaN NaN NaN  0
3 NaN NaN NaN  7
4 NaN NaN NaN  6

The reindex method as a fill_value parameter as well:

In [22]: df.reindex(columns=list('ABCD'), fill_value=0)
Out[22]: 
   A  B  C  D
0  4  0  0  0
1  7  0  0  0
2  0  0  0  0
3  7  0  0  0
4  6  0  0  0
Sign up to request clarification or add additional context in comments.

4 Comments

After experimenting with a moderately large Data Frame (~2.5k rows for 80k columns), and this solution appears to be orders of magnitude faster than the accepted one.BTW is there a reason why this specific command does not accept an "inplace=True" parameter? df = df.reindex(...) appears to use up quite a bit of RAM.
@MarcoSpinaci: I recommend never using inplace=True. It doesn't do what most people think it does. Under the hood, an entirely new DataFrame is always created, and then the data from the new DataFrame is copied into the original DataFrame. That doesn't save any memory. So inplace=True is window-dressing without substance, and moreover, is misleadingly named. I haven't checked the code, but I expect df = df.reindex(...) requires at least 2x the memory required for df, and of course more when reindex is used to expand the number of rows.
@unutbu, nevertheless, it is useful when you are iterating containers, e.g. a list or a dictionary, it would avoid the use of indexes that makes the code a bit more dirty...
@unutbu it is indeed a lot faster when i profiled my ~200 columns creation code, could you briefly explain why doing reindex is much faster than concat or simply setting multiple columns to a numpy array?
93

I'd concat using a DataFrame:

In [23]:
df = pd.DataFrame(columns=['A'])
df

Out[23]:
Empty DataFrame
Columns: [A]
Index: []

In [24]:    
pd.concat([df,pd.DataFrame(columns=list('BCD'))])

Out[24]:
Empty DataFrame
Columns: [A, B, C, D]
Index: []

So by passing a list containing your original df, and a new one with the columns you wish to add, this will return a new df with the additional columns.


Caveat: See the discussion of performance in the other answers and/or the comment discussions. reindex may be preferable where performance is critical.

Comments

50

If you don't want to rewrite the name of the old columns, then you can use reindex:

df.reindex(columns=[*df.columns.tolist(), 'new_column1', 'new_column2'], fill_value=0)

Full example:

In [1]: df = pd.DataFrame(np.random.randint(10, size=(3,1)), columns=['A'])

In [1]: df
Out[1]: 
   A
0  4
1  7
2  0

In [2]: df.reindex(columns=[*df.columns.tolist(), 'col1', 'col2'], fill_value=0)
Out[2]: 

   A  col1  col2
0  1     0     0
1  2     0     0

And, if you already have a list with the column names, :

In [3]: my_cols_list=['col1','col2']

In [4]: df.reindex(columns=[*df.columns.tolist(), *my_cols_list], fill_value=0)
Out[4]: 
   A  col1  col2
0  1     0     0
1  2     0     0

Comments

12

Summary of alternative solutions:

columns_add = ['a', 'b', 'c']
  1. for loop:

    for newcol in columns_add:
        df[newcol]= None
    
  2. dict method:

    df.assign(**dict([(_,None) for _ in columns_add]))
    
  3. tuple assignment:

    df['a'], df['b'], df['c'] = None, None, None
    

1 Comment

df.assign(**dict.fromkeys(columns_add, None)) should also work
10

Why not just use loop:

for newcol in ['B','C','D']:
    df[newcol]=np.nan

1 Comment

0 is not the same value as None. Also, it'll force the dtype to integer, whereas None won't.
8

You can make use of Pandas broadcasting:

df = pd.DataFrame({'A': [1, 1, 1]})

df[['B', 'C']] = 2, 3
# df[['B', 'C']] = [2, 3]

Result:

   A  B  C
0  1  2  3
1  1  2  3
2  1  2  3

To add empty columns:

df[['B', 'C', 'D']] = 3 * [np.nan]

Result:

   A   B   C   D
0  1 NaN NaN NaN
1  1 NaN NaN NaN
2  1 NaN NaN NaN

Comments

4

I'd use

df["B"], df["C"], df["D"] = None, None, None

or

df["B"], df["C"], df["D"] = ["None" for a in range(3)]

Comments

1

Just to add to the list of funny ways:

columns_add = ['a', 'b', 'c']
df = df.assign(**dict(zip(columns_add, [0] * len(columns_add)))

2 Comments

0 is not the same value as None. Also, it'll force the dtype to integer, whereas None won't.
Anyway you're missing a trailing fourth close-parenthesis.
0

You can do this now in Pandas 2.0

df = pd.DataFrame({'A': [1, 1, 1]})
df[['B','C','D']] = np.nan

or

df = pd.DataFrame({'A': [1, 1, 1]})
df[['B','C','D']] = None

assign np.nan or None to every entry of the three columns 'B', 'C' and 'D'.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.