How do I add multiple empty columns to a DataFrame from a list?
I can do:
df["B"] = None
df["C"] = None
df["D"] = None
But I can't do:
df[["B", "C", "D"]] = None
KeyError: "['B' 'C' 'D'] not in index"
You could use df.reindex to add new columns:
In [18]: df = pd.DataFrame(np.random.randint(10, size=(5,1)), columns=['A'])
In [19]: df
Out[19]:
A
0 4
1 7
2 0
3 7
4 6
In [20]: df.reindex(columns=list('ABCD'))
Out[20]:
A B C D
0 4 NaN NaN NaN
1 7 NaN NaN NaN
2 0 NaN NaN NaN
3 7 NaN NaN NaN
4 6 NaN NaN NaN
reindex will return a new DataFrame, with columns appearing in the order they are listed:
In [31]: df.reindex(columns=list('DCBA'))
Out[31]:
D C B A
0 NaN NaN NaN 4
1 NaN NaN NaN 7
2 NaN NaN NaN 0
3 NaN NaN NaN 7
4 NaN NaN NaN 6
The reindex method as a fill_value parameter as well:
In [22]: df.reindex(columns=list('ABCD'), fill_value=0)
Out[22]:
A B C D
0 4 0 0 0
1 7 0 0 0
2 0 0 0 0
3 7 0 0 0
4 6 0 0 0
inplace=True. It doesn't do what most people think it does. Under the hood, an entirely new DataFrame is always created, and then the data from the new DataFrame is copied into the original DataFrame. That doesn't save any memory. So inplace=True is window-dressing without substance, and moreover, is misleadingly named. I haven't checked the code, but I expect df = df.reindex(...) requires at least 2x the memory required for df, and of course more when reindex is used to expand the number of rows.I'd concat using a DataFrame:
In [23]:
df = pd.DataFrame(columns=['A'])
df
Out[23]:
Empty DataFrame
Columns: [A]
Index: []
In [24]:
pd.concat([df,pd.DataFrame(columns=list('BCD'))])
Out[24]:
Empty DataFrame
Columns: [A, B, C, D]
Index: []
So by passing a list containing your original df, and a new one with the columns you wish to add, this will return a new df with the additional columns.
Caveat: See the discussion of performance in the other answers and/or the comment discussions. reindex may be preferable where performance is critical.
If you don't want to rewrite the name of the old columns, then you can use reindex:
df.reindex(columns=[*df.columns.tolist(), 'new_column1', 'new_column2'], fill_value=0)
Full example:
In [1]: df = pd.DataFrame(np.random.randint(10, size=(3,1)), columns=['A'])
In [1]: df
Out[1]:
A
0 4
1 7
2 0
In [2]: df.reindex(columns=[*df.columns.tolist(), 'col1', 'col2'], fill_value=0)
Out[2]:
A col1 col2
0 1 0 0
1 2 0 0
And, if you already have a list with the column names, :
In [3]: my_cols_list=['col1','col2']
In [4]: df.reindex(columns=[*df.columns.tolist(), *my_cols_list], fill_value=0)
Out[4]:
A col1 col2
0 1 0 0
1 2 0 0
Summary of alternative solutions:
columns_add = ['a', 'b', 'c']
for loop:
for newcol in columns_add:
df[newcol]= None
dict method:
df.assign(**dict([(_,None) for _ in columns_add]))
tuple assignment:
df['a'], df['b'], df['c'] = None, None, None
df.assign(**dict.fromkeys(columns_add, None)) should also workWhy not just use loop:
for newcol in ['B','C','D']:
df[newcol]=np.nan
You can make use of Pandas broadcasting:
df = pd.DataFrame({'A': [1, 1, 1]})
df[['B', 'C']] = 2, 3
# df[['B', 'C']] = [2, 3]
Result:
A B C
0 1 2 3
1 1 2 3
2 1 2 3
To add empty columns:
df[['B', 'C', 'D']] = 3 * [np.nan]
Result:
A B C D
0 1 NaN NaN NaN
1 1 NaN NaN NaN
2 1 NaN NaN NaN
Just to add to the list of funny ways:
columns_add = ['a', 'b', 'c']
df = df.assign(**dict(zip(columns_add, [0] * len(columns_add)))
Noneis different to 0, but some answers are assuming it's equivalent. Also, assigningNonewill give a dtype of object, but assigning 0 will give a dtype of int.df[['B','C','D']] = None, None, Noneor[None, None, None]orpd.DataFrame([None, None, None])