0

I have two Pandas dataframes, as shown below:

import pandas as pd
main_df = pd.DataFrame({
    'day1': [1, 2, 3, 4],
    'day2': [2, 1, 3, 4],
    'day3': [3, 1, 2, 5],
    'day4': [2, 1, 3, 5],
    'day5': [4, 1, 2, 3],
    'day6': [5, 3, 4, 2]}, index=['a', 'b', 'c', 'd'])

df = pd.DataFrame({
    'day1': [0, 1, 0],
    'day3': [0, 0, 1]
})

I want to add the columns in main_df to df and set their values to 0. My expected output is:

df
    day1  day2  day3  day4  day5 day6
0    0      0    0      0    0     0
1    1      0    0      0    0     0
2    0      0    1      0    0     0

I can do this the following way in a loop:

cols_to_add = main_df.columns[~main_df.columns.isin(df.columns)]
for c in cols_to_add:
    df[c] = 0

Is there a way I do it without looping? Note that the indices of both dataframes are different.

1
  • Please show your expected output and share your attempt at solving this. Thanks. Commented Aug 25, 2020 at 20:06

4 Answers 4

1

You can try using dict and assign:

cols_to_add = main_df.columns[~main_df.columns.isin(df.columns)]
d = dict.fromkeys(cols_to_add, 0)
df.assign(**d)

Or

pd.concat([df, pd.DataFrame(columns = cols_to_add)]).fillna(0)

   day1     day3    day2    day4    day5    day6
0   0       0       0       0       0       0
1   1       0       0       0       0       0
2   0       1       0       0       0       0
Sign up to request clarification or add additional context in comments.

Comments

1

You can just do

df[cols_to_add] = pd.DataFrame(columns=cols_to_add, index=df.index).fillna(0)

DataFrame slices accept compatible DataFrames as values

Comments

1

Please use df.reindex(columns=[x]) and join outcome to df

x in this instance is the column difference between main_df and df

df.join(df.reindex(columns=list(main_df.columns.difference(df.columns)))).fillna(0)

  day1     day3    day2    day4    day5    day6
0   0       0       0       0       0       0
1   1       0       0       0       0       0
2   0       1       0       0       0       0

Comments

0

You can try this:

import pandas as pd
import numpy as np

# 1 - List comprehension
col_to_add = [col for col in main_df.columns if col not in df.columns]
# 2 - Create values to add using pd.DataFrame constructor and numpy
zero_vals_df = pd.DataFrame(data=np.zeros((3,len(col_to_add)),dtype=int),columns=col_to_add,index=df.index)
# 3 - Join DataFrames to obtain the desired result
df = pd.concat([df,zero_vals_df],axis=1)

On my machine, using %%timeit magic cell I got the following performances:

429 µs ± 3.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Note that this method requires you to add numpy as a dependency.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.