python pandas merge missing columns from right dataframe

Question

I encountered a problem just like this Python pandas merge keyerror but none of the solutions worked for me. I assume some of the code isn't relevant but I just included it just in case. I basically want to calculate some gradients on some of the rows from one column and join it back to the original dataframe

import numpy as np
from scipy import signal

df = item_loc_df_dict['key_name']
prob_series = df['col1'].astype(float).values[5:(len(df['col1']) - 5)]
week_series = df['date_col'].values[5:(len(df['date_col']) - 5)]

gradient = np.square(np.gradient(prob_series))
win = signal.windows.hann(8)
gradients = signal.convolve(gradient, win, mode='same') / sum(win)
grad_df = pd.DataFrame({'date_col': week_series, 'peak_grads': gradients})

gradient_mean = np.mean(gradients)
gradient_std = np.std(gradients)
min_height = gradient_mean + gradient_std
peak_indices = signal.find_peaks(gradients, height=min_height)[0]
peak_grad_df = grad_df.iloc[peak_indices].reset_index(drop=True)

peak_grad_df['date_col'] = peak_grad_df['date_col'].astype('datetime64[ns]')
df['date_col'] = df['date_col'].astype('datetime64[ns]')

joined_df = df.merge(peak_grad_df, left_on=['date_col'], right_on=['date_col'], how='left')
joined_df['peak_grads'].fillna(0, inplace=True)

I would get KeyError: peak_grads on the last line, so the merge wasn't including the column from my right dataframe. The left dataframe df has about 200 columns, so I tried adding

df = df[['date_col', 'col2', 'col3']]

to the top of my code, and then the KeyError went away and my merge worked as expected. Why does this merge only work when I reduce the number of columns from my left dataframe?

I would take a look at what the dataframes look like before and after merge. My initial guess is that you have a "peak_grads" column in both which, when you merge, would create "peak_grads_x" and "peak_grads_y" columns. When you subset your dataframe, you remove the problematic column — Christina Stebbins
– Christina Stebbins, Commented Sep 7, 2022 at 21:03

Fernanda Milagres · Accepted Answer · 2022-09-07 21:51:02Z

1

Use suffixes parameter or print columns (joined_df.columns) to test @christina Stebbins hypothesis

joined_df = df.merge(peak_grad_df, left_on=['date_col'], right_on=['date_col'], how='left', suffixes=(False, False)))

This must raise "ValueError: columns overlap but no suffix specified: Index(['peak_grads']"

answered Sep 7, 2022 at 21:51

Fernanda Milagres

866 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

python pandas merge missing columns from right dataframe

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related