0

I encountered a problem just like this Python pandas merge keyerror but none of the solutions worked for me. I assume some of the code isn't relevant but I just included it just in case. I basically want to calculate some gradients on some of the rows from one column and join it back to the original dataframe

import numpy as np
from scipy import signal

df = item_loc_df_dict['key_name']
prob_series = df['col1'].astype(float).values[5:(len(df['col1']) - 5)]
week_series = df['date_col'].values[5:(len(df['date_col']) - 5)]

gradient = np.square(np.gradient(prob_series))
win = signal.windows.hann(8)
gradients = signal.convolve(gradient, win, mode='same') / sum(win)
grad_df = pd.DataFrame({'date_col': week_series, 'peak_grads': gradients})

gradient_mean = np.mean(gradients)
gradient_std = np.std(gradients)
min_height = gradient_mean + gradient_std
peak_indices = signal.find_peaks(gradients, height=min_height)[0]
peak_grad_df = grad_df.iloc[peak_indices].reset_index(drop=True)

peak_grad_df['date_col'] = peak_grad_df['date_col'].astype('datetime64[ns]')
df['date_col'] = df['date_col'].astype('datetime64[ns]')

joined_df = df.merge(peak_grad_df, left_on=['date_col'], right_on=['date_col'], how='left')
joined_df['peak_grads'].fillna(0, inplace=True)

I would get KeyError: peak_grads on the last line, so the merge wasn't including the column from my right dataframe. The left dataframe df has about 200 columns, so I tried adding

df = df[['date_col', 'col2', 'col3']]

to the top of my code, and then the KeyError went away and my merge worked as expected. Why does this merge only work when I reduce the number of columns from my left dataframe?

1
  • 1
    I would take a look at what the dataframes look like before and after merge. My initial guess is that you have a "peak_grads" column in both which, when you merge, would create "peak_grads_x" and "peak_grads_y" columns. When you subset your dataframe, you remove the problematic column Commented Sep 7, 2022 at 21:03

1 Answer 1

1

Use suffixes parameter or print columns (joined_df.columns) to test @christina Stebbins hypothesis

joined_df = df.merge(peak_grad_df, left_on=['date_col'], right_on=['date_col'], how='left', suffixes=(False, False)))

This must raise "ValueError: columns overlap but no suffix specified: Index(['peak_grads']"

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.