0

I am trying to replace hours in df with hours from replacements for project IDs that exist in both dataframes:

import pandas as pd

df = pd.DataFrame({
    'project_ids': [1, 2, 3, 4, 5],
    'hours': [111, 222, 333, 444, 555],
    'else' :['a', 'b', 'c', 'd', 'e']
})

replacements = pd.DataFrame({
    'project_ids': [2, 5, 3],
    'hours': [666, 999, 1000],
})

for project in replacements['project_ids']:
    df.loc[df['project_ids'] == project, 'hours'] = replacements.loc[replacements['project_ids'] == project, 'hours']

print(df)

However, only the project ID 3 gets correct assignment (1000), but both 2 and 5 get NaN:

 projects   hours else
0         1   111.0    a
1         2     NaN    b
2         3  1000.0    c
3         4   444.0    d
4         5     NaN    e
  1. How can I fix it?
  2. Is there a better way to do this?

3 Answers 3

1

Use Series.map with another Series created by replacements with DataFrame.set_index:

s = replacements.set_index('project_ids')['hours']
df['hours'] = df['project_ids'].map(s).fillna(df['hours'])
print(df)
   project_ids   hours else
0            1   111.0    a
1            2   666.0    b
2            3  1000.0    c
3            4   444.0    d
4            5   999.0    e
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! Am I correct that df['project_ids'].map(s) is looking for each project ID from df in the index of s, and returns hours value for this index when found?
@barciewicz - exactly. And if no match, there are NaNs, which are replace by original by .fillna(df['hours'])
1

Another way using df.update():

m=df.set_index('project_ids')
m.update(replacements.set_index('project_ids')['hours'])
print(m.reset_index())

   project_ids   hours else
0            1   111.0    a
1            2   666.0    b
2            3  1000.0    c
3            4   444.0    d
4            5   999.0    e

Comments

0

Another solution would be to use pandas.merge and then use fillna:

df_new = pd.merge(df, replacements, on='project_ids', how='left', suffixes=['_1', ''])
df_new['hours'].fillna(df_new['hours_1'], inplace=True)
df_new.drop('hours_1', axis=1, inplace=True)

print(df_new)
   project_ids else   hours
0            1    a   111.0
1            2    b   666.0
2            3    c  1000.0
3            4    d   444.0
4            5    e   999.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.