Python Update only 1 column from another dataframe on index value

Question

I have 2 dataframes df1 & df2. The df2 dataframes is a subset of df1 I extracted to do some cleaning. Both dataframes can be matched on index. I seen a lot of merges on the site. I don't want to add more columns to df1 and the dataframes are not the same size df1 has 1000 rows and df2 has 275 rows so i don't want to replace the entire column. I want to update df1['AgeBin'] with the df2['AgeBin'] values where the indexes for these dataframes have a match.

indexes = df.loc[df.AgeBin.isin(dfage_test.AgeBin.values)].index
df1.at[indexes,'AgeBin'] = df2['AgeBin'].values

This was what I came up with but seems like there is an issue since df's are different sizes

ValueError: Must have equal len keys and value when setting with an iterable

Below is an over simplification. df1 has 26 columns and df2 has 12 columns the Agebin is the last column in both dfs. This is in theory what my objective is

df2
    AgeBin
0     2 
1     3 
2     1 
3     3 


df1
    AgeBin
0     NaN 
1     NaN 
2     NaN 
3     NaN 

df1 after update
    AgeBin
0     2 
1     3 
2     1 
3     3

Here are dataframe specs

RangeIndex: 1309 entries, 0 to 1308
Data columns (total 26 columns):
 #   Column            Non-Null Count  Dtype   
---  ------            --------------  -----   
 0   PassengerId       1046 non-null   float64 
 1   Survived          714 non-null    category
 2   Pclass            1046 non-null   category
 3   Name              1046 non-null   object  
 4   Sex               1046 non-null   object  
 5   Age               1046 non-null   float64 
 6   SibSp             1046 non-null   float64 
 7   Parch             1046 non-null   float64 
 8   Ticket            1046 non-null   object  
 9   Fare              1046 non-null   float64 
 10  Embarked          1046 non-null   category
 11  FamilySize        1046 non-null   float64 
 12  Surname           1046 non-null   object  
 13  Title             1046 non-null   object  
 14  IsChild           1046 non-null   float64 
 15  isMale            1046 non-null   category
 16  GroupID           1046 non-null   float64 
 17  GroupSize         1046 non-null   float64 
 18  GroupType         1046 non-null   object  
 19  GroupNumSurvived  1046 non-null   float64 
 20  GroupNumPerished  1046 non-null   float64 
 21  LargeGroup        1046 non-null   float64 
 22  SplitFare         1046 non-null   float64 
 23  log10Fare         1046 non-null   float64 
 24  log10SplitFare    1046 non-null   float64 
 25  AgeBin            1046 non-null   category
dtypes: category(5), float64(15), object(6)
memory usage: 221.9+ KB
  

dfageResults.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 263 entries, 5 to 1308
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   AgeBin  263 non-null    category
dtypes: category(1)
memory usage: 12.4 KB

Here are the categories

67] dfageResults.groupby(["AgeBin"])["AgeBin"].count()
AgeBin
0-14      25
15-29    192
30-44     46
Name: AgeBin, dtype: int64

[68] df.groupby(["AgeBin"])["AgeBin"].count()
AgeBin
0-14     107
15-29    462
30-44    301
45-59    136
60+       40
Name: AgeBin, dtype: int64

I tried your solution and it did not work. Thanks tho for helping! — Leo Torres
– Leo Torres, Commented Jul 7, 2020 at 2:39

Georgina Skibinski · Accepted Answer · 2020-07-07 12:05:25Z

1

Assuming all the indexed in df2 exist in df1 (which I understand is the case) - the below will suffice:

df1.loc[df2.index,:]=df2

In case if the above assumption for index won't hold - this is the alternative (same result - updates only existing indexes in df1):

df1.loc[set(df2.index).intersection(set(df1.index)),:]=df2

Sample output (with more representative sample data):

import pandas as pd
import numpy as np

df1=pd.DataFrame({"AgeBin": [1,2,3,'x', np.nan,np.nan,'a']})

df2=pd.DataFrame({"AgeBin": ['new1', 'new2', 123]}, index=[5,2,3])

print(df1)
print(df2)
df1.loc[df2.index,:]=df2
print(df1)

Outputs:

  AgeBin
0      1
1      2
2      3
3      x
4    NaN
5    NaN
6      a

  AgeBin
5   new1
2   new2
3    123

  AgeBin
0      1
1      2
2   new2
3    123
4    NaN
5   new1
6      a

answered Jul 7, 2020 at 12:05

Georgina Skibinski

13.5k2 gold badges16 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Leo Torres Over a year ago

This looks great! it also makes sense. But I am getting am error

Leo Torres Over a year ago

ValueError: Cannot set a Categorical with another, without identical categories

Leo Torres Over a year ago

I reduced df2 to only 1 column and its index to make this easier. I also changed the df2 column values to categorical that match df1. Yet I am still getting error. I am python 3.7.7 if that helps.

Leo Torres Over a year ago

Remember the df1 has 26 columns i have reduced df2 to just 1 now.

Georgina Skibinski Over a year ago

Try df1.loc[df2.index,'AgeBin']=df2

|

marphlap · Accepted Answer · 2020-07-07 11:53:41Z

0

Try:

print('df2')
print(df2)

print('\ndf1')
print(df1)

df1.update(df2)

print('\ndf1 after update')
print(df1)

Output:

df2
  AgeBin
0  2    
1  3    
2  1    
3  3    

df1
   AgeBin
0 NaN    
1 NaN    
2 NaN    
3 NaN    

df1 after update
  AgeBin
0  2    
1  3    
2  1    
3  3

edited Jul 7, 2020 at 11:53

answered Jul 7, 2020 at 2:41

marphlap

3811 silver badge5 bronze badges

5 Comments

Leo Torres Over a year ago

your out put does not match my desired result? Also I am not sure I follow your logic how am i supposed to implement your code on 1000+ rows with 275 possible results?

marphlap Over a year ago

Sorry I created my own dataset before. I also tested this out on different size dataframes and it worked fine.

Leo Torres Over a year ago

I see the revised code i am getting the same error as above

Leo Torres Over a year ago

ValueError: Cannot set a Categorical with another, without identical categories

marphlap Over a year ago

I apologize I thought the number of rows was the only thing different in your problem not the number of columns.

Collectives™ on Stack Overflow

Python Update only 1 column from another dataframe on index value

2 Answers 2

9 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related