2

Given these data frames...:

DF = pd.DataFrame({'COL1': ['A', 'B', 'C', 'D','D','D'], 
                   'COL2': [11032, 1960, 11400, 11355, 8, 7], 
                   'year': ['2016', '2017', '2018', '2019', '2020', '2021']})
DF

   COL1 COL2    year
0   A   11032   2016
1   B   1960    2017
2   C   11400   2018
3   D   11355   2019
4   D   8       2020
5   D   7       2021

DF2 = pd.DataFrame({'ColX': ['D'], 'ColY':['2021'], 'ColZ':[100]
DF2
        ColX   ColY    ColZ
   0     D      2021   100

If the following conditions are met:

COL1 = ColX from DF2

year = ColY from DF2

Then change the value in COL2 to ColZ from DF2.

4
  • What if there were multiple ColZ values for the same matching pairs of ColX and ColY? Commented Oct 9, 2015 at 3:50
  • There will not be, I promise. Commented Oct 9, 2015 at 4:10
  • DF2['ColY'] should be ['2021'] correct? It says 2012, but 2021 in the output. Commented Oct 9, 2015 at 4:14
  • Yes, sorry. I'll fix. Commented Oct 9, 2015 at 4:14

1 Answer 1

2

This looks like you want to update DF with data from DF2.

Assuming that all values in DF2 are unique for a given pair of values in ColX and ColY:

DF = DF.merge(DF2.set_index(['ColX', 'ColY'])[['ColZ']], 
              how='left', 
              left_on=['COL1', 'year'], 
              right_index=True)
DF.COL2.update(DF.ColZ)
del DF['ColZ']

>>> DF
  COL1   COL2  year
0    A  11032  2016
1    B   1960  2017
2    C  11400  2018
3    D  11355  2019
4    D      8  2020
5    D    100  2021

I merge a temporary dataframe (DF2.set_index(['ColX', 'ColY'])[['ColZ']]) into DF, which adds all the values from ColZ where its index (ColX and ColY) match the values from COL1 and year in DF. All non-matching values are filled with NA.

I then use update to overwrite the values in DF.COL2 from the non-null values in DF.ColZ.

I then delete DF['ColZ'] to clean-up.

If ColZ matches an existing column name in DF, then you would need to make some adjustments.

An alternative solution is as follows:

DF = DF.set_index(['COL1', 'year']).update(DF2.set_index(['ColX', 'ColY']))
DF.reset_index(inplace=True)

The output is identical to that above.

Sign up to request clarification or add additional context in comments.

3 Comments

As the song goes:Thank you...thank you...thank God for you the wind beneath my wings...
One more thing (hopefully): What if I wanted to add the condition: if less than all conditions (2) are met (found), replace the current value with 'n/a'?
With the first method above, I believe DF.ColZ will give you what you want (i.e. don't delete it). It is all matching values from DF2 given your two conditions, with n/a for unmatched values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.