4

I have the following wide df1:

Area geotype  type    ...
1      a        2      ...
1      a        1      ... 
2      b        4      ...
4      b        8      ...

And the following two-column df2:

Area   geotype
1      London
4      Cambridge

And I want the following:

Area  geotype  type    ...
1     London     2      ...
1     London     1      ... 
2       b        4      ...
4     Cambridge  8      ...

So I need to match based on the non-unique Area column, and then only if there is a match, replace the set values in the geotype column.

Apologies if this is a duplicate, I did actually search hard for a solution to this.

0

3 Answers 3

3

use update + map

df1.geotype.update(df1.Area.map(df2.set_index('Area').geotype))

   Area    geotype  type
0     1     London     2
1     1     London     1
2     2          b     4
3     4  Cambridge     8
Sign up to request clarification or add additional context in comments.

1 Comment

@jezrael fixed.
2

I think you can use map by Series created with set_index and then fill NaN values by combine_first or fillna:

df1.geotype = df1.ID.map(df2.set_index('ID')['geotype']).combine_first(df1.geotype)
#df1.geotype = df1.ID.map(df2.set_index('ID')['geotype']).fillna(df1.geotype)
print (df1)
   ID    geotype type
0   1     London    2
1   2          a    1
2   3          b    4
3   4  Cambridge   8e

Another solution with mask and numpy.in1d:

df1.geotype = df1.geotype.mask(np.in1d(df1.ID, df2.ID),
                               df1.ID.map(df2.set_index('ID')['geotype']))
print (df1)
   ID    geotype type
0   1     London    2
1   2          a    1
2   3          b    4
3   4  Cambridge   8e

EDIT by comment:

Problem is not unique ID values in df2 like:

df2 = pd.DataFrame({'ID': [1, 1, 4], 'geotype': ['London', 'Paris', 'Cambridge']})
print (df2)
   ID    geotype
0   1     London
1   1      Paris
2   4  Cambridge

So function map cannot choose right value and raise error.

Solution is remove duplicates by drop_duplicates, by default keep first value:

df2 = df2.drop_duplicates('ID')
print (df2)
   ID    geotype
0   1     London
2   4  Cambridge

Or if need keep last value:

df2 = df2.drop_duplicates('ID', keep='last')
print (df2)
   ID    geotype
1   1      Paris
2   4  Cambridge

If cannot remove duplicates, there is another solution with outer merge, but there are duplicated rows where is duplicated ID in df2:

df1 = pd.merge(df1, df2, on='ID', how='outer', suffixes=('_',''))
df1.geotype = df1.geotype.combine_first(df1.geotype_)
df1 = df1.drop('geotype_', axis=1)
print (df1)
   ID type    geotype
0   1    2     London
1   1    2      Paris
2   2    1          a
3   3    4          b
4   4   8e  Cambridge

2 Comments

Sorry, I got 'Reindexing only valid with uniquely valued Index objects' as the ID column is really an area column, so there are multiple entries.
I see problem - You have for one ID in df2 multiple values, so map is impossible - pandas does not know if need first value or first ID. You need unique values of ID in df2
2

alternative solution:

In [78]: df1.loc[df1.ID.isin(df2.ID), 'geotype'] = df1.ID.map(df2.set_index('ID').geotype)

In [79]: df1
Out[79]:
   ID    geotype  type
0   1     London     2
1   2          a     1
2   3          b     4
3   4  Cambridge     8

UPDATE: answers updated question - if you have duplicates in the Area column in the df2 DF:

In [152]: df1.loc[df1.Area.isin(df2.Area), 'geotype'] = df1.Area.map(df2.set_index('Area').geotype)
...
skipped
...
InvalidIndexError: Reindexing only valid with uniquely valued Index objects

get rid of duplicates:

In [153]: df1.loc[df1.Area.isin(df2.Area), 'geotype'] = df1.Area.map(df2.drop_duplicates(subset='Area').set_index('Area').geotype)

In [154]: df1
Out[154]:
   Area    geotype  type
0     1     London     2
1     1     London     1
2     2          b     4
3     4  Cambridge     8

3 Comments

Sorry, I got 'Reindexing only valid with uniquely valued Index objects' as the ID column is really an area column, so there are multiple entries.
@ThirstforKnowledge, do you also have duplicates in the df2 DF?
No just duplicates in df1 @MaxU

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.