Using VLOOKUP with merge in Python

Question

I have this pandas DataFrame with almost 540000 rows:

df1.head()

    username  hour    totalCount
0   lowi      00:00   12
1   klark     00:00   0
2   sturi     00:00   2
3   nukr      00:00   10
4   irore     00:00   2

I also have this other pandas DataFrame with almost 52000 rows and with some duplicated rows:

df2.head()

   username   community
0    klark       0
1    irore       2
2    sturi       2
3    sturi       2
4    sturi       2

I want to merge the column of 'community' of df2 into the df1, but in the corresponding row according to the username. I have used this code:

df_merge = df_hu.merge(df_comm, on='username')
df_merge

But I get the following DataFrame with almost 1205880 rows and duplicated ones:

    username    hour    totalCount  community
0   lowi        00:00   12          2
1   lowi        00:00   12          2
2   lowi        00:00   12          2
3   lowi        01:00   9           2
4   lowi        01:00   9           2

The expected output would be this:

df_merge.head()

    username  hour    totalCount community
0   lowi      00:00   12         2
1   klark     00:00   0          0
2   sturi     00:00   2          2
3   nukr      00:00   10         1 (not showed in the example)
4   irore     00:00   2          1 (not showed in the example)

Assuming there is only one community per username: df_hu.merge(df_comm.drop_duplicates(), on='username', how='left') — Alexander
– Alexander, Commented Jul 31, 2019 at 6:01

Chris · Accepted Answer · 2019-07-31 05:58:32Z

2

Using pandas.Series.map:

df2 = df2.drop_duplicates().set_index('username')
df1['community'] = df1['username'].map(df2['community'])
print(df1)

Output:

  username   hour  totalCount  community
0     lowi  00:00          12        NaN
1    klark  00:00           0        0.0
2    sturi  00:00           2        2.0
3     nukr  00:00          10        NaN
4    irore  00:00           2        2.0

Note that lowi and nukr weren't in the example df2 so NaN.

answered Jul 31, 2019 at 5:58

Chris

29.8k3 gold badges34 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mohamed Thasin ah Over a year ago

May I know why didn't you use merge instead of map. because I think merge is efficient than map

Chris Over a year ago

@MohamedThasinah Used map since it ran about 1.5x faster than merge in my environment.

anky Over a year ago

Yes, map is faster than merge for such usecases. :) @MohamedThasinah

Collectives™ on Stack Overflow

Using VLOOKUP with merge in Python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related