0

I have this pandas DataFrame with almost 540000 rows:

df1.head()

    username  hour    totalCount
0   lowi      00:00   12
1   klark     00:00   0
2   sturi     00:00   2
3   nukr      00:00   10
4   irore     00:00   2

I also have this other pandas DataFrame with almost 52000 rows and with some duplicated rows:

df2.head()

   username   community
0    klark       0
1    irore       2
2    sturi       2
3    sturi       2
4    sturi       2

I want to merge the column of 'community' of df2 into the df1, but in the corresponding row according to the username. I have used this code:

df_merge = df_hu.merge(df_comm, on='username')
df_merge

But I get the following DataFrame with almost 1205880 rows and duplicated ones:

    username    hour    totalCount  community
0   lowi        00:00   12          2
1   lowi        00:00   12          2
2   lowi        00:00   12          2
3   lowi        01:00   9           2
4   lowi        01:00   9           2

The expected output would be this:

df_merge.head()

    username  hour    totalCount community
0   lowi      00:00   12         2
1   klark     00:00   0          0
2   sturi     00:00   2          2
3   nukr      00:00   10         1 (not showed in the example)
4   irore     00:00   2          1 (not showed in the example)
1
  • 1
    Assuming there is only one community per username: df_hu.merge(df_comm.drop_duplicates(), on='username', how='left') Commented Jul 31, 2019 at 6:01

1 Answer 1

2

Using pandas.Series.map:

df2 = df2.drop_duplicates().set_index('username')
df1['community'] = df1['username'].map(df2['community'])
print(df1)

Output:

  username   hour  totalCount  community
0     lowi  00:00          12        NaN
1    klark  00:00           0        0.0
2    sturi  00:00           2        2.0
3     nukr  00:00          10        NaN
4    irore  00:00           2        2.0

Note that lowi and nukr weren't in the example df2 so NaN.

Sign up to request clarification or add additional context in comments.

3 Comments

May I know why didn't you use merge instead of map. because I think merge is efficient than map
@MohamedThasinah Used map since it ran about 1.5x faster than merge in my environment.
Yes, map is faster than merge for such usecases. :) @MohamedThasinah

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.