4

The 2 dataframes I am comparing are of different size (have the same index though) and I suppose that is why I am getting the error. Can you please suggest me a way to get around that. I am looking for those rows in df2 whose user_id match with those of df1. Thanks and appreciate your response.

 data = np.array([['user_id','comment','label'],
            [100,'RT @Dvillain_: #oomf should text me.',0],
            [100,'Buy viagra',1],
            [101,'#nowplaying M.C. Shan - Juice Crew Law on',0],
            [101,'Buy viagra two',1]])

 data2 = np.array([['user_id','comment','label'],
            [100,'First comment',0],
            [100,'Buy viagra',1],
            [102,'Buy viagra two',1]])

df1 = pd.DataFrame(data=data[1:,0:],columns = data[0,0:])
df2 = pd.DataFrame(data=data2[1:,0:],columns = data[0,0:])

df = df2[df2['user_id'] == df1['user_id']]

1 Answer 1

3

You are looking for isin

df = df2[df2['user_id'].isin(df1['user_id'])]
df
Out[814]: 
  user_id        comment label
0     100  First comment     0
1     100     Buy viagra     1
Sign up to request clarification or add additional context in comments.

3 Comments

Just beat me to it!
@pault just a little bit quick .I am lucky:-)
@Chandan yw~ happy coding

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.