0

I have a dataframe. It contains df['article_id'] and df['user_id']. I also have a numpy array(or a list. I figured np array would be faster for this). Which contains an article_id and a user_id. The point is to compare the df with the np array so I can filter out duplicate entries. Both user_id and article_id need to be the same value. So the idea is:

if df['article_id'] == nparray[:,0] & df['user_id'] == nparray[:,1]:
    remove the row from the dataframe

Here's what the df & np.array/list look like(as of now there is only 1 user_id but there will be more later). So if the np.array contains the same values from the dataframe, the dataframe rows should be deleted.:

array([[1127087222,          1],
       [1202623831,          1],
       [1747352473,          1],
       [1748645480,          1],
       [1759957596,          1],
       [1811054956,          1]])

    user_id article_id  date_saved
0   1   2579244390  2019-05-09 10:46:23
1   1   2580336884  2019-05-09 10:46:22
2   1   1202623831  2019-05-09 10:46:20
3   1   2450784233  2019-01-11 12:36:44
4   1   1747352473  2019-01-03 21:38:34

Desired output:

    user_id article_id  date_saved
0   1   2579244390  2019-05-09 10:46:23
1   1   2580336884  2019-05-09 10:46:22
3   1   2450784233  2019-01-11 12:36:44

How can I achieve this?

3
  • 2
    can you post a small sample of the dataframe, list and desired output? Commented May 10, 2019 at 17:59
  • @Vink I added some code snippets Commented May 11, 2019 at 15:53
  • @SomeName: check my update Commented May 11, 2019 at 19:49

1 Answer 1

1

After your clarification. You may achieve your desired output using np.isin and negate operator '~' as follows:

df[~np.isin(df[['user_id', 'article_id']], nparray)]

Out[17]:
   user_id  article_id           date_saved
0        1  2579244390  2019-05-09 10:46:23
1        1  2580336884  2019-05-09 10:46:22
3        1  2450784233  2019-01-11 12:36:44
Sign up to request clarification or add additional context in comments.

1 Comment

They do not have the same shape. I added code snippet what they look like. Df has 2 more columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.