0

I have 2 dataframes and I want to get df, where in first file I have a lot of data and in the second file I have a list of iD, that I want get from first file.

I use

merged = pd.merge(buys, chunk, left_on='id', right_on='ID')

where chunk - a part of first(big) file. And buys - file with list of id. In output file I have ID, that not in buys. What I do wrong?

buys:

id
7602962fb83ac2e2a0cb44158ca88464
bc8a731e4c7e6f6b96e56ebe7f766bcd
a703114aa8a03495c3e042647212fa63
77138e9245857e5449e9474293e31e19

chunk:

id  date
7602962fb83ac2e2a0cb44158ca88464    01.01.2016
7602962fb83ac2e2a0cb44158ca88464    02.01.2016
7602962fb83ac2e2a0cb44158ca88464    03.01.2016
77138e9245857e5449e9474293e31e19    09.05.2016
77138e9245857e5449e9474293e31e19    10.05.2016
671cfd6702c74f017209c2f1a888c279    10.01.2016
671cfd6702c74f017209c2f1a888c279    11.01.2016
029cfd6702c68f243423c2f1a234c232    11.03.2016

And I need to get

7602962fb83ac2e2a0cb44158ca88464    01.01.2016
7602962fb83ac2e2a0cb44158ca88464    02.01.2016
7602962fb83ac2e2a0cb44158ca88464    03.01.2016
77138e9245857e5449e9474293e31e19    09.05.2016
77138e9245857e5449e9474293e31e19    10.05.2016
4
  • 1
    Can you post a sample data and the desired output? Commented Jul 7, 2016 at 13:32
  • @JoeR, add dataframes Commented Jul 7, 2016 at 14:00
  • Add how=left to pd.merge. Commented Jul 7, 2016 at 14:03
  • Is it id or ID? I can't reproduce it. Could you prepare a small example and paste the output of print(buys), print(chunk), and pd.merge(buys, chunk, left_on='id', right_on='ID'). Commented Jul 7, 2016 at 14:29

1 Answer 1

0

IIUC you want to merge your two dataframes and just keeping the id present in buys? Then you cann pass the how option in your merge like this:

merged = pd.merge(buys, chunk, left_on='id', right_on='ID', how = 'left')

Note that if there are id in buys that are not in chunk.ID you will get NaN where the corresponding dates are missing. If you don't want that, change the how option to inner:

merged = pd.merge(buys, chunk, left_on='id', right_on='ID', how = 'inner')

This way you will only get the rows that are present in both dataframes.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.