0

I have a dafarame like this:

df1:

     col1  col2        data1       data2    data3
0     A     A_1         2            4        5
1     A     A_2         11           58       87
2     A     A_3         14           24       54
3     B     B_1         3            6        9
4     B     B_2         1            38       77
5     B     B_3         54           13       10

and i also have a dataframe lis this one:

df2:

     col1  col2        sample1    sample2  sample3
0     A     A_0         98          57       102
2     A     A_1         6           13       5
2     A     A_2         13          52       17
3     A     A_3         8           29       50
4     B     B_0         60          75       98
5     B     B_1         3           6        9
6     B     B_2         1           8        77
7     B     B_3         2           1        10

So, how can I combine these dataframes, based on col1 and col2 and create a dataframe like this one:

     col1  col2        sample1    sample2  sample3     data1   data2   data3
0     A     A_0         98          57       102        NaN     NaN     NaN
2     A     A_1         6           13       5          2       4       5
2     A     A_2         13          52       17         11      58      87
3     A     A_3         8           29       50         14      24      54
4     B     B_0         60          75       98         NaN     NaN     NaN
5     B     B_1         3           6        9          3       6       9
6     B     B_2         1           8        77         1       38      77
7     B     B_3         2           1        10         54      13      10

2 Answers 2

1

Use pandas.merge The on argument defines what column you want to merge the dataframes on and the how keyword defines what type of merge you want. Please look at the documentation to confirm what type of merge you want. But I think in this case you want the outer merge.

print(pd.merge(df1, df2, on='col2',how='outer'))
Sign up to request clarification or add additional context in comments.

4 Comments

Hi @kinshukdua, when I try that it is removing index 0 and 4 where I have that A_0 and B_0, and I also want index 0 and 4 rows in the output dataframe.
@user14073111 I updated the answer to make it work.
Hey @kinshukdua, i just saw that when i use that line, then i have two col1_x and col1_y. How can i have only one col1 in the final dataframe?
@user14073111 I'm assuming they both have the same value, then you can delete one of the column and rename the other. I'll lyk if I find a better solution.
0

If you want to base your conditions for both the values of 'col1' and 'col2', this minor adjustment would certainly help.

data = pd.merge(df1, df2, on=['col1',col2'],how='outer')

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.