1

I have 4 data frames as per below

df = pd.DataFrame({_id:[1,2,3,4], name:[Charan, Kumar, Nikhil, Kumar], })

df1 = pd.DataFrame({_id:[1,3,4], count_of_apple:[5,3,1]})


df2 = pd.DataFrame({_id:[1,2,3], count_of_organge:[8,4,6]})


df3 = pd.DataFrame({_id:[2,3,4], count_of_lime:[7,9,2]})

I want to merge all the data frames to a single data frame called a final

I have tried using PD.merge but the problem with it is I have to do it 3 different times is there a simpler way of doing it?

I used the below code to get the result

final = pd.merge(df, df1, on='_id', how='left')


final = pd.merge(final, df2, on='_id', how='left')


final = pd.merge(final, df3, on='_id', how='left')

I would want the final result to be something like this

final.head()

_id | name | count of orange | count of apple | count of lime

1 | Charan | 5 | 8 | Na

2 | Kumar | Na | 4 | 7

3 | Nikhil | 3 | 6 | 9

4 | Kumar | 1 | Na | 2

2 Answers 2

1

You can use concat, but first necessary convert _id to index for each DataFrame by DataFrame.set_index:

dfs = [df, df1, df2, df3]

df = pd.concat([x.set_index('_id') for x in dfs], axis=1).reset_index()

What is same like:

df = df.set_index('_id')
df1 = df1.set_index('_id')
df2 = df2.set_index('_id')
df3 = df3.set_index('_id')

df = pd.concat([df, df1, df2, df3], axis=1).reset_index()

print (df)
   _id    name  count_of_apple  count_of_organge  count_of_lime
0    1  Charan             5.0               8.0            NaN
1    2   Kumar             NaN               4.0            7.0
2    3  Nikhil             3.0               6.0            9.0
3    4   Kumar             1.0               NaN            2.0
Sign up to request clarification or add additional context in comments.

Comments

0

From Documentation https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

In [1]: df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
   ...:                     'B': ['B0', 'B1', 'B2', 'B3'],
   ...:                     'C': ['C0', 'C1', 'C2', 'C3'],
   ...:                     'D': ['D0', 'D1', 'D2', 'D3']},
   ...:                    index=[0, 1, 2, 3])
   ...:

In [8]: df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'],
   ...:                     'D': ['D2', 'D3', 'D6', 'D7'],
   ...:                     'F': ['F2', 'F3', 'F6', 'F7']},
   ...:                    index=[2, 3, 6, 7])
   ...: 

In [9]: result = pd.concat([df1, df4], axis=1, sort=False)

Output: enter image description here

1 Comment

you need to provide index column so in every data frame you need to set index like df.set_index('_id'), then it will work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.