Pandas Python: Concatenate dataframes having same columns

Question

I have 3 dataframes having the same column names as each other. Say :

df1
column1   column2   column3
a         b         c
d         e         f


df2
column1   column2   column3
g         h         i
j         k         l


df3
column1   column2   column3
m         n         o
p         q         r

Each dataframe has different values but the same columns. I tried append and concat, as well as merge outer but had errors. Here's what I tried:

df_final = df1.append(df2, sort=True,ignore_index=True).append2(df3, sort=True,ignore_index=True)

I also tried: df_final = pd.concat([df1, df2, df3], axis=1)

But I get this error: AssertionError: Number of manager items must equal union of block items# manager items: 61, # tot_items: 62

I've googled the error but I can't seem to understand why it's happening in my case. Any guidance is much appreciated!

I think there is problem with duplicated columns names in some DataFrame. — jezrael
– jezrael, Commented Sep 6, 2018 at 12:31
Your code and sample dataframes are working fine. But gives different outptut. — Space Impact
– Space Impact, Commented Sep 6, 2018 at 12:33

jezrael · Accepted Answer · 2018-09-06 12:44:58Z

11

I think there is problem with duplicated columns names in some or all DataFrames.

#simulate error
df1.columns = ['column3','column1','column1']
df2.columns = ['column5','column1','column1']
df3.columns = ['column2','column1','column1']

df_final = pd.concat([df1, df2, df3])

AssertionError: Number of manager items must equal union of block items # manager items: 4, # tot_items: 5

You can find duplicated columns names:

print (df3.columns[df3.columns.duplicated(keep=False)])
Index(['column1', 'column1'], dtype='object')

Possible solutions is set columns names by list:

df3.columns = ['column1','column2','column3']
print (df3)
  column1 column2 column3
0       m       n       o
1       p       q       r

Or remove duplicated columns with dupe names:

df31 = df3.loc[:, ~df3.columns.duplicated()]
print (df31)
  column2 column1
0       m       n
1       p       q

Then concat or append should working nice.

edited Sep 6, 2018 at 12:44

answered Sep 6, 2018 at 12:37

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

GeoSal Over a year ago

That was it actually. Thanks a lot!! The thing is I have so many columns and I've renamed some for better readability and due to a copy/paste I forgot to rename one column so I had two different columns with the same name

0-_-0 · Accepted Answer · 2020-10-05 23:22:29Z

2

given

df1
column1   column2   column3
a         b         c
d         e         f


df2
column1   column2   column3
g         h         i
j         k         l

You can specify a suffix when using the df.join() method.

df1.join(df2, lsuffix="_first", rsuffix=("_second"))

Which will result in a single data frame of

df1
column1_first   column2_first   column3_first   column1_second   column2_second   columnd 2_second
a               b               c               g                h                i
d               e               f               j                k                l

answered Oct 5, 2020 at 23:22

0-_-0

1,55318 silver badges18 bronze badges

Comments

Hassoo · Accepted Answer · 2024-10-04 15:07:17Z

2

Try without providing axis example:

import pandas as pd
mydict1 = {'column1' : ['a','d'],
          'column2' : ['b','e'],
          'column3' : ['c','f']}
mydict2 = {'column1' : ['g','j'],
          'column2' : ['h','k'],
          'column3' : ['i','i']}
mydict3= {"column1":['m','p'],
          "column2":['n','q'],
          "column3":['o','r']}
df1=pd.DataFrame(mydict1)
df2=pd.DataFrame(mydict2)
df3=pd.DataFrame(mydict3)

pd.concat([df1,df2,df3],ignore_index=False)

Output

     column1    column2    column3
0      a           b         c
1      d           e         f
0      g           h         i
1      j           k         i
0      m           n         o
1      p           q         r

edited Oct 4, 2024 at 15:07

Hassoo

3802 silver badges13 bronze badges

answered Sep 6, 2018 at 12:37

mad_

8,2832 gold badges32 silver badges46 bronze badges

1 Comment

Hassoo Over a year ago

ignore_index should be set to false to achieve the output you described

Chandu · Accepted Answer · 2018-09-06 12:49:16Z

0

You can remove axis=1 in your code

import pandas as pd
a = {"column1":['a','d'],
     "column2":['b','e'],
     "column3":['c','f']}
b = {"column1":['g','j'],
     "column2":['h','k'],
     "column3":['i','l']}

c = {"column1":['m','p'],
      "column2":['n','q'],
      "column3":['o','r']}


df1 = pd.DataFrame(a)
df2 = pd.DataFrame(b)
df3 = pd.DataFrame(c)

df_final = pd.concat([df1, df2, df3]) #.reset_index()
print(df_final)

#output
    column1 column2 column3
0       a       b       c
1       d       e       f
0       g       h       i
1       j       k       l
0       m       n       o
1       p       q       r

edited Sep 6, 2018 at 12:49

answered Sep 6, 2018 at 12:37

Chandu

2,1393 gold badges28 silver badges40 bronze badges

Collectives™ on Stack Overflow

Pandas Python: Concatenate dataframes having same columns

4 Answers 4

1 Comment

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related