9

I have 3 dataframes having the same column names as each other. Say :

df1
column1   column2   column3
a         b         c
d         e         f


df2
column1   column2   column3
g         h         i
j         k         l


df3
column1   column2   column3
m         n         o
p         q         r

Each dataframe has different values but the same columns. I tried append and concat, as well as merge outer but had errors. Here's what I tried:

df_final = df1.append(df2, sort=True,ignore_index=True).append2(df3, sort=True,ignore_index=True)

I also tried: df_final = pd.concat([df1, df2, df3], axis=1)

But I get this error: AssertionError: Number of manager items must equal union of block items# manager items: 61, # tot_items: 62

I've googled the error but I can't seem to understand why it's happening in my case. Any guidance is much appreciated!

2
  • 1
    I think there is problem with duplicated columns names in some DataFrame. Commented Sep 6, 2018 at 12:31
  • Your code and sample dataframes are working fine. But gives different outptut. Commented Sep 6, 2018 at 12:33

4 Answers 4

11

I think there is problem with duplicated columns names in some or all DataFrames.

#simulate error
df1.columns = ['column3','column1','column1']
df2.columns = ['column5','column1','column1']
df3.columns = ['column2','column1','column1']

df_final = pd.concat([df1, df2, df3])

AssertionError: Number of manager items must equal union of block items # manager items: 4, # tot_items: 5

You can find duplicated columns names:

print (df3.columns[df3.columns.duplicated(keep=False)])
Index(['column1', 'column1'], dtype='object')

Possible solutions is set columns names by list:

df3.columns = ['column1','column2','column3']
print (df3)
  column1 column2 column3
0       m       n       o
1       p       q       r

Or remove duplicated columns with dupe names:

df31 = df3.loc[:, ~df3.columns.duplicated()]
print (df31)
  column2 column1
0       m       n
1       p       q

Then concat or append should working nice.

Sign up to request clarification or add additional context in comments.

1 Comment

That was it actually. Thanks a lot!! The thing is I have so many columns and I've renamed some for better readability and due to a copy/paste I forgot to rename one column so I had two different columns with the same name
2

given

df1
column1   column2   column3
a         b         c
d         e         f


df2
column1   column2   column3
g         h         i
j         k         l

You can specify a suffix when using the df.join() method.

df1.join(df2, lsuffix="_first", rsuffix=("_second"))

Which will result in a single data frame of

df1
column1_first   column2_first   column3_first   column1_second   column2_second   columnd 2_second
a               b               c               g                h                i
d               e               f               j                k                l

Comments

2

Try without providing axis example:

import pandas as pd
mydict1 = {'column1' : ['a','d'],
          'column2' : ['b','e'],
          'column3' : ['c','f']}
mydict2 = {'column1' : ['g','j'],
          'column2' : ['h','k'],
          'column3' : ['i','i']}
mydict3= {"column1":['m','p'],
          "column2":['n','q'],
          "column3":['o','r']}
df1=pd.DataFrame(mydict1)
df2=pd.DataFrame(mydict2)
df3=pd.DataFrame(mydict3)

pd.concat([df1,df2,df3],ignore_index=False)

Output

     column1    column2    column3
0      a           b         c
1      d           e         f
0      g           h         i
1      j           k         i
0      m           n         o
1      p           q         r

1 Comment

ignore_index should be set to false to achieve the output you described
0

You can remove axis=1 in your code

import pandas as pd
a = {"column1":['a','d'],
     "column2":['b','e'],
     "column3":['c','f']}
b = {"column1":['g','j'],
     "column2":['h','k'],
     "column3":['i','l']}

c = {"column1":['m','p'],
      "column2":['n','q'],
      "column3":['o','r']}


df1 = pd.DataFrame(a)
df2 = pd.DataFrame(b)
df3 = pd.DataFrame(c)

df_final = pd.concat([df1, df2, df3]) #.reset_index()
print(df_final)

#output
    column1 column2 column3
0       a       b       c
1       d       e       f
0       g       h       i
1       j       k       l
0       m       n       o
1       p       q       r

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.