11

I want to be able to append df1 df2, df3 into one df_All , but since each of the dataframe has different column. How could I do this in for loop ( I have others stuff that i have to do in the for loop ) ?

import pandas as pd
import numpy as np

df1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('B', [5, 6, 7]), ('A', [8, 9, 10])])
df3 = pd.DataFrame.from_items([('C', [5, 6, 7]), ('D', [8, 9, 10]), ('A',[1,2,3]), ('B',[4,5,7])])
list = ['df1','df2','df3']
df_All = pd.DataFrame()
for i in list:
   # doing something else as well --- 
    df_All = df_All.append(i)

enter image description here

I want my df_All to only have ( A & B ) only, is there a way to this in loop above ? something like append only this two columns ?

2 Answers 2

13

If I understand what you want then you need to select just columns 'A' and 'B' from df3 and then use pd.concat :

In [35]:

df1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('B', [5, 6, 7]), ('A', [8, 9, 10])])
df3 = pd.DataFrame.from_items([('C', [5, 6, 7]), ('D', [8, 9, 10]), ('A',[1,2,3]), ('B',[4,5,7])])
df_list = [df1,df2,df3[['A','B']]]
pd.concat(df_list, ignore_index=True)
Out[35]:
    A  B
0   1  4
1   2  5
2   3  6
3   8  5
4   9  6
5  10  7
6   1  4
7   2  5
8   3  7

Note that in your original code this is poor practice:

list = ['df1','df2','df3']

This shadows the built in type list plus even if it was actually a valid var name like df_list you've created a list of strings and not a list of dfs.

If you want to determine the common columns then you can determine this using the np.intersection method on the columns:

In [39]:

common_cols = df1.columns.intersection(df2.columns).intersection(df3.columns)
common_cols
Out[39]:
Index(['A', 'B'], dtype='object')
Sign up to request clarification or add additional context in comments.

3 Comments

I am trying to do this is in for loop since the actual code have varies df .. sometimes ( df1, df2 ) sometimes ( df1, df2, df3 ) and also other calculation that i have to do in the loop. Do you know if there a way to do this ?
You're going to have to flesh your question out significantly as it's unclear to me, there is no reason I see why you can't even after performing some operations on the dfs concatenate them all at the end
oh, sorry I was not clear.. so basically the reason that I have to have it in the loop ( list ) because sometimes if I run the code there will be 100 dataframes that need to be combined . sometimes will be 500 dataframes all together. so the number of dataframes is different each time I ran the code. so I can not pan out how many dataframe i need each time, it must come from the "list" - let me know if this makes sense...
5

You can also use set comprehension to join all common columns from an arbitrary list of DataFrames:

df_list = [df1, df2, df3]
common_cols = list(set.intersection(*(set(c) for c in df_list)))
df_new = pd.concat([df[common_cols] for df in df_list], ignore_index=True)
>>> df_new 
    A  B
0   1  4
1   2  5
2   3  6
3   8  5
4   9  6
5  10  7
6   1  4
7   2  5
8   3  7

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.