45

I have multiple (more than 100) dataframes. How can I concat all of them?

The problem is, that I have too many dataframes, that I can not write them manually in a list, like this:

>>> cluster_1 = pd.DataFrame([['a', 1], ['b', 2]],
...                    columns=['letter  ', 'number'])


>>> cluster_1
  letter  number
0      a       1
1      b       2


>>> cluster_2 = pd.DataFrame([['c', 3], ['d', 4]],
...                    columns=['letter', 'number'])


>>> cluster_2
  letter  number
0      c       3
1      d       4


>>> pd.concat([cluster_1, cluster_2])
  letter number
0      a       1
1      b       2
0      c       3
1      d       4

The names of my N dataframes are cluster_1, cluster_2, cluster_3,..., cluster_N. The number N can be very high.

How can I concat N dataframes?

4
  • I can not write them manually in a list. The solution to this has nothing to do with concat. You need to fix your upstream process to produce a list rather than 100s of variables. Commented Dec 21, 2018 at 0:38
  • I don't see / understand how the answer that was found in an other post, can help me with my questions. I can see how it works for some small number of dataframes, but not for many dataframes, like 100 and more. Commented Dec 21, 2018 at 0:41
  • 3
    I've added a second duplicate to help you. You need to restructure your logic to NOT create a variable number of variables. A dict or list would work fine with pd.concat. Commented Dec 21, 2018 at 0:43
  • @jpp I totally agree. I was trying to do this the last 2 days but I failed. Commented Dec 21, 2018 at 0:43

3 Answers 3

96

I think you can just put it into a list, and then concat the list. In Pandas, the chunk function kind of already does this. I personally do this when using the chunk function in pandas.

pdList = [df1, df2, ...]  # List of your dataframes
new_df = pd.concat(pdList)

To create the pdList automatically assuming your dfs always start with "cluster".

pdList = []
pdList.extend(value for name, value in locals().items() if name.startswith('cluster_'))
Sign up to request clarification or add additional context in comments.

5 Comments

How can I avoid writing the list pdList manually? It is getting too long assuming more than 100 dataframes. This is my key problem
Hi PParker, i updated the answer for you to create the pdList.
Thank you very much. This is a nice solution and it works. For other people who want to try it, you should consider that you first initialise the pdList with pdList=[]. Additionally make sure, that you don't have other dataframes which start with "cluster_" and which have different dimensions that you don't want to consider.
@RuiNian How to concatenate if my list has dataframe names as string type ie, if my pdList=['df1','df2','df3',.....]? in this case new_df=pd.concat(pdList) throws error..
I don't think you can concatenate it that way because the dataframes are objects in memory whereas the strings that represent the data frame names.. are just strings. Python cannot recognize that they are df names. To overcome this, all you would need to do is remove to quotations in your list. That way, your strings become the actual dataframes themselves.
12

Generally it goes like:

frames = [df1, df2, df3]
result = pd.concat(frames)

Note: It will reset the index automatically. Read more details on different types of merging here.

For a large number of data frames: If you have hundreds of data frames, depending one if you have in on disk or in memory you can still create a list ("frames" in the code snippet) using a for a loop. If you have it in the disk, it can be easily done just saving all the df's in one single folder then reading all the files from that folder.

If you are generating the df's in memory, maybe try saving it in .pkl first.

2 Comments

Can you be more specific on that please? So you suggest me to export all dataframes and then read them in a list by using a loop?
How do you have the data frames saved right now? where are they saved? Or are they being generated in memory by your code?
4

Use:

pd.concat(your list of column names)

And if want regular index:

pd.concat(your list of column names,ignore_index=True)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.