Pandas Python DataFrames: How to split dataframes

Question

I have a df

df = pd.DataFrame(np.random.randn(11,3))

           0         1         2
0   0.102645 -1.530977  0.408735
1   1.081442  0.615082 -1.457931
2   1.852951  0.360998  0.178162
3   0.726028  2.072609 -1.167996
4  -0.454453  1.310887 -0.969910
5  -0.098552 -0.718283  0.372660
6   0.334170 -0.347934 -0.626079
7  -1.034541 -0.496949 -0.287830
8   1.870277  0.508380 -2.466063
9   1.464942 -0.020060 -0.684136
10 -1.057930  0.295145  0.161727

How can I split this in a given number of subsections, lets say 2 for now.

Something like this

           0         1         2
0   0.102645 -1.530977  0.408735
1   1.081442  0.615082 -1.457931
2   1.852951  0.360998  0.178162
3   0.726028  2.072609 -1.167996
4  -0.454453  1.310887 -0.969910

           0         1         2
5  -0.098552 -0.718283  0.372660
6   0.334170 -0.347934 -0.626079
7  -1.034541 -0.496949 -0.287830
8   1.870277  0.508380 -2.466063
9   1.464942 -0.020060 -0.684136
10 -1.057930  0.295145  0.161727

Ideally I would like to use np.array_split(df, 2) but it throws an error as its not an array.

Is there a built in function to do this? I don't particularly want to use df.loc[a:b] because its difficult to calculate the start and end depending on the given number of sub-dataframes needed.

mtadd · Accepted Answer · 2014-11-06 18:28:55Z

1

Try the following. It should return an array of n sub-dataframes if concatenated would return the original dataframe in question.

import math

def split(df, n):
    size = math.ceil(len(df) / n)
    return [ df[i:i + size] for i in range(0, len(df), size) ]

answered Nov 6, 2014 at 18:28

mtadd

2,55515 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Boosted_d16 Over a year ago

Thanks for this but the only issue is the remainder. Split(df,2) for my df returns 3 sub dfs. Is there no way to use np.arry_split() some how as that handles remainders automatically.

mtadd Over a year ago

If you're using Python 2.x, try changing the line to calculate size to size = math.ceil(float(len(df)) / n)

Boosted_d16 Over a year ago

I have no idea what you have done but its working well, I'll run some more tests and let you know it goes but thanks!

mtadd Over a year ago

Python 2.x, / will default to integer division if the 2 operands are integers. In Python 3, it'll perform floating point division, which is required for the bucket size to be calculated properly. So, that's why explicitly converting the dataframe length to a floating point number fixed your problem.

Collectives™ on Stack Overflow

Pandas Python DataFrames: How to split dataframes

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related