2

I have a Pandas data frame that has 51034 rows and 10 columns. I want to slice this data frame into 158 smaller data frames based on a list that contains the rows to slice.

How is it possible to slice a pandas data frame into smaller data frames?

For example, if I have a data frame with 10 rows and 4 columns:

      A    B    C    D
0     1    2    3    4
1     5    6    7    8
2     9    10   11   12
3     13   14   15   16
4     17   18   19   20
5     21   22   23   24
6     25   26   27   28
7     29   30   31   32
8     33   34   35   36
9     37   38   39   40

This example data frame will be sliced every 2 rows to create 5 new smaller data frames:

DataFrame1:

      A    B    C    D
0     1    2    3    4
1     5    6    7    8

DataFrame2:

      A    B    C    D
0     9    10   11   12
1     13   14   15   16

DataFrame3:

      A    B    C    D
0     17   18   19   20
1     21   22   23   24

DataFrame4:

      A    B    C    D
0     25   26   27   28
1     29   30   31   32

DataFrame5:

      A    B    C    D
0     33   34   35   36
1     37   38   39   40

I am not sure how to use the slice the larger data frame to create the smaller data frames.

Any suggestions on how to accomplish this goal?

Thank you.

Rodrigo

1 Answer 1

2

You can use groupby with a simple index-to-group mapping function, assuming that the index is consecutive and starts from 0:

for _, df_k in df.groupby(lambda x: x/2):
    print df_k.reset_index(drop=True)

Output:

   A  B  C  D
0  1  2  3  4
1  5  6  7  8
    A   B   C   D
0   9  10  11  12
1  13  14  15  16
    A   B   C   D
0  17  18  19  20
1  21  22  23  24
    A   B   C   D
0  25  26  27  28
1  29  30  31  32
    A   B   C   D
0  33  34  35  36
1  37  38  39  40

If you have a list of numbers indicating the slicing positions, you can pass in a dictionary as the group mapping:

import numpy as np
slice_at = [3, 5]
group_sizes = np.diff([0] + slice_at + [len(df)])
mapping = dict(zip(df.index, np.repeat(range(len(group_sizes)), group_sizes)))
for _, df_k in df.groupby(mapping):
    print df_k.reset_index(drop=True)

Output:

   A   B   C   D
0  1   2   3   4
1  5   6   7   8
2  9  10  11  12
    A   B   C   D
0  13  14  15  16
1  17  18  19  20
    A   B   C   D
0  21  22  23  24
1  25  26  27  28
2  29  30  31  32
3  33  34  35  36
4  37  38  39  40
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks YS-L. The one thing I haven't figured out is instead of printing the subset data frames, how can I create new data frame variables for each subset?
You can assign the subset DataFrames to some variables, or put them inside a dict or list. They are copies of part of the original DataFrame.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.