Splitting a dataframe with python

Question

What I want to do is pretty simple, in other languages. I want to split a table, using a "for" loop to split a data frame every fifth row.

The idea is that I have dataframe that adds a new row, every so often, like answering a form with different questions and every answer is added to a specific column, like Google Forms with SpreadSheet.

What I have tried is the following:

import pandas as pd
dp=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
df1=pd.DataFrame(data=dp)
for i in range(0, len(dp)):
   if i%5==0:
      df = df1.iloc[i,:]
      print(df)          
print(df)

Which I know isn't much but nevertheless it is a try. Now, what I can't do is create a new variable with the new dataframe every time the loop reaches the i mod 5 == 0 row.

It's not clear what output your desired output is. Do you want lots of individual five-row dataframes stored in varables? Or are you trying to convert a flat list into a single dataframe with rows and columns? Maybe you're just trying to print to the screen five-rows at a time. — Joshua R.
– Joshua R., Commented Sep 26, 2018 at 2:15
I understand the confusion. What I want to do is the first one. I want to generate variables that store a dataframe every fifth row. For example: I want rows 0 through 4 stored in a variable named V1, 5 through 9 stored in V2 etc. Can this be done? — Jason Statham
– Jason Statham, Commented Sep 26, 2018 at 7:16
Thanks for the clarification, Jason. See my updated answer below. I create a list of dataframes, dfs. That way dfs[0] is the first data frame, dfs[1] is the second, etc... — Joshua R.
– Joshua R., Commented Sep 26, 2018 at 7:35

Joshua R. · Accepted Answer · 2018-09-26 07:41:33Z

I think you're trying to convert a flat list into rows and columns using a known number of fields.

I'd do something like this:

import numpy as np
import pandas as pd

numFields = 3   # this is five in your case
fieldNames = ['color', 'animal', 'amphibian'] # totally optional 

# this is your 'dp'
inputData = ['brown', 'dog','false','green', 'toad','true']

flatDataArray = np.asarray(inputData)

reshapedData = flatDataArray.reshape(-1, numFields)

df = pd.DataFrame(reshapedData, columns=fieldNames) # you only need 'columns' if you want to name fields

print(df)

which gives:

    color   animal  amphibian
0   brown   dog     false
1   green   toad    true

--UPDATE--

From your comment above, I see that you'd like an arbitrary number of dataframes- one for each five-row group. Why not create a list of dataframes (i.e. so you have dfs[0], dfs[1])?

# continuing with from where the previous code left off...

dfs = []

for group in reshapedData:
     dfs.append(pd.DataFrame(group))

for df in dfs:
    print(df)

which prints:

   0
0  brown
1    dog
2  false

   0
0  green
1   toad
2   true

Jean-François Corbett · Accepted Answer · 2018-11-19 07:46:08Z

1

`numpy.split`

lod = np.split(df1, np.arange(1, 16, 5))

print(*lod, sep='\n\n')

   0
0  0

   0
1  1
2  2
3  3
4  4
5  5

     0
6    6
7    7
8    8
9    9
10  10

     0
11  11
12  12
13  13
14  14
15  15

lod = np.split(df1, np.arange(0, 16, 5)[1:])

print(*lod, sep='\n\n')

   0
0  0
1  1
2  2
3  3
4  4

   0
5  5
6  6
7  7
8  8
9  9

     0
10  10
11  11
12  12
13  13
14  14

     0
15  15

edited Nov 19, 2018 at 7:46

Jean-François Corbett

38.7k30 gold badges145 silver badges192 bronze badges

answered Sep 25, 2018 at 21:46

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Collectives™ on Stack Overflow

Splitting a dataframe with python

2 Answers 2

Comments

`numpy.split`

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related