Split pandas dataframe by String

Question

I'm new to using Pandas dataframes. I have data in a .csv like this:

foo, 1234,
bar, 4567
stuff, 7894
New Entry,,
morestuff,1345

I'm reading it into the dataframe with

 df = pd.read_csv

But what I really want is a new dataframe (or a way of splitting the current one) every time I have a "New Entry" line (obviously without including it). How could this be done?

Zero · Accepted Answer · 2015-04-19 09:03:59Z

1

1) Doing it on the fly while reading the file line-by-line and checking for NewEntry break is one approach.

2) Other way, if the dataframe already exists is to find the NewEntry and slice the dataframe into multiple ones to dff = {}

df                                                                 
        col1  col2  
0        foo  1234    
1        bar  4567                
2      stuff  7894                                                        
3   NewEntry   NaN                       
4  morestuff  1345

Find the NewEntry rows, add [-1] and [len(df.index)] for boundary conditions

rows = [-1] + np.where(df['col1']=='NewEntry')[0].tolist() + [len(df.index)]
[-1, 3L, 5]

Create the dict of dataframes

dff = {}                                                                            
for i, r in enumerate(rows[:-1]):                                                   
    dff[i] = df[r+1: rows[i+1]]

Dict of dataframes {0: datafram1, 1: dataframe2}

dff                           
{0:     col1  col2            
 0    foo  1234               
 1    bar  4567               
 2  stuff  7894, 1:         col1  col2  
 4  morestuff  1345}

Dataframe 1

dff[0]              
    col1  col2      
0    foo  1234      
1    bar  4567      
2  stuff  7894

Dataframe 2

dff[1]              
        col1  col2  
4  morestuff  1345

answered Apr 19, 2015 at 9:03

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user2757902 Over a year ago

Nice answer. Quick question..how would I handle data that "NewEntry" to begin with? As in newEntry was a title of each section rather than a separator?

EdChum · Accepted Answer · 2015-04-19 08:59:25Z

So using your example data which I concatenated 3 times, after loading (I named the cols 'a','b','c' for convenience) we then find the indices where you have 'New Entry' and the produce a list of tuples of these positions stepwise to mark the beg, end range.

We can then iterate over this list of tuples and slice the orig df and append to list:

In [22]:

t="""foo,1234,
bar,4567
stuff,7894
New Entry,,
morestuff,1345"""
df = pd.read_csv(io.StringIO(t),header=None,names=['a','b','c'] )
df = pd.concat([df]*3, ignore_index=True)
df
Out[22]:
            a     b   c
0         foo  1234 NaN
1         bar  4567 NaN
2       stuff  7894 NaN
3   New Entry   NaN NaN
4   morestuff  1345 NaN
5         foo  1234 NaN
6         bar  4567 NaN
7       stuff  7894 NaN
8   New Entry   NaN NaN
9   morestuff  1345 NaN
10        foo  1234 NaN
11        bar  4567 NaN
12      stuff  7894 NaN
13  New Entry   NaN NaN
14  morestuff  1345 NaN
In [30]:

import itertools
idx = df[df['a'] == 'New Entry'].index
idx_list = [(0,idx[0])]
idx_list = idx_list + list(zip(idx, idx[1:]))
idx_list


Out[30]:
[(0, 3), (3, 8), (8, 13)]
In [31]:

df_list = []
for i in idx_list:  
    print(i)
    if i[0] == 0:
        df_list.append(df[i[0]:i[1]])
    else:
        df_list.append(df[i[0]+1:i[1]])
df_list
(0, 3)
(3, 8)
(8, 13)
Out[31]:
[       a     b   c
 0    foo  1234 NaN
 1    bar  4567 NaN
 2  stuff  7894 NaN,            a     b   c
 4  morestuff  1345 NaN
 5        foo  1234 NaN
 6        bar  4567 NaN
 7      stuff  7894 NaN,             a     b   c
 9   morestuff  1345 NaN
 10        foo  1234 NaN
 11        bar  4567 NaN
 12      stuff  7894 NaN]

Collectives™ on Stack Overflow

Split pandas dataframe by String

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related