KeyError when Assigning value using For Loop by Pandas

Question

I have a long list of data, that meaningful data being sandwiched between 0 values, here is how it looks like

The length of 0 and meaningful value sequence is variable. I want to extract the meaningful sequence, each of them into a row in a dataframe. For example, the above data can be extracted to this:

1
2   3   1
1

I used this code to 'slice' the meaningful data:

import pandas as pd
import numpy as np

raw = pd.read_csv('data.csv')

df = pd.DataFrame(index=np.arange(0, 10000),columns = ['DT01', 'DT02', 'DT03', 'DT04', 'DT05', 'DT06', 'DT07', 'DT08', 'DT02', 'DT09', 'DT10', 'DT11', 'DT12', 'DT13', 'DT14', 'DT15', 'DT16', 'DT17', 'DT18', 'DT19', 'DT20',])
a = 0
b = 0
n=0

for n in range(0,999999):
    if raw.iloc[n].values > 0:
        df.iloc[a,b] = raw.iloc[n].values
        a=a+1
        if raw [n+1] == 0:
            b=b+1
            a=0

but I keep getting KeyError: n, while n is the row after the first row has a value different than 0.

Where is the problem with me code? And is there any way to improve it, in term of speed and memory cost? Thank you very much

jezrael · Accepted Answer · 2017-09-04 06:08:07Z

2

You can use:

df['Group'] = df['col'].eq(0).cumsum()
df = df.loc[ df['col'] != 0]

df = df.groupby('Group')['col'].apply(list)
print (df)

Group
2          [1]
4    [2, 3, 1]
8          [1]
Name: col, dtype: object

df = pd.DataFrame(df.groupby('Group')['col'].apply(list).values.tolist())
print (df)
   0    1    2
0  1  NaN  NaN
1  2  3.0  1.0
2  1  NaN  NaN

edited Sep 4, 2017 at 6:08

answered Sep 4, 2017 at 6:01

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Thanh Nguyen Over a year ago

I'm trying all the solution, I will let you know if I have any problem, thank a ton!

Thanh Nguyen Over a year ago

I got KeyError 'col' while using the first code, and Group error while using the second one, what I'm missing here?

jezrael Over a year ago

My input dataframe has col as column name df = pd.DataFrame({'col' : [0,0,1,0,0,2,3,1,0,0,0,0,1,0]}). So need only change it.

jezrael Over a year ago

If column name is 0 then df['col'] can be changed to df[0] or df['0'] - if 0 is int or string

Thanh Nguyen Over a year ago

oh right, I'm so silly, it works like a charm now, thank you!

Scott Boston · Accepted Answer · 2017-09-04 05:56:05Z

2

Let's try this outputs a dataframe:

df.groupby(df[0].eq(0).cumsum().mask(df[0].eq(0)),as_index=False)[0]\
  .apply(lambda x: x.reset_index(drop=True)).unstack(1)

Output:

     0    1    2
0  1.0  NaN  NaN
1  2.0  3.0  1.0
2  1.0  NaN  NaN

Or a string:

df.groupby(df[0].eq(0).cumsum().mask(df[0].eq(0)),as_index=False)[0]\
  .apply(lambda x: ' '.join(x.astype(str)))

Output:

0        1
1    2 3 1
2        1
dtype: object

Or as a list:

df.groupby(df[0].eq(0).cumsum().mask(df[0].eq(0)),as_index=False)[0]\
  .apply(list)

Output:

0          [1]
1    [2, 3, 1]
2          [1]
dtype: object

answered Sep 4, 2017 at 5:56

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

1 Comment

Thanh Nguyen Over a year ago

I'm trying all the solution, I will let you know if I have any problem, thank a ton!

BENY · Accepted Answer · 2017-09-04 05:58:08Z

2

Try this , I break down the steps

df.LIST=df.LIST.replace({0:np.nan})
df['Group']=df.LIST.isnull().cumsum()
df=df.dropna()
df.groupby('Group').LIST.apply(list)
Out[384]: 
Group
2              [1]
4        [2, 3, 1]
8              [1]
Name: LIST, dtype: object

Data Input

df = pd.DataFrame({'LIST' : [0,0,1,0,0,2,3,1,0,0,0,0,1,0]})

answered Sep 4, 2017 at 5:58

BENY

324k22 gold badges176 silver badges250 bronze badges

1 Comment

Thanh Nguyen Over a year ago

I'm trying all the solutions, I will let you know if I have any problem, thank a ton!

DYZ · Accepted Answer · 2017-09-04 05:59:38Z

1

Let's start with packing your original data into a pandas dataframe (in real life, you will probably use pd.read_csv() to generate this dataframe):

raw = pd.DataFrame({'0' : [0,0,1,0,0,2,3,1,0,0,0,0,1,0]})

The default index will help you locate zero spans:

s1 = raw.reset_index()
s1['index'] = np.where(s1['0'] != 0, np.nan, s1['index'])
s1['index'] = s1['index'].fillna(method='ffill').fillna(0).astype(int)
s1[s1['0'] != 0].groupby('index')['0'].apply(list).tolist()
#[[1], [2, 3, 1], [1]]

edited Sep 4, 2017 at 5:59

answered Sep 4, 2017 at 5:54

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

1 Comment

Thanh Nguyen Over a year ago

I'm trying all the solution, I will let you know if I have any problem, thank a ton!

Collectives™ on Stack Overflow

KeyError when Assigning value using For Loop by Pandas

4 Answers 4

5 Comments

1 Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related