2

I have a long list of data, that meaningful data being sandwiched between 0 values, here is how it looks like

0
0
1
0
0
2
3
1
0
0
0
0
1
0

The length of 0 and meaningful value sequence is variable. I want to extract the meaningful sequence, each of them into a row in a dataframe. For example, the above data can be extracted to this:

1
2   3   1
1

I used this code to 'slice' the meaningful data:

import pandas as pd
import numpy as np

raw = pd.read_csv('data.csv')

df = pd.DataFrame(index=np.arange(0, 10000),columns = ['DT01', 'DT02', 'DT03', 'DT04', 'DT05', 'DT06', 'DT07', 'DT08', 'DT02', 'DT09', 'DT10', 'DT11', 'DT12', 'DT13', 'DT14', 'DT15', 'DT16', 'DT17', 'DT18', 'DT19', 'DT20',])
a = 0
b = 0
n=0

for n in range(0,999999):
    if raw.iloc[n].values > 0:
        df.iloc[a,b] = raw.iloc[n].values
        a=a+1
        if raw [n+1] == 0:
            b=b+1
            a=0

but I keep getting KeyError: n, while n is the row after the first row has a value different than 0.

Where is the problem with me code? And is there any way to improve it, in term of speed and memory cost? Thank you very much

4 Answers 4

2

You can use:

df['Group'] = df['col'].eq(0).cumsum()
df = df.loc[ df['col'] != 0]

df = df.groupby('Group')['col'].apply(list)
print (df)

Group
2          [1]
4    [2, 3, 1]
8          [1]
Name: col, dtype: object

df = pd.DataFrame(df.groupby('Group')['col'].apply(list).values.tolist())
print (df)
   0    1    2
0  1  NaN  NaN
1  2  3.0  1.0
2  1  NaN  NaN
Sign up to request clarification or add additional context in comments.

5 Comments

I'm trying all the solution, I will let you know if I have any problem, thank a ton!
I got KeyError 'col' while using the first code, and Group error while using the second one, what I'm missing here?
My input dataframe has col as column name df = pd.DataFrame({'col' : [0,0,1,0,0,2,3,1,0,0,0,0,1,0]}). So need only change it.
If column name is 0 then df['col'] can be changed to df[0] or df['0'] - if 0 is int or string
oh right, I'm so silly, it works like a charm now, thank you!
2

Let's try this outputs a dataframe:

df.groupby(df[0].eq(0).cumsum().mask(df[0].eq(0)),as_index=False)[0]\
  .apply(lambda x: x.reset_index(drop=True)).unstack(1)

Output:

     0    1    2
0  1.0  NaN  NaN
1  2.0  3.0  1.0
2  1.0  NaN  NaN

Or a string:

df.groupby(df[0].eq(0).cumsum().mask(df[0].eq(0)),as_index=False)[0]\
  .apply(lambda x: ' '.join(x.astype(str)))

Output:

0        1
1    2 3 1
2        1
dtype: object

Or as a list:

df.groupby(df[0].eq(0).cumsum().mask(df[0].eq(0)),as_index=False)[0]\
  .apply(list)

Output:

0          [1]
1    [2, 3, 1]
2          [1]
dtype: object

1 Comment

I'm trying all the solution, I will let you know if I have any problem, thank a ton!
2

Try this , I break down the steps

df.LIST=df.LIST.replace({0:np.nan})
df['Group']=df.LIST.isnull().cumsum()
df=df.dropna()
df.groupby('Group').LIST.apply(list)
Out[384]: 
Group
2              [1]
4        [2, 3, 1]
8              [1]
Name: LIST, dtype: object

Data Input

df = pd.DataFrame({'LIST' : [0,0,1,0,0,2,3,1,0,0,0,0,1,0]})

1 Comment

I'm trying all the solutions, I will let you know if I have any problem, thank a ton!
1

Let's start with packing your original data into a pandas dataframe (in real life, you will probably use pd.read_csv() to generate this dataframe):

raw = pd.DataFrame({'0' : [0,0,1,0,0,2,3,1,0,0,0,0,1,0]})

The default index will help you locate zero spans:

s1 = raw.reset_index()
s1['index'] = np.where(s1['0'] != 0, np.nan, s1['index'])
s1['index'] = s1['index'].fillna(method='ffill').fillna(0).astype(int)
s1[s1['0'] != 0].groupby('index')['0'].apply(list).tolist()
#[[1], [2, 3, 1], [1]]

1 Comment

I'm trying all the solution, I will let you know if I have any problem, thank a ton!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.