0

I'm not sure if "explode" is the right term here. I have a DataFrame with columns event_id, num_steps, and avg_time.

I'd like to end up with a DataFrame where, for each event_id, there are num_steps rows with each row counting from 0 to num_steps.

event_id, num_steps, avg_time

1, 3, 5

Should become:

event_id, num_steps, avg_time

1, 0, 5

1, 1, 5

1, 2, 5

Currently I'm iterating over the dataframe and creating this manually, but I'm wondering if there is any way to do this directly within Pandas to increase processing time?

Thanks!

2 Answers 2

1

This assumes the index is unique:

# toy data
df = pd.DataFrame({'event_id':[1,2],
                  'num_steps':[3,2],
                  'avg_time':[5,4]})

(df.loc[df.index.repeat(df['num_steps'])]
   .assign(num_steps=lambda x: x.groupby(level=0).cumcount())
   .reset_index(drop=True)
)

Output:

   event_id  num_steps  avg_time
0         1          0         5
1         1          1         5
2         1          2         5
3         2          0         4
4         2          1         4
Sign up to request clarification or add additional context in comments.

Comments

1

When we talking about the explode we do explode , before that we need convert the max steps to list of steps .

Notice make sure your pandas version is later than 0.25.0

s=df.groupby(['event_id','avg_time']).num_steps.\
      apply(lambda x : list(range(1,x.iloc[0]+1))).\
        explode().reset_index()
Out[43]: 
   event_id  avg_time num_steps
0         1         5         1
1         1         5         2
2         1         5         3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.