1

I have a Panda data frame (df) with many columns. For the sake of simplicity, I am posting three columns with dummy data here.

Timestamp    Source    Length
0            1              5
1            1              5
2            1              5
3            2              5
4            2              5
5            3              5
6            1              5
7            3              5
8            2              5
9            1              5

Using Panda functions, First I set timestamp as index of the df.

index = pd.DatetimeIndex(data[data.columns[1]]*10**9) # Convert timestamp
df = df.set_index(index) # Set Timestamp as index

Next I can use groupby and pd.TimeGrouper functions to group the data into 5 seconds bins and compute cumulative length for each bin as following:

df_length = data[data.columns[5]].groupby(pd.TimeGrouper('5S')).sum()

So the df_length dataframe should look like:

Timestamp     Length
0             25
5             25

Now the problem is: "I want to get the same bins of 5 seconds, but ant to compute the cumulative length for each source (1,2 and 3) in separate columns in the following format:

Timestamp    1     2     3
0            15    10    0
5            10    5     10

I think I can use df.groupby with some conditions to get it. But confused and tired now :(

Appreciate solution using panda functions only.

1
  • Your "dummy data" does not have 5 columns, so your df_length function will not work Commented Sep 25, 2017 at 10:32

1 Answer 1

1

You can add new column for groupby Source for MultiIndex DataFrame and then reshape by unstack last level of MultiIndex for columns:

print (df[df.columns[2]].groupby([pd.TimeGrouper('5S'), df['Source']]).sum())
Timestamp            Source
1970-01-01 00:00:00  1         15
                     2         10
1970-01-01 00:00:05  1         10
                     2          5
                     3         10
Name: Length, dtype: int64

df1 = df[df.columns[2]].groupby([pd.TimeGrouper('5S'), df['Source']])
                       .sum()
                       .unstack(fill_value=0)
print (df1)
Source                1   2   3
Timestamp                      
1970-01-01 00:00:00  15  10   0
1970-01-01 00:00:05  10   5  10
Sign up to request clarification or add additional context in comments.

2 Comments

I was going to suggest using a pivot table but this is much better. Nice work!
Thank you so much. It works! However would you please explain the code, how it works because I might face some other similar kind of grouping problems with my data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.