I have a Panda data frame (df) with many columns. For the sake of simplicity, I am posting three columns with dummy data here.
Timestamp Source Length
0 1 5
1 1 5
2 1 5
3 2 5
4 2 5
5 3 5
6 1 5
7 3 5
8 2 5
9 1 5
Using Panda functions, First I set timestamp as index of the df.
index = pd.DatetimeIndex(data[data.columns[1]]*10**9) # Convert timestamp
df = df.set_index(index) # Set Timestamp as index
Next I can use groupby and pd.TimeGrouper functions to group the data into 5 seconds bins and compute cumulative length for each bin as following:
df_length = data[data.columns[5]].groupby(pd.TimeGrouper('5S')).sum()
So the df_length dataframe should look like:
Timestamp Length
0 25
5 25
Now the problem is: "I want to get the same bins of 5 seconds, but ant to compute the cumulative length for each source (1,2 and 3) in separate columns in the following format:
Timestamp 1 2 3
0 15 10 0
5 10 5 10
I think I can use df.groupby with some conditions to get it. But confused and tired now :(
Appreciate solution using panda functions only.
df_lengthfunction will not work