Grouping column data in Pandas Dataframes

Question

I have a Panda data frame (df) with many columns. For the sake of simplicity, I am posting three columns with dummy data here.

Timestamp    Source    Length
0            1              5
1            1              5
2            1              5
3            2              5
4            2              5
5            3              5
6            1              5
7            3              5
8            2              5
9            1              5

Using Panda functions, First I set timestamp as index of the df.

index = pd.DatetimeIndex(data[data.columns[1]]*10**9) # Convert timestamp
df = df.set_index(index) # Set Timestamp as index

Next I can use groupby and pd.TimeGrouper functions to group the data into 5 seconds bins and compute cumulative length for each bin as following:

df_length = data[data.columns[5]].groupby(pd.TimeGrouper('5S')).sum()

So the df_length dataframe should look like:

Timestamp     Length
0             25
5             25

Now the problem is: "I want to get the same bins of 5 seconds, but ant to compute the cumulative length for each source (1,2 and 3) in separate columns in the following format:

Timestamp    1     2     3
0            15    10    0
5            10    5     10

I think I can use df.groupby with some conditions to get it. But confused and tired now :(

Appreciate solution using panda functions only.

Your "dummy data" does not have 5 columns, so your df_length function will not work — asongtoruin
– asongtoruin, Commented Sep 25, 2017 at 10:32

jezrael · Accepted Answer · 2017-09-25 10:43:26Z

1

You can add new column for groupby Source for MultiIndex DataFrame and then reshape by unstack last level of MultiIndex for columns:

print (df[df.columns[2]].groupby([pd.TimeGrouper('5S'), df['Source']]).sum())
Timestamp            Source
1970-01-01 00:00:00  1         15
                     2         10
1970-01-01 00:00:05  1         10
                     2          5
                     3         10
Name: Length, dtype: int64

df1 = df[df.columns[2]].groupby([pd.TimeGrouper('5S'), df['Source']])
                       .sum()
                       .unstack(fill_value=0)
print (df1)
Source                1   2   3
Timestamp                      
1970-01-01 00:00:00  15  10   0
1970-01-01 00:00:05  10   5  10

edited Sep 25, 2017 at 10:43

answered Sep 25, 2017 at 10:36

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

asongtoruin Over a year ago

I was going to suggest using a pivot table but this is much better. Nice work!

Muhammad Asif Khan Over a year ago

Thank you so much. It works! However would you please explain the code, how it works because I might face some other similar kind of grouping problems with my data.

Collectives™ on Stack Overflow

Grouping column data in Pandas Dataframes

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related