2

I have dataframe:

ID,"url","app_name","used_at","active_seconds","device_connection","device_os","device_type","device_usage"
1ca9bb884462c3ba2391bf669c22d4bd,"",VK Client,2016-01-01 00:00:13,5,3g,ios,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:01:45,107,wifi,android,smartphone,home
1ca9bb884462c3ba2391bf669c22d4bd,"",Twitter,2016-01-01 00:02:48,20,3g,ios,smartphone,home
1ca9bb884462c3ba2391bf669c22d4bd,"",VK Client,2016-01-01 00:03:08,796,3g,ios,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",WhatsApp Messenger,2016-01-01 00:03:32,70,wifi,android,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:04:42,27,wifi,android,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:05:30,5,wifi,android,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",WhatsApp Messenger,2016-01-01 00:05:36,47,wifi,android,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:06:23,20,wifi,android,smartphone,home
a703114aa8a03495c3e042647212fa63,"",Instagram,2016-01-01 00:06:41,118,3g,android,smartphone,home
1637ce5a4c4868e694004528c642d0ac,"",Camera,2016-01-01 00:06:43,16,wifi,android,smartphone,home
1637ce5a4c4868e694004528c642d0ac,"",VKontakte,2016-01-01 00:07:00,45,wifi,android,smartphone,home
a703114aa8a03495c3e042647212fa63,"",VKontakte,2016-01-01 00:08:40,99,3g,android,smartphone,home
1637ce5a4c4868e694004528c642d0ac,"",VKontakte,2016-01-01 00:10:05,1,wifi,android,smartphone,home

I need to count share of every app_name to every ID. But I can't do next: sum of every app to every id I should divide to sum of all app to id and next multiple 100. (to find percent) I do:

short = df.groupby(['ID', 'app_name']).agg({'app_name': len, 'active_seconds': sum}).rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'}).reset_index()

but it only returns quantity to every app, when I try

short = df.groupby(['ID', 'app_name']).agg({'app_name': len, 'active_seconds': sum / df.ID.app_name.sum() * 100}).rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'}).reset_index()

it returns an error

How can I fix that?

1
  • Can you show the expected output? Commented Sep 27, 2016 at 14:30

1 Answer 1

3

IIUC you need:

short = df.groupby(['ID', 'app_name'])
          .agg({'app_name': len, 
                'active_seconds': lambda x: 100 * x.sum() / df.active_seconds.sum()})
          .rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'})
          .reset_index()

print (short)

                                 ID            app_name  count_sec  sum_app
0  1637ce5a4c4868e694004528c642d0ac              Camera   1.162791        1
1  1637ce5a4c4868e694004528c642d0ac           VKontakte   3.343023        2
2  1ca9bb884462c3ba2391bf669c22d4bd             Twitter   1.453488        1
3  1ca9bb884462c3ba2391bf669c22d4bd           VK Client  58.212209        2
4  a703114aa8a03495c3e042647212fa63           Instagram   8.575581        1
5  a703114aa8a03495c3e042647212fa63           VKontakte   7.194767        1
6  b8f4df3f99ad786a77897c583d98f615           VKontakte  11.555233        4
7  b8f4df3f99ad786a77897c583d98f615  WhatsApp Messenger   8.502907        2

Another solution:

#you need another name of df, e.g. short1
short1 = df.groupby(['ID', 'app_name'])
           .agg({'app_name': len, 'active_seconds': sum})
           .rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'})
           .reset_index()
short1.count_sec = 100 * short1.count_sec / df.active_seconds.sum()
print (short1)
                                 ID            app_name  count_sec  sum_app
0  1637ce5a4c4868e694004528c642d0ac              Camera   1.162791        1
1  1637ce5a4c4868e694004528c642d0ac           VKontakte   3.343023        2
2  1ca9bb884462c3ba2391bf669c22d4bd             Twitter   1.453488        1
3  1ca9bb884462c3ba2391bf669c22d4bd           VK Client  58.212209        2
4  a703114aa8a03495c3e042647212fa63           Instagram   8.575581        1
5  a703114aa8a03495c3e042647212fa63           VKontakte   7.194767        1
6  b8f4df3f99ad786a77897c583d98f615           VKontakte  11.555233        4
7  b8f4df3f99ad786a77897c583d98f615  WhatsApp Messenger   8.502907        2
Sign up to request clarification or add additional context in comments.

5 Comments

My df is more large and it returns me all 0 in column count_sec . I try to multiply to 10000, but it's not change the situation
i think it returns me int. How to convert that to flioat?
use .astype(float)
where I should use it? 100 * x.sum() / df.active_seconds.sum().astype(float)
Yes, or try 100 * x.sum().astype(float) / df.active_seconds.sum()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.