I Have a data frame from csv file:
time m_srcaddr log_type m_type m_fwd_bytes m_rev_bytes
1 1441590784 172.19.139.165 closed 10 295 146
11 1441590785 172.19.139.174 closed 10 441 183
65 1441590792 172.19.139.166 closed 10 441 200
68 1441590792 172.19.139.166 closed 10 3423 461
73 1441590792 172.19.139.172 closed 10 441 379
76 1441590792 172.19.139.172 closed 10 3423 789
77 1441590792 172.19.139.166 closed 10 441 463
81 1441590792 172.19.139.166 closed 10 3423 963
82 1441590793 172.19.139.173 closed 10 295 168
85 1441590793 172.19.139.172 closed 10 4929 542
89 1441590793 172.19.139.166 closed 10 5135 799
93 1441590793 172.19.139.166 closed 10 4929 510
96 1441590794 172.19.139.166 closed 10 0 198
98 1441590794 172.19.139.167 closed 10 0 455
100 1441590794 172.19.139.166 closed 10 4945 495
I am trying to group by m_srcaddr and their sum of m_fwd_bytes and m_rev_bytes divide by 1000 in new columns called total_fwd_size and total_rev_size
subdata['total_fwd_size'] = subdata.groupby('m_srcaddr').sum().reset_index()['m_fwd_bytes']/1000
subdata['total_rev_size'] = subdata.groupby('m_srcaddr').sum().reset_index()['m_rev_bytes']/1000
This is not working as NaN is coming for new created columns. and is there any best way to do same thing?