3

I wanted to calculate the percent of some object in one hour ('Time') so I have tried to write a lambda function, and I think it does the job, but index columns disappeared, columns that dataframe is grouped by.

df = df.groupby(['id', 'name', 'time', 'object', 'type'], as_index=True, sort=False)['col1', 'col2', 'col3', 'col4', 'col5'].apply(lambda x: x * 100 / 3600).reset_index()

After that code I print df.columns and got this:

Index([u'index', u'col1', col2', u'col3',
       u'col4', u'col5'],
      dtype='object')

If there is a need I am going to write some table with values for each column. Thanks in advance.

1
  • 6
    Please provide the data you are using or a sample of it that enables us to test it. Commented May 11, 2018 at 13:17

4 Answers 4

3

Moving the loop outward, will make the code run significantly faster:

for c in ['col1', 'col2', 'col3', 'col4', 'col5']:
    df[c] *= 100. / 3600

This is because the individual loops' calculations will be done in a vectorized way.

This also won't modify the index in any way.

Sign up to request clarification or add additional context in comments.

Comments

2

pd.DataFrame.groupby is used to aggregate data, not to apply a function to multiple columns.

For simple functions, you should look for a vectorised solution. For example:

# set up simple dataframe
df = pd.DataFrame({'id': [1, 2, 1], 'name': ['A', 'B', 'A'],
                   'col1': [5, 6, 8], 'col2': [9, 4, 5]})

# apply logic in a vectorised way on multiple columns
df[['col1', 'col2']] = df[['col1', 'col2']].values * 100 / 3600

If you wish to set your index as multiple columns, and are keen to use pd.DataFrame.apply, this is possible as two separate steps. For example:

df = df.set_index(['id', 'name'])
df[['col1', 'col2']] = df[['col1', 'col2']].apply(lambda x: x * 100 / 3600)

Comments

1

You apply .reset_index() which resets the index. Take a look at the pandas documentation and you'll see, that .reset_index() transfers the index to the columns.

Comments

1

Data from Jpp

df[['col1','col2']]*=100/3600
df
Out[110]: 
       col1      col2  id name
0  0.138889  0.250000   1    A
1  0.166667  0.111111   2    B
2  0.222222  0.138889   1    A

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.