How to write a conditional array operation on a Pandas DataFrame

Question

Suppose I have a DataFrame, in which one of the columns (we'll call it 'power') holds integer values from 1 to 10000. I would like to produce a numpy array which has, for each row, a value indicating whether the corresponding row of the DataFrame has a value in the 'power' column which is greater than 9000.

I could do something like this:

def categorize(frame):
    return np.array(frame['power']>9000)

This will give me a boolean array which can be tested against with True and False. However, suppose I want the contents of the array to be 1 and -1, rather than True and False. How can I accomplish this without having to iterate through each row in the frame?

For background, the application is preparing data for binary classification via machine learning with scikit-learn.

stackoverflow.com/questions/19913659/…

cheekybastard
– cheekybastard

2015-07-03 00:34:43 +00:00
Commented Jul 3, 2015 at 0:34 — cheekybastard
– cheekybastard, Commented Jul 3, 2015 at 0:34

Ami Tavory · Accepted Answer · 2015-07-01 17:35:33Z

2

You can use np.where for this type of stuff.

Consider the following:

import pandas as pd

df = pd.DataFrame({
    'a': range(20)})
df['even'] = df.a % 2 == 0

So now even is a boolean column. To create an array the way you like, you can use

np.where(df.even, 1, -1)

You can assign this back to the DataFrame, if you like:

df['foo'] = np.where(df.even, 1, -1)

See the pandas cookbook further for this sort of stuff.

answered Jul 1, 2015 at 17:35

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to write a conditional array operation on a Pandas DataFrame

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related