1

Suppose I have a DataFrame, in which one of the columns (we'll call it 'power') holds integer values from 1 to 10000. I would like to produce a numpy array which has, for each row, a value indicating whether the corresponding row of the DataFrame has a value in the 'power' column which is greater than 9000.

I could do something like this:

def categorize(frame):
    return np.array(frame['power']>9000)

This will give me a boolean array which can be tested against with True and False. However, suppose I want the contents of the array to be 1 and -1, rather than True and False. How can I accomplish this without having to iterate through each row in the frame?

For background, the application is preparing data for binary classification via machine learning with scikit-learn.

1

1 Answer 1

2

You can use np.where for this type of stuff.

Consider the following:

import pandas as pd

df = pd.DataFrame({
    'a': range(20)})
df['even'] = df.a % 2 == 0

So now even is a boolean column. To create an array the way you like, you can use

np.where(df.even, 1, -1)

You can assign this back to the DataFrame, if you like:

df['foo'] = np.where(df.even, 1, -1)

See the pandas cookbook further for this sort of stuff.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.