Pandas creating a new variable based on two existing variables

Question

I have the following code I think is highly inefficient. Is there a better way to do this type common recoding in pandas?

df['F'] = 0
df['F'][(df['B'] >=3) & (df['C'] >=4.35)] = 1
df['F'][(df['B'] >=3) & (df['C'] < 4.35)] = 2
df['F'][(df['B'] < 3) & (df['C'] >=4.35)] = 3
df['F'][(df['B'] < 3) & (df['C'] < 4.35)] = 4

jezrael · Accepted Answer · 2018-06-14 06:49:20Z

11

Use numpy.select and cache boolean masks to variables for better performance:

m1 = df['B'] >= 3
m2 = df['C'] >= 4.35
m3 = df['C'] < 4.35
m4 = df['B'] < 3

df['F'] = np.select([m1 & m2, m1 & m3, m4 & m2, m4 & m3], [1,2,3,4], default=0)

edited Jun 14, 2018 at 6:49

answered Jun 14, 2018 at 6:47

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Joe Over a year ago

Good one! I like it

Rob · Accepted Answer · 2018-06-14 06:56:21Z

3

In your specific case, you can make use of the fact that booleans are actually integers (False == 0, True == 1) and use simple arithmetic:

df['F'] = 1 + (df['C'] < 4.35) + 2 * (df['B'] < 3)

Note that this will ignore any NaN's in your B and C columns, these will be assigned as being above your limit.

answered Jun 14, 2018 at 6:56

Rob

3,5231 gold badge21 silver badges31 bronze badges

2 Comments

RJL Over a year ago

clever. thanks for the solution. I am looking for a generic solution because we do this type of data processing all the time. sometimes it may not be mathematically aligned as 1, 2, 3, 4.

Rob Over a year ago

This answer is in some sense more general, because it is easier to add more columns (by using 4 *, 8 *, etc...) without having to write out all combinations of masks (which grows exponentially).

Collectives™ on Stack Overflow

Pandas creating a new variable based on two existing variables

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related