Conditional Logic on Pandas DataFrame [duplicate]

Question

How to apply conditional logic to a Pandas DataFrame.

See DataFrame shown below,

   data desired_output
0     1          False
1     2          False
2     3           True
3     4           True

My original data is show in the 'data' column and the desired_output is shown next to it. If the number in 'data' is below 2.5, the desired_output is False.

I could apply a loop and do re-construct the DataFrame... but that would be 'un-pythonic'

maybe I don't know pandas, but it seems that you have two numbers in data -- which one are you checking against (seemingly the one on the right? What relevance is the number on the left?) — mgilson
– mgilson, Commented Feb 5, 2013 at 18:26
the number on the left is the index and the one on the right is the data — nitin
– nitin, Commented Feb 5, 2013 at 18:31
Does this answer your question? Pandas conditional creation of a series/dataframe column — AMC
– AMC, Commented Jan 25, 2020 at 19:14

Zelazny7 · Accepted Answer · 2013-02-05 18:35:28Z

78

In [1]: df
Out[1]:
   data
0     1
1     2
2     3
3     4

You want to apply a function that conditionally returns a value based on the selected dataframe column.

In [2]: df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
Out[2]:
0     true
1     true
2    false
3    false
Name: data

You can then assign that returned column to a new column in your dataframe:

In [3]: df['desired_output'] = df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')

In [4]: df
Out[4]:
   data desired_output
0     1           true
1     2           true
2     3          false
3     4          false

answered Feb 5, 2013 at 18:35

Zelazny7

40.7k18 gold badges72 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jacques Mathieu Over a year ago

Although this answer is more verbose and not as simple as the answer @Jasc gave, it is more general and can be applied to other situations in which one wants output other than true and false.

jpp Over a year ago

apply + lambda is not recommended for easily vectorisable operations. Use np.where or loc methods instead to utilize Pandas / NumPy vectorisation.

2 revs · Accepted Answer · 2013-09-30 15:45:39Z

33

Just compare the column with that value:

In [9]: df = pandas.DataFrame([1,2,3,4], columns=["data"])

In [10]: df
Out[10]: 
   data
0     1
1     2
2     3
3     4

In [11]: df["desired"] = df["data"] > 2.5
In [11]: df
Out[12]: 
   data desired
0     1   False
1     2   False
2     3    True
3     4    True

edited Sep 30, 2013 at 15:45

community wiki

2 revs
Jasc

Comments

Surya Chhetri · Accepted Answer · 2017-03-17 02:47:57Z

18

In [34]: import pandas as pd

In [35]: import numpy as np

In [36]:  df = pd.DataFrame([1,2,3,4], columns=["data"])

In [37]: df
Out[37]: 
   data
0     1
1     2
2     3
3     4

In [38]: df["desired_output"] = np.where(df["data"] <2.5, "False", "True")

In [39]: df
Out[39]: 
   data desired_output
0     1          False
1     2          False
2     3           True
3     4           True

answered Mar 17, 2017 at 2:47

Surya Chhetri

11.7k4 gold badges61 silver badges39 bronze badges

1 Comment

Wesley Kitlasten Over a year ago

This is good, but the < seems unnecessarily confusing. If the condition is true, the first value results, if false the second value results. So it seems far more clear (and equivalent) to have the right side = np.where(df["data"] >= 2.5, "True", "False")

Andy Hayden · Accepted Answer · 2013-02-05 21:58:25Z

15

In this specific example, where the DataFrame is only one column, you can write this elegantly as:

df['desired_output'] = df.le(2.5)

le tests whether elements are less than or equal 2.5, similarly lt for less than, gt and ge.

answered Feb 5, 2013 at 21:58

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

1 Comment

rachwa Over a year ago

OP wants to return False if df['data'] < 2.5. So you should use gt here.

rachwa · Accepted Answer · 2022-06-19 17:10:51Z

0

You can also use eval here:

In [3]: df.eval('desired_output = data >= 2.5', inplace=True)

In [4]: df
Out[4]: 
   data  desired_output
0     1           False
1     2           False
2     3            True
3     4            True

Since inplace=True you don't need to assign it back to df.

answered Jun 19, 2022 at 17:10

rachwa

2,3901 gold badge21 silver badges20 bronze badges

Collectives™ on Stack Overflow

Conditional Logic on Pandas DataFrame [duplicate]

5 Answers 5

2 Comments

Comments

1 Comment

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

1 Comment

1 Comment

Comments

Linked

Related