49

How to apply conditional logic to a Pandas DataFrame.

See DataFrame shown below,

   data desired_output
0     1          False
1     2          False
2     3           True
3     4           True

My original data is show in the 'data' column and the desired_output is shown next to it. If the number in 'data' is below 2.5, the desired_output is False.

I could apply a loop and do re-construct the DataFrame... but that would be 'un-pythonic'

3
  • maybe I don't know pandas, but it seems that you have two numbers in data -- which one are you checking against (seemingly the one on the right? What relevance is the number on the left?) Commented Feb 5, 2013 at 18:26
  • 4
    the number on the left is the index and the one on the right is the data Commented Feb 5, 2013 at 18:31
  • Does this answer your question? Pandas conditional creation of a series/dataframe column Commented Jan 25, 2020 at 19:14

5 Answers 5

78
In [1]: df
Out[1]:
   data
0     1
1     2
2     3
3     4

You want to apply a function that conditionally returns a value based on the selected dataframe column.

In [2]: df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
Out[2]:
0     true
1     true
2    false
3    false
Name: data

You can then assign that returned column to a new column in your dataframe:

In [3]: df['desired_output'] = df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')

In [4]: df
Out[4]:
   data desired_output
0     1           true
1     2           true
2     3          false
3     4          false
Sign up to request clarification or add additional context in comments.

2 Comments

Although this answer is more verbose and not as simple as the answer @Jasc gave, it is more general and can be applied to other situations in which one wants output other than true and false.
apply + lambda is not recommended for easily vectorisable operations. Use np.where or loc methods instead to utilize Pandas / NumPy vectorisation.
33

Just compare the column with that value:

In [9]: df = pandas.DataFrame([1,2,3,4], columns=["data"])

In [10]: df
Out[10]: 
   data
0     1
1     2
2     3
3     4

In [11]: df["desired"] = df["data"] > 2.5
In [11]: df
Out[12]: 
   data desired
0     1   False
1     2   False
2     3    True
3     4    True

Comments

18
In [34]: import pandas as pd

In [35]: import numpy as np

In [36]:  df = pd.DataFrame([1,2,3,4], columns=["data"])

In [37]: df
Out[37]: 
   data
0     1
1     2
2     3
3     4

In [38]: df["desired_output"] = np.where(df["data"] <2.5, "False", "True")

In [39]: df
Out[39]: 
   data desired_output
0     1          False
1     2          False
2     3           True
3     4           True

1 Comment

This is good, but the < seems unnecessarily confusing. If the condition is true, the first value results, if false the second value results. So it seems far more clear (and equivalent) to have the right side = np.where(df["data"] >= 2.5, "True", "False")
15

In this specific example, where the DataFrame is only one column, you can write this elegantly as:

df['desired_output'] = df.le(2.5)

le tests whether elements are less than or equal 2.5, similarly lt for less than, gt and ge.

1 Comment

OP wants to return False if df['data'] < 2.5. So you should use gt here.
0

You can also use eval here:

In [3]: df.eval('desired_output = data >= 2.5', inplace=True)

In [4]: df
Out[4]: 
   data  desired_output
0     1           False
1     2           False
2     3            True
3     4            True

Since inplace=True you don't need to assign it back to df.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.