How to replace Specific values of a particular column in Pandas Dataframe based on a certain condition?

Question

I have a Pandas dataframe which contains students and percentages of marks obtained by them. There are some students whose marks are shown as greater than 100%. Obviously these values are incorrect and I would like to replace all percentage values which are greater than 100% by NaN.

I have tried on some code but not quite able to get exactly what I would like to desire.

import numpy as np
import pandas as pd

new_DF = pd.DataFrame({'Student' : ['S1', 'S2', 'S3', 'S4', 'S5'],
                       'Percentages' : [85, 70, 101, 55, 120]})

#  Percentages  Student
#0          85       S1
#1          70       S2
#2         101       S3
#3          55       S4
#4         120       S5

new_DF[(new_DF.iloc[:, 0] > 100)] = np.NaN

#  Percentages  Student
#0        85.0       S1
#1        70.0       S2
#2         NaN      NaN
#3        55.0       S4
#4         NaN      NaN

As you can see the code kind of works but it actually replaces all the values in that particular row where Percentages is greater than 100 by NaN. I would only like to replace the value in Percentages column by NaN where its greater than 100. Is there any way to do that?

anky · Accepted Answer · 2019-03-23 19:04:04Z

3

Try and use np.where:

new_DF.Percentages=np.where(new_DF.Percentages.gt(100),np.nan,new_DF.Percentages)

or

new_DF.loc[new_DF.Percentages.gt(100),'Percentages']=np.nan

print(new_DF)

  Student  Percentages
0      S1         85.0
1      S2         70.0
2      S3          NaN
3      S4         55.0
4      S5          NaN

edited Mar 23, 2019 at 19:04

answered Mar 23, 2019 at 18:15

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

anky Over a year ago

@JohnE yes , also depends on the size of df i think? for larger dfs shouldnt np.where work faster? BDW uncommented now. :) Thanks

JohnE Over a year ago

Yeah, I think you are right. Generally np.where is very fast.

Loochie · Accepted Answer · 2019-03-23 19:14:15Z

2

Also,

df.Percentages = df.Percentages.apply(lambda x: np.nan if x>100 else x)

or,

df.Percentages = df.Percentages.where(df.Percentages<100, np.nan)

edited Mar 23, 2019 at 19:14

answered Mar 23, 2019 at 18:22

Loochie

2,47215 silver badges20 bronze badges

2 Comments

anky Over a year ago

This will work too. :) However avoid apply when you can , its slow.

Erfan Over a year ago

Agree with @anky_91, try to avoid .apply when its not needed.

heena bawa · Accepted Answer · 2019-03-23 18:25:57Z

1

You can use .loc:

new_DF.loc[new_DF['Percentages']>100, 'Percentages'] = np.NaN

Output:

  Student  Percentages
0      S1         85.0
1      S2         70.0
2      S3          NaN
3      S4         55.0
4      S5          NaN

answered Mar 23, 2019 at 18:25

heena bawa

8286 silver badges5 bronze badges

2 Comments

anky Over a year ago

this is already there in my solution(check commented part) not sure how is this any different

anky Over a year ago

Understood now :)

wafi · Accepted Answer · 2019-03-23 18:43:44Z

0

import numpy as np
import pandas as pd

new_DF = pd.DataFrame({'Student' : ['S1', 'S2', 'S3', 'S4', 'S5'],
                      'Percentages' : [85, 70, 101, 55, 120]})
#print(new_DF['Student'])
index=-1
for i in new_DF['Percentages']:
    index+=1
    if i > 100:
        new_DF['Percentages'][index] = "nan"




print(new_DF)

answered Mar 23, 2019 at 18:43

wafi

609 bronze badges

Collectives™ on Stack Overflow

How to replace Specific values of a particular column in Pandas Dataframe based on a certain condition?

4 Answers 4

2 Comments

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related