how to use if statements to read from two columns in Python?

Question

I have a file called students.csv. There are a couple of columns. I want to use if statements for two columns, the gender and the scores. I want to display the male students who have got the highest scores (in descending order). So, I need to write a script that can read from the csv file and combine these two columns (gender and scores).

I tried to use:

import pandas as pd

data = pd.read_csv('students.csv')

print(data[data["Gender"] == 1])

Here I gave male students = 1, and female = 0. But, I don't know how to print the male students who have got the highest scores.

You need to sort the filtered df or just get the max of that column data.loc[data['Gener']==1, 'Scores'].max() — EdChum
– EdChum, Commented Mar 16, 2016 at 10:15

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

You can use loc for selecting Gender and nlargest, with parameter n, if you need more as one values:

n : int

Return this many descending sorted values

print data
   Scores  Gender
0      10       0
1       5       1
2       5       0
3       7       1
4       8       1
5       3       0

print data.loc[data['Gender']==1, 'Scores'].nlargest(n=3)
4    8
3    7
1    5
Name: Scores, dtype: int64

If you need only the highest score, use max, as mentioned Edchum in comment:

print data.loc[data['Gender']==1, 'Scores'].max()
8

Or use groupby by Gender with nlargest for all Gender:

print data.groupby('Gender')['Scores'].nlargest(n=2)
Gender   
0       0    10
        2     5
1       4     8
        3     7
dtype: int64

If you need names, you can use merge by both indexes:

print data
  Names  Scores  Gender
0     a      10       0
1     b       5       1
2     c       5       0
3     d       7       1
4     e       8       1
5     f       3       0

print data.groupby('Gender')['Scores'].nlargest(n=2).reset_index(level=0,name='Max')
   Gender  Max
0       0   10
2       0    5
4       1    8
3       1    7

df =pd.merge(data[['Names']], 
             data.groupby('Gender')['Scores'].nlargest(n=2).reset_index(level=0, name='Max'),
             left_index=True, 
             right_index=True)
               
  Names  Gender  Max
0     a       0   10
2     c       0    5
4     e       1    8
3     d       1    7

If you need only one Gender, use concat:

print data
  Names  Scores  Gender
0     a      10       0
1     b       5       1
2     c       5       0
3     d       7       1
4     e       8       1
5     f       3       0

print data.loc[data['Gender']==1, 'Scores'].nlargest(n=2)
4    8
3    7
Name: Scores, dtype: int64

print pd.concat([data['Names'], 
                 data.loc[data['Gender']==1, 'Scores'].nlargest(n=2)], 
                 axis=1, 
                 join='inner')
                 
  Names  Scores
4     e       8
3     d       7

Or simplier solution is use loc again:

print data
  Names  Scores  Gender
0     a      10       0
1     b       5       1
2     c       5       0
3     d       7       1
4     e       8       1
5     f       3       0

print data.loc[data['Gender'] == 1, 'Scores'].nlargest(n=2).index
Int64Index([4, 3], dtype='int64')

print data.loc[data.loc[data['Gender'] == 1,'Scores'].nlargest(n=2).index,['Names','Scores']]
  Names  Scores
4     e       8
3     d       7

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Mar 16, 2016 at 10:26

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

S. Slade Over a year ago

Thank you guys. I think nlargest is more appropriate for my work. Just one more thing, can I add the names of the students together with the results?. I mean, I want to display the names of the students who have the highest scores as well.

S. Slade Over a year ago

Thanks jezrael - That's great. Since I only need to display the male students, I tried: df =pd.merge(data[['Names']], data.groupby['Gender']==1 and I completed the script. But I does not give me the male students!!

dl.meteo · Accepted Answer · 2016-03-16 10:30:36Z

1

You can also use

pandas.DataFrame.sort_values(by='Scores')

answered Mar 16, 2016 at 10:30

dl.meteo

1,80618 silver badges28 bronze badges

Comments

Gil Baggio · Accepted Answer · 2016-03-16 13:48:08Z

1

You could use the .max() command in pandas

import pandas as pd

df = pd.read_csv("student.csv")

data = df[df["Gender"]==1].max()

print data

Output:

stud       daniel
Gender     1
marks     78
dtype: object

edited Mar 16, 2016 at 13:48

answered Mar 16, 2016 at 10:33

Gil Baggio

14.1k3 gold badges51 silver badges37 bronze badges

Collectives™ on Stack Overflow

how to use if statements to read from two columns in Python?

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related