Python Pandas Dataframe get count of rows after filtering using values from multiple columns

Question

I have a dataframe that looks like below. I want to build a data profile by getting the following counts.

1) count of unique student IDs(Number of students) My Answer works:

print(len(df['Student ID'].unique()))

2)count of unique student IDs where International=N (Number of Non international students)

My answer does not work: print(len(df1.loc[(df1['Student ID'].unique())['International Student'] == N]))

3)count of unique student IDs where International=N & ATAR is not null(number of non-international students who has an ATAR)

4) count of unique student IDs where ATAR is between 0-50

some other questions:

5) how can I create a new dataframe with only unique Student IDs with all other columns, dropping all rows per student ID after the first

answers to questions 2-5 would be much appreciated.

Student_ID            International       marks      ATAR

119                   N                    60         80
119                   N                    70         80
119                   N                    75         80
129                   Y                    78         75
129                   Y                    60         75 
155                   Y                    85         
155                   Y                    80

jezrael · Accepted Answer · 2017-04-20 06:42:36Z

1

df = pd.DataFrame({
'International': ['N', 'N', 'N', 'Y', 'Y', 'Y', 'Y'], 
'marks': [60, 70, 75, 78, 60, 85, 80], 
'Student_ID': [119, 119, 130, 140, 155, 155, 155], 
'ATAR': [80.0, 20.0, np.nan, 50.0, 15.0, np.nan, np.nan]
}).reindex_axis(['Student_ID','International','marks','ATAR'], axis=1)

print (df)
   Student_ID International  marks  ATAR
0         119             N     60  80.0
1         119             N     70  20.0
2         130             N     75   NaN
3         140             Y     78  50.0
4         155             Y     60  15.0
5         155             Y     85   NaN
6         155             Y     80   NaN

Need Series.nunique mainly with boolean indexing and loc for return one column (Series), last drop_duplicates for new df1:

print(df['Student_ID'].nunique())
4
print(df.loc[df['International'] == 'N', 'Student_ID'].nunique())
2
print(df.loc[(df['International'] == 'N') & (df['ATAR'].notnull()), 'Student_ID'].nunique())
1
print(df.loc[df['ATAR'].between(0,50), 'Student_ID'].nunique())
3

df1 = df.drop_duplicates('Student_ID')
print (df1)
   Student_ID International  marks  ATAR
0         119             N     60  80.0
2         130             N     75   NaN
3         140             Y     78  50.0
4         155             Y     60  15.0

edited Apr 20, 2017 at 6:42

answered Apr 20, 2017 at 6:37

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

jezrael Over a year ago

No problem, it is my university and shorcuts is BA STU ;)

BA stu Over a year ago

:D for me it means BA(Business Analytics) student. thank u. u too

jezrael Over a year ago

:D :D Ya, it is nice :)

Collectives™ on Stack Overflow

Python Pandas Dataframe get count of rows after filtering using values from multiple columns

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related