2

I have a dataframe that looks like below. I want to build a data profile by getting the following counts.

1) count of unique student IDs(Number of students) My Answer works:

print(len(df['Student ID'].unique()))

2)count of unique student IDs where International=N (Number of Non international students)

My answer does not work: print(len(df1.loc[(df1['Student ID'].unique())['International Student'] == N]))

3)count of unique student IDs where International=N & ATAR is not null(number of non-international students who has an ATAR)

4) count of unique student IDs where ATAR is between 0-50

some other questions:

5) how can I create a new dataframe with only unique Student IDs with all other columns, dropping all rows per student ID after the first

answers to questions 2-5 would be much appreciated.

Student_ID            International       marks      ATAR

119                   N                    60         80
119                   N                    70         80
119                   N                    75         80
129                   Y                    78         75
129                   Y                    60         75 
155                   Y                    85         
155                   Y                    80          

1 Answer 1

1
df = pd.DataFrame({
'International': ['N', 'N', 'N', 'Y', 'Y', 'Y', 'Y'], 
'marks': [60, 70, 75, 78, 60, 85, 80], 
'Student_ID': [119, 119, 130, 140, 155, 155, 155], 
'ATAR': [80.0, 20.0, np.nan, 50.0, 15.0, np.nan, np.nan]
}).reindex_axis(['Student_ID','International','marks','ATAR'], axis=1)

print (df)
   Student_ID International  marks  ATAR
0         119             N     60  80.0
1         119             N     70  20.0
2         130             N     75   NaN
3         140             Y     78  50.0
4         155             Y     60  15.0
5         155             Y     85   NaN
6         155             Y     80   NaN

Need Series.nunique mainly with boolean indexing and loc for return one column (Series), last drop_duplicates for new df1:

print(df['Student_ID'].nunique())
4
print(df.loc[df['International'] == 'N', 'Student_ID'].nunique())
2
print(df.loc[(df['International'] == 'N') & (df['ATAR'].notnull()), 'Student_ID'].nunique())
1
print(df.loc[df['ATAR'].between(0,50), 'Student_ID'].nunique())
3

df1 = df.drop_duplicates('Student_ID')
print (df1)
   Student_ID International  marks  ATAR
0         119             N     60  80.0
2         130             N     75   NaN
3         140             Y     78  50.0
4         155             Y     60  15.0
Sign up to request clarification or add additional context in comments.

3 Comments

No problem, it is my university and shorcuts is BA STU ;)
:D for me it means BA(Business Analytics) student. thank u. u too
:D :D Ya, it is nice :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.