1

I have data from excel loaded into a pandas dataframe. I now want to select only those rows whose ASSESSMENT ID is the max ASSESSMENT ID per APPID and for all the UI SEQ NUMBERS for that APPID.

APPID   APPNAME ASSESSMENT ID   UI SEQ NUMBER   QUESTION    ANSWER TEXT .   
1   appname 2493    11  Question    No .   
1   appname 13808   11  Question    Ctry of domicile .   
1   appname 13808   11  Question    Name .   
1   appname 35316   11  Question    Ctry of domicile .       
1   appname 35316   11  Question    Name .   
1   appname 35316   11  Question    Nationality .       
1   appname 2493    12  Question    Corp name .   
1   appname 2493    12  Question    Cr Br Scr .   
1   appname 2493    12  Question    Inc And Assests .   
1   appname 2493    12  Question    Int, Ext Reg Reports .   
1   appname 13808   12  Question    Corp name .   
1   appname 35316   12  Question    Corp name .   
1   appname 2493    13  Question    No .   
1   appname 13808   13  Question    No .   
1   appname 35316   13  Question    No .   
1   appname 2493    14  Question    No .   
1   appname 13808   14  Question    firms Pos .   
1   appname 35316   14  Question    firms Pos .   

And the result would be

APPID   APPNAME ASSESSMENT ID   UI SEQ NUMBER   QUESTION    ANSWER TEXT .   
1   appname 35316   11  Question    Ctry of domicile .   
1   appname 35316   11  Question    Name .   
1   appname 35316   11  Question    Nationality .   
1   appname 35316   12  Question    Corp name .   
1   appname 35316   13  Question    No .   
1   appname 35316   14  Question    firms Pos .   
4
  • Please Don't post images of code (or links to them) Commented May 28, 2017 at 9:37
  • Apologies for posting the image, But there was no other way I could post the data from excel onto here without proper formatting Commented May 28, 2017 at 9:38
  • Hmmm, if copy paste and add 4 spaces before each row it doesnt work? Commented May 28, 2017 at 9:40
  • That worked. Thanks jezrael for that suggestion! Commented May 28, 2017 at 9:52

1 Answer 1

1

I think you need boolean indexing with mask created by apply:

df1 = df[df.groupby(['APPID', 'UI SEQ NUMBER'])['ASSESSMENT ID'].apply(lambda x:x==x.max())]
print (df1)
    APPID  APPNAME  ASSESSMENT ID  UI SEQ NUMBER  QUESTION       ANSWER TEXT.
3       1  appname          35316             11  Question  Ctry of domicile.
4       1  appname          35316             11  Question              Name.
5       1  appname          35316             11  Question       Nationality.
11      1  appname          35316             12  Question         Corp name.
14      1  appname          35316             13  Question                No.
17      1  appname          35316             14  Question         firms Pos.

Or if dont need all duplicated values use idxmax:

df1 = df.loc[df.groupby(['APPID', 'UI SEQ NUMBER'])['ASSESSMENT ID'].idxmax()]
print (df1)
    APPID  APPNAME  ASSESSMENT ID  UI SEQ NUMBER  QUESTION       ANSWER TEXT.
3       1  appname          35316             11  Question  Ctry of domicile.
11      1  appname          35316             12  Question         Corp name.
14      1  appname          35316             13  Question                No.
17      1  appname          35316             14  Question         firms Pos.
Sign up to request clarification or add additional context in comments.

2 Comments

Perfect jezrael. that solved it. I was doing the following -df[df.groupby(['APPID', 'UI SEQ NUMBERS'])['ASSESSMENT ID'].max()
Then better is use df1 = df.loc[df.groupby(['APPID', 'UI SEQ NUMBER'])['ASSESSMENT ID'].idxmax()]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.