0

I have a table which looks something like this:

Identified Software Version Date
0 Microsoft Office 2 2022-05-25
0 Microsoft Office 1 2022-03-21
0 Adobe Photoshop 2 2022-04-20
1 Adobe Photoshop 1 2021-04-04

The 'Identified' column is a column I have created using this code:

import pandas as pd
import datetime as dt

dfcheck = pd.read_csv('version-data.csv', encoding='utf8')
df = pd.DataFrame(dfcheck)

olderdata = dt.date.today() - pd.DateOffset(years=1)

df['Identified'] = (df['Date'] <= olderdata).astype(int)

In this I have marked everything older than one year. So now what I'm trying to do is create a new dataframe which shows all software packages which have been identified. Here is the output I am looking for:

Identified Software Version Date
0 Adobe Photoshop 2 2022-04-20
1 Adobe Photoshop 1 2021-04-04

How do I achieve this?

1 Answer 1

2

You can use groupby.filter:

out = df.groupby('Software').filter(lambda x: (x.Identified==1).any())

print (out)

   Identified          Software   Version        Date
2           0   Adobe Photoshop         2  2022-04-20
3           1   Adobe Photoshop         1  2021-04-04
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.