Python - Create a new column based on multiple dates

Question

I have a very large dataframe.
I wanna create a new column 'result' based on other columns 'userid' and 'date'.
The userid have two or more records.

import pandas as pd
import numpy as np

userid = ['1','1','22','48','48','48','393','393','555','555'] 
date = ['11/01/2016','11/02/2016','11/05/2016','11/08/2016','12/02/2016','02/12/2017','02/22/2017','02/28/2017','12/15/2016','02/28/2017'] 
df1 = pd.DataFrame({"userid": userid, "date": date})

userid  date
  1   11/01/2016
  1   11/02/2016
 22   11/05/2016
 48   11/08/2016
 48   12/02/2016
 48   02/12/2017
393   02/22/2017
393   02/28/2017
555   12/15/2016
555   02/28/2017

There are two types of values in this new column 'result'.
'1': If the userid appears before 02/01/2017, and on or after 02/01/2017 (both conditions should be satisfied), the value return is '1'.
'0': If the above conditions aren't met, this row should be assigned to '0'.

Example 1: userid 48 appears twice before 02/01/2017 and appears once after 02/01/2017. Hence, the value in result column of userid 48 should be '1' because both conditions are satisfied.
Example 2: userid 393 appears twice in our data but its date is after 02/01/2017 in both records. Hence, the value in result column of userid 393 should be '0'.

In this case, my output data frame will be:

userid     date   result
  1    11/01/2016   0
  1    11/02/2016   0
 22    11/05/2016   0
 48    11/08/2016   1
 48    12/02/2016   1
 48    02/12/2017   1
393    02/22/2017   0
393    02/28/2017   0
555    12/15/2016   1
555    02/28/2017   1

I haven't got any idea the best way to achieve this.
Can anyone help? Thanks in advance!

Ian · Accepted Answer · 2020-02-10 03:22:03Z

This should do the trick

import pandas as pd
import numpy as np
import datetime

userid = ['1','1','22','48','48','48','393','393','555','555'] 
date = ['11/01/2016','11/02/2016','11/05/2016','11/08/2016','12/02/2016','02/12/2017','02/22/2017','02/28/2017','12/15/2016','02/28/2017'] 
df1 = pd.DataFrame({"userid": userid, "date": date})

# convert date type to datetime
df1['date'] = pd.to_datetime(df1['date'])

# define threshold date
dt = datetime.datetime(2017, 2, 1)

# logic
fn = lambda _: 1 if _.min()<dt and _.max()>=dt else 0
res = df1.groupby('userid')['date'].agg(fn).reset_index()
res.rename({'date':'result'}, axis=1, inplace=True)
df1.merge(res)

Output

userid     date   result
  1    11/01/2016   0
  1    11/02/2016   0
 22    11/05/2016   0
 48    11/08/2016   1
 48    12/02/2016   1
 48    02/12/2017   1
393    02/22/2017   0
393    02/28/2017   0
555    12/15/2016   1
555    02/28/2017   1

Collectives™ on Stack Overflow

Python - Create a new column based on multiple dates

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related