I have a data frame that looks like this:
ID DATE
ABC 2018-02-07
ABC 2018-02-10
ABC 2018-02-13
ABC 2018-02-22
ABC 2018-02-26
ABC 2018-02-28
ABC 2018-04-06
ABC 2018-04-06
ABC 2018-04-12
I am trying to add 3 additional columns : (1) Earliest Date for all records (2) Time between Date and Earliest Date (3) Return the nth occurence # for the record, return maximum of nth record for duplicate dates. I am expecting the following as an output:
PEL_ID TRANSACTIONDATEDIFF EARLIESTEXPOSURE TIMEDIFF NTH_FREQUENCY
ABC 2018-02-07 2018-02-07 0 1
ABC 2018-02-10 2018-02-07 3 2
ABC 2018-02-13 2018-02-07 6 3
ABC 2018-02-22 2018-02-07 15 4
ABC 2018-02-26 2018-02-07 19 5
ABC 2018-02-28 2018-02-07 21 6
ABC 2018-04-06 2018-02-07 58 8
ABC 2018-04-12 2018-02-07 64 9
This is my SQL Code:
SELECT
PEL_ID,TRANSACTIONDATEDIFF,EARLIESTEXPOSURE,TIME_DIFF,MAX(NTH_FREQUENCY)
FROM (
SELECT C.*,ROW_NUMBER() OVER(PARTITION BY PEL_ID ORDER BY PEL_ID) AS
NTH_FREQUENCY FROM
(SELECT A.PEL_ID,A.TRANSACTIONDATEDIFF,B.EARLIESTEXPOSURE,
(A.TRANSACTIONDATEDIFF-B.EARLIESTEXPOSURE) AS TIME_DIFF FROM
CAMP_31323_TODATE A JOIN (SELECT PEL_ID,MIN(TRANSACTIONDATEDIFF) AS
EARLIESTEXPOSURE FROM CAMP_31323_TODATE
GROUP BY PEL_ID) B ON A.PEL_ID=B.PEL_ID
ORDER BY A.PEL_ID) C
)
GROUP BY PEL_ID,TRANSACTIONDATEDIFF,EARLIESTEXPOSURE,TIME_DIFF
ORDER BY PEL_ID,TRANSACTIONDATEDIFF ASC;
Most of this code is working other than the nth_frequency and this is the output:
PEL_ID TRANSACTIONDATEDIFF EARLIESTEXPOSURE TIMEDIFF NTH_FREQUENCY
ABC 2018-02-07 2018-02-07 0 3
ABC 2018-02-10 2018-02-07 3 6
ABC 2018-02-13 2018-02-07 6 8
ABC 2018-02-22 2018-02-07 15 2
ABC 2018-02-26 2018-02-07 19 7
ABC 2018-02-28 2018-02-07 21 1
ABC 2018-04-06 2018-02-07 58 5
ABC 2018-04-12 2018-02-07 64 9
I am not sure why this is happening. Any help will be appreciated wholeheartedly. Thanks in advance.