2

My data frame has a date column (that currently are strings). I am trying to fix a problem with the column.

df[:15]

    Date    Customer ID
0   01/25/2016  104064596300
1   02/28/2015  102077474472
2   11/17/2016  106430081724
3   02/24/2016  107770391692
4   10/05/2016  106523680888
5   02/24/2016  107057691592
6   11/24/2015  102472820188
7   10/12/2016  107195498128
8   01/05/2016  104796266660
9   09/30/2016  107812562924
10  10/13/2015  102809057000
11  11/21/2016  107379017712
12  11/08/2015  106642145040
13  02/26/2015  107862343816
14  10/16/2016  107383084928

My data is supposed to be within the date range of: Sept 2015 to Feb 2016.

Some of the data has their years mixed up (see row 2 above for example - its November 17, 2016!)

What I am trying to do is change the years for the observations with incorrect dates.

I have played around the replace() command in Pandas but cannot come to a command that works:

df.Date.str.replace(('^(09|10|11|12)\/\d\d\/2016$'), '2015')

0         01/25/2016
1         02/28/2015
2               2015
3         02/24/2016
4               2015
5         02/24/2016
6         11/24/2015
7               2015
8         01/05/2016
9               2015
10        10/13/2015
11              2015
12        11/08/2015
13        02/26/2015
14              2015
15        12/17/2015
16        01/05/2015
17        01/21/2015
18              2015
19              2015
20        02/06/2016
21        10/06/2015
22        02/18/2016

To be specific, I am simply trying to change the last 4 digits (the year) of each row depending on some conditions:

  1. If the month is within September to December (09 to 12) and has year 2016, change the year for this observation to 2015

  2. If the month is January or February (01 or 02) and has year 2015, change the year for this observation to 2016

The command I wrote above identifies the correct observations for scenario 1) but I am having trouble replacing the last 4 digits and inputting the results back into the original data frame.

One final note: You might be thinking why don't I simply change the column to a datetime type and then add or subtract a year based on my needs? If I attempt to do that, I will run into an error as some observations have a date of: 2/29/2015 -> you will run into an error as there was no Feb. 29 during 2015!

1 Answer 1

2

Do not treat date as strings. You can first transform the string format of date to timestamp, then slice.

import pandas ad pd
df.loc[:, 'Date'] = pd.DatetimeIndex(df['Date'], name='Date')
df = df.set_index('Date')
df['2015-09': '2016-02']

Update:

df.loc[:, 'year_month'] = df.Date.map(lambda s: int(s[-4:]+s[:3]))
df.query('201509<=year_month<=201602').drop('year_month', axis=1)

sorry, I misunderstood your question.

def transform(date_string):
    year = date_string[-4:]
    month = date_string[:2]
    day = date_string[3:5]
    if year== '2016' and month in ['09', '10', '11', '12']:
        return month + '/' + day + '/' + str(int(year)-1)
    elif year == '2015' and month in ['01', '02', '03']:
        return month + '/' + day + '/' + str(int(year)+1)
    else:
        return date_string

df.loc[:, 'Date'] = df.Date.map(transform)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks -- just some small edits are needed to your function for it to work: 1) the date_string should be [:2] rather than [:3] as it will capture the " / " in the string 2) you mis-spelt transform when defining the function (its transfrom)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.