Reformat date inside string using pandas replace with regex

Question

I have a column of strings like below that contain date information, and I need to add leading zeros to single-digit months and days. I've run into some issues trying to do this purely with pandas.DataFrame.replace and regular expressions.

import pandas as pd
df = pd.DataFrame({'Key':['0123456789_1/2/2019','0123456789_11/23/2019','0145892367_10/2/2019','0145892367_4/13/2019']})

df
Out[323]: 
                     Key
0    0123456789_1/2/2019
1  0123456789_11/23/2019
2   0145892367_10/2/2019
3   0145892367_4/13/2019

For the above column, the output I'd want after reformatting would be:

                     Key
0  0123456789_01/02/2019
1  0123456789_11/23/2019
2  0145892367_10/02/2019
3  0145892367_04/13/2019

By now I've figured out I can do this by splitting the strings:

r = df['Key'].str.split('_|/', expand=True)
df2 = r[0] + '_' + r[1].str.zfill(2) + '/' + r[2].str.zfill(2) + '/' + r[3]

df2
Out[333]: 
0    0123456789_01/02/2019
1    0123456789_11/23/2019
2    0145892367_10/02/2019
3    0145892367_04/13/2019
dtype: object

...But when I was initially trying to do it with pandas.DataFrame.replace, the closest I was able to get was:

df2 = df.replace(r'(_|/)([1-9]/)',r'\1 0\2',regex=True)

df2
Out[335]: 
                      Key
0   0123456789_ 01/2/2019
1   0123456789_11/23/2019
2  0145892367_10/ 02/2019
3  0145892367_ 04/13/2019

There are two problems with this that I'd like to know more about:

In cases like row 0 where both the month and day are single-digit, it only finds the month. How can I get it to match both?
I don't want the spaces, but when I try to replace using r'\10\2', of course I get an error because it thinks I'm trying to substitute in group 10, and there is no such group in the first regex. If I try r'(\1)0\2', it works, except it prints the literal parenthesis. Why does it do this, and how can I properly write this so that it prints group 1 immediately followed by a literal zero?

Edit for clarification: I'm aware I could also fix it by parsing the dates, but I'm specifically interested in the regex solution, as a learning exercise. Also because a single replace is much faster for large dataframes.

anky · Accepted Answer · 2019-04-18 17:16:20Z

3

IIUC, you can use:

df.Key=df.Key.str.split("_").str[0]+"_"+pd.to_datetime(df.Key.str.split("_")
            .str[1]).dt.strftime('%m/%d/%Y')
print(df)

                     Key
0  0123456789_01/02/2019
1  0123456789_11/23/2019
2  0145892367_10/02/2019
3  0145892367_04/13/2019

answered Apr 18, 2019 at 17:16

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

LuminosityXVII Over a year ago

That does work, but I'm trying to understand how to get around the specific issues I encountered using regex. I'd like to be able to use the regex solution for other cases in the future that may not involve dates.

iamklaus · Accepted Answer · 2019-04-18 17:19:13Z

1

using datetime module

df['Key'] = df.Key.str.split('_').apply(lambda x: x[0]+'_'+datetime.strptime(x[1], "%m/%d/%Y").strftime("%m/%d/%Y"))

Output

                     Key
0  0123456789_01/02/2019
1  0123456789_11/23/2019
2  0145892367_10/02/2019
3  0145892367_04/13/2019

answered Apr 18, 2019 at 17:19

iamklaus

3,7682 gold badges14 silver badges21 bronze badges

2 Comments

LuminosityXVII Over a year ago

Thank you, but I'm trying to understand how to get around the specific issues I encountered using regex. I'd like to be able to use the regex solution for other cases in the future.

iamklaus Over a year ago

using datetime or pd.to_datetime like @anky_91 is better acc to my understanding, it covers all the cases since it understands dates but regex dosen't, it might fail in some

Collectives™ on Stack Overflow

Reformat date inside string using pandas replace with regex

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related