I have a column of strings like below that contain date information, and I need to add leading zeros to single-digit months and days. I've run into some issues trying to do this purely with pandas.DataFrame.replace and regular expressions.
import pandas as pd
df = pd.DataFrame({'Key':['0123456789_1/2/2019','0123456789_11/23/2019','0145892367_10/2/2019','0145892367_4/13/2019']})
df
Out[323]:
Key
0 0123456789_1/2/2019
1 0123456789_11/23/2019
2 0145892367_10/2/2019
3 0145892367_4/13/2019
For the above column, the output I'd want after reformatting would be:
Key
0 0123456789_01/02/2019
1 0123456789_11/23/2019
2 0145892367_10/02/2019
3 0145892367_04/13/2019
By now I've figured out I can do this by splitting the strings:
r = df['Key'].str.split('_|/', expand=True)
df2 = r[0] + '_' + r[1].str.zfill(2) + '/' + r[2].str.zfill(2) + '/' + r[3]
df2
Out[333]:
0 0123456789_01/02/2019
1 0123456789_11/23/2019
2 0145892367_10/02/2019
3 0145892367_04/13/2019
dtype: object
...But when I was initially trying to do it with pandas.DataFrame.replace, the closest I was able to get was:
df2 = df.replace(r'(_|/)([1-9]/)',r'\1 0\2',regex=True)
df2
Out[335]:
Key
0 0123456789_ 01/2/2019
1 0123456789_11/23/2019
2 0145892367_10/ 02/2019
3 0145892367_ 04/13/2019
There are two problems with this that I'd like to know more about:
- In cases like row 0 where both the month and day are single-digit, it only finds the month. How can I get it to match both?
- I don't want the spaces, but when I try to replace using
r'\10\2', of course I get an error because it thinks I'm trying to substitute in group 10, and there is no such group in the first regex. If I tryr'(\1)0\2', it works, except it prints the literal parenthesis. Why does it do this, and how can I properly write this so that it prints group 1 immediately followed by a literal zero?
Edit for clarification:
I'm aware I could also fix it by parsing the dates, but I'm specifically interested in the regex solution, as a learning exercise. Also because a single replace is much faster for large dataframes.