I have a dataframe that contains some dates in mixed format as follows:
import pandas as pd
dates = ['Dec-03',
'03/11/2003 - 05/04/2004',
'Apr-04',
'2004 - 2005',
'01/02/2005 - 31/03/2005']
df = pd.DataFrame(dates, columns = ["date_range"])
The dates can come in three formats as shown in the example above: two years; a single month; two dates together.
I wish to find an efficient and "pythonic" way to create "start date" and "end date" columns in the dataframe with the following result:
date_range start_dates end_dates
0 Dec-03 01/12/2003 31/12/2003
1 03/11/2003 - 05/04/2004 03/11/2003 05/04/2004
2 Apr-04 01/04/2004 30/04/2004
3 2004 - 2005 01/01/2004 31/12/2005
4 01/02/2005 - 31/03/2005 01/02/2005 31/03/2005
I have experimented with solutions involving df.iterrows and some if statements, but I was wondering if there is a more efficient method to solve this problem. The full dataset contains millions of rows so something that uses a vectorised function or similar would work well.