i have a dataframe where within the raw text column certain text with Dates in different format is given. i am looking to extract this dates in separate column
sample Raw Text :
"Sales Assistant @ DFS Duration - June 2021 - 2023 Currently working in XYZ Within the role I am expected to achieve sales targets which I currently have no problems reaching. Job Role/Establishment - Plasterer @ XX Plasterer’s Duration - September 2016 - Nov 2016 Job Role/Establishment - Customer Advisor @ AA Duration - (2015 – 2016) Job Role/Establishment - Warehouse Operative @ xyz Duration - 03/2014 to 08/2015 In the xyz warehouse Job Role/Establishment - Airport Terminal Assistant @ port Duration - 01/2012 - 06/2013 Working at the airport . Job Role/Establishment - Apprentice Floorer @ YY Floors Duration - DEC 2010 – APRIL 2012 "
Expected Dataframe :
id Raw_text Dates
01 "sample_raw_text" June 2021 - 2023 , September 2016 - Nov 2016,(2015 – 2016),03/2014 to 08/2015 , 01/2012 - 06/2013, DEC 2010 – APRIL 2012
I have Tried below pattern :
def extract_dates(df, column):
# Define the regex pattern to match dates in different month formats
pattern = r'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)?[-,\s]*\d{1,2}[-,\s]*(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)?[-,\s]*\d{2,4}\s*[-–]\s*(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)?[-,\s]*\d{1,2}[-,\s]*(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)?[-,\s]*\d{2,4}'
# Extract the dates from the specified column
df['Dates'] = df[column].str.extract(pattern)
with above i am unable to fetch required output. please guide what am i missing