This is a subset of a data frame:
Index duration
1 4 months20mg 1X D
2 1 years10 1X D
3 2 weeks10 mg
4 8 years300 MG 1X D
5 20 days
6 10 months
The output should be like this:
Index duration
1 4 month
2 1 year
3 2 week
4 8 year
5 20 day
6 10 month
This is my code:
df.dosage_duration.replace(r'year[0-9a-zA-z]*' , 'year', regex=True)
df.dosage_duration.replace(r'day[0-9a-zA-z]*' , 'day', regex=True)
df.dosage_duration.replace(r'month[0-9a-zA-z]*' , 'month', regex=True)
df.dosage_duration.replace(r'week[0-9a-zA-z]*' , 'week', regex=True)
But it does not work. Any suggestion ?
df.duration.str.replace('((?<=year)|(?<=month)|(?<=week)|(?<=day)).*', '')