0

This is a subset of a data frame:

Index     duration 
1          4  months20mg 1X D
2          1  years10 1X D
3          2  weeks10 mg
4          8  years300 MG 1X D
5          20  days
6          10  months

The output should be like this:

Index     duration 
1          4  month
2          1  year
3          2  week
4          8  year
5          20  day
6          10  month

This is my code:

df.dosage_duration.replace(r'year[0-9a-zA-z]*' , 'year', regex=True)
df.dosage_duration.replace(r'day[0-9a-zA-z]*' , 'day', regex=True)
df.dosage_duration.replace(r'month[0-9a-zA-z]*' , 'month', regex=True)
df.dosage_duration.replace(r'week[0-9a-zA-z]*' , 'week', regex=True)

But it does not work. Any suggestion ?

1
  • df.duration.str.replace('((?<=year)|(?<=month)|(?<=week)|(?<=day)).*', '') Commented Jun 28, 2017 at 4:09

1 Answer 1

3

There are two problems.

The first is that your regular expression doesn't match all the parts you want it to match. Look at months20mg 1X D - there is a space in the part you want to replace. I think you could probably just use 'year.*' as your matches.

The second is that you are calling replace without storing the results. If you want to do the call the way you have, you should specify inplace=True.

You can also use a single call if you use a slightly extended regular expression. We can use \1 to refer to the first matching group for the regular expression. The groups are indicated by the parentheses:

df.dosage_duration.replace(r'(year|month|week|day).*' , r'\1', 
                           regex=True, inplace=True)
Sign up to request clarification or add additional context in comments.

5 Comments

thank you. Would you please explain how r'\1' works ?
Also, if it is guarrented that the day,week,month,week end with an s, I guess, you can simply do this df.dosage_duration.split('s')[0].
@Mary I've added to my answer.
@officialaimm, what is the meaning of [0] at the end of your code ?
Take first(0-indexed) half from the split i.e. the part left to the 's'. split() returns an array ['4 month', '20mg 1X D']. Now the 0th element of the array is what we are interested in.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.