5

This is my data frame

index     duration 
1           7 year   
2           2day
3           4 week
4           8 month

I need to separate numbers from time and put them in two new columns. The output is like this:

index     duration         number     time
1           7 year          7         year
2           2day            2         day
3           4 week          4        week
4           8 month         8         month

This is my code:

df ['numer'] = df.duration.replace(r'\d.*' , r'\d', regex=True, inplace = True)
df [ 'time']= df.duration.replace (r'\.w.+',r'\w.+', regex=True, inplace = True )

But it does not work. Any suggestion ?

I also need to create another column based on the values of time column. So the new dataset is like this:

 index     duration         number     time      time_days
    1           7 year          7         year       365
    2           2day            2         day         1
    3           4 week          4        week         7
    4           8 month         8         month       30

df['time_day']= df.time.replace(r'(year|month|week|day)', r'(365|30|7|1)', regex=True, inplace=True)

Any suggestion ?

1
  • what is your end goal? how are you going to use those parsed columns? Commented Jun 28, 2017 at 15:34

2 Answers 2

5

we can use Series.str.extract here:

In [67]: df[['number','time']] = df.duration.str.extract(r'(\d+)\s*(.*)', expand=True)

In [68]: df
Out[68]:
   index duration number    time
0      1   7 year      7    year
1      2     2day      2     day
2      3   4 week      4    week
3      4  8 month      8   month

RegEx explained - regex101.com is IMO one of the best online RegEx parser, tester and explainer

you may also want to convert number column to integer dtype:

In [69]: df['number'] = df['number'].astype(int)

In [70]: df.dtypes
Out[70]:
index        int64
duration    object
number       int32
time        object
dtype: object

UPDATE:

In [167]: df['time_day'] = df['time'].replace(['year','month','week','day'], [365, 30, 7, 1], regex=True)

In [168]: df
Out[168]:
   index duration number    time  time_day
0      1   7 year      7    year       365
1      2     2day      2     day         1
2      3   4 week      4    week         7
3      4  8 month      8   month        30
Sign up to request clarification or add additional context in comments.

2 Comments

would you please explain how this code df.duration.str.extract(r'(\d+)\s*?(.*)$', expand=True) works? I do not understand the ? and $ here .
@Mary, i've slightly optimized the RegEx and have added a link to the explained RegEx - please check
2

You can use str.extract with astype:

df = df['duration'].str.extract(r'(?P<number>\d+)\s*(?P<time>\w+)', expand=True)
#convert to int
df['number'] = df['number'].astype(int)
print (df)
   number   time
0       7   year
1       2    day
2       4   week
3       8  month

Extracting substrings.

For adding to original DataFrame:

df = df.join(df['duration'].str.extract(r'(?P<number>\d+)\s*(?P<time>\w+)', expand=True))
#convert to int
df['number'] = df['number'].astype(int)
print (df)
   index duration  number   time
0      1   7 year       7   year
1      2     2day       2    day
2      3   4 week       4   week
3      4  8 month       8  month

df[['number','time']] = df['duration'].str.extract(r'(\d+)\s*(\w+)', expand=True)
#convert to int
df['number'] = df['number'].astype(int)
print (df)
   index duration  number   time
0      1   7 year       7   year
1      2     2day       2    day
2      3   4 week       4   week
3      4  8 month       8  month

1 Comment

Sorry, now Iam on phone only. So I prefer map by dict. d = {'year':365, 'month':30, 'week':7, 'day':1} df['time_day'] = df['time'].map(d). It works perfect if only 4 possible values in column time, else get NaNs. replace is used if need change not all values, but only some. So here is better map. But if in columns are values a, b, c,d, e,f... and need change only a,c to b,d better solution is replace. Good luck!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.