1

I have a dataframe containing sentences like the following but with more rows:

data= {"text":["see you in five minutes.", "she is my friend.", "she goes to school in five minutes."]}

I would like to extract the sentences containing 'five minutes' in the manner presented below:

desired output:

     first part              desired part     
0    see you in              five minutes.
1    NaN                     NaN
2    she goes to school in   five minutes.

I am using the following code but it returns NaN :

data.text.str.extract(r"(?i)(?P<before>.*)\s(?P<minutes>(?=five minutes\s))\w+ \w+")    

1 Answer 1

1

You require a whitespace when there's none:

(?i)(?P<before>.*)\s(?P<minutes>(?=five minutes\s))\w+ \w+
#                                              ^^^

Either use the star quantifier (zero or more time) or rethink your expression. The following works:

import pandas as pd

data= {"text":["see you in five minutes.", "she is my friend.", "she goes to school in five minutes."]}

df = pd.DataFrame(data)
df2 = df.text.str.extract(r"(?i)(?P<before>.*?)(?=five minutes)(?P<after>.*)")
print(df2)

And yields

                   before          after
0             see you in   five minutes.
1                     NaN            NaN
2  she goes to school in   five minutes.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.