0

Denote a string:

 string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

I want to extract the first three sentence, that is,

This is the first sentence.\nIt is the second one.\nNow, this is the third one.

Apparently, the following regular expression does not work:

re.search('(?<=This)(.*?)(?=\n)', string)

What is the correct expression for extracting text between This and the third \n?

Thanks.

3
  • 1
    I think you meant to have \nNow ... instead of \Now. I also think you could split on \n to make it simpler (and join back the first 3 elements of the split using \n) Commented Feb 27, 2019 at 6:44
  • Thank you, Jerry. But I want to know how to solve it using regex. Commented Feb 27, 2019 at 6:46
  • 1
    Do you have any reasons why you want to do that? Why you want to use a tool that you don't know how to use, especially when there are more efficient solutions to solve that particular problem? Commented Feb 27, 2019 at 6:50

4 Answers 4

1

You can use this regex for capturing three sentences starting with This text,

This(?:[^\n]*\n){3}

Demo

Edit:

Python code,

import re

s = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

m = re.search(r'This(?:[^\n]*\n){3}',s)
if (m):
 print(m.group())

Prints,

This is the first sentence.
It is the second one.
Now, this is the third one.
Sign up to request clarification or add additional context in comments.

6 Comments

How to extract text between This and the third \n?
I tried re.search('(?<=This)((?:[^\n]*\n){3})', string) but it returns None.
I also created a demo: repl.it/repls/OvercookedAdventurousDownload. Please check.
Thank you very much, Pushpesh. But how to include This in the result?
@Chan: For including This itself too in the match, change positive look behind to literal This and use this regex This(?:[^\n]*\n){3}. Check your modified demo
|
0

Jerry's right, regex isn't the right tool for the job and there are much easier and more efficient ways of tackling the problem;

this = 'This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

print('\n'.join(this.split('\n', 3)[:-1]))

OUTPUT:

This is the first sentence.

It is the second one.

Now, this is the third one.

If you just want to practice using regex, following a tutorial would be much easier.

Comments

0

(?s)(This.*?)(?=\nThis)

Make the . include newline with (?s), look for a sequence starting with This and followed by \nThis.

Don't forget that __repr__ of the search result doesn't print the whole matched string, so you'll need to

print(re.search('(?s)(This.*?)(?=\nThis)', string)[0])

3 Comments

My output for Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\Now, this is the third one.\nThis is not I want.\n: This is the first sentence.\nIt is the second one.\nNow, this is the third one. Is this not what you want?
I tried print(re.search('(?s)(This.*?)(?=\nThis)', string)[0]) and I got TypeError: 'NoneType' object is not subscriptable.
May be caused by the fact that you assign a string like this string = r'blah' and not like this string = 'blah' First one is for regex strings. You need a regular one.
0

Try the following:

import re

string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
extracted_text = re.search(r'This(.*?\n.*?\n.*?)\n', string).group(1)
print(extracted_text)

Giving you:

 is the first sentence.
It is the second one.
Now, this is the third one.

This assumes there was a missing n before Now. If you wish to keep This then you can move it inside the (

5 Comments

How to extract text between This and the third \n?
.match() will return a match object, and then .group() will give you the matched text.
Sorry for unclear question. Please see the modified question. I want the text between This and the third \n and This is the first word of the sentence.
I tried your suggestions but failed. Please see repl.it/@Chan2019/OvercookedAdventurousDownload
It wasn't clear if you actually wanted \n literally or as a newline. I have updated the solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.