How to extract a certain paragraph from a file use regex in python?

Question

My question is to extract a certain paragraph (e.g., usually a middle paragraph) from a file through the regex in Python.

An example file is as follows:

poem = """The time will come
when, with elation,
you will greet yourself arriving
at your own door, in your own mirror,
and each will smile at the other's welcome,
and say, sit here. Eat.
You will love again the stranger who was your self.
Give wine. Give bread. Give back your heart
to itself, to the stranger who has loved you

all your life, whom you ignored
for another, who knows you by heart.
Take down the love letters from the bookshelf,

the photographs, the desperate notes,
peel your own image from the mirror.
Sit. Feast on your life."""

How to extract the second paragraph (which means "all you life ... the bookshelf,") of this poem use regex in python?

I am struggling with the pattern of the second paragraph right now. NEED HELP! — hoperose
– hoperose, Commented Oct 4, 2017 at 5:08
@BurhanKhalid Could you provide me with the specific code to capture anything that's between two \n\n? Thank you so much — hoperose
– hoperose, Commented Oct 4, 2017 at 5:09

Aaditya Ura · Accepted Answer · 2017-10-04 16:01:04Z

1

Use group capturing and try this out:

import re


pattern=r'^(all.*bookshelf[,\s])'

second=re.search(pattern,poem,re.MULTILINE | re.DOTALL)
print(second.group(0))

answered Oct 4, 2017 at 16:01

Aaditya Ura

12.8k7 gold badges60 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sweeper · Accepted Answer · 2017-10-04 05:45:10Z

0

Use a positive look-ahead and look-behind:

(?<=\n\n).+(?=\n\n)

The (?<=\n\n) at the start there is a look-behind. It only matches the things after it if there is \n\n behind it.

The last bit (?=\n\n) is a look-ahead, which only matches the thing before it if there are \n\n after it.

Try it out: https://regex101.com/r/7XnDjS/1

answered Oct 4, 2017 at 5:45

Sweeper

292k23 gold badges260 silver badges438 bronze badges

5 Comments

hoperose Over a year ago

Thank you for you help. I added your code like this: paragraph =re.match(r'(?<=\n\n).+(?=\n\n)', poem) print(paragraph). However, the result is "None" in the shell.

Sweeper Over a year ago

@hoperose You have to use search instead of match. Also, call group(0) on the return value to get the matched string.

hoperose Over a year ago

like this: paragraph = re.search(r'(?<=\n\n).+(?=\n\n)', poem) print(paragraph.group(0))?

hoperose Over a year ago

result=paragraph.group(0) AttributeError: 'NoneType' object has no attribute 'group'

Sweeper Over a year ago

It does work: repl.it/MD7v/0 One reason why this might not work might be that you are using Windows, where new lines are represented by \r\n, but I don't have a Windows PC so I'm not sure. Try replacing the \n\ns with \r\n\r\n. @hoperose

Ken Schumack · Accepted Answer · 2017-10-04 06:20:22Z

0

It may be important that some Windows text files end a line with \r\n instead of just \n. Python has excellent documentation on regular expressions. Just google "python regexp". You could even google "perl regexp" since Python copied regexp from Perl ;-) One way to get just the second paragraph text would be to use () to grab the text between two groups of two or more line endings like this:

myPattern = re.compile('[^\r\n]+\r?\n\r?\n+([^\r\n]+)\r?\n\r?\n.*')

and then use it like this:

secondPara = myPattern.sub("\\1", content)

Here's my script in action:

schumack@linux2 137> ./poem2.py
secondPara: all your life, whom you ignored for another, who knows you by heart. Take down the love letters from the bookshelf,

answered Oct 4, 2017 at 6:20

Ken Schumack

7194 silver badges11 bronze badges

1 Comment

hoperose Over a year ago

Thank you @ Ken Schumack. Nonetheless, the running results give back the whole content. I don't know why

Collectives™ on Stack Overflow

How to extract a certain paragraph from a file use regex in python?

3 Answers 3

Comments

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related