extract strings between two strings in python using regular expression

Question

"<>THIS is the place to stay at when visiting the historical area of Seattle.

Your right on the water front near the ferry's and great sea food hotel.

The breakfast was great. <>"

Above is my sample text. I want to print the strings fall in between <> & <>. I want my output to be free of new line character \n, like this:

THIS is the place to stay at when visiting the historical area of Seattle. Your right on the water front near the ferry's and great sea food hotel.The breakfast was great.

I have tried the following piece of code:

import re
pattern = re.compile(r'\<>(.+?)\<>',re.DOTALL|re.MULTILINE)
text = """<>THIS is the place to stay at when visiting the historical area of Seattle.

Your right on the water front near the ferry's and great sea food hotel.

The breakfast was great.
<>"""
results = pattern.findall(text)
print results

But I am getting results like this :

["THIS is the place to stay at when visiting the historical area of Seattle.\n\nYour right on the water front near the ferry's and great sea food hotel.\n\nThe breakfast was great.\n"]

But I don't want any new line characters in my resulting string.

Just use .replace("\n", "") on each found match. See ideone.com/2i5Rl8 — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jun 14, 2016 at 10:31
Both answers ideas look similar, but the question is not clear in if either the list shall remain (then Wiktors answer is the best match) or there shall be one string at the end, then UpZone's answer solves that. In any case both answers work I guess ;-) — Dilettant
– Dilettant, Commented Jun 14, 2016 at 10:41
but i dont want any extra piece of code to slow down the processing.. Can i combine it with the pattern = re.compile(r'\<>(.+?)\<>',re.DOTALL|re.MULTILINE) — albin antony
– albin antony, Commented Jun 14, 2016 at 10:51

Wiktor Stribiżew · Accepted Answer · 2016-06-14 10:34:13Z

4

Use .replace("\n", "") on each found match (use comprehension) to replace any newline with an empty string.

See the demo:

results = [x.replace("\n", "") for x in pattern.findall(text)]
# => ["THIS is the place to stay at when visiting the historical area of Seattle.Your right on the water front near the ferry's and great sea food hotel.The breakfast was great."]

answered Jun 14, 2016 at 10:34

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

albin antony Over a year ago

but i dont want any extra piece of code to slow down the processing.. Can i combine it with the pattern = re.compile(r'\<>(.+?)\<>',re.DOTALL|re.MULTILINE)

Wiktor Stribiżew Over a year ago

@albinantony: no, you cannot match discontinuous text within one match operation.

Wiktor Stribiżew Over a year ago

I have written a general article about extracting strings between two strings with regex, too, feel free to read if you have a problem approaching your current similar problem.

товіаѕ · Accepted Answer · 2016-06-14 10:35:03Z

3

just replace those characters you don't want

e.g.

result_without_newline = str(result).replace('\n', '')

hope this helps :)

answered Jun 14, 2016 at 10:35

товіаѕ

3,3644 gold badges28 silver badges54 bronze badges

Collectives™ on Stack Overflow

extract strings between two strings in python using regular expression

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related