Python: find a string between 2 strings in text

Question

I have a text like this

s = """
...

(1) Literature

1. a.
2. b.
3. c.

...
"""

I want to cut Literature section but I have some problem with detection.

I use here

re.search("(1) Literature\n\n(.*).\n\n", s).group(1)

but search return None.

Desire output is

(1) Literature

1. a.
2. b.
3. c.

What did I do wrong?

What is your desired output?

Pubudu Sitinamaluwa
– Pubudu Sitinamaluwa

2021-07-14 15:33:05 +00:00
Commented Jul 14, 2021 at 15:33 — Pubudu Sitinamaluwa
– Pubudu Sitinamaluwa, Commented Jul 14, 2021 at 15:33
Probably you need r'$1$\s+Literature\s+((?:.+\n)+)'

anubhava
– anubhava

2021-07-14 15:36:15 +00:00
Commented Jul 14, 2021 at 15:36 — anubhava
– anubhava, Commented Jul 14, 2021 at 15:36

The fourth bird · Accepted Answer · 2021-07-14 16:14:45Z

2

You could match (1) Literature and 2 newlines, and then capture all lines that start with digits followed by a dot.

\(1\) Literature\n\n((?:\d+\..*(?:\n|$))+)

The pattern matches:

$1$ Literature\n\n Match (1) Literature and 2 newlines
( Capture group 1
- (?: Non capture group
  - \d+\..*(?:\n|$) Match 1+ digits and a dot followed by either a newline or end of string
- )+ Close non capture group and repeat it 1 or more times to match all the lines
) Close group 1

Regex demo

Another option is to capture all following lines that do not start with ( digits ) using a negative lookahead, and then trim the leading and trailing whitespaces.

\(1\) Literature((?:\n(?!\(\d+\)).*)*)

Regex demo

answered Jul 14, 2021 at 16:14

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Pubudu Sitinamaluwa · Accepted Answer · 2021-07-14 15:50:12Z

1

Parentheses have a special meaning in regex. They are used to group matches.

(1) - Capture 1 as the first capturing group.

Since the string has parentheses in it, the match is not successful. And .* capturing end with line end.

Check Demo

Based on your regex, I assumed you wanted to capture the line with the word Literature, 5 lines below it. Here is a regex to do so.

\(1\) Literature(.*\n){5}

Regex Demo

Note the scape characters used on parentheses around 1.

EDIT

Based on zr0gravity7's comment, I came up with this regex to capture the middle section on the string.

\(1\)\sLiterature\n+((.*\n){3})

This regex will capture the below string in capturing group 1.

1. a.
2. b.
3. c.

Regex Demo

edited Jul 14, 2021 at 15:50

answered Jul 14, 2021 at 15:42

Pubudu Sitinamaluwa

9887 silver badges22 bronze badges

1 Comment

zr0gravity7 Over a year ago

Most likely they want to capture the Literature part in a group, and the choices in a group, they do not want to capture newlines.

anubhava · Accepted Answer · 2021-07-14 16:58:43Z

1

You may use this regex with a capture group:

r'\(1\)\s+Literature\s+((?:.+\n)+)'

RegEx Demo

Explanation:

$1$: Match (1) text
\s+: Match 1+ whitespaces
Literature:
\s+:
(: Start capture group #1
- (?:.+\n)+: Match a line with 1+ character followed by newline. Repeat this 1 or more times to allow it to match multiple such lines
): End capture group #1

answered Jul 14, 2021 at 16:58

anubhava

790k67 gold badges603 silver badges671 bronze badges

Comments

zr0gravity7 · Accepted Answer · 2021-07-14 16:22:25Z

0

Regex for capturing the generic question with that structure:

$\d+$\s+(\w+)\s+((?:\d+\.\s.+\n)+)

It will capture the title "Literature", then the choices in another group (for a total of 2 groups).

It is not possible to capture repeating groups, so in order to get each of your "1. a." in a separate group you would have to match the second group from above again, with this pattern:

((\d+\.\s+.+)\n)+) then globally match to get all groups.

edited Jul 14, 2021 at 16:22

answered Jul 14, 2021 at 15:48

zr0gravity7

3,2801 gold badge16 silver badges38 bronze badges

Collectives™ on Stack Overflow

Python: find a string between 2 strings in text

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related