1

Suppose that I have following string:

mystr = """
<p>Some text and another text. </p> ![image_file_1][image_desc_1] some other text. 
<p>some text</p> 
![image_file_2][image_desc_2] and image: ![image_file_3][image_desc_3] 
test case 1: ![dont_match_1]
test case 2: [dont_match_2][dont_match_3]
finally: ![image_file_4][image_desc_4]
"""

I can get image_file_X's using the following code:

import re
re.findall('(?<=!\[)[^]]+(?=\]\[.*?\])', mystr)

I want to capture image_desc_X's but following does not work:

re.findall('(?!\[.*?\]\[)[^]]+(?=\])', mystr)

Any suggestions? If I can get both image_file's and image_desc's using one command that would be even better.

2 Answers 2

2

Use the following approach:

result = re.findall(r'!\[([^]]+)\]\[([^]]+)\]', mystr)
print(result)

The output:

[('image_file_1', 'image_desc_1'), ('image_file_2', 'image_desc_2'), ('image_file_3', 'image_desc_3'), ('image_file_4', 'image_desc_4')]
Sign up to request clarification or add additional context in comments.

1 Comment

There is no need to express existence of ! with a positive lookbehind. Literal ! is enough.
1

I guess you can use:

for match in re.finditer(r"!\[(.*?)\]\[(.*?)]", mystr):
    print match.group(1)
    print match.group(2)

output:

image_file_1
image_desc_1
image_file_2
image_desc_2
image_file_3
image_desc_3
image_file_4
image_desc_4

DEMO

2 Comments

Are you sure case-insensitivity flag should be set here?
It came by default...I'll remove it, tks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.