Python - find all regex

Question

I have list of fake ids in a text file. I wanted to capture all IDs that starts with 'A0015'. I tried different regex but they are not capturing the final output. Which regex should be using?

text = "Here are the fake student ids: IDs A0015-4737, IDs: A0015-384721-ADA2ad, A0015WE382 \n A00152838. Please enter this."
capture_id_list = (re.findall(r"A0015 ([\w-]+)", text,flags=re.IGNORECASE))
print(capture_id_list) # results with []
# print(text.startswith('A0015')) # Gives False...not usefull

find_this = "A0015"
capture_id_list = text[:text.find(find_this) + len(find_this)]
print(capture_id_list) # Here are the fake student ids: IDs A0015. Not the results

Final Output:

['A0015-4737','A0015-384721-ADA2ad','A0015WE382','A00152838']

Your regex has a space in it that the patterns don't.

Daniel Roseman
– Daniel Roseman

2018-10-31 18:11:29 +00:00
Commented Oct 31, 2018 at 18:11 — Daniel Roseman
– Daniel Roseman, Commented Oct 31, 2018 at 18:11
you can toy with your regex at regex101.com.

David Culbreth
– David Culbreth

2018-10-31 18:17:23 +00:00
Commented Oct 31, 2018 at 18:17 — David Culbreth
– David Culbreth, Commented Oct 31, 2018 at 18:17

Koikos · Accepted Answer · 2018-10-31 18:18:33Z

1

I suggest using r"(A0015[^ ,.]+)" in your code:

>>>import re
>>>text = "Here are the fake student ids: IDs A0015-4737, IDs: A0015-384721-ADA2ad, A0015WE382 \n A00152838. Please enter this."
>>>capture_id_list = (re.findall(r"(A0015[^ ,.]+)", text,flags=re.IGNORECASE))
>>>print(capture_id_list)
['A0015-4737', 'A0015-384721-ADA2ad', 'A0015WE382', 'A00152838']

Here () is a capture group. It catches a string beginning with A0015 and one or more characters (the + sign) that differs from space, comma or dot (characters in the [] braces negated by ^ sign].

answered Oct 31, 2018 at 18:18

Koikos

18210 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

David Culbreth · Accepted Answer · 2018-10-31 18:30:42Z

0

This should work for you: r"(A0015[^\s,.]*)", inline it would look like this:

capture_id_list = (re.findall(r"(A0015[^\s,.]*)", text,flags=re.IGNORECASE))

(A0015[^\s,.]*)

1st Capturing Group (A0015[^\s,.]*)
- A0015 matches the characters A0015 literally (case insensitive)
- Match a single character not present in the list below: [^\s,.]*
  - * Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  - \s matches any whitespace character (equal to [\r\n\t\f\v ])
  - ,. matches a single character in the list ,. (case insensitive)

answered Oct 31, 2018 at 18:30

David Culbreth

2,8641 gold badge18 silver badges29 bronze badges

Collectives™ on Stack Overflow

Python - find all regex

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related