0

I have list of fake ids in a text file. I wanted to capture all IDs that starts with 'A0015'. I tried different regex but they are not capturing the final output. Which regex should be using?

text = "Here are the fake student ids: IDs A0015-4737, IDs: A0015-384721-ADA2ad, A0015WE382 \n A00152838. Please enter this."
capture_id_list = (re.findall(r"A0015 ([\w-]+)", text,flags=re.IGNORECASE))
print(capture_id_list) # results with []
# print(text.startswith('A0015')) # Gives False...not usefull

find_this = "A0015"
capture_id_list = text[:text.find(find_this) + len(find_this)]
print(capture_id_list) # Here are the fake student ids: IDs A0015. Not the results 

Final Output:

['A0015-4737','A0015-384721-ADA2ad','A0015WE382','A00152838']
2
  • 3
    Your regex has a space in it that the patterns don't. Commented Oct 31, 2018 at 18:11
  • you can toy with your regex at regex101.com. Commented Oct 31, 2018 at 18:17

2 Answers 2

1

I suggest using r"(A0015[^ ,.]+)" in your code:

>>>import re
>>>text = "Here are the fake student ids: IDs A0015-4737, IDs: A0015-384721-ADA2ad, A0015WE382 \n A00152838. Please enter this."
>>>capture_id_list = (re.findall(r"(A0015[^ ,.]+)", text,flags=re.IGNORECASE))
>>>print(capture_id_list)
['A0015-4737', 'A0015-384721-ADA2ad', 'A0015WE382', 'A00152838']

Here () is a capture group. It catches a string beginning with A0015 and one or more characters (the + sign) that differs from space, comma or dot (characters in the [] braces negated by ^ sign].

Sign up to request clarification or add additional context in comments.

Comments

0

This should work for you: r"(A0015[^\s,.]*)", inline it would look like this:

capture_id_list = (re.findall(r"(A0015[^\s,.]*)", text,flags=re.IGNORECASE))

(A0015[^\s,.]*)

  • 1st Capturing Group (A0015[^\s,.]*)
    • A0015 matches the characters A0015 literally (case insensitive)
    • Match a single character not present in the list below: [^\s,.]*
      • * Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
      • \s matches any whitespace character (equal to [\r\n\t\f\v ])
      • ,. matches a single character in the list ,. (case insensitive)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.