2

I have a string formatted as results_item12345. The numeric part is either four or five digits long. The letters will always be lowercase and there will always be an underscore somewhere in the non-numeric part.

I tried to extract it using the following:

 import re
 string = 'results_item12345'
 re.search(r'[^a-z][\d]',string)

However, I only get the leftmost two digits. How can I get the entire number?

2
  • 1
    Your regex is currently matching "a single character that is not a-z followed by a single digit". That should shed some light on what is happening. Commented Oct 11, 2012 at 19:11
  • Ah that explains why there were two characters. Commented Oct 11, 2012 at 19:22

3 Answers 3

7

Assuming you only care about the numbers at the end of the string, the following expression matches 4 or 5 digits at the end of the string.

\d{4,5}$

Otherwise, the following would be the full regex matching the provided requirements.

^[a-z_]+\d{4,5}$
Sign up to request clarification or add additional context in comments.

2 Comments

Its backslash, but that's the solution ;)
Yep, saw that… one second after hitting the button
2

If you wanted to just match any number in the string you could search for:

r'[\d]{4,5}'

If you need validation of some sort you need to use:

r'^result_item[\d]{4,5}$'

2 Comments

@JasonMcCreary updated it just before you posted comment... Thanks anyway.
@JasonMcCreary thanks, I know that but I prefer to always encapsulate character groups into braces, it's easier for me to read :)
1
import re    
a="results_item12345"
pattern=re.compile(r"(\D+)(\d+)")
x=pattern.match(a).groups()
print x[1]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.