32

I am quite new to python and regex (regex newbie here), and I have the following simple string:

s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""

I would like to extract only the last digits in the above string i.e 767980716 and I was wondering how I could achieve this using python regex.

I wanted to do something similar along the lines of:

re.compile(r"""-(.*?)""").search(str(s)).group(1)

indicating that I want to find the stuff in between (.*?) which starts with a "-" and ends at the end of string - but this returns nothing..

I was wondering if anyone could point me in the right direction.. Thanks.

7 Answers 7

45

You can use re.match to find only the characters:

>>> import re
>>> s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
>>> re.match('.*?([0-9]+)$', s).group(1)
'767980716'

Alternatively, re.finditer works just as well:

>>> next(re.finditer(r'\d+$', s)).group(0)
'767980716'

Explanation of all regexp components:

  • .*? is a non-greedy match and consumes only as much as possible (a greedy match would consume everything except for the last digit).
  • [0-9] and \d are two different ways of capturing digits. Note that the latter also matches digits in other writing schemes, like ୪ or ൨.
  • Parentheses (()) make the content of the expression a group, which can be retrieved with group(1) (or 2 for the second group, 0 for the whole match).
  • + means multiple entries (at least one number at the end).
  • $ matches only the end of the input.
Sign up to request clarification or add additional context in comments.

Comments

8

Your Regex should be (\d+)$.

  • \d+ is used to match digit (one or more)
  • $ is used to match at the end of string.

So, your code should be: -

>>> s = "99-my-name-is-John-Smith-6376827-%^-1-2-767980716"
>>> import re
>>> re.compile(r'(\d+)$').search(s).group(1)
'767980716'

And you don't need to use str function here, as s is already a string.

1 Comment

If you write your regex pattern as r'(\d+)$', then you don't have to escape the backslash.
8

Nice and simple with findall:

import re

s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""

print re.findall('^.*-([0-9]+)$',s)

>>> ['767980716']

Regex Explanation:

^         # Match the start of the string
.*        # Followed by anthing
-         # Upto the last hyphen
([0-9]+)  # Capture the digits after the hyphen
$         # Upto the end of the string

Or more simply just match the digits followed at the end of the string '([0-9]+)$'

Comments

4

Use the below regex

\d+$

$ depicts the end of string..

\d is a digit

+ matches the preceding character 1 to many times

Comments

4

Save the regular expressions for something that requires more heavy lifting.

>>> def parse_last_digits(line): return line.split('-')[-1]
>>> s = parse_last_digits(r"99-my-name-is-John-Smith-6376827-%^-1-2-767980716")
>>> s
'767980716'

Comments

3

I have been playing around with several of these solutions, but many seem to fail if there are no numeric digits at the end of the string. The following code should work.

import re

W = input("Enter a string:")
if re.match('.*?([0-9]+)$', W)== None:
    last_digits = "None"
else:
    last_digits = re.match('.*?([0-9]+)$', W).group(1)
print("Last digits of "+W+" are "+last_digits)

1 Comment

m = re.findall(r"\d+\s*$", W); last_digits = m[0] if m else 'None' eliminates the redundant expression match.
2

Try using \d+$ instead. That matches one or more numeric characters followed by the end of the string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.