2

This will be really quick marks for someone...

Here's my string:

Jan 13.BIGGS.04222 ABC DMP 15

I'm looking to match:

  1. the date at the front (mmm yy) format
  2. the name in the second field
  3. the digits at the end. There could be between one and three.

Here is what I have so far:

(\w{3} \d{2})\.(\w*)\..*(\d{1,3})$

Through a lot of playing around with http://www.pythonregex.com/ I can get to matching the '5', but not '15'.

What am I doing wrong?

3 Answers 3

6

Use .*? to match .* non-greedily:

In [9]: re.search(r'(\w{3} \d{2})\.(\w*)\..*?(\d{1,3})$', text).groups()
Out[9]: ('Jan 13', 'BIGGS', '15')

Without the question mark, .* matches as many characters as possible, including the digit you want to match with \d{1,3}.

Sign up to request clarification or add additional context in comments.

Comments

2

Alternatively to what @unutbu has proposed, you can also use word boundary \b - this matches "word border":

(\w{3} \d{2})\.(\w*)\..*\b(\d{1,3})$

From the site you referred:

>>> regex = re.compile("(\w{3} \d{2})\.(\w*)\..*\b(\d{1,3})$")
>>> regex.findall('Jan 13.BIGGS.04222 ABC DMP 15')
[(u'Jan 13', u'BIGGS', u'15')]

Comments

1

.* before numbers are greedy and match as much as it can, leaveing least possible digits to the last block. You either need to make it non-greedy (with ? like unutbu said) or make it do not match digits, replacing . with \D

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.