0

I am trying to use regex to identify data of the format: XX days, XX hours, XX minutes (expecting minimal structural changes due to white spaces, comma and plurals) I wanted to see an effecient python way of using regex so that I can get numbers associated with days hours and minutes.

I tried the following:

matchingTime = "27 days, 21 hours, 23 minutes ago"
re.search('([0-9]{0,2}).*day.* ([0-9]+) .*hour.* ([0-9]+) .*minute.*',matchingTime)

For the above case it works fine and I get the values in group 1 2 3 respectively.

The issue is either of the field may not necessarily be present, say

matchingTime = "21 hours, 23 minutes ago"

for the above case it fails.

I do know I can get it done using trys and excepts, but I was hoping to find a concise and efficient way of doing it.

Any inputs will be really helpful. Would be glad to give any further clarifications to my query.

EDIT: [0-9]{0,2} for the days part, just trying a few ways to solve this.

2
  • 1
    Just put non-capturing parens then ? around the optional part '(:?([0-9]{0,2}).*day.* )?([0-9]+) .*hour.* ([0-9]+) .*minute.*' Commented Oct 8, 2013 at 16:33
  • 1
    You want to have optional groups have a look at python regex optional capture group Commented Oct 8, 2013 at 16:39

1 Answer 1

2

You could perhaps use a regex like:

(?:(?P<days>[0-9]{0,2})\s*day[^, ]*,? *)?(?:(?P<hrs>[0-9]+)\s*hour[^, ]*,? *)?(?:(?P<min>[0-9]+)\s*minute[^, ]*,? *)?

regex101 demo

I'm using [^, ]*,? * for the optional commas and spaces and not using .* just so there's not too much backtracking.

I also used named capture groups and wrapped the whole groups for each day/hour/minute into a non-capture group, after which I put a ? to mark them as optional. Each group is fairly similar:

(?:                       # Start of non-capture group
    (?P<days>[0-9]{0,2})  # Numbers to capture
    \s*                   # Spaces if any
    day                   # Literal match
    [^, ]*,? *            # Anything until first comma and optional spaces
)?                        # Close of non-capture group and marking it as optional
Sign up to request clarification or add additional context in comments.

3 Comments

parenthesis is unbalanced.
So do I access the result in group 1 2 3 ?
@AjayNair In a similar fashion to what you have, except you use the name of the capture as string. The match object with .group('days') for days, .group('hrs') for hours, etc

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.