2

I am looking to strip dates out of a list of longer strings, each of which, may or may not, contain a date. An example of one such string might be:

"Jane Doe 76554334 12/15/2017 - 8:35 pm 700945 - SDFTRD $550.95"

I have built a method which is returning an error:

AttributeError: 'NoneType' object has no attribute 'match_object'

My aim has been to look for regex matches on (\d+/\d+/\d+) and then convert that match to a string so that it can be used with .replace(). I cannot seem to solve this using match_object.

Here is my method:

def replace_match(string):
    match=re.search(r'(\d+/\d+/\d+)',string)
    if match:
        match=re.match(r'(\d+/\d+/\d+)',string).match_object.group(0)
        print("match = " + match)
        string = string.replace(match, "")
    else:
        print("no match found")
    return string

I am using Python 3.6.3

1
  • You should found others means to do that in this post with try-catch. Commented Dec 21, 2017 at 19:23

1 Answer 1

6

You can use re.sub:

import re
s = "Jane Doe 76554334 12/15/2017 - 8:35 pm 700945 - SDFTRD $550.95"
new_s = re.sub('\d+\/\d+\/\d+', '', s)

Output:

'Jane Doe 76554334  - 8:35 pm 700945 - SDFTRD $550.95'

Edit, removing the timestamp:

import re
s = "Jane Doe 76554334 12/15/2017 - 8:35 pm 700945 - SDFTRD $550.95"
new_s = re.sub('\d+\/\d+\/\d+|\d+:\d+(?=\spm)|\d+:\d+(?=\sam)', '', s)

Output:

'Jane Doe 76554334  -  pm 700945 - SDFTRD $550.95'

Explanation for timestamp removal regex:

\d+:\d+: matches hour then minutes (?=\sam): is a positive lookahead, which means that \d+:\d+ will not register a match unless the matched characters are followed by a space and then am, designating that it is indeed a time stamp.

\d+:\d+(?=\spm) does the same as above except that it is checking if the time match is preceded by pm, accounting for both time conventions.

Sign up to request clarification or add additional context in comments.

6 Comments

That worked! I'm curious though, I'm used to using regex in the format of r'(\d+\/\d+\/\d+)'. Just curious if you knew why the r() was not necessary in this instance.
@HMLDude r, or raw string, enables the interpreter to immediately treat any "\"s as a special character in the escape sequence, or literally. In normal strings, "\" is considered as "\" unless used as an escape character. In the case of regex in Python, the interpreter will still evaluate "\" as an escape character regardless of whether it was passed as a raw string.
I'd recommend avoiding redundant escaping, use '\d+/\d+/\d+'. In Python regex patterns, / is never special.
I have been messing around with this quite a bit since this answer was posted, including using @WiktorStribiżew's suggestion of removing unnecessary slashes. Curious what you guys would suggest for removing the timestamp as in 8:35 in the example above.
@Ajax1234 that did it. Trying to understand why the timestamp is so much more complicated an expression as compared to the date.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.