0

I'm trying to use change a bunch of filenames using regex groups but can't seem to get it to work (despite writing what regexr.com tells me should be a valid regex statement). The 93,000 files I currently have all look something like this:

Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt    
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.tx

And I want them to look like this:

20120731McCONNELL2014sep19_at_182325.txt

But every time I run the script below, I get the following error:

Traceback (most recent call last):
  File "changefilenames.py", line 11, in <module>
    date = m.group(2)
AttributeError: 'NoneType' object has no attribute 'group'

Thanks so much for your help. My apologies if this is a silly question. I'm just starting with RegEx and Python and can't seem to figure this one out.

import os
import re
from dateutil.parser import parse


for filename in os.listdir("."):
    if filename.startswith("Mr."):

        m = re.match("Mr.\s(\w*).(\d*-\d*-\d*).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mr"

    if filename.startswith("Mrs."):

        m = re.match("Ms.\s(\w*).(\d*-\d*-\d*).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs"

    if filename.startswith("Ms."):

        m = re.match("Mrs.\s(\w*).(\d*-\d*-\d*).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs" 

EDIT I changed the script based on the suggestions below but am still getting the exact same errors. Here's the new script:

for filename in os.listdir("."):

    m = re.search("(Mr|Mrs|Ms)\.\s(\w*)\.(\d*\-\d*\-\d*)\.(\w*)\.txt", filename)
    date = m.group(2)
    name = m.group(1)
    timestamp = m.group(3)

    dt = parse(date)
    new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

    os.rename(filename, new_filename)
    print new_filename
1
  • Neither of your regexes matches Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt, so re.match returns None. Note that you're checking if filename.startswith("Ms."), but the regex matches Mrs., not Ms.. Commented Jan 26, 2015 at 15:54

4 Answers 4

0

You must use re.search instead re.match , for more detail read search() vs. match():

>>> s="Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt "
>>> import re
>>> m = re.search("Mr.\s(\w*).(\d*-\d*-\d*).(\w*).txt", s)

>>> date = m.group(2)
>>> date
'2012-07-31'
>>> name = m.group(1)
>>> name
'McCONNELL'
>>> timestamp = m.group(3)
>>> timestamp
'2014sep19_at_182325'
Sign up to request clarification or add additional context in comments.

Comments

0

Here are my suggestions on your regular expression.

  1. Escape special characters (periods and dashes).
  2. Combine the regular expressions by grouping the prefix.
  3. Group the digits so you can retrieve them by group later.

    (Mr|Mrs|Ms)\.\s(\w*)\.(\d*)\-(\d*)\-(\d*)\.(\w*)\.txt

2 Comments

Ok. Try combining my answer with Kasra's below, using re.search instead of re.match. Beware that the group numbers in my regular expression are different, but you won't have to pass the date portions through a date parser.
Did that and no luck! I posted the new script above. If there are certain files in the folder that are a different format, will it just skip over them or spit out that error? Because I figured it'd just skip over them but I might be wrong there.
0

re.sub(r'^Mrs?\. (\w+)\.(\d{4})-(\d{2})-(\d{2})\.(\d{4}\w+\d+_at_\d+)(\.txt)$',r'\2\3\4\1\5\6','Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt')

1 Comment

Please edit your answer, and explain why this is a good solution to the problem.
0

I did the transformation like this (Disclaimer, I have not cleaned this up at all):

import re

from pprint import pprint

names = """
Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.txt
""".strip()

for record in names.splitlines():
    name, part2 = re.split('\.(?=\d)', record, 1)
    date, at_time, fileext = re.split('\.', part2)

    pprint(record)
    pprint(''.join([
        date.replace('-', ''),
        name.translate(None, ' .',),
        at_time,
    ]) + '.' + fileext)


    print('\n')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.