Using Regex to Change Filenames with Python

Question

I'm trying to use change a bunch of filenames using regex groups but can't seem to get it to work (despite writing what regexr.com tells me should be a valid regex statement). The 93,000 files I currently have all look something like this:

Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt    
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.tx

And I want them to look like this:

20120731McCONNELL2014sep19_at_182325.txt

But every time I run the script below, I get the following error:

Traceback (most recent call last):
  File "changefilenames.py", line 11, in <module>
    date = m.group(2)
AttributeError: 'NoneType' object has no attribute 'group'

Thanks so much for your help. My apologies if this is a silly question. I'm just starting with RegEx and Python and can't seem to figure this one out.

import os
import re
from dateutil.parser import parse


for filename in os.listdir("."):
    if filename.startswith("Mr."):

        m = re.match("Mr.\s(\w*).(\d*-\d*-\d*).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mr"

    if filename.startswith("Mrs."):

        m = re.match("Ms.\s(\w*).(\d*-\d*-\d*).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs"

    if filename.startswith("Ms."):

        m = re.match("Mrs.\s(\w*).(\d*-\d*-\d*).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs"

EDIT I changed the script based on the suggestions below but am still getting the exact same errors. Here's the new script:

for filename in os.listdir("."):

    m = re.search("(Mr|Mrs|Ms)\.\s(\w*)\.(\d*\-\d*\-\d*)\.(\w*)\.txt", filename)
    date = m.group(2)
    name = m.group(1)
    timestamp = m.group(3)

    dt = parse(date)
    new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

    os.rename(filename, new_filename)
    print new_filename

Neither of your regexes matches Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt, so re.match returns None. Note that you're checking if filename.startswith("Ms."), but the regex matches Mrs., not Ms.. — Aran-Fey
– Aran-Fey, Commented Jan 26, 2015 at 15:54

Kasravnd · Accepted Answer · 2015-01-26 15:54:17Z

0

You must use re.search instead re.match , for more detail read search() vs. match():

>>> s="Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt "
>>> import re
>>> m = re.search("Mr.\s(\w*).(\d*-\d*-\d*).(\w*).txt", s)

>>> date = m.group(2)
>>> date
'2012-07-31'
>>> name = m.group(1)
>>> name
'McCONNELL'
>>> timestamp = m.group(3)
>>> timestamp
'2014sep19_at_182325'

answered Jan 26, 2015 at 15:54

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Paul Rowe · Accepted Answer · 2015-01-26 15:58:58Z

0

Here are my suggestions on your regular expression.

Escape special characters (periods and dashes).
Combine the regular expressions by grouping the prefix.
Group the digits so you can retrieve them by group later.

(Mr|Mrs|Ms)\.\s(\w*)\.(\d*)\-(\d*)\-(\d*)\.(\w*)\.txt

answered Jan 26, 2015 at 15:58

Paul Rowe

7883 silver badges10 bronze badges

2 Comments

Paul Rowe Over a year ago

Ok. Try combining my answer with Kasra's below, using re.search instead of re.match. Beware that the group numbers in my regular expression are different, but you won't have to pass the date portions through a date parser.

Jolijt Tamanaha Over a year ago

Did that and no luck! I posted the new script above. If there are certain files in the folder that are a different format, will it just skip over them or spit out that error? Because I figured it'd just skip over them but I might be wrong there.

John Hua · Accepted Answer · 2015-01-26 16:22:54Z

0

re.sub(r'^Mrs?\. (\w+)\.(\d{4})-(\d{2})-(\d{2})\.(\d{4}\w+\d+_at_\d+)(\.txt)$',r'\2\3\4\1\5\6','Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt')

answered Jan 26, 2015 at 16:22

John Hua

1,47611 silver badges17 bronze badges

1 Comment

Ben Over a year ago

Please edit your answer, and explain why this is a good solution to the problem.

ThorSummoner · Accepted Answer · 2015-01-26 23:56:53Z

0

I did the transformation like this (Disclaimer, I have not cleaned this up at all):

import re

from pprint import pprint

names = """
Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.txt
""".strip()

for record in names.splitlines():
    name, part2 = re.split('\.(?=\d)', record, 1)
    date, at_time, fileext = re.split('\.', part2)

    pprint(record)
    pprint(''.join([
        date.replace('-', ''),
        name.translate(None, ' .',),
        at_time,
    ]) + '.' + fileext)


    print('\n')

answered Jan 26, 2015 at 23:56

ThorSummoner

18.6k18 gold badges144 silver badges156 bronze badges

Collectives™ on Stack Overflow

Using Regex to Change Filenames with Python

4 Answers 4

Comments

2 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related