0

I'm trying to use change a bunch of filenames using regex groups but can't seem to get it to work (despite writing what regexr.com tells me should be a valid regex statement). The 93,000 files I currently have all look something like this:

Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt    
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.txt
The PRESIDING OFFICER.2012-12-06.2014sep19_at_182320.txt

And I want them to look like this:

20120731McCONNELL2014sep19_at_182325.txt

And ignore any file that starts with anything other than Mr., Mrs., and Ms.

But every time I run the script below, I get the following error:

Traceback (most recent call last):
  File "changefilenames.py", line 11, in <module>
    date = m.group(2)
AttributeError: 'NoneType' object has no attribute 'group'

Thanks so much for your help. My apologies if this is a silly question. I'm just starting with RegEx and Python and can't seem to figure this one out.

import io
import os
import re
from dateutil.parser import parse


for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
    if filename.startswith("Mr."):

        m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mr"

    if filename.startswith("Mrs."):

        m = re.search("Mrs.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs"

    if filename.startswith("Ms."):

        m = re.search("Ms.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs"

I've made the adjustments suggested in Using Regex to Change Filenames with Python but still no luck.

EDIT: Made the following changes based on answer below:

for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
    if filename.startswith("Mr."):
        print filename
        m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        if m:
            date = m.group(2)
            name = m.group(1)
            timestamp = m.group(3)

            dt = parse(date)
            new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

            os.rename(filename, new_filename)
            print new_filename

print "All done with the Mr"

And it spit out this:

Mr. Adams was right.2009-05-18.2014sep17_at_22240.txt
Mr. ADAMS.2009-12-16.2014sep18_at_223650.txt
Traceback (most recent call last):
  File "changefilenames.py", line 19, in <module>
    os.rename(filename, new_filename)
OSError: [Errno 2] No such file or directory
5
  • To gain more insight, I'd suggest print filename before the re.search, so you see the failing file name. Commented Feb 1, 2015 at 21:13
  • @Jasper did that and it prints no filenames but says: ".DS_Store" so I have no idea what that means! Commented Feb 1, 2015 at 21:19
  • That's a hidden file that apple systems produce. Is this print really after the if filenamt.startswith("Mr.") line and properly indented? See also the answer below, this should work Commented Feb 1, 2015 at 21:21
  • @Jasper ah yes I indented it incorrectly. sorry about that. and I did the answer below but got an OSError. I think that means my regex isn't robust enough? Commented Feb 1, 2015 at 21:29
  • The regex seems OK, but you are trying to rename a nonexistent file. See my answer Commented Feb 1, 2015 at 21:54

2 Answers 2

1

You are passing bare file names to os.rename, probably with missing paths.

Consider the following layout:

yourscript.py
subdir/
  - one
  - two

This is similar to your code:

import os

for fn in os.listdir('subdir'):
    print(fn)
    os.rename(fn, fn + '_moved')

and it throws an exception (somewhat nicer in Python 3):

FileNotFoundError: [Errno 2] No such file or directory: 'two' -> 'two_moved'

because in the current working directory, there is no file named two. But consider this:

import os

for fn in os.listdir('subdir'):
    print(fn)
    os.rename(os.path.join('subdir',fn), os.path.join('subdir', fn+'_moved'))

This works, because the full path is used. Instead of using 'subdir' again and again (or in a variable), you should perhaps change the working directory as a first step:

import os

os.chdir('subdir')

for fn in os.listdir():
    print(fn)
    os.rename(fn, fn + '_moved')
Sign up to request clarification or add additional context in comments.

Comments

1

After you do a search, you'll always want to make sure you have a match before doing any processing. It looks like you may have a file that starts with 'Mr.' but doesn't match your expression in general.

if filename.startswith("Mr."):

    m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
    if m: # Only look at groups if we have a match.
        date = m.group(2)
        name = m.group(1)
        ....

I would also suggest not using startswith('Mr.') and regex at the same time, since your regex should already only work on strings that start with 'Mr.', though you may want to add a '^' to the beginning of the regex to enforce this:

m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m:        # ^ added carat to signify start of string.
    date = m.group(2)
    name = m.group(1)
    ...

Additionally, you may want to verify what files you are not matching, since with that much data, you will often run into problems like extra whitespace or improper case, so you may want to look into making your regex more robust.

1 Comment

Skylor thanks for your help. I made the changes but getting an OSError so I guess you're right that my regex isn't robust enough?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.