0

I have a bunch of files:

File Completed for 123456 1 - Platform Junk (AP .msg

File Completed for 1234566 1 - More Junk here and Stuf.msg

File Completed for 654321 1 - ® Stuff and Junk.msg

So each file contains a 6 or 7 digit number (not including that 1 after the number), also some files have stupid R (registered trademark) symbols.

My goal is to search the current directory for all .msg files, find the 6 or 7 digit number, and rename the file to 123456.msg or 11234567.msg`.

I have the regex that should work properly to extract the number:

(?<!\d)(\d{6}|\d{7})(?!\d)

Now I just need to loop through all .msg files and rename them. I've got my foot in the door with the following code, but I don't quite know how to extract what I want and rename:

for filename in glob.glob(script_dir + '*.msg'):
    new_name = re.sub(r'(?<!\d)(\d{6}|\d{7})(?!\d)')

Any help or step in the right direction would be much appreciated!

1 Answer 1

1

Only the regex is right here, don't take it the wrong way. I'll explain how to fix your code to rename your files step by step:

First, the glob pattern should be written using os.path.join or you'd have to end script_dir with /:

for filename in glob.glob(os.path.join(script_dir,'*.msg')):

Let's test your regex, adapted to keep only the regex match and drop the rest:

>>> re.sub(r".*((?<!\d)(\d{6}|\d{7})(?!\d)).*",r"\1.msg","File Completed for 1234566 1 - More Junk here and Stuf.msg")
'1234566.msg'

Ok, now since it works, then compute the new name like this:

base_filename = os.path.basename(filename)
new_name = re.sub(r".*((?<!\d)(\d{6}|\d{7})(?!\d)).*",r"\1.msg",base_filename) # keep only matched pattern, restore .msg suffix in the end

so the regex only applies to filename, not full path

And last, use os.rename to rename the files (check if something was replaced or rename will fail because source==dest:

if base_filename != new_name:
   os.rename(filename,os.path.join(script_dir,new_name))
Sign up to request clarification or add additional context in comments.

9 Comments

Hey, thanks for the answer! The only thing it's saying is on the new_name line "TypeError: sub() missing 1 required positional argument: 'string' ". Also, I believe there's an extra ) at the end, so I removed it.
you're right! fixed. I assume you want to remove the matched pattern. was missing a parameter (the empty string), and the parameter order was wrong too:
So the code is now working great, except that it did the opposite: it removed the 6 or 7 numbers and left the text. But the numbers should stay and the rest disappear. Think that can be fixed with ^ in the regex?
I think we're almost there, I'm getting a TypeError: sub() missing 2 required positional arguments: 'repl' and 'string'. So I guess the method has parameters we need to fill? Can I just put a couple commas in? Also, I think there might need another ) after ." at the end maybe.
I feel like the easiest way might be to somehow invert the regex. I'm playing with it now trying to invert it, then we can use what you had before maybe.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.