Python Regex error

Question

I have a list of file paths, with the file name containing something I need to retrieve. C:\PATH\PATH\PATH\PATH\THE_THING_I_NEED.xslx

Using Pythex I created the regular expression and it picks exactly what I want. Which is everything between \ and .xslx. Below is the code and error I get:

import re
files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']

pattern = re.compile('(?<=\\)?[a-zA-Z]+(?=\.xlsx)')
for x in files:
    matches =re.findall(pattern, x)
    print(matches)

#error i get below   
error: missing ), unterminated subpattern at position 0

So following the error i added an extra ) and it works:

pattern = re.compile('(?<=\\))?[a-zA-Z]+(?=\.xlsx)')
#                           ^ added right there

What exactly is that extra ) doing? Pythex doesn't seem to need it and to my eye, it seems unnecessary

That is why it is recommended to use raw string literals when defining regex in Python. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Mar 1, 2017 at 13:53
If you're trying to extract data from file paths, consider using the functions in os.path, which exist for exactly that purpose. os.path.splitext(os.path.split('C:\\PATH\\PATH\\PATH\\thing1.xlsx')[1])[0] gives you "thing1". — Kevin
– Kevin, Commented Mar 1, 2017 at 13:55
The lookbehind is not necessary since you are using re.findall. Just use a capturing group around the pattern you need to extract. Your filenames also contain digits. Use pattern = re.compile(r'\\([^\\]+)\.xlsx'), see this online demo. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Mar 1, 2017 at 13:57
You need to escape the backslash for regex so you get \\ . Now comes the next layer. To create those two backslashes in Python you have to escape each of them in your string literal leading to \\\\ . That's why you should use a raw string (which doesn't do escaping), as Wiktor said: r'\\ '. — Matthias
– Matthias, Commented Mar 1, 2017 at 14:01

user6165050 · Accepted Answer · 2017-03-01 14:03:52Z

2

You're using the wrong tool. I'd recommend the os module for what you want to accomplish:

import os

files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
for file in files:
    base = os.path.basename(file)
    print(os.path.splitext(base)[0])

This will print exactly what you want:

thing1
thing2

You can also wrap this as a one-liner inside a function as stated in comments:

import os


def get_filename(files):
    return [os.path.splitext(os.path.basename(file))[0] for file in files]

if __name__ == '__main__':
    files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
    print(get_filename(files))

answered Mar 1, 2017 at 14:03

user6165050

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Regex error

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related