0

I have a list of file paths, with the file name containing something I need to retrieve. C:\PATH\PATH\PATH\PATH\THE_THING_I_NEED.xslx

Using Pythex I created the regular expression and it picks exactly what I want. Which is everything between \ and .xslx. Below is the code and error I get:

import re
files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']

pattern = re.compile('(?<=\\)?[a-zA-Z]+(?=\.xlsx)')
for x in files:
    matches =re.findall(pattern, x)
    print(matches)

#error i get below   
error: missing ), unterminated subpattern at position 0

So following the error i added an extra ) and it works:

pattern = re.compile('(?<=\\))?[a-zA-Z]+(?=\.xlsx)')
#                           ^ added right there

What exactly is that extra ) doing? Pythex doesn't seem to need it and to my eye, it seems unnecessary

5
  • That is why it is recommended to use raw string literals when defining regex in Python. Commented Mar 1, 2017 at 13:53
  • 3
    You don't need an extra ), you need an extra \\. Commented Mar 1, 2017 at 13:54
  • 1
    If you're trying to extract data from file paths, consider using the functions in os.path, which exist for exactly that purpose. os.path.splitext(os.path.split('C:\\PATH\\PATH\\PATH\\thing1.xlsx')[1])[0] gives you "thing1". Commented Mar 1, 2017 at 13:55
  • The lookbehind is not necessary since you are using re.findall. Just use a capturing group around the pattern you need to extract. Your filenames also contain digits. Use pattern = re.compile(r'\\([^\\]+)\.xlsx'), see this online demo. Commented Mar 1, 2017 at 13:57
  • You need to escape the backslash for regex so you get \\ . Now comes the next layer. To create those two backslashes in Python you have to escape each of them in your string literal leading to \\\\ . That's why you should use a raw string (which doesn't do escaping), as Wiktor said: r'\\ '. Commented Mar 1, 2017 at 14:01

1 Answer 1

2

You're using the wrong tool. I'd recommend the os module for what you want to accomplish:

import os

files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
for file in files:
    base = os.path.basename(file)
    print(os.path.splitext(base)[0])

This will print exactly what you want:

thing1
thing2

You can also wrap this as a one-liner inside a function as stated in comments:

import os


def get_filename(files):
    return [os.path.splitext(os.path.basename(file))[0] for file in files]

if __name__ == '__main__':
    files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
    print(get_filename(files))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.