I have a list of file paths, with the file name containing something I need to retrieve. C:\PATH\PATH\PATH\PATH\THE_THING_I_NEED.xslx
Using Pythex I created the regular expression and it picks exactly what I want. Which is everything between \ and .xslx. Below is the code and error I get:
import re
files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
pattern = re.compile('(?<=\\)?[a-zA-Z]+(?=\.xlsx)')
for x in files:
matches =re.findall(pattern, x)
print(matches)
#error i get below
error: missing ), unterminated subpattern at position 0
So following the error i added an extra ) and it works:
pattern = re.compile('(?<=\\))?[a-zA-Z]+(?=\.xlsx)')
# ^ added right there
What exactly is that extra ) doing? Pythex doesn't seem to need it and to my eye, it seems unnecessary
os.path, which exist for exactly that purpose.os.path.splitext(os.path.split('C:\\PATH\\PATH\\PATH\\thing1.xlsx')[1])[0]gives you"thing1".re.findall. Just use a capturing group around the pattern you need to extract. Your filenames also contain digits. Usepattern = re.compile(r'\\([^\\]+)\.xlsx'), see this online demo.\\. Now comes the next layer. To create those two backslashes in Python you have to escape each of them in your string literal leading to\\\\. That's why you should use a raw string (which doesn't do escaping), as Wiktor said:r'\\ '.