1

Need help with regex within re.sub . In this case I am replacing with nothing ("")

My Current Code:

file_list = ['F_5500_SF_PART7_[0-9][0-9][0-9][0-9]_all.zip',
 'F_5500_SF_[0-9][0-9][0-9][0-9]_All.zip',
 'F_5500_[0-9][0-9][0-9][0-9]_All.zip',
 'F_SCH_A_PART1_[0-9][0-9][0-9][0-9]_All.zip']

foldernames = [re.sub('(\d{4})_All.zip', '', i) for i in file_list]

The Result I am trying to achieve is:

foldernames = ['F_5500_SF_PART7','F_5500_SF','F_5500','F_SCH_A_PART1']

I think part of the complexity is the fact that there is already regex in my file_list. Hoping someone smarter could help.

0

2 Answers 2

2

You don't need a regular expression, you're removing fixed strings. So you can just use the str.replace() method.

foldernames = [i.replace('_[0-9][0-9][0-9][0-9]_All.zip', '').replace('_[0-9][0-9][0-9][0-9]_all.zip', '') for i in file_list]

The two calls to replace() are needed to handle both All and all. Or if the rest of the filename is always uppercase, you could use:

foldernames = [i.upper().replace('_[0-9][0-9][0-9][0-9]_ALL.ZIP', '') for i in file_list]
Sign up to request clarification or add additional context in comments.

Comments

1

Barmar's answer is the most appropriate for your problem. But if you actually need to use regex (let's say not all the files have the same fixed "[0-9][0-9][0-9][0-9]" string), then you can use:

'_(\[[-\d]*\]){4}_[aA]ll.zip'

(the [aA]ll at the end if for capturing the lower-case "all" in your first case)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.