2

I have a a filename below and I want to extract year and _TEXT part.

fle_2019-11-17A17-21-09.01(_TEXT).txt

I am able to do this using two regex and then join the results.

(?<=\_)(\d{4})(?=\-) This gives me year

(?<=\()(.*)(?=\)) This gives me _TEXT

Is there a way to get this from a single expression?

2
  • What programming language are you using? Commented Dec 13, 2019 at 10:11
  • Python. Sorry forgot to mention previously. Commented Dec 13, 2019 at 10:12

2 Answers 2

1

One option is to use 2 capturing groups. Depending on what you would allow to match before the first underscore, you could for example use a character class to match word characters without an underscore [^\W_]+

^[^\W_]+_(\d{4})-[\w.-]+\(([^)]+)\)\.\w+$

In parts

  • ^ Start of string
  • [^\W_]+ Match 1+ word chars except _
  • _ Match the _
  • (\d{4}) Capture group 1, match 1+ digits
  • -[\w.-]+ Match - and 1+ word chars, . or - (extend the character class with what you would allow to match
  • \( Match (
    • ([^)]+) Capture group 2, match 1+ times any char except )
  • \) Match )
  • \.\w+ Match a . and 1+ word chars
  • $ End of string

Regex demo | Python demo

For example

import re

regex = r"^[^\W_]+_(\d{4})-[\w.-]+\(([^)]+)\)\.\w+$"
test_str = "fle_2019-11-17A17-21-09.01(_TEXT).txt"
print(re.findall(regex, test_str))

Output

[('2019', '_TEXT')]
Sign up to request clarification or add additional context in comments.

1 Comment

This is not a complete answer, because it fails to provide any actual Python code.
1

In the interest of simplicity, we could just try using re.findall with an alternation which captures either a 4 digit year or a file name:

file = "fle_2019-11-17A17-21-09.01(_TEXT).txt"
parts = re.findall(r'\d{4}(?=-\d{2})|(?<=\().*?(?=\))', file)
print(parts)

This prints:

['2019', '_TEXT']

I like this approach because the output already yields separate logical values for the year and file name.

3 Comments

@Thefourthbird Thanks for casting doubt on my answer.
It is not meant to cast doubt on the answer, but perhaps something to mention. I will remove the comment.
I agree with you, but I am assuming the OP has a bunch of files all with the same format in a folder somewhere (in a list in Python), and it wants to extract the year and name features. My answer does not attempt any validation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.