1

This question follows on from a previous question about If-Then-Else Regular Expressions.

Because of how I phrased my problem in the other question solutions didn't use the (?(A)X|Y) syntax. But I think I need to use that approach.

Here is my problem re-phrased...

I need a regex that takes as input a string representing a filename.

Here are my test strings...

The Edge Of Seventeen 2016 720p.mp4
20180511 2314 - Film4 - Northern Soul.ts
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts

If the filename matches this regex:

\d{8} \d{4} -.*?- .*?\.ts

Then this RegEx should be applied:

\d{8} \d{4} -.*?- ?(.*)\.ts

If the filename does not match that first regex then this regex should be applied to it:

(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?

This is the expected output...

Test string: The Edge Of Seventeen 2016 720p.mp4
Expected output: "The Edge Of Seventeen 2016 " (quotes only included to show that a trailing space can be left at the end)

Test String: 20180511 2314 - Film4 - Northern Soul.ts
Expected output: Northern Soul

Test String: 20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
Expected output: We Need to Talk About Kevin

Here is what I have tried to make the If-Then-Else Regex but it doesn't work:

I use this format --> (?(A)X|Y)

(?(\d{8} \d{4} -.*?- .*?\.ts)\d{8} \d{4} -.*?- ?(.*)\.ts|(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?)

This is A

\d{8} \d{4} -.*?- .*?\.ts

This is X

\d{8} \d{4} -.*?- ?(.*)\.ts

This is Y

(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?

I have tested the A, X and Y Regexes and they work individually but not when I put them together. Can someone help to piece them together using PCRE standard?

Cheers,

Flex

4
  • It is even simpler and you still misunderstand the conditional construct. You may just use an alternation: \d{8} \d{4} -.*?- ?(.*)\.ts|(.*[^] _,.()[-])[] _.()[-]+(19[0-9][0-9]|20[0-9][0-9])([] _,.()[-]|[^0-9]$)? Commented Feb 26, 2020 at 8:23
  • 1
    Even simpler: ^\d{8} \d{4} -.*?- ?\K.*(?=\.ts$)|^.*[^][ _,.()-][][ _.()-]+(?:19|20)\d{2}(?!\d) Commented Feb 26, 2020 at 8:53
  • 1
    See regex101.com/r/oZzNIV/1 Commented Feb 26, 2020 at 9:05
  • Thanks Wiktor. If you want to make this into an answer I'll accept it and give another thumbs up but up to you. Cheers. Commented Feb 27, 2020 at 23:04

2 Answers 2

1

You may use

^\d{8} \d{4} -.*?- ?\K.*(?=\.ts$)|^.*[^][ _,.()-][][ _.()-]+(?:19|20)\d{2}(?!\d)

See the regex demo

The pattern is a combination of two alternatives and as in any NFA regex the first alternative that matches "wins" and regex engine stops analyzing the the rest of alternatives on that level:

  • ^\d{8} \d{4} -.*?- ?\K.*(?=\.ts$) - matches
    • ^ - start of string
    • \d{8} \d{4} - - 8 digits, space, four digits, space and then -
    • .*? - 0+ chars other than line breaks as few as possible
    • - ? - - and an optional space
    • \K - match reset operator that discards the text matched so far in the memory buffer
    • .* - any 0+ chars other than line break chars, as many as possible
    • (?=\.ts$) - this positive lookahead requires .ts and end of string position immediately to the right of the current position.
  • | - or, if the above alternative does not match, try
    • ^ - start of a string
    • .* - any 0+ chars other than line break chars, as many as possible
    • [^][ _,.()-] - a char other than ], [, space, _, ., (, ) and - chars
    • [][ _.()-]+ - 1+ ], [, space, _, ., (, ) and - chars
    • (?:19|20) - 19 or 20 substring
    • \d{2}(?!\d) - two digits, not followed with another digit.
Sign up to request clarification or add additional context in comments.

Comments

1

Continuing from my answer here -> How to make an If Then Else Regex conditional statement, the same method is applicable still. I have tested it on the Java engine.

One difference that will help you is to name the groups that you are interested in the values of. Eg, I have rewritten the regexes below with named groups (small letter) x and y. Once the engine has completed the parsing, you can check for the value of match group x and then for group y, if there is nothing for group x.

Regex X: \d{8} \d{4} -.*?- ?(?<x>.*)\.ts

Regex Y: (?<y>(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9]))([ _\,\.\(\)\[\]\-]|[^0-9]$)?

You will have to choose the right group for y as I don't think I have done that part correctly.

2 Comments

Sree, thank you for once again supplying an answer. I don't mean to be rude but simply to feedback that both of your answers were too complicated for me to follow, I don't know really any regex! Also neither of your answers were complete or actually worked.
Oh! :-D Sorry, about that... I tested them on the Java engine only. And I welcome your feedback. It wasn't rude by any means. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.