2

I am trying to make an If-Then-Else conditional statement in regular expressions.

The regex takes as input a string representing a filename.

Here are my test strings...

The Edge Of Seventeen 2016 720p.mp4
20180511 2314 - Film4 - Northern Soul.ts
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts

In the first string, 2016 represents a year but in the other two strings 2314 and 2059 represent times in 24 hour clock format.

The filename should be retained unchanged if it matches this regex:

\d{8} \d{4} -.*?- .*?\.ts

Which I have tested and it works. It can match these test strings:

20180511 2314 - Film4 - Northern Soul.ts
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts

If the filename does not match that first regex then this regex should be applied to it:

(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?

This is a cleandatetime regexp that is used by Kodi to remove everything from a string AFTER a four digit number, if it exists, representing a date between 1900 and 2099. I have also tested this and it works.

Here is what I have tried to make the If-Then-Else Regex but it doesn't work:

I use this format --> (?(A)X|Y)

(?(\d{8} \d{4} -.*?- .*?\.ts)^.*$|(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?)

This is A

\d{8} \d{4} -.*?- .*?\.ts

This is X

^.*$

This is Y

(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?

This is the expected output...

Test string: The Edge Of Seventeen 2016 720p.mp4 Expected output: "The Edge Of Seventeen 2016 " (quotes only included to show that a trailing space can be left at the end)

Test String: 20180511 2314 - Film4 - Northern Soul.ts Expected output: 20180511 2314 - Film4 - Northern Soul.ts

Test String: 20150526 2059 - BBC Four - We Need to Talk About Kevin.ts Expected output: 20150526 2059 - BBC Four - We Need to Talk About Kevin.ts

I am looking for a solution entirely in regular expression syntax. Can someone help me to make it work please?

Cheers,

Flex

3
  • 1
    If you use PCRE, try regex101.com/r/LTtcJv/2 Commented Feb 25, 2020 at 1:09
  • Magic! Thank you Wiktor. Commented Feb 25, 2020 at 1:57
  • 1
    Sorry, it is actually much easier, I posted the comment late at night :) Commented Feb 25, 2020 at 7:49

2 Answers 2

1

You may use a PCRE pattern like

^(?!\d{8} \d{4} -.*?- .*?\.ts$)(.*[^ _,.()\[\]-][ _.()\[\]-]+(?:19|20)[0-9]{2})(?:[ _,.()\[\]-]|[^0-9]$)?.*

Replace with $1, see the regex demo.

It matches

  • ^ - start of string
  • (?!\d{8} \d{4} -.*?- .*?\.ts$) - the negative lookahead fails the match if the whole string matches
    • \d{8} \d{4} - 8 digits, space, 4 digits, space
    • -.*?- .*? - -, then any 0 or more chars other than line break chars, as few as possible, - and a space and then again 0 or more chars other than line break chars, as few as possible
    • \.ts$ - .ts at the end of string
  • (.*[^ _,.()\[\]-][ _.()\[\]-]+(?:19|20)[0-9]{2})(?:[ _,.()\[\]-]|[^0-9]$)?.*: an optional Group 1 and then the rest of the string:
    • .* - any 0+ chars other than line break chars as many as possible
    • [^ _,.()\[\]-] - a char other than
    • [ _.()\[\]-]+ - 1+ spaces, _, ., (, ), [, ] or -
    • (?:19|20) - 19 or 20
    • [0-9]{2} - two digits
    • (?:[ _,.()\[\]-]|[^0-9]$)? - an optional non-capturing group matching a space, _, ., (, ), [, ] or - or any char other than digit at the end of the string.
    • .*[^ _,.()\[\]-][ _.()\[\]-]+(?:19|20)[0-9]{2})(?:[ _,.()\[\]-]|[^0-9]$
    • .* - any 0+ chars other than line break chars as many as possible.
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Wiktor. Now my requirements have changed a little.. I posted another question separately seeking help. It's a small modification of the challenge here that I think will require the (?(A)X|Y) approach. I would be grateful if you could help?
-1

Since you have mentioned that A, X and Y are tested and found working, and since there are only 2 patterns, I think this pattern will work (Python style):

pattern = "(.?(?=" + A + ")" + X + ")|(" + Y + ")"

which means:

(.?(?=A)X)|(Y)

Explanation:

  1. There are two groups - one for X and one for Y.
  2. The group for capturing X starts with .? just to make the engine start moving and check if there is a part matching X ahead (a lookahead). If yes, it continues with matching X since it will encounter it after the lookahead block.
  3. If in (2), the lookahead doesn't match, then the | (or) part, which is Y will take over. If that matches, you get a result. Else, no output.

(Sadly, the patterns for A and Y you posted were not working for me on Python, so I replaced them with my own for testing. Please do confirm if the pattern is working with the original ones.)

1 Comment

Thank you for your help. But my problem has now changed and I need to use PCRE. Please see this question if you have time?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.