1

I have this text

This is test 2019 -(dsd) g1-d2 720p test
This test 2019 - g1-d2 test

They are separate lines so not together

I am trying to catch all stuff between 2019 and 720p if present

(.+?) ([0-9]{4})(.+?)([0-9]{3,4}p)?(.*)

The problem is group (.+?) is only matching single character. i want it to match till 720p . if use non greedy then it match everything

The output i want is

G1: This is test
G2: 2019
G3:  -(dsd) g1-d2 
G4: 720p
G5:  test
1
  • I think you want to catch all till 720p or end of the string. So instead of "unsure" ? use (([0-9]{3,4}p)(.*))|$ Commented Jun 13, 2019 at 5:23

3 Answers 3

1

You need to remove the ? quantifier after ([0-9]{3,4}p) as it will make the previous group optional and won't force the engine to match it.

Edit

To match strings with or without 720p, you can enclose (.+?) and ([0-9]{3,4}p) into an optional non-capturing group (?:)?.
Like so:

(.+?)([0-9]{4})(?:(.+?)([0-9]{3,4}p))?(.*)

Demo

Sign up to request clarification or add additional context in comments.

3 Comments

But that part may or may not be present so it wont be always there
Ok I thought you wanted to match the string only if 720p was present. See edit, should work as expected now.
Thanks man ur solution worked perfectly and was simple as well :) and close to mine
0

We can try doing a regex split on the following pattern:

 (?=\d{4})|(?<=\d{4}) | (?=\d{3,4}p)|(?<=\d{3}p) |(?<=\d{4}p) 

Sample script:

input = "This is test 2019 -(dsd) g1-d2 720p test"
parts = re.split(r' (?=\d{4})|(?<=\d{4}) | (?=\d{3,4}p)|(?<=\d{3}p) |(?<=\d{4}p) ', input)
print(parts)

This prints:

['This is test', '2019', '-(dsd) g1-d2', '720p', 'test']

The idea here is to split using lookarounds which assert, but do not actually consume anything in the input. We split whenever we lookahead or lookbehind and see a 4 digit year, or a 3-4 digit number followed by p.

Comments

0

Just need to play around with your lookbehinds and lookaheads.

(?<=2019)(.+?)(?=720p)

Gives me:

enter image description here

More info about lookahead and lookbehind assertions here.

EDIT:

You can use regex patterns inside lookaheads if you need it to be more flexible. Here's one take:

(?<=[0-9]{4})(.+?)(?=[0-9]{3,4}p)

enter image description here

3 Comments

that 2019 and 720p is not fixed and can vary like any year and and resolution like 1080p 2160p
@rgd You can always substitute other regex patterns. See edit.
the group with 720p is optional so it wont always be there

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.