0

Here is my test string that I'm trying to capture using ruby:

<?lang 
  this_should_be_captured();
  and_also_this();
  and_this();
?>

this text should NOT be captured

<?lang this_should_also_be_captured(); ?>

When I use this regular expression:

(<\?lang(\n|.)*\?>)

The match captures everything (including the part that I don't want: "this text should NOT be captured"), as shown on http://rubular.com/r/qSOOzq6HAx.

How can I capture the two different blocks correctly without capturing what I don't want?

2 Answers 2

4

You want to use a lazy quantifier.

(<\?lang(\n|.)*?\?>)

Adding the ? after the * quantifier means it will make it lazy. This means instead of trying to consume as many characters as possible to make the match (greedy), it will consume the minimum to meet the expression.

Sign up to request clarification or add additional context in comments.

Comments

1

You can make it simpler by using multiline mode. You also do not need the outer parentheses because that is the same as the entire match, which you can get by $~. If you want to capture what is inside <?lang ?>, then you can put parentheses there.

/<\?lang(.*?)\?>/m

PS.

  • When the alternation pattern is a single character, you can use [ ] instead of the parentheses ( ). E.g., [\n.]
  • Even when you need to use parentheses to show alternation, you should use the non-capture paretheses (?: ) unless you need to refer to the content because that will make it faster than using the capture parentheses ( ). E.g., (?:\n|.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.