0

I'm trying to match "custom" tags that might be complete/incomplete as described below.

The bold text is what I'm trying to match.

  1. %end{some text

  2. %start{some text

  3. %start{some text}%end

  4. %start{some text}%end%start{more text}%end

Also, these tags can appear multiple times within a string. For example, the regex:

/%start(.*)%end/gi

applied on the 4th example would capture: %start{some text}%end%start{more text}%end

How would I go on about tho achieve the matches described on the first 4 examples?

2
  • That 4th one is the easiest: /%start(.*?)%end/gi. What about 1 and 2? Are you sure 1 is %end{some text and not some text}%end? Commented Jun 15, 2016 at 13:22
  • I would also want to match those (when no "closing" tag is available), should it be possible with one regex? Commented Jun 15, 2016 at 13:23

4 Answers 4

1

If your data can contain multiple tags on a line, with unclosed tags in other positions than the last one, and the tag content can contain %, it's a little tricky:

Use /%(?:start|end){((?:(?!%(?:start|end){)[^}])+)/g and retrieve the first group.

Here is a regex101 test.

Note that it is about 3 times more expensive than the next two expressions, taking 112 steps to match your fourth data example, while the other two only take 34 steps.


If your data can contain multiple tags on a line, with unclosed tags in other positions than the last one, but the tag content can't contain %, it's already a lot easier :

Use /%(?:start|end){([^}%]+)/g and retrieve the first group.

Here is a regex101 test. Note how it fails on the last dataset.


If your data can't contain unclosed tags in other positions than the last one, it's even easier :

Use /%(?:start|end){([^}]+)/g and retrieve the first group.

Here is a regex101 test. Note that you will need to add linefeed characters to the negated class if you parse multiple lines at once, and also how it fails on the last two dataset.

Sign up to request clarification or add additional context in comments.

4 Comments

lookbehind is not available in javascript, but first group works just fine.
@pelican_george Woops, I always forgot which language has which regex features. Btw be careful with the regex, it will fail if an unclosed tag is followed by another tag : in %end{some text%start{some other text}, the returned result will be some text%start{some other text. If your tags content never contains % you can easily add it to the negated class. If can contain %, you could use a negative lookahead (I think that's implemented in JS).
tks for the great feedback. how would you apply the negative lookahead for that specific case?
@pelican_george I've heavily edited my answer to present the different solutions valid in javascript depending on your dataset. I've underlined the cost of the most generic solution.
1

You can use this pattern:

/%start([^%]*(?:%(?!end)[^%]*)*)(?:%end)?/gi

The idea is to describe the content in a greedy way that can't match the closing tag and to make the closing tag optional.

[^%]*          # all that is not a %
(?:
    %(?!end)   # a % not followed by "end"
    [^%]*
)*             

2 Comments

Fails with scenario of %start{some text%start{some text}%end
@Justinas: There is no example of nested tags in the question. But dealing with nested tags is also possible if you use the same pattern for a replacement in a while loop and if you change (?!end) to (?!start|end) to target the innermost occurrences.
1

I assume that first tag is invalid as it does not have %start and if you omit %end than tag ends at last word.

So regex would be (example): %start{([a-z0-9\s]+)}?

3 Comments

it wouldn't match the first example.
@pelican_george Yes, as I stated if tags starts width %start{ than %end{some text is not valid tag
it would work flawlessly, however I have a very specific requirement where I need to catch those too. thanks for the help
0

You could try to use this one:

/{([a-z0-9 ]*)}/gi

You can see the result on there: https://regex101.com/r/uY8jE5/1

1 Comment

would fail if any extra curly brackets are present

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.