Javascript REGEX to match multiple custom tags (also incomplete)

Question

I'm trying to match "custom" tags that might be complete/incomplete as described below.

The bold text is what I'm trying to match.

%end{some text
%start{some text
%start{some text}%end
%start{some text}%end%start{more text}%end

Also, these tags can appear multiple times within a string. For example, the regex:

/%start(.*)%end/gi

applied on the 4th example would capture: %start{some text}%end%start{more text}%end

How would I go on about tho achieve the matches described on the first 4 examples?

That 4th one is the easiest: /%start(.*?)%end/gi. What about 1 and 2? Are you sure 1 is %end{some text and not some text}%end? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jun 15, 2016 at 13:22
I would also want to match those (when no "closing" tag is available), should it be possible with one regex? — pelican_george
– pelican_george, Commented Jun 15, 2016 at 13:23

Aaron · Accepted Answer · 2016-06-15 15:21:44Z

1

If your data can contain multiple tags on a line, with unclosed tags in other positions than the last one, and the tag content can contain %, it's a little tricky:

Use /%(?:start|end){((?:(?!%(?:start|end){)[^}])+)/g and retrieve the first group.

Here is a regex101 test.

Note that it is about 3 times more expensive than the next two expressions, taking 112 steps to match your fourth data example, while the other two only take 34 steps.

If your data can contain multiple tags on a line, with unclosed tags in other positions than the last one, but the tag content can't contain %, it's already a lot easier :

Use /%(?:start|end){([^}%]+)/g and retrieve the first group.

Here is a regex101 test. Note how it fails on the last dataset.

If your data can't contain unclosed tags in other positions than the last one, it's even easier :

Use /%(?:start|end){([^}]+)/g and retrieve the first group.

Here is a regex101 test. Note that you will need to add linefeed characters to the negated class if you parse multiple lines at once, and also how it fails on the last two dataset.

edited Jun 15, 2016 at 15:21

answered Jun 15, 2016 at 13:23

Aaron

24.9k2 gold badges41 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

pelican_george Over a year ago

lookbehind is not available in javascript, but first group works just fine.

Aaron Over a year ago

@pelican_george Woops, I always forgot which language has which regex features. Btw be careful with the regex, it will fail if an unclosed tag is followed by another tag : in %end{some text%start{some other text}, the returned result will be some text%start{some other text. If your tags content never contains % you can easily add it to the negated class. If can contain %, you could use a negative lookahead (I think that's implemented in JS).

pelican_george Over a year ago

tks for the great feedback. how would you apply the negative lookahead for that specific case?

Aaron Over a year ago

@pelican_george I've heavily edited my answer to present the different solutions valid in javascript depending on your dataset. I've underlined the cost of the most generic solution.

Casimir et Hippolyte · Accepted Answer · 2016-06-15 13:27:54Z

1

You can use this pattern:

/%start([^%]*(?:%(?!end)[^%]*)*)(?:%end)?/gi

The idea is to describe the content in a greedy way that can't match the closing tag and to make the closing tag optional.

[^%]*          # all that is not a %
(?:
    %(?!end)   # a % not followed by "end"
    [^%]*
)*

answered Jun 15, 2016 at 13:27

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

2 Comments

Justinas Over a year ago

Fails with scenario of %start{some text%start{some text}%end

Casimir et Hippolyte Over a year ago

@Justinas: There is no example of nested tags in the question. But dealing with nested tags is also possible if you use the same pattern for a replacement in a while loop and if you change (?!end) to (?!start|end) to target the innermost occurrences.

Justinas · Accepted Answer · 2016-06-15 13:33:58Z

1

I assume that first tag is invalid as it does not have %start and if you omit %end than tag ends at last word.

So regex would be (example): %start{([a-z0-9\s]+)}?

answered Jun 15, 2016 at 13:33

Justinas

43.9k5 gold badges72 silver badges108 bronze badges

3 Comments

pelican_george Over a year ago

it wouldn't match the first example.

Justinas Over a year ago

@pelican_george Yes, as I stated if tags starts width %start{ than %end{some text is not valid tag

pelican_george Over a year ago

it would work flawlessly, however I have a very specific requirement where I need to catch those too. thanks for the help

Aaron · Accepted Answer · 2016-06-15 13:42:59Z

0

You could try to use this one:

/{([a-z0-9 ]*)}/gi

You can see the result on there: https://regex101.com/r/uY8jE5/1

edited Jun 15, 2016 at 13:42

Aaron

24.9k2 gold badges41 silver badges61 bronze badges

answered Jun 15, 2016 at 13:38

Samuel C.

556 bronze badges

1 Comment

pelican_george Over a year ago

would fail if any extra curly brackets are present

Collectives™ on Stack Overflow

Javascript REGEX to match multiple custom tags (also incomplete)

4 Answers 4

4 Comments

2 Comments

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

2 Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related