0

we want to use regex to validate a document structure. For this we simplify the document and the regex. The regex is generated out of a schema which is used for the validation. The application is completly client based and coded in JavaScript.

A simple example is this regex:

regex1 = new RegExp(/~(A{1}B?C?(D*|E*|F*|G*)+){1}~/g)

That means the document structure can have this structure

A
-B
-D
-D
-D
-D
-D

So the document structure is parsed to ~ABDDDDD~

Now I want to validate if I can add "A" to the end which would result in this string: ~ABDDDDDA~

This does not match with the reg ex anymore:

"~ABDDDDDA~".match(regex1)

This does work quiet fine, but the document structure can grow and be like this: ~ABDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD~

A matching value can be matched quiet fast, but if the value is then: ~ABDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDA~

It takes very long, most times I just close the browser and reopen it.

Does anyone have ideas how to solve it?

Thanks!

UPDATE

The RegEx should also cover more, the structure can be quiet dynamic. I have not used a RegEx Generator, this example is parsed from a self developed script and is just an example.

It is in this case, that there is one root element A, then optional B or C. And now in a not given order any amount of D,E,F,G. But at least one!

So it should be valid for: "~ABDDDDDFEG~" "~AGGGGGEGGD~" "~ABCDEFG~" "~ABCDDDDDDDDDDDDDDDEFGGGGGG~"

Additionally it is possible, that that the E is limited to 0-5 occurances.

As soon as I work with the match either(A | B), there are real performance issues in all browsers. (IE, Chrome, Firefox)

Any ideas? Are there any alternatives to "match either(A | B)" with better performance?

3
  • What is your question? Your regex seems to do the job... is it the running time? Commented Jan 30, 2018 at 13:48
  • This is the case of catastrophic backtracking. You need to fix your regex generator. Commented Jan 30, 2018 at 13:49
  • If you want help fixing your Regex generator, you should post some info about it. Commented Jan 30, 2018 at 13:53

1 Answer 1

1

The resulting regex should be as close as possible to:

~AB?C?[DEFG]*A?~

There are a lot of simplifications to do in your regex generator to get rid of the following points:

  • {1}: is literally useless, you can remove it from everywhere
  • (A*|B*)+: is strictly equivalent to [AB]*

Here is a Regex101: https://regex101.com/r/Lc6Fx8/1

Also, if you want help fixing your Regex generator, you should post some info about it.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.