22

I am building a JSON validator from scratch, but I am quite stuck with the string part. My hope was building a regex which would match the following sequence found on JSON.org:

JSON.org String Sequence

My regex so far is:

/^\"((?=\\)\\(\"|\/|\\|b|f|n|r|t|u[0-9a-f]{4}))*\"$/

It does match the criteria with a backslash following by a character and an empty string. But I'm not sure how to use the UNICODE part.

Is there a regex to match any UNICODE character expert " or \ or control character? And will it match a newline or horizontal tab?

The last question is because the regex match the string "\t", but not " " (four spaces, but the idea is to be a tab). Otherwise I will need to expand the regex with it, which is not a problem, but my guess is the horizontal tab is a UNICODE character.

Thanks to Jaeger Kor, I now have the following regex:

/^\"((?=\\)\\(\"|\/|\\|b|f|n|r|t|u[0-9a-f]{4})|[^\\"]*)*\"$/

It appears to be correct, but is there any way to check for control characters or is this unneeded as they appear on the non-printable characters on regular-expressions.info? The input to validate is always text from a textarea.

Update: the regex is as following in case anyone needs it:

/^("(((?=\\)\\(["\\\/bfnrt]|u[0-9a-fA-F]{4}))|[^"\\\0-\x1F\x7F]+)*")$/
2
  • 1
    The above regular expression suffers from inefficiency and ambiguity which can lead to malicious user performing a Denial of Service ("DoS") attack. Here is version that is free of the inefficiency: /^("(((?=\\)\\(["\\\/bfnrt]|u[0-9a-fA-F]{4}))|[^"\\\x00-\x1F\x7F])*")$/ Commented Jan 25, 2023 at 13:10
  • @VladimírGorej Further, one of the regex capture groups is unnecessary; /^("((?=\\)\\(["\\\/bfnrt]|u[0-9a-fA-F]{4})|[^"\\\x00-\x1F\x7F])*")$/ is enough. Commented Dec 25, 2024 at 1:59

2 Answers 2

17

For your exact question create a character class

# Matches any character that isn't a \ or "
/[^\\"]/

And then you can just add * on the end to get 0 or unlimited number of them or alternatively 1 or an unlimited number with +

/[^\\"]*/

or

/[^\\"]+/

Also there is this below, found at https://regex101.com/ under the library tab when searching for json

/(?(DEFINE)
# Note that everything is atomic, JSON does not need backtracking if it's valid
# and this prevents catastrophic backtracking
(?<json>(?>\s*(?&object)\s*|\s*(?&array)\s*))
(?<object>(?>\{\s*(?>(?&pair)(?>\s*,\s*(?&pair))*)?\s*\}))
(?<pair>(?>(?&STRING)\s*:\s*(?&value)))
(?<array>(?>\[\s*(?>(?&value)(?>\s*,\s*(?&value))*)?\s*\]))
(?<value>(?>true|false|null|(?&STRING)|(?&NUMBER)|(?&object)|(?&array)))
(?<STRING>(?>"(?>\\(?>["\\\/bfnrt]|u[a-fA-F0-9]{4})|[^"\\\0-\x1F\x7F]+)*"))
(?<NUMBER>(?>-?(?>0|[1-9][0-9]*)(?>\.[0-9]+)?(?>[eE][+-]?[0-9]+)?))
)
\A(?&json)\z/x

This should match any valid json, you can also test it at the website above

EDIT:

Link to the regex

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your quick response. I added it to my first regular expression and it seems to be working fine. I don't know anything about the control characters, but maybe I don't need to worry about it as the input is from a textarea where they might not be accepted in. The last regex you provided was a complete regex, but I want to know where the error is. But than again, I'll check it if it might be more useful!
I have been playing with your latest regex, and when splitting them, they work great! Thanks!
if you could post a code snippet that uses this regex it would be helpful, I got a lot of syntax errors when I pasted that into my code
5

Use this, works also with array jsons [{...},{...}]:

((\[[^\}]{3,})?\{s*[^\}\{]{3,}?:.*\}([^\{]+\])?)

Demo: https://regex101.com/r/aHAnJL/1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.