5

I am in need for a regex in Javascript. I have a string:

'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'

I want to split this string by periods such that I get an array:

[
    '*window',
    'some1',
    'some\.2',   //ignore the . because it's escaped
    '(a.b ? cc\.c : d.n [a.b, cc\.c])',  //ignore everything inside ()
    'some\.3',
    '(this.o.p ? ".mike." [ff\.])',
    'some5'
]

What regex will do this?

5
  • What about {foo.bar}, etc... Commented Nov 5, 2011 at 20:07
  • 1
    What are you trying to do with this. It sounds like you want something more powerful then a regex... Commented Nov 5, 2011 at 20:12
  • Perhaps stackoverflow.com/questions/812144/…? Commented Nov 5, 2011 at 20:14
  • A split will always return a simple string or something in parenthesis. So I will never end up with {foo.bar} Commented Nov 5, 2011 at 20:31
  • 3
    Friend, you are in need of a full-fledged parser... Commented Nov 5, 2011 at 20:45

5 Answers 5

7
var string = '*window.some1.some\\.2.(a.b + ")" ? cc\\.c : d.n [a.b, cc\\.c]).some\\.3.(this.o.p ? ".mike." [ff\\.]).some5';
var pattern = /(?:\((?:(['"])\)\1|[^)]+?)+\)+|\\\.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array

Fiddle: http://jsfiddle.net/66Zfh/3/
Explanation of the RegExp. Match a consecutive set of characters, satisfying:

/             Start of RegExp literal
(?:            Create a group without reference (example: say, group A)
   \(          `(` character
   (?:         Create a group without reference (example: say, group B)
      (['"])     ONE `'` OR `"`, group 1, referable through `\1` (inside RE)
      \)         `)` character
      \1         The character as matched at group 1, either `'` or `"`
     |          OR
      [^)]+?     Any non-`)` character, at least once (see below)
   )+          End of group (B). Let this group occur at least once
  |           OR
   \\\.        `\.` (escaped backslash and dot, because they're special chars)
  |           OR
   [^.]+?      Any non-`.` character, at least once (see below)
)+            End of group (A). Let this group occur at least once
/g           "End of RegExp, global flag"
        /*Summary: Match everything which is not satisfying the split-by-dot
                 condition as specified by the OP*/

There's a difference between + and +?. A single plus attempts to match as much characters as possible, while a +? matches only these characters which are necessary to get the RegExp match. Example: 123 using \d+? > 1 and \d+ > 123.

The String.match method performs a global match, because of the /g, global flag. The match function with the g flag returns an array consisting of all matches subsequences.

When the g flag is omitted, only the first match will be selected. The array will then consist of the following elements:

Index 0: <Whole match>
Index 1: <Group 1>
Sign up to request clarification or add additional context in comments.

10 Comments

I am developing a Javascript binding framework. The splitted values are property chains. The above example is something I quickly made up. The above example really means ...
*window (javascript window object) has a property called "some1" has a property called some\.2 evaluate the expression a.b ? cc\.c : d.n whenever a.b OR cc\.c changes and so on and so forth. Hope this answers the question and sorry about the multiple posts. Hitting the enter button does a post instead of a new line.
This is not correct. (a.b ? cc\.c : d.n [a.b, cc\.c]) you split this result, while you shouldn't.
@FailedDev Updated, it now correctly deals with quoted parentheses.
Rob W, <BR> Seems like this might be the solution.<BR> Slight change to the string, adding ')'<BR> *window.some1.some\.2.(a.b + ")" + ')' ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5
|
3

The regex below :

result = subject.match(/(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g);

Can be used to acquire the desired results. Group 1 has the results since you want to omit the .

Use this :

var myregexp = /(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g;
var match = myregexp.exec(subject);
while (match != null) {
    for (var i = 0; i < match.length; i++) {
        // matched text: match[i]
    }
    match = myregexp.exec(subject);
}

Explanation :

// (?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))
// 
// Match the regular expression below «(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))»
//    Match the regular expression below and capture its match into backreference number 1 «(\(.*?[^'"]\)|.*?[^\\])»
//       Match either the regular expression below (attempting the next alternative only if this one fails) «\(.*?[^'"]\)»
//          Match the character “(” literally «\(»
//          Match any single character that is not a line break character «.*?»
//             Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
//          Match a single character NOT present in the list “'"” «[^'"]»
//          Match the character “)” literally «\)»
//       Or match regular expression number 2 below (the entire group fails if this one fails to match) «.*?[^\\]»
//          Match any single character that is not a line break character «.*?»
//             Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
//          Match any character that is NOT a “A \ character” «[^\\]»
//    Match the regular expression below «(?:\.|$)»
//       Match either the regular expression below (attempting the next alternative only if this one fails) «\.»
//          Match the character “.” literally «\.»
//       Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
//          Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

1 Comment

your solution works too but leaves a period in the end. Thanks
2

It is notoriously difficult to use a Regex to do balanced parenthesis matching, especially in Javascript.

You would be way better off creating your own parser. Here's a clever way to do this that will utilize the strength of Regex's:

  • Create a Regex that matches and captures any "pattern of interest" - /(?:(\\.)|([\(\[\{])|([\)\]\}])|(\.))/g
  • Use string.replace(pattern, function (...)), and in the function, keep a count of opening braces and closing braces.
  • Add the matching text to a buffer.
  • If the split character is found and the opening and closing braces are balanced, add the buffer to your results array.

This solution will take a bit of work, and requires knowledge of closures, and you should probably see the documentation of string.replace, but I think it is a great way to solve your problem!

Update:
After noticing the number of questions related to this one, I decided to take on the above challenge.
Here is the live code to use a Regex to split a string.
This code has the following features:

  • Uses a Regex pattern to find the splits
  • Only splits if there are balanced parenthesis
  • Only splits if there are balanced quotes
  • Allows escaping of parenthesis, quotes, and splits using \

This code will work perfectly for your example.

Comments

0

not need regex for this work.

var s = '*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5';

console.log(s.match(/(?:\([^\)]+\)|.*?\.)/g));

output:

  ["*window.", "some1.", "some.", "2.", "(a.b + ")", "" ? cc.", "c : d.", "n [a.", "b, cc.", "c]).", "some.", "3.", "(this.o.p ? ".mike." [ff.])", "."]

3 Comments

That doesn't appear to meet the requirements of the question (ignoring \., and ignoring . inside parentheses...)
Yet this isn't what the OP wanted. The OP wanted that text inside of the () will remain as one unit (even though there are dots inside of it), and an escaped dot (/.) should be ignored as well.
This is the first time I have posted a question on stackflow and am amazed at the quick responses. Thanks stacksflow and thanks to all who responded.
0

So, was working with this, and now I see that @FailedDev is rather not a failure, since that was pretty nice. :)

Anyhow, here's my solution. I'll just post the regex only.

((\(.*?((?<!")\)(?!")))|((\\\.)|([^.]))+)

Sadly this won't work in your case however, since I'm using negative lookbehind, which I don't think is supported by javascript regex engine. It should work as intended in other engines however, as can be confirmed here: http://gskinner.com/RegExr/. Replace with $1\n.

4 Comments

As you have mentioned, you're using a look-behind, which are not supported in JavaScript. Even if look-behinds were supported, ?! has to be ?= (look-ahead).
No, I wanted negative lookbehind, not lookahead. I wanted to match the ) character that's not preceded by a " character => Negative lookbehind.
I am referring to the ?! (at \)(?!). You want to match a parenthesis which is preceded and postfixed by a double-quote character.
In that case, no I want to end my match at the first ) which is not enclosed in double quotes. So I want to match a parenthesis which is not preceded nor postfixed by double-quote characters. So my reasoning was sound. I do see a bug in it however, but I'm not going to point it out for you since you're poking at me. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.