8

I have the following string:

a,b,c,d.e(f,g,h,i(j,k)),l,m,n

Would know tell me how I could build a regex that returns me only the "first level" of parentheses something like this:

[0] = a,b,c,
[1] = d.e(f,g,h,i.j(k,l))
[2] = m,n

The goal would be to keep the section that has the same index in parentheses nested to manipulate future.

Thank you.

EDIT

Trying to improve the example...

Imagine I have this string

username,TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2)),password

My goal is to turn a string into a dynamic query. Then the fields that do not begin with "TB_" I know they are fields of the main table, otherwise I know informandos fields within parentheses, are related to another table. But I am having difficulty retrieving all fields "first level" since I can separate them from related tables, I could go recursively recovering the remaining fields.

In the end, would have something like:

[0] = username,password
[1] = TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2))

I hope I have explained a little better, sorry.

7
  • 2
    I don't understand your example. Commented Oct 25, 2013 at 17:59
  • Shouldn't match [1] be (f,g,h,i.j(k,l)) ? If not, can you explain a bit more please? Commented Oct 25, 2013 at 18:01
  • 1
    from what I know, Regex cannot parse nested structures Commented Oct 25, 2013 at 18:03
  • The example input and output doesn't make sense..one has (j,k) and the other (k,l). Commented Oct 25, 2013 at 18:04
  • 2
    @CasimiretHippolyte: source or prove it... Commented Oct 25, 2013 at 18:13

3 Answers 3

13

You can use this:

(?>\w+\.)?\w+\((?>\((?<DEPTH>)|\)(?<-DEPTH>)|[^()]+)*\)(?(DEPTH)(?!))|\w+

With your example you obtain:

0 => username
1 => TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2))
2 => password

Explanation:

(?>\w+\.)? \w+ \(    # the opening parenthesis (with the function name)
(?>                  # open an atomic group
    \(  (?<DEPTH>)   # when an opening parenthesis is encountered,
                     #  then increment the stack named DEPTH
  |                  # OR
    \) (?<-DEPTH>)   # when a closing parenthesis is encountered,
                     #  then decrement the stack named DEPTH
  |                  # OR
    [^()]+           # content that is not parenthesis
)*                   # close the atomic group, repeat zero or more times
\)                   # the closing parenthesis
(?(DEPTH)(?!))       # conditional: if the stack named DEPTH is not empty
                     #  then fail (ie: parenthesis are not balanced)

You can try it with this code:

string input = "username,TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2)),password";
string pattern = @"(?>\w+\.)?\w+\((?>\((?<DEPTH>)|\)(?<-DEPTH>)|[^()]+)*\)(?(DEPTH)(?!))|\w+";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
    Console.WriteLine(match.Groups[0].Value);
}
Sign up to request clarification or add additional context in comments.

2 Comments

Hi! I'm trying to apply the regex exactly as you put it, but the return I'm having is this: [0] => "" [1] => "," [2] => ",", [3] => "" Could tell me what I'm forgetting to do? Thank you.
You might be better off using a nested quantified group inside of your atomic one to prevent backtracking and speed up recognition of a failed match. I.e. \( (?> (?: \( (?<DEPTH>) | \) (?<-DEPTH>) | [^()]+ )* ) \). However, if you don't care about failure performance, then it's not necessary. (spaces there for readability only)
0

I suggest a new strategy, R2 - do it algorithmically. While you can build a Regex that will eventually come close to what you're asking, it'll be grossly unmaintainable, and hard to extend when you find new edge cases. I don't speak C#, but this pseudo code should get you on the right track:

function parenthetical_depth(some_string):
    open = count '(' in some_string
    close = count ')' in some_string
    return open - close

function smart_split(some_string):
    bits = split some_string on ','
    new_bits = empty list
    bit = empty string
    while bits has next:
        bit = fetch next from bits
        while parenthetical_depth(bit) != 0:
            bit = bit + ',' + fetch next from bits
        place bit into new_bits
    return new_bits

This is the easiest way to understand it, the algorithm is currently O(n^2) - there's an optimization for the inner loop to make it O(n) (with the exception of String copying, which is kind of the worst part of this):

depth = parenthetical_depth(bit)
while depth != 0:
    nbit = fetch next from bits
    depth = depth + parenthetical_depth(nbit)
    bit = bit + ',' + nbit

The string copying can be made more efficient with clever use of buffers and buffer size, at the cost of space efficiency, but I don't think C# gives you that level of control natively.

Comments

0

If I understood correctly your example, your are looking for something like this:

(?<head>[a-zA-Z._]+\,)*(?<body>[a-zA-Z._]+[(].*[)])(?<tail>.*)

For given string:

username,TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2)),password

This expression will match

  • username, for group head
  • TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2)) for group body
  • ,password for group tail

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.