1

I'm trying to create a regex which will capture whole array of any objects inside it.

I've got example input string:

[2020-05-29T10:00:00, 12.5, 'Test text'][][[], ['Some Data']][['String with[ \'escaped quote][ and parenthesis inside it']]

Expected matches are:

Match 1: [2020-05-29T10:00:00, 12.5, 'Test text']
Match 2: []
Match 3: [[], ['Some Data']]
Match 4: [['String with[ \'escaped quote][ and parenthesis inside it']] // If this one is possible it's brilliant

Regex which I've already created is: \[[a-zA-Z0-9\-,' :\.\[]*\], but it doesn't handle array of arrays and parenthesis inside strings.

I would be really grateful for you help!

6
  • There can be no [[], []] match here. Commented Sep 22, 2021 at 8:33
  • If you use PCRE, something that could work is \[\s*(?>((?:'[^\\']*(?:\\[\s\S][^\\']*)*'|[^]'\s,])+)(?:\s*,\s*\g<1>)*|(?R))*\s*], but it might not work in all cases. \[\s*(?>(\w+(?:\.\w+)*(?:\[\w+])*|(?:'[^\\']*(?:\\[\s\S][^\\']*)*'|[^]\w])+)(?:\s*,\s*\g<1>)*|(?R))*\s*] might... But this is all too fragile, you need to get the appropriate parser. Commented Sep 22, 2021 at 9:19
  • I have something that will match your 4 matches, but I really need to know the engine before I can post it. It would be helpful if you could add a language tag, as the regex tag asks "this tag should also include a tag specifying the applicable programming language or tool". Commented Sep 22, 2021 at 9:45
  • 1
    @Scratte Added a platform, it's .net C# Commented Sep 22, 2021 at 9:57
  • You cannot parse these with a regex, for the reasons explained in detail (for the equivalent problem of parsing HTML with regex) in this answer: stackoverflow.com/a/1732454 Commented Sep 22, 2021 at 10:19

1 Answer 1

1

This is similar to the question Regex nested parentheses - you should look at the accepted answer for a great explanation of what's going on.

The regex you want is, I believe:

\[(?>'(?:[^'\\]|\\.)*'|\[(?<DEPTH>)|\](?<-DEPTH>)|'(?:[^'\\]|\\.)*'|[^\[\]]+)*\](?(DEPTH)(?!))
Sign up to request clarification or add additional context in comments.

8 Comments

Why did you decide it is a .NET related question? Also, it won't work for the case where [ and ] are not paired inside a '...' string literal. Just checking for the order and amount of open/close brackets in the DEPTH group stack is not a solution here.
Fair point on the issue with [ and ] inside a string Wiktor. I tested against the provided example (which does work). .Net - it's what I'm familiar with. The question doesn't express a specific platform - should have cast my net wider! I'll update the "answer" to point out the very fair issues you've raised. :-)
@Brett Sorry for missing a platform in tags. It is exactly .net (C#). This regex is almost perfect, it matches it very well, but this mismatches: [['String with[ ][ parenthesis inside it']] First parenthesis is ignored in this case. Almost there
I think then that we need to consume anything within single quotes (a string) before considering the [ and ] for the DEPTH. I have tested the following which appears to work, assuming that a single quote within a string is escaped by being repeated (e.g. 'a string''s length'). Here you go: \[(?>'.*?'|\[(?<DEPTH>)|\](?<-DEPTH>)|'.*?'|[^\[\]]+)*\](?(DEPTH)(?!))
@Brett Brilliant. I've tried to make it work by escaping ' inside string by using \', it's more .net natural. Is it possible to include this last one adjustment?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.