1

I have requrement to split a string with regex on several rules and I do something with help of previous posts here but I don't know how to do it completelly.

Input string (intentionally written ugly) is:

Berlin "New York"Madrid 'Frankfurt Am Main' Quebec Łódź   München Seattle,Milano

Splitting code is:

Dim subStrings() As String = Regex.Split(myText, """([^""]*)""|,| ")

Result of this is:

0)  
1)  
2)Berlin  
3)  
4)New York  
5)Madrid  
6)'Frankfurt  
7)Am  
8)Main'  
9)Quebec  
10)Łódź  
11)  
12)  
13)München  
14)Seattle  
15)Milano  

In short, string should be splitted into array by " " (space) and/or "," char and/or by single or double quote. Quoted terms should be treated as a single word. This means that term in single quotes (at place 6) will be treated a same like a term in double quotes. That way 'Frankfurt Am Main' at place 6. will be "one word" same as is "New York" at place 4. Also, I would like if regex can be made that empty matches would not go to subStrings() array. After all an ideal result from given example should be:

0)Berlin  
1)New York  
2)Madrid  
3)Frankfurt Am Main  
4)Quebec  
5)Łódź  
6)München  
7)Seattle  
8)Milano  

So, please if someone know how to solve this concrete regex for me.

1 Answer 1

1

You may extract the strings by using Regex.Matches with the following regex:

"([^"]*)"|'([^']*)'|([^,\s]+)

See the regex demo.

Details

  • "([^"]*)" - ", then Group 1 matching any 0+ chars other than ", and then "
  • | - or
  • '([^']*)' - ', then Group 2 matching any 0+ chars other than ', and then '
  • | - or
  • ([^,\s]+) - Group 3: any 1+ chars other than , and whitespace

VB.NET code snippet:

Dim text = "Berlin ""New York""Madrid 'Frankfurt Am Main' Quebec Łódź   München Seattle,Milano"
Dim pattern As String = """([^""]*)""|'([^']*)'|([^,\s]+)"
Dim matches() As String = Regex.Matches(text, pattern) _
          .Cast(Of Match)() _
          .Select(Function(m) m.Groups(1).Value & m.Groups(2).Value & m.Groups(3).Value) _
          .ToArray()

Results:

enter image description here

The same can be obtained with the following Regex.Split approach:

pattern = """([^""]*)""|'([^']*)'|[,\s]+"
Dim matches() As String = Regex.Split(text, pattern).Where(Function(m) Not String.IsNullOrWhiteSpace(m)).ToArray()

See the regex demo.

Sign up to request clarification or add additional context in comments.

5 Comments

Hello Wiktor, thank you for a fast answer. However, I expected a result in subStrings() array like is showed in the question without involving any additional collection (List) and Linq.
@WineToo I added a Regex.Split solution that outputs results to an array, not a list, but the point is that you will still have to get rid of empty items in the resulting array. Whether you do it with Linq or without, it is your choice.
Yes. Actually that is only remaing problem. Other work well (as expected). (Maybe silly but) is there any way to filter/skip those empty strings with the same regex?
@WineToo You can hardly solve that with a sane regex. I have never seen such a regex, and the reason is that you need to get the strings between identical start/end delimiters ("...", and '...'), which makes any approach with lookarounds wrong, you just have to use capturing groups. Regex.Split always returns empty strings between a match and a non-match when the pattern contains a capturing group. Thus, no way to do that with just regex.
Ok, I implement your solution with split and that works wery well. With filtering empty strings I can live if there's no better way. Thank you once more for a solution and additional explanations which I'm trying to understand now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.