1

I'd like to get a String array from a String that is delimited with spaces (" ") and commas. Is there a clever way to do this?

For example, if the string was:

cat dog giraffe "big elephant" snake

I'd like the resulting array to contain strings

cat

dog

giraffe

big elephant

snake

I know I could do a Split(str, " ") but the result would differ from what I wanted. I've never used RegEx, but I have a hunch that the solution might have something to do with it.

2
  • Why not to a .Replace() and then .Split()? Regex is overkill here. Commented Dec 2, 2016 at 12:00
  • 1
    Replacing what, @DanielShillcock? Commented Dec 2, 2016 at 12:05

2 Answers 2

3

Treating the input as space-delimited CSV can greatly simplify the task:

Imports Microsoft.VisualBasic.FileIO.TextFieldParser
...
Dim s As String = "cat dog giraffe ""big elephant"" snake"
Dim afile As FileIO.TextFieldParser = New FileIO.TextFieldParser(New System.IO.StringReader(s))
Dim CurrentRecord As String()
afile.TextFieldType = FileIO.FieldType.Delimited
afile.Delimiters = New String() {" "}
afile.HasFieldsEnclosedInQuotes = True
Do While Not afile.EndOfData
    Try
        CurrentRecord = afile.ReadFields
        Console.WriteLine(String.Join("; ", CurrentRecord))
    Catch ex As FileIO.MalformedLineException
        Stop
    End Try
Loop

It prints cat; dog; giraffe; big elephant; snake.

The code is adapted from Parse Delimited CSV in .NET.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. That'll do.
0

You can use a regex for this :

Const data = "åäöÄ åäöÄ ""åäöÄ åäöÄ"" åäöÄ"

Dim matches = Regex.Matches (data, "\p{L}+|""\p{L}+(?: \p{L}+)*""")

For Each m As Match in matches
    Console.WriteLine (m.Value.Trim(""""))
Next

The regex works as follow :

  • match either \p{L}+ which means one or more letter as much as possible
  • or (denoted by the |) match "\p{L}+(?: \p{L}+)*" in detail :
    • " match a quote
    • \p{L}+ match one or more letter as much as possible
    • the (?: \p{L}+)* means a group which doesn't result in a capture repeated zero or more times as much as possible
      This group consist in a space followed by one or more letter as much as possible
    • finally match the closing quote "

Then we just have to Trim the resulting match to eliminate the potential startind/ending quote

Note : see here for more info about \p{L}

4 Comments

What about non-english text? Does it melt down when åäö is inputted?
It was melting down, it wasn't stated as a requirement though but I've editted the code to support them
Depends where in alphabet öä and z are located. In our language, z is next to s (...pqrszšžt...) and even t, u etc are ignored :)
@Sehnsucht Interesting, thanks for pointing out. Do unicode letters include numerics and punctuation marks? (eg for "big.0-eléphant" case)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.