There's a file I'm trying to use for a list of strings that has the following rules:
- Cannot begin or end with an unescaped comma.
- A comma is escaped by a preceding comma.
- Strings are separated by unescaped commas.
- Everything else is absolutely face-value.
I've been fiddling around with some VB.NET code to parse a file like this and split it up into either a String() or a List(Of String), but it's gotten to be a little annoying. It's not that I can't figure this out; it's that I don't want to write crap code. If it's unnecessarily confusing, unecessarily slow, or anything else like that, it's not good enough.
Now, I know this almost starts to sound a little like a Code Review question, but I'm really starting to think that maybe a good regex would work better than trying to do this programmatically. Unfortunately regexes are not easy to work with, and while using one to tell it to escape on a comma may be a trivial matter, getting it to also ignore double commas and such is a bit more of an issue, at least for somebody who's not used to regexes.
How do you do this (properly) in VB.NET? In particular, I'm having a little bit of trouble putting together a wild card that'll match anything at all but a comma. It's also taking me a little bit to find out whether #1 has to be verified programmatically, or whether it can be done in the regex itself at the same time as the split operation.
EDIT
I just "woke up" and realized that this syntax is ambiguous, since in an odd-numbered series of three or more commas, you don't know what's escaped and what isn't. I'm just going to accept the current answer and move on.