5

I would like to split up a string using a space as my delimiter, but if there are multiple words enclosed in double or single quotes, then I would like them to be returned as one item.

For example if the input string is:

CALL "C:\My File Name With Space" /P1 P1Value /P1 P2Value

The output array would be:

Array[0]=Call
Array[1]=C:\My File Name With Space
Array[2]=/P1
Array[3]=P1Value
Array[4]=/P1
Array[5]=P2Value

How do you use regular expressions to do this? I realize that there are command line parsers. I took a cursory look at a popular one, but it did not handle the situation where you can have multiple parameters with the same name. In any event, instead of learning how to use a command line parsing library (leave that for another day). I'm interested in getting exposed more to RegEx functions.

How would you use a RegEx function to parse this?

5
  • 1
    Is it not the case that you are given command line arguments as an array of strings in Main()? Commented Jun 11, 2013 at 18:55
  • No, I am parsing batch files in a folder. Commented Jun 11, 2013 at 18:57
  • 1
    I wouldn't use a regular expression to handle this. There are just too many special cases in command lines. You'd be better off using one of the recommendations from stackoverflow.com/questions/491595/…, or just writing your own (which would take a couple of hours, perhaps). Commented Jun 11, 2013 at 19:06
  • 1
    Actually, I think it was NDesk that didn't support multiple params with the same name ( I could be wrong.) I have a feeling RegEx can handle the 2 requirement criteria specified. That's all I'm looking for. Commented Jun 11, 2013 at 19:09
  • 2
    The problem is harder than it sounds. Parsing a Windows command line that includes quotes is pretty weird. See blogs.msdn.com/b/oldnewthing/archive/2010/09/17/10063629.aspx for some examples. Commented Jun 11, 2013 at 20:15

3 Answers 3

12

The link in Jim Mischel's comment points out that the Win32 API provides a function for this. I'd recommend using that for consistency. Here's a sample (from PInvoke).

static string[] SplitArgs(string unsplitArgumentLine)
{
    int numberOfArgs;
    IntPtr ptrToSplitArgs;
    string[] splitArgs;

    ptrToSplitArgs = CommandLineToArgvW(unsplitArgumentLine, out numberOfArgs);
    if (ptrToSplitArgs == IntPtr.Zero)
        throw new ArgumentException("Unable to split argument.",
          new Win32Exception());
    try
    {
        splitArgs = new string[numberOfArgs];
        for (int i = 0; i < numberOfArgs; i++)
            splitArgs[i] = Marshal.PtrToStringUni(
                Marshal.ReadIntPtr(ptrToSplitArgs, i * IntPtr.Size));
        return splitArgs;
    }
    finally
    {
        LocalFree(ptrToSplitArgs);
    }
}

[DllImport("shell32.dll", SetLastError = true)]
static extern IntPtr CommandLineToArgvW(
    [MarshalAs(UnmanagedType.LPWStr)] string lpCmdLine,
    out int pNumArgs);

[DllImport("kernel32.dll")]
static extern IntPtr LocalFree(IntPtr hMem);

If you want a quick-and-dirty, inflexible, fragile regex solution you can do something like this:

var rex = new Regex(@"("".*?""|[^ ""]+)+");
string test = "CALL \"C:\\My File Name With Space\" /P1 P1Value /P1 P2Value";
var array = rex.Matches(test).OfType<Match>().Select(m => m.Groups[0]).ToArray();
Sign up to request clarification or add additional context in comments.

4 Comments

Worked like a charm. I'm surprised to see code going outside of the framework. I feel a little dirty, not sure why, probably cause I don't understand.
sqlcmd.exe (msdn.microsoft.com/en-us/library/ms162773.aspx) and probably other exes allow for params switches in the form of a dash followed by a single letter to have an OPTIONAL space before writing the param value. For example "sqlcmd.exe -sMyServer" and "sqlcmd.exe -s MyServer" indicate the same passed value. However, this function passes 2 arguments for the first and 3 for the second.
@ChadD - CommandLineToArgvW is what the shell uses to figure out how to pass arguments. sqlcmd.exe then contains logic that interprets them. -s MyServer is passed as two args, but sqlcmd.exe recognizes them as one option together.
The CommandLineToArgvW solution doesn't work as it doesn't respect special cases like \\ and \"
2

I wouldn't do it with Regex, for various reasons shown above.

If I did need to, this would match your simple requirements:

(".*?")|([^ ]+)

However, this doesn't include:

  • Escaped quotes
  • Single quotes
  • non-ascii quotes (you don't think people will paste smart quotes from word into your file?)
  • combinations of the above

And that's just off the top of my head.

Comments

1

@chad Henderson you forgot to include the single quotes, and this also have the problem of capturing anything that comes before a set of quotes.

here is the correction including the single quotes, but also shows the problem with the extra capture before a quote. http://regexhero.net/tester/?id=81cebbb2-5548-4973-be19-b508f14c3348

3 Comments

Windows actually doesn't treat single quotes the same way it does double quotes. And you're not making sure the types of quotes match in your regex :). Just for fun, I updated mine to support args of the form a"b c"d
I'm curious about what the way windows treats single quotes has to do with this?
Windows treats 'a b' as two separate arguments, 'a and b'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.