10

I need to parse strings intended for cross-spawn

From the following strings:

cmd foo bar
cmd "foo bar" --baz boom
cmd "baz \"boo\" bam"
cmd "foo 'bar bud' jim" jam
FOO=bar cmd baz

To an object:

{command: 'cmd', args: ['foo', 'bar']}
{command: 'cmd', args: ['foo bar', '--baz', 'boom']}
{command: 'cmd', args: ['baz "boo" bam']}
{command: 'cmd', args: ['foo \'bar bud\' jim', 'jam']}
{command: 'cmd', args: ['baz'], env: {FOO: 'bar'}}

I'm thinking a regex would be possible, but I'd love to avoid writing something custom. Anyone know of anything existing that could do this?

Edit

The question and answers are still valuable, but for my specific use-case I no longer need to do this. I'll use spawn-command instead (more accurately, I'll use spawn-command-with-kill) which doesn't require the command and args to be separate. This will make life much easier for me. Thanks!

3 Answers 3

4

You could roll your own with regex, but I'd strongly recommend looking at either:

  • minimist by Substack, or
  • yargs which is a more comprehensive implementation of argument parsing for node

Both are battle-hardened and well supported; minimist gets about 30 million downloads a month while yargs gets nearly half that.

It's very likely you can find a way to use one or the other to get the CLI syntax you want, with the exception of env support which IMO should be handled separately (I can't imagine why you'd want to be opinionated about environment variables being set as part of the command)

Sign up to request clarification or add additional context in comments.

3 Comments

yargs won't handle the config file format (FOO=bar cmd baz), will it?
FOO=bar won't show up in process.argv at all in node, so both yargs and minimist would ignore it. I strongly suggest using process.env instead of trying to parse env vars from the command since there are many ways they can be set and unless there's a very specific reason to do so, limiting support to variables set at the start of the command would be unexpected.
So, to be more clear, I'm working on p-s which allows you to specify command strings and I invoke that as a child process. With cross-spawn, you need to provide the command and args separately. However, if I use spawn-command then I don't have to do this (I can just send the whole command) which removes my need to do this in the first place!
2

While you could use raw regular expressions, but what you're building is called a tokenizer. The reason you'd want a tokenizer is to handle certain contexts such as strings that contain spaces, which you don't want to split on.

There are existing generic libraries out there specifically designed for doing parsing and tokenization and can handle cases like strings, blocks, etc.

https://www.npmjs.com/package/js-parse

Additionally, most of these command line formats and config file formats already have parsers/tokenizers. You might want to leverage those and then normalize the results from each into your object structure.

Comments

2

A regular expression could match your command line...

^\s*(?:((?:(?:"(?:\\.|[^"])*")|(?:'[^']*')|(?:\\.)|\S)+)\s*)$

... but you wouldn't be able to extract individual words. Instead, you need to match the next word and accumulate it into a command line.

function parse_cmdline(cmdline) {
    var re_next_arg = /^\s*((?:(?:"(?:\\.|[^"])*")|(?:'[^']*')|\\.|\S)+)\s*(.*)$/;
    var next_arg = ['', '', cmdline];
    var args = [];
    while (next_arg = re_next_arg.exec(next_arg[2])) {
        var quoted_arg = next_arg[1];
        var unquoted_arg = "";
        while (quoted_arg.length > 0) {
            if (/^"/.test(quoted_arg)) {
                var quoted_part = /^"((?:\\.|[^"])*)"(.*)$/.exec(quoted_arg);
                unquoted_arg += quoted_part[1].replace(/\\(.)/g, "$1");
                quoted_arg = quoted_part[2];
            } else if (/^'/.test(quoted_arg)) {
                var quoted_part = /^'([^']*)'(.*)$/.exec(quoted_arg);
                unquoted_arg += quoted_part[1];
                quoted_arg = quoted_part[2];
            } else if (/^\\/.test(quoted_arg)) {
                unquoted_arg += quoted_arg[1];
                quoted_arg = quoted_arg.substring(2);
            } else {
                unquoted_arg += quoted_arg[0];
                quoted_arg = quoted_arg.substring(1);
            }
        }
        args[args.length] = unquoted_arg;
    }
    return args;
}

1 Comment

😮 wow, thanks! Honestly, I'm thinking that now I don't need this, but I'm amazed at your regex skills!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.