0

I have to split a command string into segments using regex. I am looking for a very basic parser to create some custom functions, for example I have this command:

rm --remove all --keep some --but-not *.php --or-like "?-imp-*.*"

Now I want to split this string into multiple segments each containing the argument name and value, e.g.

rm
--remove all
--keep some
--but-not *.php
--or-like "?-imp-*.*"

So I can further split each segment off blank space and have the argument name and value separated.

I am not good at RegEx. So far I've written this Regex to extract the argument and value part only but it does not match the words at end of string or those with special characters like * and ?

Regex

(?<=\s)--([^--]*)(?=(\s--))

and then I grab the name of command by

(^\w+)

Any thought on this ?

8
  • 1
    Does it all really have to be done with a single regex? You could just split from spaces, loop through the result and combine anything that starts with -- with the next element (if the next element doesn't start with --). Commented Jan 26, 2013 at 11:24
  • What language are you using? Commented Jan 26, 2013 at 11:34
  • I don't think this is a job for (a single) regex. There should be existing library for this. Commented Jan 26, 2013 at 11:37
  • @beny23 Does language matters for Regex ? I want to use it in javascript and in a shell script as well. Commented Jan 26, 2013 at 11:53
  • @Juhana splitting into array and joining the elements does not seems reliable to me. It might fail in some cases. Commented Jan 26, 2013 at 11:54

3 Answers 3

1

It is not a good idea to use regex for parsing, but regex should be used for tokenising.

Having said that, here is an imperfect regex that matches your scenario (but not all use cases)

Implemented in javascript...

str = 'rm --remove all --keep some --but-not *.php --or-like "?-imp-*.*"'   
regex = /(^\w+\b|--[\w-]+(\s([\w*.]+|".+?"))?)/g
res = str.match(regex)
// ['rm','--remove all','--keep some','--but-not *.php','--or-like "?-imp-*.*"']

Each item will need further processing to split into keys and values.

Sign up to request clarification or add additional context in comments.

2 Comments

I feel jealous of you guys :( Thanks for the help, I find this one much better than other solutions.
@Gufran: Since the answer doesn't state the assumption: command is assumed to only contain A-Z,a-z,0-9,_ (command is allowed to be something like 234__9). The name of the switch is assumed to only contain A-Z,a-z,0-9,_ and - (allowed to be this ---). The value can either be without quote or with quote. If without quote, it can only contain A-Z,a-z,0-9,_, and * and .. If with quote, it can contain any character besides new line, but the problem is that you won't be able to specify " as argument.
1

Example implementation in Javascript:

var match,
    str = 'rm --remove all --keep some --but-not *.php --or-like "?-imp\'\'-*.*"',
    args = [],
    reg = /\s--(\S+)\s+((["']).*?[^\\]\3|\S+)/g;

while ( match = reg.exec( str ) ) {
    args.push( [ match[1], match[2] ] );
}

console.log( args );

/* 
    [ [ "remove", "all" ], [ "keep", "some" ],
      [ "but-not", "*.php" ], [ "or-like", ""?-imp-*.*"" ] ]
 */

Note: this is not intended to be fully watertight and it requires that the format of a command is validated first.

Known limitation: an argument's value may not start with a quote mark and not have a closing quote mark, e.g. vales such as " and 'n will break the parse.

8 Comments

[^\3] --> You cannot do this. It will match any character, except for the character with ASCII code 3.
Same problem as Billy Moon's solution regarding the quoted text, though.
@nhahtdh. Added known limitation. Further criticisms appreciated.
You can have a look at this (don't copy over, though, since the OP's requirement can be different): stackoverflow.com/questions/13799773/…
@nhahtdh. "'ab\'c'".match( /'(?:[^'\\\n\r\u2028\u2029]|\\(?:[^\n\rxu0-9]|0(?![0-9])|x[0-9a-fA-F]{2}|u[0-9a-fA-F]{4}|\n|\r\n?))*'/ ); // "'ab'" Not sure how that helps here, or how it is better than what I have used above for this use case.
|
0

I would use library which implements GetOpt for javascript for that purpose (otherwise you're reinventing the wheel):

A quick google search brought up the following:

Note, I have not tried any of these.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.