6

I'm in a situation where I need to parse arguments from a string in the same way that they would be parsed if provided on the command-line to a Java/Clojure application.

For example, I need to turn "foo \"bar baz\" 'fooy barish' foo" into ("foo" "bar baz" "fooy barish" "foo").

I'm curious if there is a way to use the parser that Java or Clojure uses to do this. I'm not opposed to using a regex, but I suck at regexes, and I'd fail hard if I tried to write one for this.

Any ideas?

2
  • I think your shell is in charge of splitting command line args, not Java. Commented Jul 14, 2010 at 22:16
  • 1
    Regardless, I'm still looking for a decent way to do this. Commented Jul 14, 2010 at 22:26

4 Answers 4

4

Updated with a new, even more convoluted version. This is officially ridiculous; the next iteration will use a proper parser (or c.c.monads and a little bit of Parsec-like logic on top of that). See the revision history on this answer for the original.

This convoluted bunch of functions seems to do the trick (not at my DRYest with this one, sorry!):

(defn initial-state [input]
  {:expecting nil
   :blocks (mapcat #(str/split % #"(?<=\s)|(?=\s)")
                   (str/split input #"(?<=(?:'|\"|\\))|(?=(?:'|\"|\\))"))
   :arg-blocks []})

(defn arg-parser-step [s]
  (if-let [bs (seq (:blocks s))]
    (if-let [d (:expecting s)]
      (loop [bs bs]
        (cond (= (first bs) d)
              [nil (-> s
                       (assoc-in [:expecting] nil)
                       (update-in [:blocks] next))]
              (= (first bs) "\\")
              [nil (-> s
                       (update-in [:blocks] nnext)
                       (update-in [:arg-blocks]
                                  #(conj (pop %)
                                         (conj (peek %) (second bs)))))]
              :else
              [nil (-> s
                       (update-in [:blocks] next)
                       (update-in [:arg-blocks]
                                  #(conj (pop %) (conj (peek %) (first bs)))))]))
      (cond (#{"\"" "'"} (first bs))
            [nil (-> s
                     (assoc-in [:expecting] (first bs))
                     (update-in [:blocks] next)
                     (update-in [:arg-blocks] conj []))]
            (str/blank? (first bs))
            [nil (-> s (update-in [:blocks] next))]
            :else
            [nil (-> s
                     (update-in [:blocks] next)
                     (update-in [:arg-blocks] conj [(.trim (first bs))]))]))
    [(->> (:arg-blocks s)
          (map (partial apply str)))
     nil]))

(defn split-args [input]
  (loop [s (initial-state input)]
    (let [[result new-s] (arg-parser-step s)]
      (if result result (recur new-s)))))

Somewhat encouragingly, the following yields true:

(= (split-args "asdf 'asdf \" asdf' \"asdf ' asdf\" asdf")
   '("asdf" "asdf \" asdf" "asdf ' asdf" "asdf"))

So does this:

(= (split-args "asdf asdf '  asdf \" asdf ' \" foo bar ' baz \" \" foo bar \\\" baz \"")
   '("asdf" "asdf" "  asdf \" asdf " " foo bar ' baz " " foo bar \" baz "))

Hopefully this should trim regular arguments, but not ones surrounded with quotes, handle double and single quotes, including quoted double quotes inside unquoted double quotes (note that it currently treats quoted single quotes inside unquoted single quotes in the same way, which is apparently at variance with the *nix shell way... argh) etc. Note that it's basically a computation in an ad-hoc state monad, just written in a particularly ugly way and in a dire need of DRYing up. :-P

Sign up to request clarification or add additional context in comments.

12 Comments

Jesus. I'm horrified that I have to put that thing in my code. This should be a lot easier than it actually is. :\ Thanks a lot! :D
You know, you might want to consider putting this into contrib or a small library or something. Seriously, this could be useful to more than just me.
Shouldn't this be true? (= (split-args "foo bar baz") '("foo" "bar" "baz")) false
Ah, right, will fix in a sec. (Might make it a bit DRYer too.)
Well, this is simple enough to fix -- wrap the str/split form with (mapcat #(str/split % #"(?<=\s)|(?=\s)") ...). I have however found another bug to do with escaping quotes... will post an updated version once I've got that fixed.
|
2

This bugged me, so I got it working in ANTLR. The grammar below should give you an idea of how to do it. It includes rudimentary support for backslash escape sequences.

Getting ANTLR working in Clojure is too much to write in this text box. I wrote a blog entry about it though.

grammar Cmd;

options {
    output=AST;
    ASTLabelType=CommonTree;
}

tokens {
    DQ = '"';
    SQ = '\'';
    BS = '\\';
}

@lexer::members {
    String strip(String s) {
        return s.substring(1, s.length() - 1);
    }
}

args: arg (sep! arg)* ;
arg : BAREARG
    | DQARG 
    | SQARG
    ;
sep :   WS+ ;

DQARG  : DQ (BS . | ~(BS | DQ))+ DQ
        {setText( strip(getText()) );};
SQARG  : SQ (BS . | ~(BS | SQ))+ SQ
        {setText( strip(getText()) );} ;
BAREARG: (BS . | ~(BS | WS | DQ | SQ))+ ;

WS  :   ( ' ' | '\t' | '\r' | '\n');

Comments

0

I ended up doing this:

(filter seq
        (flatten
         (map #(%1 %2)
              (cycle [#(s/split % #" ") identity])
              (s/split (read-line) #"(?<!\\)(?:'|\")"))))

3 Comments

I'm afraid this breaks with, say, 'asdf"asdf'.
Also, a backslash may itself be escaped... Just pointing things out in case you want to fix them, if I figure out an alternative solution, I'll post that as an answer.
Indeed. I knew it wasn't quite right, but I was taking whatever I could get at that point.
0

I know this is a very old thread, but I came across this same problem and used java interop to call:

(CommandLineUtils/translateCommandline cmd-line)

from Plexus Common Utilities.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.