4

The split in both clojure and java takes regular expression as parameter to split. But I just want to use normal char to split. The char passed in could be "|", ","," " etc. how to split a line by that char?

I need some function like (split string a-char). And this function will be called at very high frequency, so need good performance. Any good solution.

2 Answers 2

5

There are a few features in java.util.regex.Pattern class that support treating strings as literal regular expressions. This is useful for cases such as these. @cgrand already alluded to (Pattern/quote s) in a comment to another answer. One more such feature is the LITERAL flag (documented here). It can be used when compiling literal regular expression patterns. Remember that #"foo" in Clojure is essentially syntax sugar for (Pattern/compile "foo"). Putting it all together we have:

(import 'java.util.regex.Pattern)
(clojure.string/split "foo[]bar" (Pattern/compile "[]" Pattern/LITERAL))
;; ["foo" "bar"]
Sign up to request clarification or add additional context in comments.

Comments

4

Just make your character a regex by properly escaping special characters and use the default regex split (which is fastest by far).

This version will make a regexp that automatically escapes every character or string within it

(defn char-to-regex
  [c]
  (re-pattern (java.util.regex.Pattern/quote (str c))))

This version will make a regexp that escapes a single character if it's within the special character range of regexps

(defn char-to-regex
  [c]
  (if ((set "<([{\\^-=$!|]})?*+.>") c)
    (re-pattern (str "\\" c))
    (re-pattern c)))

Make sure to bind the regex, so you don't call char-to-regex over and over again if you need to do multiple splits

(let [break (char-to-regex \|)]
  (clojure.string/split "This is | the string | to | split" break))
=> ["This is " " the string " " to " " split"]

2 Comments

(Pattern/quote s) is more reliable than (str "\\Q" c "\\E")
verified re: perf, I made a relatively low level character scanning / splitting version in clojure, but it was 10x slower than a regex split.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.