2

My question is about function clojure.string/split. One can specify maximum number of splits for the function, and it works like a charm:

user> (clojure.string/split "1{1,2{3,4},5}6" #"\{" 2)
;; => ["1" "1,2{3,4},5}6"]

However, the function traverses the string from left to right. Sometimes I want it to traverse a string form right to left (from end):

user> (clojure.string/split "1{1,2{3,4},5}6" #"\}" 2)
;; => ["1{1,2{3,4" ",5}6"]
;; desired result: ["1{1,2{3,4},5" "6"]

How can I achieve it using regex?

0

1 Answer 1

1

You could try using a negative lookahead in your particular case to ensure that there is no more } after the one you're splitting at:

user> (clojure.string/split "1{1,2{3,4},5}6" #"\}(?![^\}]*\})" 2)

(?![^\}]*\}) is a negative lookahead and will prevent a match if there is another } after the } matched. I'm using the negated class [^\}]* to make it faster than using something like .* and I'm not entirely sure if you need the escape since I'm not familiar with clojure. Usually, you can safely use \}(?![^}]*\}), but escaping will work whether it is required or not.

Sign up to request clarification or add additional context in comments.

7 Comments

Wow, it works, but I do not understand how :-( You said you was using 'negated class' to make it faster. Can you post also slower and simpler variant (just to start digging). P.S. One should use escape character because clojure uses java regex and without it clojure throws PatternSyntaxException.
@Mark The difference in speed is not much different, but here it is: #"\}(?!.*?\})" The way the negative lookahead works is, if what's inside matches, it fails the match. So if .*?\} matches, the match will fail. .*? will match any character, but it checks one character at a time. This is more suitable for instances where the string is short. #"\}(?!.*\})" (without ?) will do the same thing, except this one works when the last } is near to the end of the string and can be very slow if the two } are at the beginning of a very long string. [^\}]* works quickly in any situation.
About the escaping, I said that because most engines, including Java doesn't need escaping when you put a meta character in a character class, so that: [\}] is the same as [}]. But there's so much variety of things out there, one can never be too sure :)
Thanks for explaining the 'machinery' :-) I like to understand what's going on in my code.
@Mark Should you need some more elaboration, let me know. I ran out of characters in that comment above lol.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.