5

I'm new to Clojure and am having trouble figuring out how to avoid stack overflows in certain situations. One such situation came up while trying to port a parsing project to Clojure with a parser combinator library I found called kern.

Kern defines a recursive implementation for a "many-till" parser: source

This works fine for small inputs:

(def input-text "The blue {cat} and the red {dog} became best friends with the white {wolf} END {not included}")

(def  between-brackets "parser that grabs all text between brackets"
  (between (sym* \{) (sym* \}) (<+> (many (none-of* "}")))))

(def parse-enclosed-words "parser that attempts to grab text between brackets, 
skips a character when it can't, and halts when it hits the string END"
  (many-till (<|> between-brackets (skip any-char)) (token* "END")))

(filter #(some? %) (value parse-enclosed-words input-text)) ;; => ("cat" "dog" "wolf")

Unfortunately, the parser encounters stack overflows as input strings grow:

(def file-input-text (slurp (io/resource "[input-text-20x-repeated.txt][2]") ))
(filter #(some? %) (value parse-enclosed-words file-input-text)) ;; => Unhandled java.lang.StackOverflowError

From what I can tell by reading online, this is probably due to the function using stack-consuming recursion. I played around with some attempts to re-write the function using the "recur" keyword, but since the recursive call is not in tail position this didn't seem to work.

How can many-till be modified to avoid stack overflow?

2 Answers 2

4

I don't think you can do it with the abstractions the library provides. This is a very natural definition of many-till, and would work fine in Parsec, the library that Kern is obviously inspired by. But Clojure doesn't have Haskell's lazy evaluation and automatic trampolining, so the nested lambdas that many-till builds inevitably consume unbounded stack. You would need an implementation more like its implementation of many, which builds a parser by hand out of a function. I would include its source below, but I don't own it and so I don't think I am authorized to give Stack Overflow a CC BY-SA 4.0 license to it, as posting it would do. Instead, here is a link to its source.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, I was able to write a different implementation this way which doesn't overflow. And apologies for the issue of posting source, I edited the post to link to code that wasn't mine. Out of curiosity, in your experience with both languages, do you often run into situations where recursion feels kinda hacky/messy in Clojure compared to Haskell? It just seems like you have to structure recursive functions a lot more deliberately in Clojure, I'm not sure how restrictive that's going to feel down the line.
It doesn't come up that often. Most of the lazy stuff you want to do is with sequences, and Clojure does have lazy sequences. Lazy data of other sorts is idiomatic in Haskell but not usually that much better than a less fancy approach.
1

There is already a good answer discussing the specifics of recursion with this particular library.

On the more general question of parsing data using Clojure, you should check out the high-quality parser library Instaparse. It is written in Clojure, and is very powerful. You could either use it directly, or for comparison purposes.

See also the video from Clojure/West.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.