6

In this answer on CodeReview, the question asker and the answerer both seem to show disdain for the (++) operator. Is this due to it's speed (causing the algorithm to explicitly run in O(n^2) where n is the length of the list iirc)? Is this a pre-optimization if not otherwise tested, as Haskell is known for being difficult to reason about time complexity? Should others avoid the (++) operator in their programs?

6
  • Haskell lists are linked lists, and appending lists is an O(n) operation in strict languages. In Haskell, however, it’s not quite as straightforward: laziness allows that work to be put off until the list is used, effectively distributing the cost over the length of the list. As you mention, this laziness can make time and space complexities difficult to reason about in Haskell, but I think the same advice for other languages applies to Haskell, too: write for clarity first, and optimize for speed if and only if you both need it to go faster and you’ve profiled to find a bottleneck. Commented Jul 28, 2016 at 5:22
  • it's both - it's harder to reason about your code, it usually slows down your code quite a bit and you can very often work around using it - but it all depends on the circumstances and IMO your questions is to unspecific to be a good fit for SO Commented Jul 28, 2016 at 5:22
  • @Carsten: I'm a little confused about your remark regarding "it usually slows down your code". While this is certainly true with the canonical tail-recursive factorial example, is this really quite common amongst everyday programs? I thought the benefit to something like Haskell over one of the more popular outsider languages like Ruby and Python was the optimizations afforded by the rich typing system. If you really believe that my questions are not specific enough, what about in the example I provided, save only the last question (which was my intention, and I think ThreeFx figured it out. Commented Jul 28, 2016 at 6:09
  • you should not worry if I think the question is no fit for SO - it's not about the question but about SO Commented Jul 28, 2016 at 6:37
  • I'm not worrying, I just think it is a fit for SO. ThreeFx was able to help me significantly with that, and I hope many others as well. Commented Jul 28, 2016 at 6:43

1 Answer 1

11

It depends.

Consider the expression

foldl (++) [] list

This expression concatenates a list of lists into a single list, but has aforementioned quadratic complexity. This happens because the implementation of (++) traverses the entire left list and prepends each element to the right list (while keeping the correct order of course).

Using a right fold, we get linear complexity:

foldr (++) [] list

This is due to the (++) operator's implementation, which traverses only the left argument and prepends it to the right.

[1,2] ++ [3,4] ++ [5,6]

is equal to

-- Example as created by foldr
[1,2] ++ ([3,4] ++ [5,6])
== [1,2] ++ [3,4,5,6]
== [1,2,3,4,5,6] -- all good, no element traversed more than once

which only traverses each list element once.

Now switching the parentheses around to the first two lists is more expensive, since now some elements are traversed multiple times, which is inefficient.

-- Example as created by foldl
([1,2] ++ [3,4]) ++ [5,6]
== [1,2,3,4] ++ [5,6]
== [1,2,3,4,5,6] -- the sublist [1,2] was traversed twice due to the ordering of the appends

All in all, watch out for such cases and you should be fine.

Sign up to request clarification or add additional context in comments.

7 Comments

Can this be generalized such that foldl should be used with left-associative operators all the time? What do we do with foldl'? EDIT: If an operation is both left and right associative, does that mean that foldl <op> <vals> == foldr <op> <vals>? And, under what circumstances is foldl <otherop> <vals> != foldr <otherop> <vals>? And does Haskell really not notice this bottleneck? I thought the compiler was supposed to be among the best in the world (specifically referring to the GHC here), to not optimize this seems folly (although I may be out of my league...).
@JackBrandt No, at least not as a general rule. Imagine (^) (or (-)): These operators produce different results when used in left and right folds, respectively. It depends on what you want it to do. foldl' is an entirely different matter, it is a strict version of the standard foldl and should almost always be preferred over foldl, except when you explicitly want to build up thunks.
Can you expand on why anyone would want to build up thunks? It seems rather counterintuitive to popular understanding. Also, the mathematical examples were really quite useful for understanding the difference b/t folds.
@JackBrandt It's less the thunks than it is the advantage of using a lazy consuming function. Whereas foldl' will crash given undefinedin its list, the lazy foldl may return given that your function is lazy enough. You might want to have a look at this example.
The reason foldr is better has nothing to do with the associativity of (++). It's because of how (++) has to work when concatenating immutable linked lists. It must create new (:) constructors for each element in its left argument, but it doesn't do anything at all with the right argument. Therefore, if you stack up a left-biased tree of calls to (++) it repeatedly copies the same elements. If you stack up a right-biased tree, it only copies the elements it needs to once.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.