Let's look at how one might construct two of these.
zip
We'll start with zip. The purpose of zip is to "zip" two lists into one. The name comes from the analogy of zipping two sides of a zipper together. Here's an example of how it functions:
zip [1,2,3] ["a", "b", "c"]
= [(1,"a"), (2,"b"), (3,"c")]
The type signature of zip (which is typically the first thing you'd write) is
zip :: [a] -> [b] -> [(a, b)]
That is, it takes a list of elements of type a and a list of elements of type b and produces a list of pairs with one component of each type.
To construct this function, let's go for standard Haskell pattern matching. We have four cases:
- The first list is
[] and the second list is [].
- The first list is
[] and the second list is a cons (constructed using :).
- The first list is a cons and the second list is
[].
- The first list is a cons and the second list is also a cons.
Let's work out each of these.
zip [] [] = ?
If you zip together two empty lists, you have no elements to work with, so surely you get the empty list.
zip [] [] = []
In the next case, we have
zip [] (y : ys) = ?
We have an element, y, of type b, but no element of type a to pair it with. So we can only construct the empty list.
zip [] (y : ys) = []
The same happens in the other asymmetrical case:
zip (x : xs) [] = []
Now we get to the interesting case of two conses:
zip (x : xs) (y : ys) = ?
We have elements of the right types, so we can make a pair, (x, y), of type (a, b). That's the head of the result. What's the tail of the result? Well, that's the result of zipping the two tails together.
zip (x : xs) (y : ys) = (x, y) : zip xs ys
Putting all these together, we get
zip [] [] = []
zip [] (y : ys) = []
zip (x : xs) [] = []
zip (x : xs) (y : ys) = (x, y) : zip xs ys
But the implementation you gave only has three cases! How's that? Look at what the first two cases have in common: the first list is empty. You can see that whenever the first list is empty, the result is empty. So you can combine these cases:
zip [] _ = []
zip (x : xs) [] = []
zip (x : xs) (y : ys) = (x, y) : zip xs ys
Now look at what's now the second case. We already know that the first list is a cons (because otherwise we'd have taken the first case), and we don't need to know anything more about its composition, so we can replace it with a wildcard:
zip [] _ = []
zip _ [] = []
zip (x : xs) (y : ys) = (x, y) : zip xs ys
That's produces the zip implementation you copied. Now it turns out that there's a different way to combine the patterns that I think explains itself a bit more clearly. Reorder the four patterns like this:
zip (x : xs) (y : ys) = (x, y) : zip xs ys
zip [] [] = []
zip [] (y : ys) = []
zip (x : xs) [] = []
Now you can see that the first pattern produces a cons and all the rest produce empty lists. So you can collapse all three of the rest, producing the nicely compact
zip (x : xs) (y : ys) = (x, y) : zip xs ys
zip _ _ = []
This explains what happens when both lists are conses, and what happens when that's not the case.
length
The naive way to implement length is very direct:
length :: [a] -> Int
length [] = 0
length (_ : xs) = 1 + length xs
This will give you correct answers, but it's inefficient. When evaluating the recursive call, the implementation needs to keep track of the fact that once it's done, it needs to add 1 to the result. In practice, it likely pushes the 1+ onto some sort of stack, makes the recursive call, pops the stack, and performs the addition. If the list has length n, the stack will reach size n. That's not great for efficiency. The solution, which the code you copied obscures somewhat, is to write a more general function instead.
-- | A number plus the length of a list
--
-- > lengthPlus n xs = n + length xs
lengthPlus :: Int -> [a] -> Int
-- n plus the length of an empty list
-- is n.
lengthPlus n [] = n
lengthPlus n (_ : xs) = ?
Well,
lengthPlus n (x : xs)
= -- the defining property of `lengthPlus`
n + length (x : xs)
= -- the naive definition of length
n + (1 + length xs)
= -- the associative law of addition
(n + 1) + length xs
= -- the defining property of lengthPlus, applied recursively
lengthPlus (n + 1) xs
So we get
lengthPlus n [] = n
lengthPlus n (_ : xs) = lengthPlus (n + 1) xs
Now the implementation can increment the counter argument on each recursive call instead of delaying them till afterwards. Well ... pretty much.
Thanks to Haskell's call-by-need semantics, this isn't guaranteed to run in constant memory. Suppose we call
lengthPlus 0 ["a","b"]
This reduces to the second case:
lengthPlus (0 + 1) ["b"]
But we haven't actually demanded the value of the sum. So the implementation could defer that addition work, creating a chain of deferrals that's just as bad as the stack seen earlier! In practice, the compiler is clever enough that it will work out how to do this right when optimizations are enabled. But if you don't want to rely on that, you can give it a hint:
lengthPlus n [] = n
lengthPlus n (_ : xs) = n `seq` lengthPlus (n + 1) xs
This tells the compiler that the integer argument actually has to be evaluated. As long as the compiler isn't being intentionally obtuse, it will be sure to evaluate it first, clearing up any deferred additions.
zip [1,2,3] ['a','b','c']gives you[(1,'a'), (2,'b'), (3,'c')]). The third function is an indexing function. For example,['a', 'b', 'c'] !! 1gives you'b'.