By Kai Sassnowski

Published 06.11.2016Github Twitter

I’m writing a book!

If you learned something from my blog or enjoy my style of writing you should check out the book I’m writing. It’s called Developer’s Guide to Database Indexing and will be stock full of all the things you should know about database indexing as a developer.

Check it out

Understanding filterM

At the recent Haskell Meetup I went to, someone posed this interesting challenge:

Given a list of integers, produce a list of all possible sums that can be created using the integers from the list.

Turns out, in Haskell this is a one-liner:

map sum $ filterM (const [True, False]) [1..5]

Which returns

[15,10,11,6,12,7,8,3,13,8,9,4,10,5,6,1,14,9,10,5,11,6,7,2,12,7,8,3,9,4,5,0]

To me as a Haskell beginner, this is straight up witchcraft. How can this line possibly produce something this complex. So a couple of us decided to dive into the implementation of filterM to see how it all works. What follows are our findings (a bit cleaned up).

First impressions

The first thing throwing me for a loop, is that a function called filterM is returning more elements than in the initial list. That’s not what filtering means in my mind. I should always end up with less, or at least with the same amount of elements than before.

Secondly, I assume that the M in filterM stands for Monad. So that instantly makes it a bit more scary to me. While I (sort of) understand what Monads are – at least from a theoretical point of view – I don’t yet have intuition for them. It’s not immediately obvious to me, why Monads are needed in this particular case.

Understanding the type signature

Looking at the type signature for filterM gives us this:

filterM :: Monad m => (a -> m Bool) -> [a] -> m [a]

Note that the type signature ghci gives you actually differs from the one on Hoogle. I have chosen to use the type signature that ghci provides me.

As mentioned before, monads are still a bit scary to me. What I always end up doing, is pluck in the actual Monad I’m dealing with, and see what the type signature looks like. So let’s do that.

We’re passing in const [True, False] into filterM. So our monad in this case would be the list monad []. Let’s see what that gives us.

filterM :: (a -> [Bool]) -> [a] -> [[a]]

Ok that makes a bit more sense. One thing that strikes me as interesting, is that this is returning a nested list.

Let’s continue.

Playing human interpreter

Since I still don’t have a clue what this function is supposed to do, I decided to just evaluate the function myself by plucking in actual values. Let’s take a look at our filterM call again.

I’m leaving off the map call and only focusing on the filterM expression here.

filterM (const [True, False]) [1..5]

filterM can be implemented something like this (credits to this guy ):

filterM :: Monad m => (a -> m Bool) -> [a] -> m [a]
filterM p []     = return []
filterM p (x:xs) =
    let rest = filterM p xs in
        do b <- p x
           if p then liftM (x:) rest
           else                 rest

This is not actually how it is implemented, but this implementation makes it much easier to understand.

Since we’re not passing in an empty list, let’s just skip that part for now and instead focus on the second part of the pattern match.

Immediately we have a recursive call on the tail of the list. Let’s ignore that for now and move on.

do b <- p x
   if b then liftM (x:) rest
   else                 rest

Ok, so what’s going on here. Looks like we’re applying p to x, which is the head of our input list [1..5]. Substituting the variables for their actual values gives us

do b <- const [True, False] 1
-- ...

const is a function of type a -> b -> a. So in other words, it takes two parameters and simply returns the first one. The first argument in this case is [True, False]. So const [True, False] 1 returns [True, False]. In fact, const [True, False] anything will return [True, False]. This expression will always evaluate to [True, False].

Moving on, that leaves us with

do b <- [True, False]
    if b then liftM (1:) rest
    else                 rest

What does b <- [True, False] do? We’re in a do-expression, so this is just syntactic sugar around bind (>>=) calls. Let’s desugar and see what we end up with.

[True, False] >>= \b -> if b then liftM (1:) rest else rest

Ok, so depending on what b is, we either execute liftM (x:) rest or simply rest. rest is the result of our initial recursive call filterM p xs which has type m [a] since that is the return type of filterM.

Right so what does liftM (x:) rest do? liftM has this type signature

liftM :: Monad m => (a -> b) -> m a -> m b

That looks familiar… What’s the type signature of fmap again?

fmap :: Functor f => (a -> b) -> f a -> f b

Swap out Monad with Functor and m with f and you have the same function. So liftM is basically fmap, but for monads. In other words, it takes a function that works on “unwrapped” values, and makes it work on values that are in a monad.

So back to our example. We had this expression

if b then liftM (1:) rest else rest

If b is True we want to cons 1 onto rest. Since rest is of type [[Int]] and x is of type Int we need to lift the cons function (:) into the monad before we can apply it. Makes sense.

Depending on the value of b we either return a list with 1 added to the front, or one without the 1. There’s our filter!

That still leaves the question what b actually is. We had the following piece of code

[True, False] >>= \x -> if x then liftM (1:) rest else rest

It seems like we need to take a look at how (>>=) is implemented for the list monad in order for this snippet to make sense. Here it is.

(>>=) :: Monad m => m a -> (a -> m b) ->  m b
xs >>= f = concat $ fmap f xs

Aha! Bind for lists is implemented in terms of fmap. So what’s really going on is that we’re feeding both True and False into our little if-expression one after the other. But what does it mean? Let’s see what the expression above evaluates to:

[True, False] >>= \x -> if x then liftM (1:) rest else rest

[[1],[]]

What if we were to simulate the recursive call, and pass in 2 as our value. In this case rest would be equal to [[1], []].

[True, False] >>= \x -> if x then liftM (2:) [[1],[]] else [[1],[]]

[[2,1],[2],[1],[]]

I think I’m starting to see it. One more.

[True, False] >>= \x -> if x then liftM (3:) [[2,1],[2],[1],[]] else [[2,1],[2],[1],[]]

[[3,2,1],[3,2],[3,1],[3],[2,1],[2],[1],[]]

Note, that this not actually how the evaluation goes. We’re evaluating from left to right for simplicities sake. However, filterM is actually implemented in terms of foldr. So evaluate it from right to left, starting with the 5.

For every element in our input list, we return a list with that element and one without the element. Due to the recursive nature of this algorithm this gives us back all possible subsets of our initial list.

After some research, I found out, that this is called the power set of a list (or set, in set theory).

Wrapping it up

Going back to the original problem we can now see, that

filterM (const [True, False]) [1..5]

is returning all possible subsets that can be formed using the numbers 1 to 5 (including the empty set).

[[1,2,3,4,5],[1,2,3,4],[1,2,3,5],[1,2,3],[1,2,4,5],[1,2,4],[1,2,5],[1,2],[1,3,4,5],[1,3,4],[1,3,5],[1,3],[1,4,5],[1,4],[1,5],[1],[2,3,4,5],[2,3,4],[2,3,5],[2,3],[2,4,5],[2,4],[2,5],[2],[3,4,5],[3,4],[3,5],[3],[4,5],[4],[5],[]]

All that’s now left to do in order to answer the original problem is calculating the sum of each subset. Which leaves us with the initial solution.

map sum $ filterM (const [True, False]) [1..5]

[15,10,11,6,12,7,8,3,13,8,9,4,10,5,6,1,14,9,10,5,11,6,7,2,12,7,8,3,9,4,5,0]

Neat.