Shuffling in Haskell

Posted by Andre Tue, 23 Dec 2008 13:43:00 GMT

The other day I  implemented the Fisher-Yates Shuffle in Haskell. The algorithm is pretty simple, but writing it in Haskell was a challenge, because I had to work with the ST monad (more specifically STArrays), which I had never done, in order to write an efficient version (Haskell’s immutable arrays are simpler to use but wouldn’t work very well because of the copying needed on every update), and not only that, but also combine it with the State monad using a monad transformer, because the shuffling algorithm requires a random number generator, and the State monad is useful to keep it updated without the need for extra parameters in functions.

These are the modules we’ll use:

> import Control.Monad.ST
> import Control.Monad.State.Strict
> import Data.Array.ST
> import Data.Array.IArray
> import System.Random

The first function we need is one that swaps the elements of an array, given their indices. The code is straightforward, and the only thing worth mentioning is that since we’re using an STArray, the function has to be in the ST monad. Here’s the code:

> swap :: STArray s Int a -> Int -> Int -> ST s ()
> swap arr i j = do
>   x <- readArray arr i
>   y <- readArray arr j
>   writeArray arr i y
>   writeArray arr j x

The next step would be the main action of the shuffling algorithm, which is selecting a random array index and swapping it with the current upper bound on the array indices. This is where our need for the State monad transformer (StateT) appears. We need to keep our random number generator in the State monad, and also use the ST monad for the array operations. So we define a type synonym that represents that:

> type ShuffleT s a = StateT StdGen (ST s) a

This simply means we’re using the State monad to store an StdGen and that ST s is our inner monad. Computations of this type require a parameter of type s, which is required by the ST monad, and one of type a, which is the type of the return value of the computation.

Now we need a function that will give us a random array index, that is, an integer in some range delimited by the function’s parameters. Since we’re using the State monad, we’ll retrieve and update the random number generator using the get and put functions to access this implicit parameter.

> randomInt :: Int -> Int -> ShuffleT s Int
> randomInt lb ub = do
>   g <- get
>   let (r, g') = randomR (lb, ub) g
>   put g'
>   return r

With that, we can write the function that does the random swapping very simply:

> swapRand :: STArray s Int a -> Int -> Int -> ShuffleT s ()
> swapRand arr lb ub = do
>   r <- randomInt lb ub
>   lift $ swap arr r ub

Note that we can call randomInt directly, because the State monad is the outer monad in our stack, but we need to lift the result of swap, because it’s in ST, our inner monad (I guess calling it "upper" instead helps understanding the purpose of lift, if you consider a pile of stacked monads, with State at the bottom and ST at the top).

Finally, we can write the main function, which simply calls swapRand multiple times, each time decrementing the upper bound parameter until every element has been swapped. I’ll call it shuffle’.

> shuffle' :: STArray s Int a -> ShuffleT s (STArray s Int a)
> shuffle' arr = do
>   (lb, ub) <- lift $ getBounds arr
>   mapM_ (swapRand arr lb) [ub, ub-1 .. lb]
>   return arr

Note that the return type now is ShuffleT s (STArray s Int a), because we’re in ShuffleT and want to return an STArray. We use mapM_ with a decreasing range for the upper bounds, and apply each one to the partial application of swapRand to its first two arguments. Once again we have to lift the return value of a function that lives in the ST monad, this time getBounds.

The last step is getting the shuffled array out of the ST monad. Here I’m assuming that the array won’t be modified by the rest of the program, so we’re given an immutable array as an argument, convert it to a mutable array, shuffle it, extract it from StateT (ShuffleT in our case), freeze it, and finally extract it from ST. The array needs to be frozen in order to be extracted from ST, to avoid the mutability of the array to "escape" to the pure world. Credit for this wrapper goes to Ryan Ingram, who kindly explained this to me in Haskell-Cafe. The function also returns the modified random number generator for further use in the program. Here it is:

> shuffle :: Array Int a -> StdGen -> (Array Int a, StdGen)
> shuffle arr gen = runST $ do
>   stArr <- thaw arr
>   (stArr', gen') <- runStateT (shuffle' stArr) gen
>   arr' <- unsafeFreeze stArr'
>   return (arr', gen')

And that’s it! The only hard function, in my opinion, is shuffle, the wrapper around shuffle’. It’s not hard in the sense that the code is hard to follow, but you have to understand what needs to be done to get stuff out of the ST monad. Knowing that, it’s actually pretty simple.

2 comments |

Haskell daemons

Posted by Andre Thu, 11 Dec 2008 12:43:00 GMT

Out of boredom, I wrote a daemonize function in Haskell, following the steps in Advanced Programming in the UNIX Environment. Here it is:

module Daemon where

import System.Directory
import System.Exit
import System.Posix.Files
import System.Posix.IO
import System.Posix.Process
import System.Posix.Signals

daemonize :: IO () -> IO ()
daemonize f = do
  omask <- setFileCreationMask 0
  pid <- forkProcess child
  exitSuccess

  where

    child :: IO ()
    child = do
      sess <- createSession
      ohandler <- installHandler sigHUP Ignore Nothing
      forkProcess grandChild
      exitSuccess

    grandChild :: IO ()
    grandChild = do
      setCurrentDirectory "/"
      devNullFd <- openFd "/dev/null" ReadWrite Nothing defaultFileFlags
      mapM_ (closeAndDupTo devNullFd) [stdInput, stdOutput, stdError]
      f
      where
        closeAndDupTo dupFd fd = closeFd fd >> dupTo dupFd fd

3 comments |

Intermediate Haskell

Posted by Andre Wed, 26 Nov 2008 17:44:00 GMT

Some time ago I found this list of Haskell intermediate-level exercises. I found it interesting because this is something that is largely missing in Haskell tutorials or text books, since in general there’s a huge leap from the basics to the more advanced stuff. It is also a good way for getting used to working with monads without being scared by them (although having had some contact with their use helps with the exercises).

Anyway, the first time I saw the exercises I did some of them and stopped due to lack of time. This time I managed to finish them, and one thing I noticed was that I had to do some thinking and spend time redoing some of the exercises I had already done, so I decided to write these notes as a future reference for myself, and maybe for someone who happens to be stuck solving any of them. I hope I don’t alienate my usual audience with this post.

This post is literate Haskell, so you should be able to save this post and compile it as it is.

> module Intermediate where

We’re given a type class Fluffy, which requires its instances to implement a function called furry. This function takes as arguments a function from type a to type b and a Fluffy thing of type a, and returns a Fluffy thing of type b.

> class Fluffy f where
>   furry :: (a -> b) -> f a -> f b

The first exercise is to create an instance of Fluffy for lists. The best way to proceed in these exercises is to write down the types for the function for each instance. So for lists we have

furry :: (a -> b) -> [a] -> [b]

We take a function from a to b and a list of things of type a, and return a list of type b. In other words, we apply the function to each element of the list, collecting the results in another list. But hey, that’s just the definition of map from the Prelude! So we simply do

> instance Fluffy [] where
>   furry = map

Exercise two asks the same for Maybe. Writing the types, we have

furry :: (a -> b) -> Maybe a -> Maybe b

So we clearly have to apply the function to whatever’s inside the Maybe argument (if anything), and pack the result into Maybe again. We do that with pattern matching:

> instance Fluffy Maybe where
>   furry f (Just x) = Just (f x)
>   furry f Nothing  = Nothin

The third exercise asks the same for the type ((->) t). We write down the types as

furry :: (a -> b) -> (->) t a -> (->) t b

which can also be written

furry :: (a -> b) -> (t -> a) -> (t -> b)

Using that notation it’s easy to see we have two functions as arguments, one from type a to b and one from type t to a, and as a result we want a function from t to b. We can achieve that by calling the first function with the result of applying the second one as its argument. In code,

furry f g x = f (g x)

which means furry is the same as function composition:

> instance Fluffy ((->) t) where
>   furry = (.)

For the next exercises, a new type is defined.

> newtype EitherLeft b a  = EitherLeft (Either a b)
> newtype EitherRight a b = EitherRight (Either a b)

We are then asked to write instances of Fluffy for these two types. This can be somewhat confusing. Let’s write the types for EitherLeft:

furry :: (a -> b) -> EitherLeft t a -> EitherLeft t b

Recall that the Either type is defined as

data Either a b = Left a | Right b

Looking at the definition of EitherLeft, we see that type a ends up in the Left data constructor of Either. This is the argument that will be turned into something of type b by the application of the function we’re given in the first argument. We’re going to use two equations to define furry, matching for each of the data constructors of the Either type.

> instance Fluffy (EitherLeft t) where
>   furry f (EitherLeft (Left x))  = EitherLeft (Left (f x))
>   furry f (EitherLeft (Right x)) = EitherLeft (Right x)

The instance for EitherRight follows the same logic, so I’ll just show the code here.

> instance Fluffy (EitherRight t) where
>   furry f (EitherRight (Left x))  = EitherRight (Left x)
>   furry f (EitherRight (Right x)) = EitherRight (Right (f x))

For the next exercises, we’re given a type class, called Misty. It requires its instances to implement two functions, banana and unicorn, and provides a function called furry’.

Let’s start with unicorn because it’s simpler. From its type we can see that its purpose is to take a thing and turn it into a Misty thing. Now banana is a bit more complicated. It takes a function from things of type a to things of type m b (i.e. Misty things) and a Misty thing containing something of type a, and returns a Misty thing containing something of type b. From that we can see that it takes a Misty thing, unwraps it, turns whatever is inside into something of another type, and covers it in mist again.

Our job is to write furry’, which has a type similar to banana’s, except that the function it takes as an argument is from a to b, and not to m b.

What one should notice is that we can turn the (a -> b) function into an (a -> m b) function by use of unicorn. Say the (a -> b) function is called f. Then we can write an anonymous function of type (a -> m b) as

\x -> unicorn (f x)

This fits perfectly as an argument to banana, so our implementation of furry’ becomes

furry' f m = banana (\x -> unicorn (f x)) m

Turning it into point-free style we have

> class Misty m where
>   banana :: (a -> m b) -> m a -> m b
>   unicorn :: a -> m a
>
>   furry' :: (a -> b) -> m a -> m b
>   furry' f = banana (unicorn . f)

Now we’re asked to implement a Misty instance for lists. The unicorn function is really simple. Its type is

unicorn :: a -> [a]

To turn something into a list, we just create a singleton list with that thing as its element.

Let’s have a look at banana’s type:

banana :: (a -> [b]) -> [a] -> [b]

We have two arguments, a function and a list. We can see that we can apply the function to each element of the list, so this is somewhat like map, but because of the return type of this function, we end up with a list of lists of things of type b, which is not quite what we want (we want a list of things of type b). The solution is simple: we concatenate the list returned by map.

banana f xs = concat $ map f xs

We write it more simply in the code below, using Haskell’s concatMap function which does just that.

> instance Misty [] where
>   banana = concatMap
>   unicorn x = [x]

The Misty instance for Maybe is equaly simple. Checking the types, we have

unicorn :: a -> Maybe a
banana  :: (a -> Maybe b) -> Maybe a -> Maybe b

The unicorn function is trivial. To make a Maybe something out of something, we Just do it. The banana function is easy too. We extract the value from the Just case by pattern matching, and simply call the function passed as an argument on it. The Nothing case is even simpler, as we just have to return Nothing.

> instance Misty Maybe where
>   banana f (Just x) = f x
>   banana f Nothing  = Nothing
>   unicorn = Just

The next exercise asks for a Misty instance for ((->) t). Writing down the types we have

banana  :: (a -> (t -> b)) -> (t -> a) -> (t -> b)
unicorn :: a -> (t -> a)

Once again banana is the hard one. By inspecting the types, we can see that if we compose the two functions given as arguments, call them f and g, we have a function whose type is t -> (t -> b). This is almost what we want, since the return type of banana is t -> b. What we need then is to apply f.g to get the desired type.

The unicorn function is trivial again: we just need to wrap its argument in an anonymous function which will return it whatever it’s given.

The code for both of them follows.

> instance Misty ((->) t) where
>   banana f g = \x -> ((f.g) x) x
>   unicorn x = \_ -> x

Finally, we do the same for EitherLeft and EitherRight. I’ll only explain the Misty instance for EitherLeft, since EitherRight is entirely analogous. Let’s write down the types.

banana  :: (a -> EitherLeft t b) -> EitherLeft t a -> EitherLeft t b
unicorn :: a -> EitherLeft t a

This is really easy. To write banana, we’ll pattern-match on the second argument to extract the value carried on the Left case, and simply apply the function we’re given as an argument to it, since this gives us the correct type. In the Right case, we simply return the second argument. If you’re confused, re-read the definition of the EitherLeft type to clear things up.

The code for both EitherLeft and EitherRight follows.

> instance Misty (EitherLeft t) where
>   banana f (EitherLeft (Left x))  = f x
>   banana f (EitherLeft (Right x)) = EitherLeft (Right x)
>   unicorn x = EitherLeft (Left x)

> instance Misty (EitherRight t) where
>   banana f (EitherRight (Left x))  = EitherRight (Left x)
>   banana f (EitherRight (Right x)) = f x
>   unicorn x = EitherRight (Right x)

We’re finally over with all those instances! We’re now asked to write some utility functions. To do that, we’ll use the banana, unicorn and furry’ functions, so it’s a good thing we’re now intimate with how they work. Notice that from now on we’ll only care about the types of those functions, meaning the functions below should work with any of the Misty instances we wrote above (which is the whole point of type classes after all).

The first function is jellybean, which has type Misty m => m (m a) -> m a. So what it does is unpack a Misty thing that is itself a Misty thing. We remember that banana has type (a -> m b) -> m a -> m b. So somehow we extract something of type a out of the Misty thing in the second argument and turn it into a Misty thing of type b. If we look at the type of jellybean, though, we see that all we want is to return the extracted value itself, and we can just use Haskell’s id function for that.

> jellybean :: Misty m => m (m a) -> m a
> jellybean = banana id

The second function is called apple, with type Misty m => m a -> m (a -> b) -> m b. So it takes a Misty thing of type a, a Misty function from a to b and returns a Misty of type b. Once again we’ll take advantage of banana to extract the function from its Misty wrapper. How can we use this function though? Its type is (a -> b) but all we have is an m a which was given as an argument, and we must return an m b. It turns out we have already written that function! Check the type of furry’ again:

furry' :: (a -> b) -> m a -> m b

It’s just what we need! The code follows below.

> apple :: Misty m => m a -> m (a -> b) -> m b
> apple m = banana (\f -> furry' f m)

The next exercise asks for the implementation of the following function:

moppy :: Misty m => [a] -> (a -> m b) -> m [b]

We have a list of things of type a and a function from a to m b, and we have to return a Misty list of thing of type b. The idea here will be to apply the function to each element of the list, thus getting a list of things of type m b. Then somehow we have to turn a [m b] into a m [b]. To do that, we’ll fold over the list, extracting each element from its Misty wrapping, and inserting it into a list which will be our return value. The base case is simply an empty list with a Misty wrapping, which we can get by applying unicorn to the empty list. So for now we have something like this:

moppy xs f = foldr g (unicorn []) (map f xs)

How do we write g, though? Well, it takes two arguments, the first being a Misty thing (corresponding to each element of the list returned by map), and the second is the accumulator, which has type m [b]. To extract each value from their Misty wrappings, we’ll use the same strategy as before, using banana.

g m ms = banana (\x -> h x ms) m

So now we have access to each unwrapped element of the list. All that is left is to write h, which must insert it into the m [b] list. To do that, we must unwrap this list, insert the element and finally wrap it again, which we can do by using banana once again, and then unicorn for the wrapping.

h x ms = banana (\xs -> unicorn (x:xs)) ms

Putting it all together, and using an anonymous function instead of h, we have

> moppy :: Misty m => [a] -> (a -> m b) -> m [b]
> moppy xs f = foldr g (unicorn []) (map f xs)
>   where
>     g m ms = banana (\x -> banana (\xs -> unicorn (x:xs)) ms) m

Onto the next function, sausage. This time we have to turn a list of Misty things into a Misty list of things. But wait, haven’t we just done something like this in moppy? We did, so we can reuse it now. The idea is very simple: the first argument to moppy is simply the list of Misty things. The second argument in our case will be a function of type (m a -> m b). However, here we’re not interested in doing any transformation on the value wrapped by Misty, so it simplifies to (m a -> m a), which can be satisfied by the simplest of all functions, id. We use flip to write sausage in point-free style.

> sausage :: Misty m => [m a] -> m [a]
> sausage = flip moppy $ id

The next function we have to write is called banana2. It takes a function of type (a -> b -> c), something of type m a and something else of type m b, and returns something of type m c. Before trying to write anything, let’s have a look again at the type of furry’:

furry' :: (a -> b) -> m a -> m b

Quite similar, isn’t it? It’s actually the same thing, but useful when the function given as the first argument takes two arguments instead of one.

So how do we write banana2? We’ll do it in two steps. The idea of the first step is to use partial application on the two-argument function, and then use furry’ to handle the resulting one-argument function. Let’s do this part of the job and see what we get

g :: (a -> b -> c) -> m a -> m (b -> c)
g f m = furry' (\x -> f x) m

So, the return value of this is a Misty function… haven’t we seen this before? We did, this is exactly what apple takes as its second argument! So the second step is simply to use apple to finish the job. Writing it all at once,

> banana2 :: Misty m => (a -> b -> c) -> m a -> m b -> m c
> banana2 f m1 m2 = apple m2 (furry' (\x -> f x) m1)

The next two exercises ask for banana3 and banana4. This is basically the same exercise, but for three- and four-argument functions, so I’ll just show the code here.

> banana3 :: Misty m => (a -> b -> c -> d) -> m a -> m b -> m c -> m d
> banana3 f m1 m2 m3 = apple m3 (banana2 (\x -> f x) m1 m2)

> banana4 :: Misty m => (a -> b -> c -> d -> e) -> m a -> m b -> m c
          -> m d -> m e
> banana4 f m1 m2 m3 m4 = apple m4 (banana3 (\x -> f x) m1 m2 m3)

Now we’re given yet another type:

> newtype State s a = State {
>   state :: (s -> (s, a))
> }

This is a record with a single field, a function takes something of type s and returns a tuple of type (s, a). There’s a reason this type is called State. The state function receives an argument, the current state, and returns a tuple containing a new state and a value of type a. On a subsequent call of state, this new state should be given as its parameter.

The canonical example of this is the generation of random numbers. Here the state is a random number generator, and the returned value is a random number. A new RNG is also returned, because if we always used the same generator, we’d get the same "random" number every time.

The next exercise asks for us to implement an instance of Fluffy for State s. Once again we’ll write down the type for furry.

furry :: (a -> b) -> State s a -> State s b

So we’re given a function and a State s a. We have to extract the value of type a from the state, apply the function to it, and then pack it back into a state. Let’s start with the wrapping part this time:

furry f st = State $ \s -> g st s

We still have to write g though. By inspecting the types, we see that g must have type State s a -> s -> (s, a), because the whole thing must have type s -> (s, a). By using pattern matching on the State, writing g becomes easy:

g :: State s a -> s -> (s, a)
g (State k) s = let (t, x) = k s in (t, f x)

Putting it all together,

> instance Fluffy (State s) where
>   furry f (State k) = State $ \s -> let (t, x) = k s in (t, f x)

Finally, the last exercise: we have to write a Misty instance for State s. Writing down the types, we have

banana  :: (a -> State s b) -> State s a -> State s b
unicorn :: a -> State s a

The unicorn function is quite simple. All we have to do is wrap the given value in State according to the type definitions.

The banana function is similar to furry. Instead of a function from a to b, we have a function from a to State s b, so we have to use the state function to get a tuple in the return value of the anonymous function. The code follows.

> instance Misty (State s) where
>   banana f (State k) = State $ \s -> let (t, x) = k s in state (f x) t
>   unicorn x = State $ \s -> (s, x)

And that’s it… I guess I can call myself an intermediate-level Haskeller now  This was a huge post, so there are probably some mistakes in the explanations, and while I tried to keep them simple and avoid technical terms where possible (Misty things, etc.), there are certainly points where it’s better to simply look at the code to understand the meaning. When in doubt, trust the code!

2 comments |