A pure function can allocate and modify memory as long as a) it never
returns a reference to the memory or b) it never again modifies the
memory once it returns (i.e. it returns an immutable object).

That's a reasonable first approximation to the problem, yes. It gets a bit more complicated due to laziness (what if the mutation gets delayed until some later part of the output is examined?)

So, again, what is the Haskell philosophy towards using mutable data
structures in pure functions? Is it:

1. leave it to the compiler to find these kinds of opportunities
2. just use the immutable data structures - after all they are just as
good (at least asymptotically)
3. you don't want to use mutable data structures because of _____
4. it does happen, and some examples are ______

There are two ways in which Haskell encourages the use of mutable data structures in a pure way.

The first is in the inherent mutation caused by laziness. For example:

type Positive = Integer

-- trie of binary representation of positive number
-- [1] -> tOne
-- x ++ [0] -> lookup x . tEven
-- x ++ [1] -> lookup x . tOdd
data Trie a = Trie {
    tOne :: a,
tEven :: NatTrie a,
    tOdd :: NatTrie a
    }

lookupTrie :: Trie a -> Positive -> a
lookupTrie t 1 = tOne t
lookupTrie t n
    | even n = lookupTrie (tEven t) (n `div` 2)
    | otherwise = lookupTrie (tOdd t) (n `div` 2) -- div drops remainder

makeTrie :: (Positive -> a) -> Trie a
makeTrie f = Trie (f 1) e o where
    e = makeTrie $ \n -> f (2*n)
    o = makeTrie $ \n -> f (2*n + 1)

memoize :: (Positive -> a) -> (Positive -> a)
memoize = lookupTrie . makeTrie

collatz_rec :: (Positive -> Integer) -> Positive -> Integer
collatz_rec f 1 = 0
collatz_rec f n
    | even n = 1 + f (n `div` 2)
    | otherwise = 1 + f (3*n + 1)

collatz = memoize (collatz_rec collatz)

In this case, makeTrie creates a thunk, and it's only evaluated where requested by lookupTrie. You can call collatz at many different values and later calls will be much faster, as the mutation caused by lazy evaluation 'remembers' the values.

The second is by explicitly documenting that you are using a temporarily mutable structure, which is the ST monad:

instance Monad (ST s)
newSTRef :: a -> ST s (STRef s a)
readSTRef :: STRef s a -> ST s a
writeSTRef :: STRef s a -> a -> ST s ()
-- and similar interface for mutable STArrays

runST :: (forall s. ST s a) -> a -- note higher rank type

A computation in the ST monad is an impure computation that can modify memory, but only memory allocated within that same computation.

The higher rank type in runST makes it safe to do so--references from one ST computation cannot escape to any other ST computation. So even though internally some pure value might rely on an impure computation, it's safe to do so in a pure context.

Here's a sample implementation of ST:

-- DO NOT EXPORT THESE CONSTRUCTORS
newtype ST s a = ST (IO a)
newtype STRef s a = STRef { getRef :: IORef a }

runST :: (forall s. ST s a) -> a
runST (ST act) = unsafePerformIO act -- Actually safe!

newSTRef :: a -> ST s (STRef s a)
newSTRef a = ST $ liftM STRef (newIORef a)

readSTRef :: STRef s a -> ST s a
readSTRef (STRef r) = ST $ readIORef r

writeSTRef :: STRef s a -> a -> ST s ()
writeSTRef (STRef r) a = ST $ writeIORef r a

There's some usage examples at http://www.haskell.org/haskellwiki/Monad/ST

-- ryan