sequence causing stack overflow on pretty small lists

On #haskell we recently had a discussion about the following: import System.Random list <- replicateM 1000000 randomIO :: IO [Int] I would think that this gives us a list of a million random Ints. In fact, this is what happens in ghci. But with ghc we get: Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it. This is because sequence is implemented as sequence (m:ms) = do x <- m xs <- sequence ms return (x:xs) and uses stack space when used on some [IO a].
From a theoretical side, this is an implementation detail. From the software engineering side this disastrous because the code is
* obviously correct by itself * the first thing people would come up with * not exaggerating: a million elements is not much * used a lot of places: mapM, replicateM are *everywhere* and yet it will kill our programs, crash our airplanes, and give no helpful information where the problem occurred. Effectively, sequence is a partial function. (Note: We are not trying to obtain a lazy list of random numbers, use any kind of streaming or the likes. We want the list in memory and use it.) We noticed that this problem did not happen if sequence were implemented with a difference list. What do you think about this? Should we "fix" functions like this, probably trading off a small performance hit, or accept that idiomatic Haskell code can crash at any time?

As an example that this actually makes problems in production code, I found this in the wildlife: https://github.com/ndmitchell/shake/blob/e0e0a43/Development/Shake/Database.... -- Do not use a forM here as you use too much stack space bad <- (\f -> foldM f [] (Map.toList status)) $ \seen (i,v) -> ... I could bet that there is a lot of code around on which we rely, which has the same problem but does not go that far in customisation.

The `mwc-random` package solves this specific problem by providing a function that creates a vector of random integers. The `vector` package solves this in general by letting you use mutation to store intermediate results in order to avoid a space leak. On 08/26/2013 01:46 AM, Niklas Hambüchen wrote:
On #haskell we recently had a discussion about the following:
import System.Random
list <- replicateM 1000000 randomIO :: IO [Int]
I would think that this gives us a list of a million random Ints. In fact, this is what happens in ghci. But with ghc we get:
Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it.
This is because sequence is implemented as
sequence (m:ms) = do x <- m xs <- sequence ms return (x:xs)
and uses stack space when used on some [IO a].
From a theoretical side, this is an implementation detail. From the software engineering side this disastrous because the code is
* obviously correct by itself * the first thing people would come up with * not exaggerating: a million elements is not much * used a lot of places: mapM, replicateM are *everywhere*
and yet it will kill our programs, crash our airplanes, and give no helpful information where the problem occurred.
Effectively, sequence is a partial function.
(Note: We are not trying to obtain a lazy list of random numbers, use any kind of streaming or the likes. We want the list in memory and use it.)
We noticed that this problem did not happen if sequence were implemented with a difference list.
What do you think about this? Should we "fix" functions like this, probably trading off a small performance hit, or accept that idiomatic Haskell code can crash at any time?
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Mon, Aug 26, 2013 at 1:46 AM, Niklas Hambüchen
This is because sequence is implemented as
sequence (m:ms) = do x <- m xs <- sequence ms return (x:xs)
and uses stack space when used on some [IO a].
This problem is not due to sequence, which doesn't need to add any strictness here. It occurs because the functions in System.Random are excessively lazy. In particular, randomIO returns an unevaluated thunk.

On Mon, Aug 26, 2013 at 3:05 PM, Bryan O'Sullivan
On Mon, Aug 26, 2013 at 1:46 AM, Niklas Hambüchen
wrote: This is because sequence is implemented as
sequence (m:ms) = do x <- m xs <- sequence ms return (x:xs)
and uses stack space when used on some [IO a].
This problem is not due to sequence, which doesn't need to add any strictness here. It occurs because the functions in System.Random are excessively lazy. In particular, randomIO returns an unevaluated thunk.
It doesn't have to do with System.Random. import Control.Monad {-# NOINLINE a #-} a :: IO Int a = return 1 main = do list <- replicateM 1000000 a :: IO [Int] return () will produce a stack overflow, regardless of optimization level. sequence tends to be tail-recursive for monads like Reader and (lazy) State, but not for monads like Maybe or IO where (>>=) must pattern-match on its first argument. Regards, Reid Barton

Maybe an unlimited stack size should be the default? As far as I understand, the only negative effect would be that some programming mistakes would not result in a stack overflow. However, I doubt the usefulness of that: * It already depends a lot on the optimisation level * If you do the same thing in a slightly different way, and you allocate on the heap instead of on the stack you will not get it either

On Mon, Aug 26, 2013 at 4:46 AM, Niklas Hambüchen
On #haskell we recently had a discussion about the following:
import System.Random
list <- replicateM 1000000 randomIO :: IO [Int]
I would think that this gives us a list of a million random Ints. In fact, this is what happens in ghci. But with ghc we get:
Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it.
You can use ContT to force the function to use heap instead of stack space, e.g. runContT (replicateM 1000000 (lift randomIO)) return

This is somewhat related: http://ghc.haskell.org/trac/ghc/ticket/4219
This also solves the concrete problem you gave in your original post
(in reverse order):
import Control.Monad
import System.Random
sequencel :: Monad m => [m a] -> m [a]
sequencel = foldM (\tail m -> (\x -> return $ x : tail) =<< m) []
main :: IO ()
main = print =<< sequencel (replicate 1000000 (randomIO :: IO Integer))
Following on Reid's point, maybe it's worth noting in the
documentation that replicateM, mapM, and sequence are not tail
recursive for Monads that define (>>=) as strict in the first
argument?
On Tue, Aug 27, 2013 at 6:07 AM, Niklas Hambüchen
On 27/08/13 20:37, Patrick Palka wrote:
You can use ContT to force the function to use heap instead of stack space, e.g. runContT (replicateM 1000000 (lift randomIO)) return
That is interesting, and works.
Unfortunately its pure existence will not fix sequence, mapM etc. in base.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Love in Jesus Christ, John Alfred Nathanael Chee http://www.biblegateway.com/ http://web.cecs.pdx.edu/~chee/

IMHO it's perfectly reasonable to expect sequence/replicateM/mapM to be able to handle a list of ~1e6 elements in the Unescapable Monad (i.e. IO). All the alternate implementations in the world won't be as handy as Prelude.sequence, and no amount of documentation will prevent people from running into this headlong*. So unless there's a downside to upping the stack size limitation I'm unaware of, +1 to that suggestion from me. John [1] Most people are physically incapable of reading documents that explain why what they want to do won't work. Even if people did read the documentation, I suspect that the people most in need of the information would be the least likely to understand how it applies to their situation. On Tue, Aug 27, 2013 at 9:19 PM, John Alfred Nathanael Chee < cheecheeo@gmail.com> wrote:
This is somewhat related: http://ghc.haskell.org/trac/ghc/ticket/4219
This also solves the concrete problem you gave in your original post (in reverse order):
import Control.Monad import System.Random
sequencel :: Monad m => [m a] -> m [a] sequencel = foldM (\tail m -> (\x -> return $ x : tail) =<< m) []
main :: IO () main = print =<< sequencel (replicate 1000000 (randomIO :: IO Integer))
Following on Reid's point, maybe it's worth noting in the documentation that replicateM, mapM, and sequence are not tail recursive for Monads that define (>>=) as strict in the first argument?
On 27/08/13 20:37, Patrick Palka wrote:
You can use ContT to force the function to use heap instead of stack space, e.g. runContT (replicateM 1000000 (lift randomIO)) return
That is interesting, and works.
Unfortunately its pure existence will not fix sequence, mapM etc. in
On Tue, Aug 27, 2013 at 6:07 AM, Niklas Hambüchen
wrote: base. _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Love in Jesus Christ, John Alfred Nathanael Chee http://www.biblegateway.com/ http://web.cecs.pdx.edu/~chee/
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Tue, 27 Aug 2013, John Lato wrote:
[1] Most people are physically incapable of reading documents that explain why what they want to do won't work. Even if people did read the documentation, I suspect that the people most in need of the information would be the least likely to understand how it applies to their situation.
Plus: I don't expect that programmers read the documentation of 'sequence' and 'mapM' again every time they use the function.
participants (8)
-
Bryan O'Sullivan
-
Gabriel Gonzalez
-
Henning Thielemann
-
John Alfred Nathanael Chee
-
John Lato
-
Niklas Hambüchen
-
Patrick Palka
-
Reid Barton