To seq or not to seq, that is the question

9 Mar 2013

      Are these equivalent? If not, under what circumstances are they not
equivalent? When should you use each?

    evaluate a >> return b
    a `seq` return b
    return (a `seq` b)

Furthermore, consider:

    - Does the answer change when a = b? In such a case, is 'return $! b' permissible?
    - What about when b = () (e.g. unit)?
    - What about when 'return b' is some arbitrary monadic value?
    - Does the underlying monad (e.g. if it is IO) make a difference?
    - What if you use pseq instead of seq?

In http://hackage.haskell.org/trac/ghc/ticket/5129 we a bug in
'evaluate' deriving precisely from this confusion.  Unfortunately, the
insights from this conversation were never distilled into a widely
publicized set of guidelines... largely because we never really figured
out was going on! The purpose of this thread is to figure out what is
really going on here, and develop a concrete set of guidelines which we
can disseminate widely.  Here is one strawman answer (which is too
complicated to use in practice):

    - Use 'evaluate' when you mean to say, "Evaluate this thunk to HNF
      before doing any other IO actions, please."  Use it as much as
      possible in IO.

    - Use 'return (a `seq` b)' for strictness concerns that have no
      relation to the monad.  It avoids unnecessary strictness when the
      value ends up never being used and is good hygiene if the space
      leak only occurs when 'b' is evaluated but not 'a'.

    - Use 'return $! a' when you mean to say, "Eventually evaluate this
      thunk to HNF, but if you have other thunks which you need to
      evaluate to HNF, it's OK to do those first."  In particular,

        (return $! a) >> (return $! b) === a `seq` (return $! b)
                                       === a `seq` b `seq` return b
                                       === b `seq` a `seq` return b [1]

      This situation is similar for 'a `seq` return ()' and 'a `seq` m'.
      Avoid using this form in IO; empirically, you're far more likely
      to run into stupid interactions with the optimizer, and when later
      monadic values maybe bottoms, the optimizer will be justified in
      its choice.  Prefer using this form when you don't care about
      ordering, or if you don't mind thunks not getting evaluated when
      bottoms show up. For non-IO monads, since everything is imprecise
      anyway, it doesn't matter.

    - Use 'pseq' only when 'par' is involved.

Edward

Edward Z. Yang

Tom Ellis

Edward Z. Yang

Albert Y. C. Lai

Tom Ellis

tags

participants (3)