Daniel,
thanks for your explanations! For some time, I thaught laziness might be some kind of holy grail. But now I realize that it takes both a lot of time to understand properly and that it has more limitations than I expected. Another example for this: I wrote a function
recursiveContents :: FilePath -> IO [FilePath]
and later on I thaught it was possible to run some actions on every file from the list, in a lazy fashion. Then I had to find out that all files had to be found before I could process the first one. By now, I think I understand why, but it stops me from being too enthusiastic about laziness :-(
Regards
Tim
On Saturday 29 January 2011 15:16:41, Tim Baumgartner wrote:We'd all love a compiler clever enough to transform simple high-level
> Hi everybody,
>
> after reading a post by Don Stewart on
> http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast, I was not really
> satisfied with the necessity to rewrite
>
> mean xs = sum xs / fromIntegral (length xs)
>
> to an explicitly recursive function.
specifications into efficient code. But that's very hard and while we're
waiting, we have to go lower-level sometimes.
Yes.
> I investigated a bit further and
> got really confused after the following GHCi session:
>
> Prelude> length [1..10^8] + length [1..10^8]
> 200000000
> Prelude> let xs = [1..10^8]
> Prelude> length xs + length xs
> <interactive>: out of memory
>
> Can someone explain this
creates two lists, each is generated lazily and garbage collected as it is
Prelude> length [1 .. 10^8] + length [1 .. 10^8]
consumed by length, so the entire calculation runs in constant space, just
not particularly fast.
Since you haven't turned off the monomorphism restriction in ghci
(otherwise the computation would run in constant space),
defines a list xs :: [Integer] (the definition yields the constraints Num a
Prelude> let xs = [1 .. 10^8]
and Enum a, per the monomorphism restriction, the type variable a defaults
to Integer), which will be stored for later reuse.
then starts calculating the first length. That means xs will be completely
Prelude> length xs + length xs
evaluated in the course of that (if there is sufficient memory). Since the
list is bound to a name in scope, it doesn't get garbage collected, so with
enough memory, you'll have a 100 million long list of Integers in your
memory (I always forget how many words per cons cell are required, but
that's something like 400 or 500 million words, 1.6 or 2 / 3.2 or 4 GB,
depending on whether you're on a 32-bit or a 64-bit system, iirc).
In your case, that exhausts your RAM, otherwise calculating the second
length would be faster since the list need not be created a second time.
With the monomorphism restriction turned off,
(0.04 secs, 9183720 bytes)
Prelude> let xs = [1 .. 10^8]
Prelude> length xs + length xs200000000
(8.02 secs, 8033302172 bytes)
Prelude> :t xs
xs :: (Enum a, Num a) => [a]
xs remains a polymorphic value, so doesn't get shared across computations
(since each computation could use it at a different type).
But.
Trying such stuff at the ghci prompt doesn't always tell you much about how
compiled code behaves.
For example:
module Length where
val :: Int
val = length [1 .. 10^8] + length [1 .. 10^8]
compiled with -O, gives the core (other stuff snipped)
Length.val :: GHC.Types.Int
[GblId,
Str=DmdType,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, Cheap=False, Expandable=False,
Guidance=IF_ARGS [] 6 2}]
Length.val =
case GHC.List.$wlen @ GHC.Integer.Type.Integer Length.val1 0
of ww_ayU { __DEFAULT ->
GHC.Types.I# (GHC.Prim.+# ww_ayU ww_ayU)
}
meaning that the value "length [1 .. 10^8]" is reused, thus giving only one
list traversal (and the list can be garbage-collected as it is consumed).
No. Just a few rules of thumb.
> and give a simple, reasonable rule by which I
> can predict such things in the future?
If an expression gets a name (by a top level definition or a let/where
binding), it is reused (unless polymorphism prohibits that, that's one of
the reasons for the MR, without the MR, stuff expected to be shared will be
recomputed).
GHC does little common subexpression elimination, so if an expression
contains two identical subexpressions, they may not be shared. Sharing is
more likely if the subexpressions are simple and 'close together' (for very
fuzzy values of simple and close).
Unfortunately, one thing that GHC spots well is enumFromTo
(sugared: [a .. b]), meaning that long lists are often shared if they
shouldn't be for memory reasons.
Yes, it is.
> Isn't it a pitfall?