Re: [Haskell-beginners] Re: When, if ever, does Haskell "calculate once"?

6 May 2010


      On Thursday 06 May 2010 22:49:40, Maciej Piechotka wrote:
...
On Thu, 2010-05-06 at 15:37 -0400, Brent Yorgey wrote:
...
On Thu, May 06, 2010 at 11:35:15AM -0700, Travis Erdman wrote:
...
Two questions here, I think they might be related, perhaps even the
same, but I am not sure, so I will ask both:
...
Q1:  Re the function f below, I like the first implementation as
it's "cleaner", but is the second implementation necessary for
performance purposes?
...
-- g = some CPU intensive function
-- alternate 1
f a b = c + (g b)
    where
        c = dosomethingelse a (g b)
-- alternate 2
f a b = c + saveit
    where
        saveit = g b
        c = dosomethingelse a saveit
You need alternative 2.  In general, GHC (and, I imagine, other
Haskell compilers) do not do much common subexpression elimination,
because in some cases it can be a *pessimization*.  The classic
example is
length [1..1000000] + length [1..1000000]
vs
let a = [1..1000000] in length a + length a
Not a particularly fortunate example, with optimisations turned on, GHC 
shares the common subexpression (length a), so that's calculated only once:


share :: Int
share = let a = [1 .. 100000000] in length a + length a

Core:

Share.share :: GHC.Types.Int
GblId
[Str: DmdType]
Share.share =
  case GHC.List.$wlen @ GHC.Integer.Type.Integer Share.share_a 0
  of ww_amc { __DEFAULT ->
  GHC.Types.I# (GHC.Prim.+# ww_amc ww_amc)
  }

Consider 

let a = [1 .. 100000000] in length a + sum a

vs

length [1 .. 100000000] + sum [1 .. 100000000]
...
...
The first will execute in constant space, since each list will be
lazily generated as needed by the length function and then the garbage
collector will come along behind length and get rid of the nodes that
have already been processed.  However, in the second expression, the
garbage collector cannot get rid of the nodes that are already
processed by the first call to length, since the second call to length
still needs the list.  So the entire list [1..1000000] will end up
being in memory at once.
Hmm:
...
cFoldr :: (a -> b -> b) -> (a -> c -> c) -> (b, c) -> [a] -> (b, c)
cFoldr f g ~(b, c) []     = (b, c)
cFoldr f g ~(b, c) (x:xs) = let (b', c') = cFoldr f g (b, c) xs
                            in (f x b', g x c')
cFoldl' :: (b -> a -> b) -> (c -> a -> c) -> (b, c) -> [a] -> (b, c)
cFoldl' f g bc xs0 = lgo bc xs0
                     where lgo (b, c) [] = (b, c)
                           lgo (b, c) (x:xs) =
                               let b' = f b x
                                   c' = g c x
                                   bc' = b' `seq` c' `seq` (b', c')
                               in bc' `seq` lgo bc' xs
No.

cFoldl' f g (b0,c0) xs0 = lgo b0 c0 xs0
    where
      lgo b c [] = (b,c)
      lgo !b !c (x:xs) = lgo (f b x) (g c x) xs
...
...
lengthFold :: Int -> a -> Int
lengthFold !n _ = n + 1
size = 100000000
main = let a = [1..size]
           l = length a + length a
      in print $! l
200000000
   8,031,190,480 bytes allocated in the heap
64-bit system? I get

200000000                                                          
   4,015,621,900 bytes allocated in the heap                       
         187,840 bytes copied during GC                            
           1,472 bytes maximum residency (1 sample(s))             
          29,892 bytes maximum slop                                
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  7659 collections,     0 parallel,  0.10s,  0.09s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    3.38s  (  3.41s elapsed)
  GC    time    0.10s  (  0.09s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    3.48s  (  3.51s elapsed)

  %GC time       2.9%  (2.7% elapsed)

  Alloc rate    1,187,979,304 bytes per MUT second

  Productivity  97.1% of total user, 96.4% of total elapsed
...
741,264 bytes copied during GC
           1,920 bytes maximum residency (1 sample(s))
          28,216 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 15318 collections,     0 parallel,  0.14s,  0.18s
elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s
elapsed
INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    7.15s  (  7.26s elapsed)
  GC    time    0.14s  (  0.18s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    7.29s  (  7.44s elapsed)
%GC time       1.9%  (2.5% elapsed)
Alloc rate    1,123,257,562 bytes per MUT second
Productivity  98.1% of total user, 96.1% of total elapsed
./a.out +RTS -s  7.29s user 0.05s system 98% cpu 7.450 total
...
main = print $! length [1..size] + length [1..size]
200000000
  16,062,318,576 bytes allocated in the heap
       1,476,904 bytes copied during GC
           2,000 bytes maximum residency (1 sample(s))
          27,992 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 30637 collections,     0 parallel,  0.29s,  0.37s
elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s
elapsed
INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   15.57s  ( 15.90s elapsed)
  GC    time    0.29s  (  0.37s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   15.87s  ( 16.27s elapsed)
%GC time       1.8%  (2.3% elapsed)
Alloc rate    1,031,313,475 bytes per MUT second
Productivity  98.2% of total user, 95.7% of total elapsed
./a.out +RTS -s  15.87s user 0.11s system 98% cpu 16.272 total
And that is strange, because I get the same figures for that one as for the 
first (times differ by a few hundredths of a second).
Is that a difference between 32-bit code generator and 64-bit or between 
GHC versions (6.12.2 here, but 6.10.3 gives roughly the same results)?
...
...
main =
  print $! uncurry (+) (cFoldl' lengthFold lengthFold
                                (0, 0) [1..size])
And that gives the same figures as the other two (plus/minus 0.05s).
...
All are compiled with optimizations.
All compiled with -O2.

Re: [Haskell-beginners] Re: When, if ever, does Haskell "calculate once"?

Daniel Fischer