Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

29 Sep 2004

      Malcolm Wallace wrote:
...
Graham Klyne  writes:
...
I can see that this requires the original file to be kept for 3-time 
scanning,  so enough memory for the entire file will be required.  Is that 
*the* problem to which you allude?  I can't see any other problem 
here.
Yes, it is the main problem.
<snip>
In any case, for the shootout, this is patently a different algorithm
to the one every other solution uses.  All the other languages do a
simple one-pass traversal of the file, in constant memory space.  Why
artificially handicap Haskell by insisting it do the job naively?
Just for the heck of it, I'd thought I'd try to write a naive 1-pass
version of the program.  It turned out to be 4.5 times slower than the
original...

-- compiled with:  ghc -O2 -ddump-simpl -fvia-c  -o wc_fold wc_fold.hs

import IO

main = do   file <- getContents
            putStrLn (show (foldl wc (0,0,0) file))

wc :: (Int,Int,Int) -> Char -> (Int, Int, Int)
wc (l,w,c) '\n' = (l+1,w  ,c+1)
wc (l,w,c) ' '  = (l  ,w+1,c+1)
wc (l,w,c)  x   = (l  ,w  ,c+1)

The algorithm isn't correct (it counts spaces instead of words), but
anyone have advice for improving its performance?  Is the problem
caused by the laziness of the Int's inside the tuple?  Here is the
report from ghc with the '-ddump-simpl' option.

Main.wc :: (GHC.Base.Int, GHC.Base.Int, GHC.Base.Int)
           -> GHC.Base.Char -> (GHC.Base.Int, GHC.Base.Int,
GHC.Base.Int)
[GlobalId]
Arity 2 __P Main.$wwc NoCafRefs Str: DmdType U(LLL)U(L)m
Main.wc = __inline_me (\ w :: (GHC.Base.Int,
                               GHC.Base.Int,
                               GHC.Base.Int)
                         w1 :: GHC.Base.Char ->
                         case w of w2 { (ww, ww1, ww2) ->
                         case w1 of w3 { GHC.Base.C# ww3 ->
                         case Main.$wwc ww ww1 ww2 ww3 of ww4 { (# ww5,
ww6, ww7 #) ->
                         (ww5, ww6, ww7)
                         }
                         }
                         })

Rec {
Main.$wlgo :: GHC.Base.Int
              -> GHC.Base.Int
                 -> GHC.Base.Int
                    -> [GHC.Base.Char]
                       -> (# GHC.Base.Int, GHC.Base.Int, GHC.Base.Int #)

And here is the memory allocation report generated from passing 
'+RTS -M900m -k250m -sstderr' to the executable...

875,150,472 bytes allocated in the heap
149,008,464 bytes copied during GC
250,845,060 bytes maximum residency (2 sample(s))

        478 collections in generation 0 ( 10.44s)
          2 collections in generation 1 (  0.00s)

        813 Mb total memory in use

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.93s  (  1.03s elapsed)
  GC    time   10.44s  ( 11.38s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   11.37s  ( 12.41s elapsed)

  %GC time      91.8%  (91.7% elapsed)

  Alloc rate    941,022,012 bytes per MUT second

  Productivity   8.2% of total user, 7.5% of total elapsed

Finally, here's the profiling report (-pT)...

        Wed Sep 29 15:21 2004 Time and Allocation Profiling Report (Final)

           wc_fold +RTS -M800m -k250m -pT -RTS

        total time  =        1.22 secs   (61 ticks @ 20 ms)
        total alloc = 173,502,056 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

wc                             Main                  60.7   64.8
main                           Main                  39.3   35.2

                                                          individual inherited
COST CE MODULE                           no.    entries  %time %alloc %time %alloc

MAIN    MAIN                               1           0   0.0    0.0 100.0  100.0
 main   Main                             146           1  39.3   35.2 100.0  100.0
  wc    Main                             147     3048000  60.7   64.8 60.7   64.8
 CAF    GHC.Handle                       103           3   0.0    0.0 0.0    0.0
 CAF    System.Posix.Internals            81           4   0.0    0.0 0.0    0.0

Any hints appreciated,

Greg Buchholz