Re: [Haskell-cafe] How on Earth Do You Reason about Space?

31 May 2011

      Hi Aleksandar,

On Tue, May 31, 2011 at 6:10 PM, Aleksandar Dimitrov
 wrote:
...
Say, we have an input file that contains a word per line. I want to find all
unigrams (unique words) in that file, and associate with them the amount of
times they occurred in the file. This would allow me, for example, to make a
list of word frequencies in a given text.
Here's how I would do it:

{-# LANGUAGE BangPatterns #-}
module Ngram (countUnigrams) where

import qualified Data.ByteString as S
import qualified Data.HashMap.Strict as M
import System.IO

foldLines :: (a -> S.ByteString -> a) -> a -> Handle -> IO a
foldLines f z0 h = go z0
  where
    go !z = do
        eof <- hIsEOF h
        if eof
            then return z
            else do
                line <- S.hGetLine h
                go $ f z line
{-# INLINE foldLines #-}

-- Example use
countUnigrams :: IO (M.HashMap S.ByteString Int)
countUnigrams = foldLines (\ m s -> M.insertWith (+) s 1 m) M.empty stdin
...
RANT
I have tried and tried again to avoid writing programs in Haskell that would
leak space like BP likes to leak oil. However, I have yet to produce a single
instance of a program that would do anything at all and at the same time consume
less memory than there is actual data in the input file.
It is very disconcerting to me that I seem to be unable, even after quite some
practice, to identify space leaks in trivial programs like the above. I know of
no good resource to educate myself in that regard. I have read the GHC manual,
RWH's chapter on profiling, also "Inside T5"'s recent series on the Haskell
heap, but no dice. Even if I can clearly see the exact line where at least some
of the leaking happens (as I can in this case,) it seems impossible for me to
prevent it.
*thank you very much* for reading this far. This is probably a mostly useless
email anyhow, I just had to get it off my chest. Maybe, just maybe, someone
among you will have a crucial insight that will save Haskell for me :-) But
currently, I see no justification to not start my next project in Lua, Python or
Java. Sure, Haskell's code is pretty, and it's fun, but if I can't actually
*run* it, why bother?  (Yes, this isn't the first time I've ran into this
problem …)
We definitely need more accessible material on how to reliably write
fast Haskell code. There are those among us who can, but it shouldn't
be necessary to learn it in the way they did (i.e. by lots of
tinkering, learning from the elders, etc). I'd like to write a 60 (or
so) pages tutorial on the subject, but haven't found the time.

In addition to RWH, perhaps the slides from the talk on
high-performance Haskell I gave could be useful:

    http://blog.johantibell.com/2010/09/slides-from-my-high-performance-haskell....

Cheers,
Johan

Re: [Haskell-cafe] How on Earth Do You Reason about Space?

Johan Tibell