Re: [Haskell-cafe] How do I get this done in constant mem?

10 Oct 2009

      On Fri, Oct 09, 2009 at 05:48:15PM -0600, Luke Palmer wrote:
...
To: mf-hcafe-15c311f0c@etc-network.de
Cc: 
From: Luke Palmer 
Date: Fri, 9 Oct 2009 17:48:15 -0600
Subject: Re: [Haskell-cafe] How do I get this done in constant mem?
On Fri, Oct 9, 2009 at 2:05 PM,   wrote:
...
Hi all,
I think there is something about my use of the IO monad that bites me,
but I am bored of staring at the code, so here you g.  The code goes
through a list of records and collects the maximum in each record
position.
-- test.hs
import Random
import System.Environment (getArgs)
import System.IO (putStr)
samples :: Int -> Int -> IO [[Double]]
samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 1000 ** 3)
Yes, you should not do this in IO.  That requires the entire
computation to finish before the result can be used.  This computation
should be pure and lazy.
Yeah.  I also got an excellent reason via private mail why sequence
has to be strict:

sequence [Maybe 3, Maybe 4, Nothing]  = Nothing
sequence [Maybe 3, Maybe 4]           = Just [3, 4]
...
...
maxima :: [[Double]] -> [Double]
maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head samples) (tail samples)
FWIW, This function has a beautiful alternate definition:
maxima :: [[Double]] -> [Double]
maxima = map maximum . transpose
Beautiful indeed!  But see below.

To be honest, I don't really roll dice, but I am reading from a file.
I just thought that randomRIO would be more concise, but now the
discussion has gone totally in that direction.  Sorry...  (-: reading
the random number code is more fun, though!

Anyhow, I fixed my example to do lazy file processing where before I
used readFile (which has to be strict, as I can see now).  First, I
generate a file with the samples, and then I read that file back (this
is the phase I'm interested in, since my real data is not really
random numbers).

import List
import Monad
import Random
import System.Environment
import System.IO

samples :: Int -> Int -> IO [[Int]]
samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 1000 * 1000 * 1000)

maxima :: [[Int]] -> [Int]
maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head samples) (tail samples)

lazyProcess :: ([[Int]] -> a) -> FilePath -> IO a
lazyProcess f fileName =
    do
      h <- openFile fileName ReadMode
      v <- fmap (f . map read . lines) $ hGetContents h
      v `seq` hClose h
      return v

mkSamples = do
  args <- getArgs
  x <- samples (read (head args)) 5
  putStr . (++ "\n") . join . intersperse "\n" . map show $ x

-- main = mkSamples
-- ghc --make -O9 test.hs -o test && ./test 10000 > test.data

main = lazyProcess length "test.data" >>= putStr . show

lazyProcess (What would be a better name?  foldSampleFile perhaps?) is
where the IO happens, but the computation is located in a pure
function.  And yet, only those lines are read that are relevant, and
GC on previous lines is allows if the pure function allows it.

This program has constant memory usage.  Unfortunately, if I replace
the length function with implementation of maxima, it explodes again.
I tried a few things, such as

maxima'3 :: [[Int]] -> [Int]
maxima'3 (h:t) = foldr (\ x y -> let v = map (uncurry max) $ zip x y in sum v `seq` v) h t

with no luck so far.  Tricky business, that!  But much more curiously,
if I replace maxima'3 in main with this

maxima'4 :: [[Int]] -> [Int]
maxima'4 = map maximum . transpose

(with explicit type signature in both definitions), I get a 'no parse'
error from Prelude.read.  maxima'3 with the same file gives me a
result.  How can there be a difference if the type signatures are
identical?!

Probably something about "don't use Prelude.read" :-)?  I have to play
with this some more...

matthias

Re: [Haskell-cafe] How do I get this done in constant mem?

mf-hcafe-15c311f0c＠etc-network.de