
On Fri, Oct 09, 2009 at 05:48:15PM -0600, Luke Palmer wrote:
To: mf-hcafe-15c311f0c@etc-network.de Cc: From: Luke Palmer
Date: Fri, 9 Oct 2009 17:48:15 -0600 Subject: Re: [Haskell-cafe] How do I get this done in constant mem? On Fri, Oct 9, 2009 at 2:05 PM,
wrote: Hi all,
I think there is something about my use of the IO monad that bites me, but I am bored of staring at the code, so here you g. The code goes through a list of records and collects the maximum in each record position.
-- test.hs import Random import System.Environment (getArgs) import System.IO (putStr)
samples :: Int -> Int -> IO [[Double]] samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 1000 ** 3)
Yes, you should not do this in IO. That requires the entire computation to finish before the result can be used. This computation should be pure and lazy.
Yeah. I also got an excellent reason via private mail why sequence has to be strict: sequence [Maybe 3, Maybe 4, Nothing] = Nothing sequence [Maybe 3, Maybe 4] = Just [3, 4]
maxima :: [[Double]] -> [Double] maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head samples) (tail samples)
FWIW, This function has a beautiful alternate definition:
maxima :: [[Double]] -> [Double] maxima = map maximum . transpose
Beautiful indeed! But see below. To be honest, I don't really roll dice, but I am reading from a file. I just thought that randomRIO would be more concise, but now the discussion has gone totally in that direction. Sorry... (-: reading the random number code is more fun, though! Anyhow, I fixed my example to do lazy file processing where before I used readFile (which has to be strict, as I can see now). First, I generate a file with the samples, and then I read that file back (this is the phase I'm interested in, since my real data is not really random numbers). import List import Monad import Random import System.Environment import System.IO samples :: Int -> Int -> IO [[Int]] samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 1000 * 1000 * 1000) maxima :: [[Int]] -> [Int] maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head samples) (tail samples) lazyProcess :: ([[Int]] -> a) -> FilePath -> IO a lazyProcess f fileName = do h <- openFile fileName ReadMode v <- fmap (f . map read . lines) $ hGetContents h v `seq` hClose h return v mkSamples = do args <- getArgs x <- samples (read (head args)) 5 putStr . (++ "\n") . join . intersperse "\n" . map show $ x -- main = mkSamples -- ghc --make -O9 test.hs -o test && ./test 10000 > test.data main = lazyProcess length "test.data" >>= putStr . show lazyProcess (What would be a better name? foldSampleFile perhaps?) is where the IO happens, but the computation is located in a pure function. And yet, only those lines are read that are relevant, and GC on previous lines is allows if the pure function allows it. This program has constant memory usage. Unfortunately, if I replace the length function with implementation of maxima, it explodes again. I tried a few things, such as maxima'3 :: [[Int]] -> [Int] maxima'3 (h:t) = foldr (\ x y -> let v = map (uncurry max) $ zip x y in sum v `seq` v) h t with no luck so far. Tricky business, that! But much more curiously, if I replace maxima'3 in main with this maxima'4 :: [[Int]] -> [Int] maxima'4 = map maximum . transpose (with explicit type signature in both definitions), I get a 'no parse' error from Prelude.read. maxima'3 with the same file gives me a result. How can there be a difference if the type signatures are identical?! Probably something about "don't use Prelude.read" :-)? I have to play with this some more... matthias