Daniel,

Thank you so much for helping me out with this issue!

Thanks to all the other answers from haskel-cafe members too!

As a newbie, I am not able to understand why zip and map would make a problem...

Is there any link I could read that could help me to understand why in this case
zip and map created a leak? What are some function compositions that should be
avoided when doing lazy I/O?

Regards,

Arnoldo


On Thu, Mar 11, 2010 at 11:46 PM, Daniel Fischer <daniel.is.fischer@web.de> wrote:
Am Donnerstag 11 März 2010 00:24:28 schrieb Daniel Fischer:
> Hmm, offhand, I don't see why that isn't strict enough.

Turns out, mapM_ was a red herring. The villain was (zip and map).
I must confess, I don't know why it sort-of worked without the mapM_,
though. "sort-of", because that also hung on to unnecessarily much memory,
the space leak was just smaller than with the mapM_.

A very small change that eliminates the space leak, is

readFasta :: Int -> [Char] -> [Window]
readFasta windowSize sequence =
    -- get the header
    let (header,rest) = span (/= '\n') sequence
        chr = parseChromosome header
       go i (w:ws) = Window w chr i : go (i+1) ws
       go _ [] = []
   in go 0 $ slideWindow windowSize $ filter (/= '\n') rest

You can improve performance by eliminating slideWindow and the intermediate
Window list (merging fastaExtractor and readFasta),

{-# LANGUAGE BangPatterns #-}

readFasta2 :: (String -> Bool) -> Int -> String
readFasta2 test windowSize sequence =
   let (header,rest) = span (/= '\n') sequence
       chr = parseChromosome header
       schr = show chr
       go !i st@(_:tl)
           | test w    = w ++ '\t' : schr ++ '\t' : show i ++ '\n' : go
(i+1) tl
           | otherwise = go (i+1) tl
             where
               w = take windowSize st
       go _ [] = []
   in go 0 (filter (/= '\n')) rest