
Daniel,
Thank you so much for helping me out with this issue!
Thanks to all the other answers from haskel-cafe members too!
As a newbie, I am not able to understand why zip and map would make a
problem...
Is there any link I could read that could help me to understand why in this
case
zip and map created a leak? What are some function compositions that should
be
avoided when doing lazy I/O?
Regards,
Arnoldo
On Thu, Mar 11, 2010 at 11:46 PM, Daniel Fischer
Am Donnerstag 11 März 2010 00:24:28 schrieb Daniel Fischer:
Hmm, offhand, I don't see why that isn't strict enough.
Turns out, mapM_ was a red herring. The villain was (zip and map). I must confess, I don't know why it sort-of worked without the mapM_, though. "sort-of", because that also hung on to unnecessarily much memory, the space leak was just smaller than with the mapM_.
A very small change that eliminates the space leak, is
readFasta :: Int -> [Char] -> [Window] readFasta windowSize sequence = -- get the header let (header,rest) = span (/= '\n') sequence chr = parseChromosome header go i (w:ws) = Window w chr i : go (i+1) ws go _ [] = [] in go 0 $ slideWindow windowSize $ filter (/= '\n') rest
You can improve performance by eliminating slideWindow and the intermediate Window list (merging fastaExtractor and readFasta),
{-# LANGUAGE BangPatterns #-}
readFasta2 :: (String -> Bool) -> Int -> String readFasta2 test windowSize sequence = let (header,rest) = span (/= '\n') sequence chr = parseChromosome header schr = show chr go !i st@(_:tl) | test w = w ++ '\t' : schr ++ '\t' : show i ++ '\n' : go (i+1) tl | otherwise = go (i+1) tl where w = take windowSize st go _ [] = [] in go 0 (filter (/= '\n')) rest