Daniel,
Thank you so much for helping me out with this issue!
Thanks to all the other answers from haskel-cafe members too!
As a newbie, I am not able to understand why zip and map would make a problem...
Is there any link I could read that could help me to understand why in this case
zip and map created a leak? What are some function compositions that should be
avoided when doing lazy I/O?
Regards,
Arnoldo
Am Donnerstag 11 März 2010 00:24:28 schrieb Daniel Fischer:
> Hmm, offhand, I don't see why that isn't strict enough.Turns out, mapM_ was a red herring. The villain was (zip and map).
I must confess, I don't know why it sort-of worked without the mapM_,
though. "sort-of", because that also hung on to unnecessarily much memory,
the space leak was just smaller than with the mapM_.
A very small change that eliminates the space leak, is
let (header,rest) = span (/= '\n') sequence
readFasta :: Int -> [Char] -> [Window]
readFasta windowSize sequence =
-- get the header
chr = parseChromosome header
go i (w:ws) = Window w chr i : go (i+1) ws
go _ [] = []
in go 0 $ slideWindow windowSize $ filter (/= '\n') rest
You can improve performance by eliminating slideWindow and the intermediate
Window list (merging fastaExtractor and readFasta),
{-# LANGUAGE BangPatterns #-}
readFasta2 :: (String -> Bool) -> Int -> String
readFasta2 test windowSize sequence =
let (header,rest) = span (/= '\n') sequence
chr = parseChromosome header
schr = show chr
go !i st@(_:tl)
| test w = w ++ '\t' : schr ++ '\t' : show i ++ '\n' : go
(i+1) tl
| otherwise = go (i+1) tl
where
w = take windowSize st
go _ [] = []
in go 0 (filter (/= '\n')) rest