Hello Daniel:
Thanks!
I employed mapM'_ but I am still getting the space leak.
Any other hint?
Arnoldo
Am Mittwoch 10 März 2010 21:45:56 schrieb Arnoldo Muller:
> Hello,No work is been done until the end, when all is tried to be done
>
> I am learning haskell and I found a space leak that I find difficult to
> solve. I've been asking at #haskell but we could not solve
> the issue.
>
> I want to lazily read a set of 22 files of about 200MB each, filter them
> and then I want to output the result into a unique file.
> If I modify the main function to work only with one input file, the
> program runs without issues. I will call this version A.
> Version B uses a mapM_ to iterate over a list of filenames and uses
> appendFile to output the result of filtering each file.
> In this case the memory usage grows sharply and quickly (profiles show
> constant memory growth). In less than a minute, memory
> occupation will make my system hang with swapping.
simultaneously. Make sure genomeExecute ... input1 has actually finished
its work before genomeExecute ... input2 starts etc.
One way is to use a stricter version of sequence_,
sequence'_ :: Monad m => [m a] -> m ()
sequence'_ (x:xs) = do
a <- x
a `seq` sequence'_ xs
sequence'_ [] = return ()
(nicer with BangPatterns, but not portable), and
mapM'_ f = sequence'_ . map f
Another option is making genomeExecute itself stricter.
>
> This is version B:
>
> ------------------------------- Program B
> ------------------------------------------------------------------------
>-------------------------------------------- import Data.List
> import System.Environment
> import System.Directory
> import Control.Monad
>
>
> -- different types of chromosomes
> data Chromosome = C1
>
> | C2
> | C3
> | C4
> | C5
> | C6
> | C7
> | C8
> | C9
> | C10
> | C11
> | C12
> | C13
> | C14
> | C15
> | C16
> | C17
> | C18
> | C19
> | CX
> | CY
> | CMT
>
> deriving (Show)
> -- define a window
> type Sequence = [Char]
> -- Window data
> data Window = Window { sequen :: Sequence,
> chrom :: Chromosome,
> pos :: Int
> }
> -- print a window
> instance Show Window where
> show w = (sequen w) ++ "\t" ++ show (chrom w) ++ "\t" ++ show (pos
> w)
>
> -- Reading fasta files with haskell
>
> -- Initialize the
> main = do
> -- get the arguments (intput is
> [input, output, windowSize] <- getArgs
> -- get directory contents (only names)
> names <- getDirectoryContents input
> -- prepend directory
> let fullNames = filter isFastaFile $ map (\x -> input ++ "/" ++
> x) names
> let wSize = (read windowSize)::Int
> -- process the directories
> mapM (genomeExecute output wSize filterWindow) fullNames
>
>
> -- read the files one by one and write them to the output file
> genomeExecute :: String -> Int -> (Window -> Bool) -> String -> IO ()
> genomeExecute outputFile windowSize f inputFile = do
> fileData <- readFile inputFile
> appendFile outputFile $ fastaExtractor fileData windowSize f