
Hi Arnoldo This doesn't address the space leak, but your parseChromosome function looks very inefficient - isInfixOf is repeatedly checking the prefix "chromosome" for C1 to CY. If you have a lot of CY's in a file then it will do a lot of work parsing them. The cleanest way of handling this would be to use a parser combinator library with keywords for "chromosome" and "mitochondrion" - however that might add a performance penalty itself. Here is a version that should be fairly efficient although a little ugly due to how it has to match literal chars in prefix of the string: Add a import for Data.Char to the import list:
import Data.Char
Add Enum to the deriving clause of the Chromosome data type:
| C19 | CX | CY | CMT deriving (Show,Enum)
Replace parseChromosome with the one below. Note that the derived Enum functions for Chromosome are indexed from 0.. whereas when you read one from the file it is indexed from 1.. so you have to sub1 before using toEnum: sub1 :: Int -> Int sub1 x = x-1 parseChromosome :: [Char] -> Chromosome parseChromosome ('c':'h':'r':'o':'m':'o':'s':'o':'m':'e':' ':xs) = chro xs where chro ('X' :_) = CX chro ('Y' :_) = CY chro ( x : ',' :_) | isDigit x = toEnum (sub1 $ digitToInt x) chro ('1' : x : ',' :_ ) | isDigit x = toEnum (sub1 $ (10+) $ digitToInt x) chro ('1' : x :_ ) | isDigit x = toEnum (sub1 $ (10+) $ digitToInt x) chro _ = error "BAD header" parseChromosome ('m':'i':'t':'o':'c':'h':'o':'n':'d':'r':'i':'o':'n':_) = CMT parseChromosome _ = error "BAD header" Best wishes Stephen