Space leak - help needed - Haskell-Cafe

3 Mar 2008

      Dear Haskellers,

Another story from an (almost) happy Haskell user that finds himself
overwhelmed by laziness/space leaks.

I'm trying to parse a large file (>600MB) with a single S-expression
like structure. With the help of ByteStrings I'm down to 4min processing
time in constant space. However, when I try to wrap the parse results
in a data structure, the heap blows up - even though I never actually
inspect the structure being built! This bugs me, so I come here looking
for answers.

Parser follows:
...
module Main where
import qualified Data.ByteString.Lazy.Char8 as B
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Pos
import System.Environment
import System.Exit
import qualified Data.Map as M
import Lexer
type XdlParser a = GenParser Token XdlState a
-- Parser state
type XdlState = Counts
type Counts = M.Map Count Integer
data Count = ListCount | SymbolCount
    deriving (Eq, Ord, Show)
emptyXdlState = M.empty
incCount :: Count -> (Counts -> Counts)
incCount c = M.insertWith' (+) c 1
-- handling tokens
myToken      :: (Token -> Maybe a) -> XdlParser a
myToken test  = token showTok posTok testTok
    where
        showTok = show
        posTok  = const (initialPos "")
        testTok = test
-- Syntax of expressions
data Exp = Sym !B.ByteString | List ![Exp]
    deriving (Eq, Show)
expr =  list <|> symbol
rparen = myToken $ \t -> case t of
            RParen  -> Just ()
            other   -> Nothing
lparen = myToken $ \t -> case t of
            LParen  -> Just ()
            other   -> Nothing
name = myToken $ \t -> case t of
            Name n -> Just n
            other  -> Nothing
list = do
    updateState $ incCount ListCount
    lparen
    xs <- many1 expr
    rparen
    return ()
--  return $! (List xs)
symbol = do
    updateState $ incCount SymbolCount
    name >> return ()
--  Sym `fmap` name
-- Top level parser
top :: XdlParser XdlState
top =  do
    l <- many1 list
    eof
    getState
main = do
    args <- getArgs
    case args of
        [fname] -> do
            text <- B.readFile fname
            let result = runParser top emptyXdlState fname (tokenize text)
            putStrLn $ either show show result
        _ -> putStrLn "usage: parse filename" >> exitFailure
And the Lexer:
...
module Lexer (Token(..), tokenize) where
import qualified Data.ByteString.Lazy.Char8 as B
import Control.Monad
import Data.Char
import Data.List
import System.Environment
import System.Exit
data Token = LParen
           | RParen
           | Name B.ByteString
    deriving (Ord, Eq, Show)
type Input = B.ByteString
-- Processor returns Nothing if it can't process the Input
type Processor = Input -> Maybe ([Token], Input)
-- Tokenize ends the list when all processors return Nothing
tokenize :: Input -> [Token]
tokenize  = concat . unfoldr processAll
    where
        processors    = [doSpaces, doComment, doParens, doName]
        processAll   :: Processor
        processAll bs = if B.null bs 
                            then Nothing
                            else foldr mminus Nothing $ map ($ bs) processors
        mminus a@(Just _) _ = a
        mminus Nothing    b = b
doSpaces    :: Processor
doSpaces bs =
        if B.null sp
            then Nothing
            else Just ([], nsp)
    where
        (sp, nsp) = B.span isSpace bs
doComment    :: Processor
doComment bs =
    if B.pack "# " `B.isPrefixOf` bs
        then Just ([], B.dropWhile (/= '\n') bs)
        else Nothing
doParens     :: Processor
doParens bs  = case B.head bs of
    '(' -> Just ([LParen], B.tail bs)
    ')' -> Just ([RParen], B.tail bs)
    _   -> Nothing
doName       :: Processor
doName  bs   =
        if B.null nsp
            then Nothing
            else Just ([Name nsp], sp)
    where
        (nsp, sp) = B.span (not . isRest) bs
        isRest c = isSpace c || c == ')' || c == '('
Regards,
-- 
Krzysztof Kościuszkiewicz
Skype: dr.vee,  Gadu: 111851,  Jabber: kokr@jabberpl.org
"Simplicity is the ultimate sophistication" -- Leonardo da Vinci

Space leak - help needed

Krzysztof Kościuszkiewicz

Luke Palmer

Bertram Felgenhauer

Krzysztof Kościuszkiewicz

Justin Bailey

Krzysztof Kościuszkiewicz

Justin Bailey

Bertram Felgenhauer

Krzysztof Kościuszkiewicz

tags

participants (5)