
[left-fold operator for enumerating the lines of a text file] ..
enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a enumLines iter accum filename = do h <- openFile filename ReadMode flip fix accum $ \iterate accum -> do try_line <- try (hGetLine h) case try_line of Left e -> hClose h >> return accum Right line -> do case iter accum line of Left accum -> hClose h >> return accum Right accum -> iterate accum .. getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String String) getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where
we can keep the Left/Right implicit, using either: getHeaders1 hdrs file = enumLines findHdrs (hdrs,M.empty) file where enumLines iter accum filename = do h <- openFile filename ReadMode flip fix accum $ \iterate accum -> join $ (fmap (either (const $ hClose h >> return accum) (either ((hClose h >>) . return) iterate . iter accum))) (try (hGetLine h)) or extract the reusable loop-with-exit-by-either functionality: loopME m stop continue acc = m >>= either (stop acc) (continue (loopME m stop continue) acc) getHeaders2 hdrs file = enumLines findHdrs (hdrs,M.empty) file where enumLines iter accum f = do h <- openFile f ReadMode loopME (try (hGetLine h)) (\acc left->hClose h >> return acc) (\loop acc right->either ((hClose h >>) . return) loop (iter acc right)) accum or sneak some lazy i/o back in, using a fold-with-exit-by-either, similar to loopME: withFile path m = bracket (openFile path ReadMode) hClose m withContentsOf path f = withFile path ((((return $!) . f ) =<<) . hGetContents) withLinesOf path f = withContentsOf path (f . lines) foldE f a [] = a foldE f a (x:xs) = either id (\a'->foldE f a' xs) (f a x) getHeaders3 hdrs file = withLinesOf file (foldE findHdrs (hdrs,M.empty))
To use this, you provide an "iteratee", a function which takes an accumulator and a line from the file, and returns a new accumulator embedded in an Either. Using the Left branch causes immediate termination of the enumeration. For example, to search for the first occurrence of each of a set of email headers:
findHdrs accum@(wanted,found) line = if null line then Left accum else case headerLine line of Nothing -> Right accum Just hdr -> case findDelete hdr wanted of Nothing -> Right accum Just wanted -> let accum = (wanted, M.insert hdr line found) in if S.null wanted then Left accum else Right accum
headerLine :: String -> Maybe String headerLine (':':xs) = Just [] headerLine (x:xs) = fmap (x:) (headerLine xs) headerLine [] = Nothing
findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a) findDelete e s = if S.member e s then Just (S.delete e s) else Nothing
It's a bit of a case-analysis nightmare
indeed, this part can be cleaned up considerably, using Monad Maybe: findHdrs accum@(wanted,found) line = if null line || S.null wanted then Left accum else maybe (Right accum) id $ do (field,value) <- headerLine line wanted' <- findDelete field wanted let found' = M.insert field value found return $! (Right $! ((,) $! wanted') $! found') headerLine :: String -> Maybe (String,String) headerLine xs = do (field,':':value) <- return (span (/=':') xs) let value' = dropWhile isSpace value return $! ((,) $! field) $! strictly value' findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a) findDelete e s = guard (S.member e s) >> return (S.delete e s) strictly l = length l `seq` l running the three variants over a moderately sized directory (>3k emails, one including a hugs-tarball;-), 1/2 are roughly equivalent, but Hugs claims that 3 allocates less and needs fewer garbage collections than 1/2, while GHC claims that it is the other way round.. claus