
"Matthew Brecknell"
Pete Kazmier:
I attempted to read Oleg's fold-stream implementation [1] as this sounds quite appealing to me, but I was completely overwhelmed, especially with all of the various type signatures used. It would be great if one of the regular Haskell bloggers (Tom Moertel are you reading this?) might write a blog entry or two interpreting his implementation for those of us starting out in Haskell perhaps by starting out with a non-polymorphic version so as to emphasize the approach.
The basic idea of the paper is the use of a left-fold operator as the primary interface for enumarating collections. The recursive version (less general than the non-recursive version) of a left-fold operator for enumerating the lines of a text file might look something like this:
import Control.Monad.Fix import Control.Exception import Data.List import qualified Data.Set as S import qualified Data.Map as M import System.IO
enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a enumLines iter accum filename = do h <- openFile filename ReadMode flip fix accum $ \iterate accum -> do try_line <- try (hGetLine h) case try_line of Left e -> hClose h >> return accum Right line -> do case iter accum line of Left accum -> hClose h >> return accum Right accum -> iterate accum
I understand the intent of this code, but I am having a hard time understanding the implementation, specifically the combination of 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how one can call 'flip' on a function that takes one argument.
To use this, you provide an "iteratee", a function which takes an accumulator and a line from the file, and returns a new accumulator embedded in an Either. Using the Left branch causes immediate termination of the enumeration. For example, to search for the first occurrence of each of a set of email headers:
getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String String) getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where findHdrs accum@(wanted,found) line = if null line then Left accum else case headerLine line of Nothing -> Right accum Just hdr -> case findDelete hdr wanted of Nothing -> Right accum Just wanted -> let accum = (wanted, M.insert hdr line found) in if S.null wanted then Left accum else Right accum
headerLine :: String -> Maybe String headerLine (':':xs) = Just [] headerLine (x:xs) = fmap (x:) (headerLine xs) headerLine [] = Nothing
findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a) findDelete e s = if S.member e s then Just (S.delete e s) else Nothing
It's a bit of a case-analysis nightmare, but when comparing this to previous approaches, note that file traversal and processing are cleanly separated, file handle closure is guaranteed to be timely, file traversal stops as soon as all the required headers have been found, memory usage is minimised.
Very nice. I like the clean separation, but as you say, its one ugly bit of code compared to my original code, although much more elegant no doubt.
I hope that helps.
Very much so. Thank you for you help.