
How can anyone stay away from such a deliciously pointless waste of time as implementing a wc(1) derivate? :-) Here is my attempt:
import IO
type Count = Int data CountingState = ST !Bool !Count !Count !Count deriving (Show)
initCST = ST True 0 0 0
wc :: CountingState -> [Char] -> CountingState wc (ST _ l w c) ('\n':xs) = wc (ST True (l+1) w (c+1)) xs wc (ST _ l w c) (' ' :xs) = wc (ST True l w (c+1)) xs wc (ST _ l w c) ('\t':xs) = wc (ST True l w (c+1)) xs wc (ST True l w c) ( x :xs) = wc (ST False l (w+1) (c+1)) xs wc (ST False l w c) ( x :xs) = wc (ST False l w (c+1)) xs wc st [] = st
main :: IO () main = do ST _ l w c <- getContents >>= return . wc initCST putStrLn $ (l `shows`) . spaces . (w `shows`) . spaces . (c `shows`) $ [] where spaces = (' ':) . (' ':) . (' ':)
I compiled this with "ghc -O2 -funbox-strict-fields" and got the following performance results in a simple test. The wc(1) tool: $ time /usr/bin/wc
import IO import Control.Monad.State
type Count = Int data CountingState = ST !Bool !Count !Count !Count deriving (Show)
type WordCounter = State CountingState ()
initCST = ST True 0 0 0
wc :: Char -> WordCounter wc x = get >>= \(ST b l w c) -> case (b,x) of ( _ , '\n') -> put (ST True (l+1) w (c+1)) ( _ , '\t') -> put (ST True l w (c+1)) ( _ , ' ' ) -> put (ST True l w (c+1)) (True, _ ) -> put (ST False l (w+1) (c+1)) (False, _ ) -> put (ST False l w (c+1))
main :: IO () main = do xs <- getContents let ST _ l w c = snd (runState (mapM wc xs) initCST) putStrLn $ (l `shows`) . spaces . (w `shows`) . spaces . (c `shows`) $ [] where spaces = (' ':) . (' ':) . (' ':)
Curiously enough, this version fails to process the "words" file because it runs out of stack space! Naturally, it is very slow, too. So I wonder: How needs that program above to be changed in order to solve this space leak? Why does this happen in the first place? Peter