
Hello, I've spent some time over the last couple of days trying to write an enumeratee that prints a "." every n bytes (with obvious intended use as a progress tracker). Seems like it oughtn't be hard, but it has been a steep learning curve... I have come up with something that seems to do the job but I don't know that I'm completely happy with it (or even that I completely understand it, to be honest). If anyone more expert would be kind enough either to reassure me that I'm doing it right or - more likely - to offer improvements / suggestions on what obvious simplifications I have overlooked, I'd be grateful. Thanks David import qualified Data.ByteString.Char8 as B import qualified Data.ByteString.Lazy.Char8 as BL import qualified Data.Enumerator as E import qualified Data.Enumerator.Binary as EB import Control.Monad.IO.Class (liftIO, MonadIO) import System.IO (hFlush, stdout) dotEvery :: MonadIO m => Integer -> E.Enumeratee B.ByteString B.ByteString m b dotEvery count = E.checkDone $ E.continue . dotIn count where dotIn need k E.EOF = E.yield (E.Continue k) E.EOF dotIn need k (E.Chunks []) = E.continue (dotIn need k) dotIn need k (E.Chunks xs) = iter where lazy = BL.fromChunks xs len = toInteger $ BL.length lazy iter = if len < need then k (E.Chunks xs) E.>>== E.checkDone (E.continue . dotIn (count - len)) else let (x1, x2) = BL.splitAt (fromInteger need) lazy s1 = E.Chunks $ BL.toChunks x1 s2 = E.Chunks $ BL.toChunks x2 enumee = E.checkDoneEx s2 (\k' -> dotIn count k' s2) in E.Iteratee $ do newStep <- E.runIteratee $ k s1 liftIO $ putStr "." >> hFlush stdout E.runIteratee $ enumee newStep PS Implementations which involve "EB.take count" seem to me unsatisfactory; one surely oughtn't need to have a large buffer to solve this problem PPS I did find an implementation via mapAccumM which I consider pleasing enough from an aesthetic point of view - but which runs 30x slower dotAt :: MonadIO m => Integer -> Integer -> Word8 -> m (Integer, Word8) dotAt n s w | s >= n = do liftIO $ putStr "." >> hFlush stdout return (1, w) | otherwise = return (s+1, w) dotEvery' :: MonadIO m => Integer -> E.Enumeratee B.ByteString B.ByteString m b dotEvery' n = EB.mapAccumM (dotAt n) 1

An error slipped into the version below. The line:
E.checkDone (E.continue . dotIn (count - len))
should read
E.checkDone (E.continue . dotIn (need - len))
"David Hotham"
Hello,
I've spent some time over the last couple of days trying to write an enumeratee that prints a "." every n bytes (with obvious intended use as a progress tracker). Seems like it oughtn't be hard, but it has been a steep learning curve...
I have come up with something that seems to do the job but I don't know that I'm completely happy with it (or even that I completely understand it, to be honest).
If anyone more expert would be kind enough either to reassure me that I'm doing it right or - more likely - to offer improvements / suggestions on what obvious simplifications I have overlooked, I'd be grateful.
Thanks
David
import qualified Data.ByteString.Char8 as B import qualified Data.ByteString.Lazy.Char8 as BL import qualified Data.Enumerator as E import qualified Data.Enumerator.Binary as EB import Control.Monad.IO.Class (liftIO, MonadIO) import System.IO (hFlush, stdout)
dotEvery :: MonadIO m => Integer -> E.Enumeratee B.ByteString B.ByteString m b dotEvery count = E.checkDone $ E.continue . dotIn count where dotIn need k E.EOF = E.yield (E.Continue k) E.EOF dotIn need k (E.Chunks []) = E.continue (dotIn need k) dotIn need k (E.Chunks xs) = iter where lazy = BL.fromChunks xs len = toInteger $ BL.length lazy iter = if len < need then k (E.Chunks xs) E.>>== E.checkDone (E.continue . dotIn (count - len)) else let (x1, x2) = BL.splitAt (fromInteger need) lazy s1 = E.Chunks $ BL.toChunks x1 s2 = E.Chunks $ BL.toChunks x2 enumee = E.checkDoneEx s2 (\k' -> dotIn count k' s2) in E.Iteratee $ do newStep <- E.runIteratee $ k s1 liftIO $ putStr "." >> hFlush stdout E.runIteratee $ enumee newStep
PS Implementations which involve "EB.take count" seem to me unsatisfactory; one surely oughtn't need to have a large buffer to solve this problem PPS I did find an implementation via mapAccumM which I consider pleasing enough from an aesthetic point of view - but which runs 30x slower
dotAt :: MonadIO m => Integer -> Integer -> Word8 -> m (Integer, Word8) dotAt n s w | s >= n = do liftIO $ putStr "." >> hFlush stdout return (1, w) | otherwise = return (s+1, w)
dotEvery' :: MonadIO m => Integer -> E.Enumeratee B.ByteString B.ByteString m b dotEvery' n = EB.mapAccumM (dotAt n) 1

"David Hotham"
I've spent some time over the last couple of days trying to write an enumeratee that prints a "." every n bytes (with obvious intended use as a progress tracker). Seems like it oughtn't be hard, but it has been a steep learning curve...
I have come up with something that seems to do the job but I don't know that I'm completely happy with it (or even that I completely understand it, to be honest).
If anyone more expert would be kind enough either to reassure me that I'm doing it right or - more likely - to offer improvements / suggestions on what obvious simplifications I have overlooked, I'd be grateful.
I think that using lazy bytestrings does not have any advantage here, since the enumerator creates the strict bytestrings at some point, then your enumeratee converts them into lazy ones just for counting. Just use the straightforward approach: Take the chunks and count the bytes like here: {-# LANGUAGE ScopedTypeVariables #-} dotsAt :: forall b m. MonadPeelIO m => Int -> Enumeratee ByteString ByteString m b dotsAt n = loop 0 where loop :: Int -> Enumeratee ByteString ByteString m b loop i' step@(Continue k) = continue go where go :: Stream ByteString -> Iteratee ByteString m (Step ByteString m b) go EOF = return step go ch@(Chunks strs) = do let (numDots, i) = divMod (i' + sum (L.map BC.length strs)) n tryIO $ BC.putStr (BC.replicate numDots '.') k ch >>== loop i loop i' step = return step I think, this is about the most straightforward and also the fastest approach. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Thanks for the reply.
I did have a version along those lines at some point, but I felt it was
cheating rather to print the dots not at the correct point in the stream.
Perhaps I've over-complicated for the sake of the learning experience, but I
do like to have a version that passes on the correct number of bytes, then
prints the ".", and then continues.
David
"Ertugrul Soeylemez"
"David Hotham"
wrote: I've spent some time over the last couple of days trying to write an enumeratee that prints a "." every n bytes (with obvious intended use as a progress tracker). Seems like it oughtn't be hard, but it has been a steep learning curve...
I have come up with something that seems to do the job but I don't know that I'm completely happy with it (or even that I completely understand it, to be honest).
If anyone more expert would be kind enough either to reassure me that I'm doing it right or - more likely - to offer improvements / suggestions on what obvious simplifications I have overlooked, I'd be grateful.
I think that using lazy bytestrings does not have any advantage here, since the enumerator creates the strict bytestrings at some point, then your enumeratee converts them into lazy ones just for counting. Just use the straightforward approach: Take the chunks and count the bytes like here:
{-# LANGUAGE ScopedTypeVariables #-}
dotsAt :: forall b m. MonadPeelIO m => Int -> Enumeratee ByteString ByteString m b dotsAt n = loop 0
where loop :: Int -> Enumeratee ByteString ByteString m b loop i' step@(Continue k) = continue go
where go :: Stream ByteString -> Iteratee ByteString m (Step ByteString m b) go EOF = return step go ch@(Chunks strs) = do let (numDots, i) = divMod (i' + sum (L.map BC.length strs)) n tryIO $ BC.putStr (BC.replicate numDots '.') k ch >>== loop i loop i' step = return step
I think, this is about the most straightforward and also the fastest approach.
Greets, Ertugrul
-- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

"David Hotham"
I did have a version along those lines at some point, but I felt it was cheating rather to print the dots not at the correct point in the stream.
Perhaps I've over-complicated for the sake of the learning experience, but I do like to have a version that passes on the correct number of bytes, then prints the ".", and then continues.
Well, then just do the printing after calling the continuation: dotsAt :: forall b m. MonadPeelIO m => Int -> Enumeratee ByteString ByteString m b dotsAt n = loop 0 where loop :: Int -> Enumeratee ByteString ByteString m b loop i' step@(Continue k) = continue go where go :: Stream ByteString -> Iteratee ByteString m (Step ByteString m b) go EOF = return step go ch@(Chunks strs) = do let (numDots, i) = divMod (i' + sum (L.map BC.length strs)) n printDots = tryIO $ BC.putStr (BC.replicate numDots '.') >> hFlush stdout k ch >>== (\step -> printDots >> loop i step) loop i' step = return step By the way, after trying out the code, I found that you should use hFlush after printing. Otherwise you may see the dots delayed. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

The desired behaviour (certainly my desired behaviour, but I think also the
most useful behaviour generally) is that the enumeratee passes n bytes to
its iteratee, prints a dot, and repeats.
Given that, printing the dots all in one bunch after passing bytes to the
iteratee isn't any improvement over printing the dots all in one bunch
before passing them to the iteratee.
I think that mostly I want it the way that I want it because that's the bit
that I struggled most over and I'm now reluctant to give it up! However
this might actually make a useful difference to behaviour in the case of an
iteratee that did very expensive processing, or that itself performed IO.
In such cases, my behaviour could be expected to give a more accurate
indication of how far through processing we'd actually got.
David
PS Yes, I already tried out the code ;-). You'll see that both of my
versions did indeed hFlush stdout.
"Ertugrul Soeylemez"
"David Hotham"
wrote: I did have a version along those lines at some point, but I felt it was cheating rather to print the dots not at the correct point in the stream.
Perhaps I've over-complicated for the sake of the learning experience, but I do like to have a version that passes on the correct number of bytes, then prints the ".", and then continues.
Well, then just do the printing after calling the continuation:
dotsAt :: forall b m. MonadPeelIO m => Int -> Enumeratee ByteString ByteString m b dotsAt n = loop 0
where loop :: Int -> Enumeratee ByteString ByteString m b loop i' step@(Continue k) = continue go
where go :: Stream ByteString -> Iteratee ByteString m (Step ByteString m b) go EOF = return step go ch@(Chunks strs) = do let (numDots, i) = divMod (i' + sum (L.map BC.length strs)) n printDots = tryIO $ BC.putStr (BC.replicate numDots '.') >> hFlush stdout k ch >>== (\step -> printDots >> loop i step) loop i' step = return step
By the way, after trying out the code, I found that you should use hFlush after printing. Otherwise you may see the dots delayed.
Greets, Ertugrul
-- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

"David Hotham"
The desired behaviour (certainly my desired behaviour, but I think also the most useful behaviour generally) is that the enumeratee passes n bytes to its iteratee, prints a dot, and repeats.
Given that, printing the dots all in one bunch after passing bytes to the iteratee isn't any improvement over printing the dots all in one bunch before passing them to the iteratee.
I think that mostly I want it the way that I want it because that's the bit that I struggled most over and I'm now reluctant to give it up! However this might actually make a useful difference to behaviour in the case of an iteratee that did very expensive processing, or that itself performed IO. In such cases, my behaviour could be expected to give a more accurate indication of how far through processing we'd actually got.
If you're talking about my code, you must have misunderstood something. It does not print the dots all in one bunch, but prints them as input is requested from the enumerator. The last version I posted prints the dots after the iteratee has consumed the input. That difference is noticable, when your iteratee does complex computations before it goes back to the Continue state. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

As a side note, even though GHC seems to handle this properly, I would force the value of 'i' before passing data to the continuation. Otherwise a less smart compiler may eat memory. I believe, it can only eat memory proportional to 'n', but nevertheless real constant space is better: seq i $ k ch >>== (\step -> printDots >> loop i step) Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

My enumerator style may not be the best (I'm long-winded), but personally when the stream types are the same on input and output I often skip the Enumeratee stuff and just write an Enumerator wrapper. To address your complaint here:
PS Implementations which involve "EB.take count" seem to me unsatisfactory; one surely oughtn't need to have a large buffer to solve this problem
I'd write a helping combinator:
module Main where
import Control.Monad (when) import Control.Monad.Trans import Data.ByteString.Char8 (ByteString) import qualified Data.ByteString.Lazy.Char8 as L import Data.Enumerator import qualified Data.Enumerator.List as EL import System.IO
takeUpTo :: Monad m => Int -> Iteratee ByteString m (Stream ByteString, Int) takeUpTo n' = continue k where n = toEnum n'
k EOF = yield (EOF,0) EOF k (Chunks xs) = if taken == 0 then takeUpTo n' else yield (stream, taken) rest where s = L.fromChunks xs (a,b) = L.splitAt n s taken = fromEnum $ L.length a stream = Chunks $ L.toChunks a rest = Chunks $ L.toChunks b
The code to run a side effect every N bytes is then pretty short (and should be efficient):
sideEffectEveryNBytes :: Monad m => Int -- ^ run the side effect every N bytes -> m () -- ^ side effect -> Step ByteString m a -> Iteratee ByteString m a sideEffectEveryNBytes n act = flip checkContinue1 n $ \loop i k -> do (str, taken) <- takeUpTo i when (taken == i) $ lift act (lift $ runIteratee $ k str) >>= loop (nextI $ i - taken) where nextI 0 = n nextI i = i
Here's your particular example:
example :: IO [ByteString] example = run_ $ enumList 1 [ "the quick brown " , "fox " , "jumped " , "over " , "the lazy dog" ] $$ it where it = do xs <- sideEffectEveryNBytes 10 (putStr "." >> hFlush stdout) $$ EL.consume lift $ putStrLn "" return xs
Running it:
*Main> example .... ["the quick ","brown ","fox ","jumped ","ove","r ","the lazy"," dog"]
G
--
Gregory Collins

Very interesting - thanks!
"Gregory Collins"
My enumerator style may not be the best (I'm long-winded), but personally when the stream types are the same on input and output I often skip the Enumeratee stuff and just write an Enumerator wrapper. To address your complaint here:
PS Implementations which involve "EB.take count" seem to me unsatisfactory; one surely oughtn't need to have a large buffer to solve this problem
I'd write a helping combinator:
module Main where
import Control.Monad (when) import Control.Monad.Trans import Data.ByteString.Char8 (ByteString) import qualified Data.ByteString.Lazy.Char8 as L import Data.Enumerator import qualified Data.Enumerator.List as EL import System.IO
takeUpTo :: Monad m => Int -> Iteratee ByteString m (Stream ByteString, Int) takeUpTo n' = continue k where n = toEnum n'
k EOF = yield (EOF,0) EOF k (Chunks xs) = if taken == 0 then takeUpTo n' else yield (stream, taken) rest where s = L.fromChunks xs (a,b) = L.splitAt n s taken = fromEnum $ L.length a stream = Chunks $ L.toChunks a rest = Chunks $ L.toChunks b
The code to run a side effect every N bytes is then pretty short (and should be efficient):
sideEffectEveryNBytes :: Monad m => Int -- ^ run the side effect every N bytes -> m () -- ^ side effect -> Step ByteString m a -> Iteratee ByteString m a sideEffectEveryNBytes n act = flip checkContinue1 n $ \loop i k -> do (str, taken) <- takeUpTo i when (taken == i) $ lift act (lift $ runIteratee $ k str) >>= loop (nextI $ i - taken) where nextI 0 = n nextI i = i
Here's your particular example:
example :: IO [ByteString] example = run_ $ enumList 1 [ "the quick brown " , "fox " , "jumped " , "over " , "the lazy dog" ] $$ it where it = do xs <- sideEffectEveryNBytes 10 (putStr "." >> hFlush stdout) $$ EL.consume lift $ putStrLn "" return xs
Running it:
*Main> example .... ["the quick ","brown ","fox ","jumped ","ove","r ","the lazy"," dog"]
G -- Gregory Collins
participants (3)
-
David Hotham
-
Ertugrul Soeylemez
-
Gregory Collins