Re: [Haskell-cafe] compare iteratee with python's yield

Your question is insightful. The posted answers are excellent. I merely wish to illustrate one, already made, point on a concrete example. The complete code is in the file http://okmij.org/ftp/Haskell/Iteratee/IterDemo.hs The file implements a series of progressively more complex examples -- from wc to grep, -- stressing compositionality. The whole program is assembled from independent building blocks; as our task changes, we replace one building block with another. The building blocks, iteratees, may be stateful. We never have to worry about writing out the whole state of the program and properly initializing it. The overall state is implicitly composed from the state of each building block; each iteratee manages its own state without leaking it. The particular example to illustrate the state encapsulation is grep1_word
grep1_word word fname = let search = IterateeM.dropWhile (/= word) >> IterateeM.head runI' = runI . en_handle (\_ -> "not found:" ++ word) in print =<< run =<< enum_file fname (runI =<< en_lines ((runI' =<< en_words search) `en_pair` (count_i `en_pair` en_last)))
which implements "grep -w -n -m 1 <word> <filename>". It searches for the first occurrence of the given word in the file and returns the line of that occurrence and the line number. (The example also illustrates exception handling: The exception is raised by Iteratee.head when IterateeM.dropWhile dropped the whole stream while looking for the given word.) A few comments: en_lines is the enumeratee, transforming a stream of characters into a stream of lines; en_words is the enumeratee that transforms the stream of lines to the stream of words. The stateful iteratee count_i counts the elements in any stream; en_last, the analogue of List.last, returns the last seen element. The combinator en_pair pairs two iteratees to run in parallel, processing the same stream chunk-by-chunk. The processing stops as soon as one of the iteratees in the pair stopped. That behavior does the trick: "en_words search" stops when the word is found. The whole pair is stopped too. The other component of the pair, (count_i `en_pair` en_last), will receive EOF and tell us the line and the line count they have last seen. I guess the example can be written in Python-like style as follows (simplifying and assuming that the file is already parsed into lines): counter = 0 for line in file_lines: counter = counter + 1 if contains line word: print counter print line break Suppose we wish to change our example to print not only the found line, but a few lines before it (the option "-B" of grep). We have to modify our Python-like code: -- add a piece of state to keep the context lines, and initialize it -- add code to update the state as we read lines -- add the finalization code, to convert the state to the desired result The code will look like the following (### mark the changes): counter = 0 lines_seen = [] ### for line in file_lines: counter = counter + 1 accumulate lines_seen line ### if contains line word: print counter print (finalize lines_seen) ### break In contrast, to modify the iteratee code for the new task, we make the _single_ change: replace en_last with (en_lastN 5) (for 5 lines of context). It should be stressed that the Python code has to be changed in three *distinct, disconnected places*. New state has to be defined and initialized, before the loop. The state has to be updated, somewhere in the loop. The state has to be finalized, in the loop termination part. When changing code, it is easy to overlook the needed changes. For example, it is easier to accumulate the context lines in reverse order. When printing the lines, we should not forget to reverse them. When changes are disconnected, it is hard to reason and test for them. With so many other things going on, it is hard to write a proper test (for example, when testing the context accumulation we don't care about word matching; we should test the accumulation in isolation of whatever else we are doing in the iteration). The iteratee en_lastN deals only with the context accumulation aspect:
-- Return N last received elements -- The acc below is the list of N last elements in the reverse -- order en_lastN :: Monad m => Int -> Iteratee el m [el] en_lastN n = ie_cont $ step [] where step acc (Chunk []) = ie_contM (step acc) step acc (Chunk ls) = ie_contM (step $! Prelude.take n $ reverse ls ++ acc) step acc stream = ie_doneM (reverse acc) stream
The initialization of the state, the accumulation and the finalization are all in the same place. The en_lastN iteratee does not care of counting, searching, or whatever else is going on. In fact, the iteratee is pure and deals with arbitrary elements (not necessarily lines). We can test it using Quickcheck or HUnit. No IO is needed for the test. Thus in the iteratee style, the overall state of the program becomes distributed, across iteratees that encapsulate and manage only their own, small portion of the program state. The code IterDemo.hs shows other examples, including emulating generators (aka, unfold_stream, or the `while loop'), to build the stream of intermediate results of an iteratee. An iteratee in question does not even suspect that it is being used `incrementally', seduced to reveal the intermediate results. Thank you for the inspiring question.
participants (1)
-
oleg@okmij.org