Re: iteratee: Do I need to roll my own?

I'm looking at iteratee as a way to replace my erroneous and really inefficient lazy-IO-based backend for an expect like Monad DSL I've been working for about 6 months or so now on and off.
The problem is I want something like:
expect "some String" send "some response"
to block or perhaps timeout, depending on the environment, looking for "some String" on an input Handle, and it appears that iteratee works in a very fixed block size.
Actually, it doesn't. It works with what enumerator gives him. In case of `enum_fd'[1] this is a fixed block, but generally this is a ``value'' of some ``collection''[2]. And it is up to programmer to decide of what should become a value. [1] http://okmij.org/ftp/Haskell/Iteratee/IterateeM.hs [2] http://okmij.org/ftp/papers/LL3-collections-enumerators.txt
While a fixed block size is ok, if I can put back unused bytes into the enumerator somehow (I may need to put a LOT back in some cases, but in the common case I will not need to put any back as most expect-like scripts typically catch the last few bytes of data sent before the peer is blocked waiting for a response...)
I don't quite get this ``last few bytes'' thing. Could you explain? I was about writing that there is no problem with putting data back to Stream, and referring to head/peek functions... But then I thought, that the ``not consuming bytes from stream'' approach may not work well in cases, when the number of bytes needed (by your function to accept/reject some rule) exceeds the size of underlying memory buffer (4K in current version of `iteratee' library[3]). [3] http://hackage.haskell.org/packages/archive/iteratee/0.3.4/doc/html/src/Data... Do you think that abstracting to the level of _tokens_ - instead of bytes - could help here? (Think of flex and bison.) You know, these enumerators/iteratees things can be layered into _enumeratees_[1][4]... It's just an idea. [4] http://ianen.org/articles/understanding-iteratees/
Otherwise, I'm going to want to roll my own iteratee style library where I have to say "NotDone howMuchMoreIThinkINeed" so I don't over consume the input stream.
What's the problem with over-consuming a stream? In your case? BTW, this `NotDone' is just a ``control message'' to the chunk producer (an enumerator): IE_cont k (Just (GimmeThatManyBytes n))
Does that even make any sense? I'm kind of brainstorming in this email unfortunately :-)
What's the problem with brainstorming? :) Cheers. -- vvv

First thanks for the reply,
On Wed, Mar 31, 2010 at 11:15 AM, Valery V. Vorotyntsev wrote: I'm looking at iteratee as a way to replace my erroneous and really
inefficient lazy-IO-based backend for an expect like Monad DSL I've
been working for about 6 months or so now on and off. The problem is I want something like: expect "some String"
send "some response" to block or perhaps timeout, depending on the environment, looking for
"some String" on an input Handle, and it appears that iteratee works
in a very fixed block size. Actually, it doesn't. It works with what enumerator gives him.
In case of `enum_fd'[1] this is a fixed block, but generally this is
a ``value'' of some ``collection''[2]. And it is up to programmer to
decide of what should become a value. [1] http://okmij.org/ftp/Haskell/Iteratee/IterateeM.hs
[2] http://okmij.org/ftp/papers/LL3-collections-enumerators.txt While a fixed block size is ok, if I can put back unused bytes into
the enumerator somehow (I may need to put a LOT back in some cases,
but in the common case I will not need to put any back as most
expect-like scripts typically catch the last few bytes of data sent
before the peer is blocked waiting for a response...) I don't quite get this ``last few bytes'' thing. Could you explain? What I mean is let's say the stream has
"abcd efg abcd efg"
and then I run some kind of iteratee computation looking for "abcd"
and the block size was fixed to cause a read 1024 bytes, but returns as much
as it can providing it to the iteratee to deal with. The iteratee, which I
want to implement Expect like behavior, would really only want to read up to
"abcd" consuming that from the input stream. Does the iteratee get the
whole stream that was read by the enumerator, or is it supplied a single
atomic unit at a time, such as a character, in which I can halt the
consumption of the streamed data?
What I don't want to have happen is my consuming bytes from the input
Handle, only to have them ignored, as the second instance of "abcd" could be
important.
I'm actually not sure that was very clear :-). I don't want to throw out
bytes by accident if that's even possible.
My discomfort with Iteratee is that most Haskell texts really want you to go
the way of lazy IO, which has led me to a good bit of trouble, and I've
never seen a very comprehensive tutorial of Iteratee available anywhere. I
am reading the Examples that come with the hackage package though. I was about writing that there is no problem with putting data back to
Stream, and referring to head/peek functions... But then I thought,
that the ``not consuming bytes from stream'' approach may not work
well in cases, when the number of bytes needed (by your function to
accept/reject some rule) exceeds the size of underlying memory buffer
(4K in current version of `iteratee' library[3]). [3]
http://hackage.haskell.org/packages/archive/iteratee/0.3.4/doc/html/src/Data... Do you think that abstracting to the level of _tokens_ - instead of
bytes - could help here? (Think of flex and bison.) You know, these
enumerators/iteratees things can be layered into
_enumeratees_[1][4]... It's just an idea. Now that's an interesting idea, and sort of where my previous confusing
answer seemed to be heading. I wasn't sure if the iteratee was provided a
byte, a char, or a token. If I can tell the enumerator to only send tokens
to the iteratee, (which I'd have to define), then perhaps I can ignore the
amount consumed per read, and deal with let the enumerator deal with that
buffering issue directly. Perhaps that's how iteratee really works anyway! [4] http://ianen.org/articles/understanding-iteratees/ Otherwise, I'm going to want to roll my own iteratee style library
where I have to say "NotDone howMuchMoreIThinkINeed" so I don't over
consume the input stream. What's the problem with over-consuming a stream? In your case? Well my concern is if it's read from the input stream, and then not used,
the next time I access it, I'm not certain what's happened to the buffer.
However I suppose it's really a 2-level situation where the enumerator
pulls out some fixed chunk from a Handle or FD or what have you, and then
folds the iteratee over the buffer in some sized chunk.
In C++ I've used ideas like this example that a professor I had in college
showed me from a newsgroup he helped to moderate.
int main () {
std::cout << "Word count on stdin: " <<
std::distance(std::istream_iteratorstd::string(std::cin),
std::istream_iteratorstd::string()) << std::endl;
}
If the code were changed to be:
int main () {
std::cout << "Character count on stdin: " <<
std::distance(std::istreambuf_iterator<char>(std::cin),
std::istreambuf_iterator<char>()) << std::endl;
}
We get different behavior out of the upper level distance algorithm due to
the kind of iterator, while distance does a form of folding over the
iterators, but it's actually doing the accumulation of a count at the
enumerator level rather than having the iterator evaluate it. Iteratee
seems to be like this but with this inversion of control.
Note in the C++ example, changing the properties of each chunk being
iterated over changes the result of the folding from "word counter" to
"character counter".
I guess I need to just get familiar with Iteratee to understand what knobs I
have available to turn. BTW, this `NotDone' is just a ``control message'' to the chunk
producer (an enumerator): IE_cont k (Just (GimmeThatManyBytes n)) Yes, I was thinking of something like that. Does that even make any sense? I'm kind of brainstorming in this
email unfortunately :-) What's the problem with brainstorming? :) Cheers. --
vvv

On Wed, Mar 31, 2010 at 7:42 PM, David Leimbach
What I mean is let's say the stream has "abcd efg abcd efg" and then I run some kind of iteratee computation looking for "abcd"
You could adapt the 'heads' function from the iteratee package to do this: http://hackage.haskell.org/packages/archive/iteratee/0.3.4/doc/html/src/Data...
and the block size was fixed to cause a read 1024 bytes, but returns as much as it can providing it to the iteratee to deal with. The iteratee, which I want to implement Expect like behavior, would really only want to read up to "abcd" consuming that from the input stream. Does the iteratee get the whole stream that was read by the enumerator, or is it supplied a single atomic unit at a time, such as a character, in which I can halt the consumption of the streamed data?
The iteratee will be applied to the whole stream that was read by the enumerator. You should ensure that the part of this input stream which is not needed for the result is saved in the 'Done' constructor so that other iteratees may consume it.
What I don't want to have happen is my consuming bytes from the input Handle, only to have them ignored, as the second instance of "abcd" could be important.
Note that an IterateeG has an instance for Monad which allows you to sequentially compose iteratees. If you write a 'match' iteratee (by adapting the 'heads' function I mentioned earlier which matches a given string against the first part of a stream) you can compose these sequentially: foo = match "abcd" >> match "efg" >> foo The first match will be applied to the stream that was read by the enumerator. It will consume the "abcd" and saves the rest of the stream (in the 'Done' constructor). The second match will first be applied to the saved stream from the first match. If this stream was not big enough the iteratee will ask for more (using the 'Cont' constructor). The enumerator will then do a second read and applies the continuation (stored in the 'Cont' constructor) to the new stream. You may also consider using actual parser combinators build on top of iteratee: http://hackage.haskell.org/package/attoparsec-iteratee http://hackage.haskell.org/package/iteratee-parsec (I typed this in a hurry so some things may be off a bit) regards, Bas
participants (3)
-
Bas van Dijk
-
David Leimbach
-
Valery V. Vorotyntsev