On Wed, Mar 31, 2010 at 11:15 AM, Valery V. Vorotyntsev <valery.vv@gmail.com> wrote:

> I'm looking at iteratee as a way to replace my erroneous and really
> inefficient lazy-IO-based backend for an expect like Monad DSL I've
> been working for about 6 months or so now on and off.
>
> The problem is I want something like:
>
> expect "some String"
> send "some response"
>
> to block or perhaps timeout, depending on the environment, looking for
> "some String" on an input Handle, and it appears that iteratee works
> in a very fixed block size.

Actually, it doesn't. It works with what enumerator gives him.
In case of `enum_fd'[1] this is a fixed block, but generally this is
a ``value'' of some ``collection''[2]. And it is up to programmer to
decide of what should become a value.

[1] http://okmij.org/ftp/Haskell/Iteratee/IterateeM.hs
[2] http://okmij.org/ftp/papers/LL3-collections-enumerators.txt

> While a fixed block size is ok, if I can put back unused bytes into
> the enumerator somehow (I may need to put a LOT back in some cases,
> but in the common case I will not need to put any back as most
> expect-like scripts typically catch the last few bytes of data sent
> before the peer is blocked waiting for a response...)

I don't quite get this ``last few bytes'' thing. Could you explain?

What I mean is let's say the stream has

"abcd efg abcd efg"

and then I run some kind of iteratee computation looking for "abcd"

and the block size was fixed to cause a read 1024 bytes, but returns as much as it can providing it to the iteratee to deal with. The iteratee, which I want to implement Expect like behavior, would really only want to read up to "abcd" consuming that from the input stream. Does the iteratee get the whole stream that was read by the enumerator, or is it supplied a single atomic unit at a time, such as a character, in which I can halt the consumption of the streamed data?

What I don't want to have happen is my consuming bytes from the input Handle, only to have them ignored, as the second instance of "abcd" could be important.

I'm actually not sure that was very clear :-). I don't want to throw out bytes by accident if that's even possible.

My discomfort with Iteratee is that most Haskell texts really want you to go the way of lazy IO, which has led me to a good bit of trouble, and I've never seen a very comprehensive tutorial of Iteratee available anywhere. I am reading the Examples that come with the hackage package though.

I was about writing that there is no problem with putting data back to
Stream, and referring to head/peek functions... But then I thought,
that the ``not consuming bytes from stream'' approach may not work
well in cases, when the number of bytes needed (by your function to
accept/reject some rule) exceeds the size of underlying memory buffer
(4K in current version of `iteratee' library[3]).

[3] http://hackage.haskell.org/packages/archive/iteratee/0.3.4/doc/html/src/Data-Iteratee-IO-Fd.html

Do you think that abstracting to the level of _tokens_ - instead of
bytes - could help here? (Think of flex and bison.) You know, these
enumerators/iteratees things can be layered into
_enumeratees_[1][4]... It's just an idea.

Now that's an interesting idea, and sort of where my previous confusing answer seemed to be heading. I wasn't sure if the iteratee was provided a byte, a char, or a token. If I can tell the enumerator to only send tokens to the iteratee, (which I'd have to define), then perhaps I can ignore the amount consumed per read, and deal with let the enumerator deal with that buffering issue directly. Perhaps that's how iteratee really works anyway!

[4] http://ianen.org/articles/understanding-iteratees/

> Otherwise, I'm going to want to roll my own iteratee style library
> where I have to say "NotDone howMuchMoreIThinkINeed" so I don't over
> consume the input stream.

What's the problem with over-consuming a stream? In your case?

Well my concern is if it's read from the input stream, and then not used, the next time I access it, I'm not certain what's happened to the buffer. However I suppose it's really a 2-level situation where the enumerator pulls out some fixed chunk from a Handle or FD or what have you, and then folds the iteratee over the buffer in some sized chunk.

In C++ I've used ideas like this example that a professor I had in college showed me from a newsgroup he helped to moderate.

int main () {

std::cout << "Word count on stdin: " << std::distance(std::istream_iterator<std::string>(std::cin), std::istream_iterator<std::string>()) << std::endl;

}

If the code were changed to be:

int main () {

std::cout << "Character count on stdin: " << std::distance(std::istreambuf_iterator<char>(std::cin), std::istreambuf_iterator<char>()) << std::endl;

}

We get different behavior out of the upper level distance algorithm due to the kind of iterator, while distance does a form of folding over the iterators, but it's actually doing the accumulation of a count at the enumerator level rather than having the iterator evaluate it. Iteratee seems to be like this but with this inversion of control.

Note in the C++ example, changing the properties of each chunk being iterated over changes the result of the folding from "word counter" to "character counter".

I guess I need to just get familiar with Iteratee to understand what knobs I have available to turn.

BTW, this `NotDone' is just a ``control message'' to the chunk
producer (an enumerator):

IE_cont k (Just (GimmeThatManyBytes n))

Yes, I was thinking of something like that.

> Does that even make any sense? I'm kind of brainstorming in this
> email unfortunately :-)

What's the problem with brainstorming? :)

Cheers.

--
vvv