Re: [Haskell-cafe] Re: Fwd: Semantics of iteratees, enumerators, enumeratees?

From: Heinrich Apfelmus
Jason Dagit wrote:
Heinrich Apfelmus wrote:
I'm curious, can you give an example where you want to be explicit about chunking? I have a hard time imagining an example where chunking is beneficial compared to getting each character in sequence. Chunking seems to be common in C for reasons of performance, but how does that apply to Haskell?
[...] I think it basically comes down to this: We replace lazy io with explicit chunking because lazy io is unsafe, but explicit chunking can be safe.
Ah, I mean to compare Iteratees with chunking to Iteratees with single character access, not to lazy IO.
In C, this would be a comparison between read and getchar . If I remember correctly, the former is faster for copying a file simply because copying one character at a time with getchar is too granular (you have to make an expensive system call every time). Of course, this reasoning only applies to C and not necessarily to Haskell.
Do you have an example where you want chunking instead of single character access?
I am unable to think of any examples where you want chunking for any reason other than efficiency. Yesterday I spent some time on an element-wise iteratee implementation, and unfortunately it's significantly slower than any of the chunked implementations. I'm certain there's room for optimization, but I don't know if it's possible to make up the whole difference. I'd need to make some examination of the core to figure out where it could be improved. It's also possible that certain other implementations, such as your ProgramT version or the simplified implementation John Millikin recently posted to this list, would be more amenable to compiler magic for this purpose. John

On Wednesday 25 August 2010 13:53:47, John Lato wrote:
From: Heinrich Apfelmus
Do you have an example where you want chunking instead of single character access?
I am unable to think of any examples where you want chunking for any reason other than efficiency.
For many hashing or de/encryption algorithms, chunking is more natural than single-character access.

Daniel Fischer wrote:
John Lato wrote:
Heinrich Apfelmus wrote:
Do you have an example where you want chunking instead of single character access?
I am unable to think of any examples where you want chunking for any reason other than efficiency.
For many hashing or de/encryption algorithms, chunking is more natural than single-character access.
Even when the chunk lengths are unpredictable? After all, unlike with fread in C, you can't request the next chunk to have a certain length with Iteratees. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Thursday 26 August 2010 09:33:30, Heinrich Apfelmus wrote:
Daniel Fischer wrote:
John Lato wrote:
Heinrich Apfelmus wrote:
Do you have an example where you want chunking instead of single character access?
I am unable to think of any examples where you want chunking for any reason other than efficiency.
For many hashing or de/encryption algorithms, chunking is more natural than single-character access.
Even when the chunk lengths are unpredictable? After all, unlike with fread in C, you can't request the next chunk to have a certain length with Iteratees.
Well, I just gave an example where one would want chunking for reasons other than performance. That iteratees don't provide the desired functionality is a different matter. For performance reasons, one would still be likely to want the I/O to happen in larger chunks than the processing, so it's kind of moot. Cheers, Daniel

Daniel Fischer wrote:
Heinrich Apfelmus wrote:
Daniel Fischer wrote:
For many hashing or de/encryption algorithms, chunking is more natural than single-character access.
Even when the chunk lengths are unpredictable? After all, unlike with fread in C, you can't request the next chunk to have a certain length with Iteratees.
Well, I just gave an example where one would want chunking for reasons other than performance. That iteratees don't provide the desired functionality is a different matter.
For performance reasons, one would still be likely to want the I/O to happen in larger chunks than the processing, so it's kind of moot.
Yes, I/O should happen in chunks, but I thought that the Enumerator implementations could buffer I/O while still presenting single characters to the Iteratees. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On 26/08/2010, Daniel Fischer
[...] Well, I just gave an example where one would want chunking for reasons other than performance. That iteratees don't provide the desired functionality is a different matter. [...]
In the case of hashing, wouldn't it be more reasonable to consider iterators over streams of fixed (or at least predictable) sized chunks (where a set of chunks can themselves be chunked), with the chunking behaviour being given by another iteratee over the original stream? It seems to me that one of the major points of iteratees is to provide an abstraction from the kind of chunking irrelevant to the parsing logic, otherwise I fail to see any difference (at least relevant to chunking) to plain strict IO. I'm not particularly well read on anything here, so I could just totally miss what is going on, in which case I apologise. //Tilo Wiklund

Tilo Wiklund wrote:
Daniel Fischer wrote:
[...] Well, I just gave an example where one would want chunking for reasons other than performance. That iteratees don't provide the desired functionality is a different matter. [...]
In the case of hashing, wouldn't it be more reasonable to consider iterators over streams of fixed (or at least predictable) sized chunks (where a set of chunks can themselves be chunked), with the chunking behaviour being given by another iteratee over the original stream?
It seems to me that one of the major points of iteratees is to provide an abstraction from the kind of chunking irrelevant to the parsing logic, otherwise I fail to see any difference (at least relevant to chunking) to plain strict IO.
I thought so, too, but I was informed[1] that iteratees are just a small step up the abstraction ladder. The difference compared to an ordinary file Handle is that you can now reuse one and the same iteratee for reading from a String , for instance, without changing the source code of the iteratee. Furthermore, iteratees can be suspended, which facilities resource management like closing files handles after they've been read. [1]: http://www.reddit.com/r/haskell/comments/ar4wb/understanding_iteratees/c0j0f... Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Sorry to be late coming into this conversation.....
Something that has bothered me (which I have mentioned to John Lato
privately) is that it is very easy to write non-compositional code due
to the chunking. For example, there is a standard function
map :: (a -> b) -> Enumeratee a b c
whose meaning I hope is clear : use the function to transform the type
of a stream and pass it to an iteratee. However last I checked the
versions provided in both the iteratee and enumerator packages fail to
satisfy the equation
map f (it1 >> it2) == (map f it1) >> (map f it 2)
because of chunking, essentially. You can check this with f == id and
it1 and it2 are head:
let r = runIdentity . runIteratee
runIdentity $ run $ enumList 10 [1..100] $ r $ joinI $ map id $ r (head >> head)
--> Right (Just 2)
runIdentity $ run $ enumList 10 [1..100] $ r $ joinI $ (map id $ r
head) >> (map id $ r head)
--> Right (Just 11)
It is possible to fix this behavior, but it complicates the "obvious"
definitions a lot.
B
On Wed, Sep 1, 2010 at 5:10 AM, Heinrich Apfelmus
Tilo Wiklund wrote:
Daniel Fischer wrote:
[...] Well, I just gave an example where one would want chunking for reasons other than performance. That iteratees don't provide the desired functionality is a different matter. [...]
In the case of hashing, wouldn't it be more reasonable to consider iterators over streams of fixed (or at least predictable) sized chunks (where a set of chunks can themselves be chunked), with the chunking behaviour being given by another iteratee over the original stream?
It seems to me that one of the major points of iteratees is to provide an abstraction from the kind of chunking irrelevant to the parsing logic, otherwise I fail to see any difference (at least relevant to chunking) to plain strict IO.
I thought so, too, but I was informed[1] that iteratees are just a small step up the abstraction ladder. The difference compared to an ordinary file Handle is that you can now reuse one and the same iteratee for reading from a String , for instance, without changing the source code of the iteratee.
Furthermore, iteratees can be suspended, which facilities resource management like closing files handles after they've been read.
[1]: http://www.reddit.com/r/haskell/comments/ar4wb/understanding_iteratees/c0j0f...
Regards, Heinrich Apfelmus
-- http://apfelmus.nfshost.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Mon, Sep 6, 2010 at 22:49, Ben
Sorry to be late coming into this conversation.....
Something that has bothered me (which I have mentioned to John Lato privately) is that it is very easy to write non-compositional code due to the chunking. For example, there is a standard function
map :: (a -> b) -> Enumeratee a b c
whose meaning I hope is clear : use the function to transform the type of a stream and pass it to an iteratee. However last I checked the versions provided in both the iteratee and enumerator packages fail to satisfy the equation
map f (it1 >> it2) == (map f it1) >> (map f it 2)
because of chunking, essentially. You can check this with f == id and it1 and it2 are head:
let r = runIdentity . runIteratee
runIdentity $ run $ enumList 10 [1..100] $ r $ joinI $ map id $ r (head >> head) --> Right (Just 2)
runIdentity $ run $ enumList 10 [1..100] $ r $ joinI $ (map id $ r head) >> (map id $ r head) --> Right (Just 11)
It is possible to fix this behavior, but it complicates the "obvious" definitions a lot.
Chunking doesn't have anything to do with this, and an iteratee encoding without input chunking would exhibit the same problem. You're running into an (annoying? dangerous?) subtlety in enumeratees. In the particular case of map/head, it's possible to construct an iteratee with the expected behavior by altering the definition of 'map'. However, if the composition is more complicated (like map/parse-json), this alteration becomes impossible. Remember than an enumeratee's return value contains two levels of "extra" input. The outer layer is from the enumeratee (map), while the inner is from the iteratee (head). The iteratee is allowed to consume an arbitrary amount of input before yielding, and depending on its purpose it might yield "extra" input from a previous stream. Perhaps the problem is that 'map' is the wrong name? It might make users expect that it composes "horizontally" rather than "vertically". Normally this incorrect interpretation would be caught by the type checker, but using (>>) allows the code to compile. Anyway, the correct way to encode @(map f it1) >> (map f it 2)@, using above style is: (map id (r head) >>= returnI) >> (map id (r head) >>= returnI) so the full expression becomes: runIdentity $ run $ enumList 10 [1..100] $ r $ (map id (r head)
= returnI) >> (map id (r head) >>= returnI)
which ought to return the correct value (untested; I have no Haskell compiler on this computer).
participants (6)
-
Ben
-
Daniel Fischer
-
Heinrich Apfelmus
-
John Lato
-
John Millikin
-
Tilo Wiklund