Re: [Haskell-cafe] Re: Fwd: Semantics of iteratees, enumerators, enumeratees?

newer
Re: [Haskell] ANNOUNCE: reference...

John Lato

25 Aug 2010 25 Aug '10

11:53 a.m.

From: Heinrich Apfelmus

Jason Dagit wrote:

...
Heinrich Apfelmus wrote:

...
I'm curious, can you give an example where you want to be explicit about chunking? I have a hard time imagining an example where chunking is beneficial compared to getting each character in sequence. Chunking seems to be common in C for reasons of performance, but how does that apply to Haskell?

[...] I think it basically comes down to this: We replace lazy io with explicit chunking because lazy io is unsafe, but explicit chunking can be safe.

Ah, I mean to compare Iteratees with chunking to Iteratees with single character access, not to lazy IO.

In C, this would be a comparison between read and getchar . If I remember correctly, the former is faster for copying a file simply because copying one character at a time with getchar is too granular (you have to make an expensive system call every time). Of course, this reasoning only applies to C and not necessarily to Haskell.

Do you have an example where you want chunking instead of single character access?

I am unable to think of any examples where you want chunking for any reason other than efficiency. Yesterday I spent some time on an element-wise iteratee implementation, and unfortunately it's significantly slower than any of the chunked implementations. I'm certain there's room for optimization, but I don't know if it's possible to make up the whole difference. I'd need to make some examination of the core to figure out where it could be improved. It's also possible that certain other implementations, such as your ProgramT version or the simplified implementation John Millikin recently posted to this list, would be more amenable to compiler magic for this purpose. John

Attachments:

attachment.html (text/html — 2.2 KB)

Show replies by date

Daniel Fischer

25 Aug 25 Aug

12:35 p.m.

New subject: Fwd: Semantics of iteratees, enumerators, enumeratees?

On Wednesday 25 August 2010 13:53:47, John Lato wrote:

...

...
From: Heinrich Apfelmus

Do you have an example where you want chunking instead of single character access?

I am unable to think of any examples where you want chunking for any reason other than efficiency.

For many hashing or de/encryption algorithms, chunking is more natural than single-character access.

Heinrich Apfelmus

26 Aug 26 Aug

7:33 a.m.

New subject: Fwd: Semantics of iteratees, enumerators, enumeratees?

Daniel Fischer wrote:

...

John Lato wrote:

...
...
Heinrich Apfelmus wrote:

Do you have an example where you want chunking instead of single character access?

I am unable to think of any examples where you want chunking for any reason other than efficiency.

For many hashing or de/encryption algorithms, chunking is more natural than single-character access.

Even when the chunk lengths are unpredictable? After all, unlike with fread in C, you can't request the next chunk to have a certain length with Iteratees. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Daniel Fischer

12:52 p.m.

New subject: Fwd: Semantics of iteratees, enumerators, enumeratees?

On Thursday 26 August 2010 09:33:30, Heinrich Apfelmus wrote:

...

Daniel Fischer wrote:

...
John Lato wrote:

...
...
Heinrich Apfelmus wrote:

Do you have an example where you want chunking instead of single character access?

I am unable to think of any examples where you want chunking for any reason other than efficiency.

For many hashing or de/encryption algorithms, chunking is more natural than single-character access.

Even when the chunk lengths are unpredictable? After all, unlike with fread in C, you can't request the next chunk to have a certain length with Iteratees.

Well, I just gave an example where one would want chunking for reasons other than performance. That iteratees don't provide the desired functionality is a different matter. For performance reasons, one would still be likely to want the I/O to happen in larger chunks than the processing, so it's kind of moot. Cheers, Daniel

Heinrich Apfelmus

27 Aug 27 Aug

9:18 a.m.

New subject: Fwd: Semantics of iteratees, enumerators, enumeratees?

Daniel Fischer wrote:

...

Heinrich Apfelmus wrote:

...
Daniel Fischer wrote:

...
For many hashing or de/encryption algorithms, chunking is more natural than single-character access.

Even when the chunk lengths are unpredictable? After all, unlike with fread in C, you can't request the next chunk to have a certain length with Iteratees.

Well, I just gave an example where one would want chunking for reasons other than performance. That iteratees don't provide the desired functionality is a different matter.

For performance reasons, one would still be likely to want the I/O to happen in larger chunks than the processing, so it's kind of moot.

Yes, I/O should happen in chunks, but I thought that the Enumerator implementations could buffer I/O while still presenting single characters to the Iteratees. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Tilo Wiklund

28 Aug 28 Aug

11:19 p.m.

New subject: Fwd: Semantics of iteratees, enumerators, enumeratees?

On 26/08/2010, Daniel Fischer wrote:

...

[...] Well, I just gave an example where one would want chunking for reasons other than performance. That iteratees don't provide the desired functionality is a different matter. [...]

In the case of hashing, wouldn't it be more reasonable to consider iterators over streams of fixed (or at least predictable) sized chunks (where a set of chunks can themselves be chunked), with the chunking behaviour being given by another iteratee over the original stream? It seems to me that one of the major points of iteratees is to provide an abstraction from the kind of chunking irrelevant to the parsing logic, otherwise I fail to see any difference (at least relevant to chunking) to plain strict IO. I'm not particularly well read on anything here, so I could just totally miss what is going on, in which case I apologise. //Tilo Wiklund

Heinrich Apfelmus

1 Sep 1 Sep

12:10 p.m.

New subject: Fwd: Semantics of iteratees, enumerators, enumeratees?

Tilo Wiklund wrote:

...

Daniel Fischer wrote:

...
[...] Well, I just gave an example where one would want chunking for reasons other than performance. That iteratees don't provide the desired functionality is a different matter. [...]

In the case of hashing, wouldn't it be more reasonable to consider iterators over streams of fixed (or at least predictable) sized chunks (where a set of chunks can themselves be chunked), with the chunking behaviour being given by another iteratee over the original stream?

It seems to me that one of the major points of iteratees is to provide an abstraction from the kind of chunking irrelevant to the parsing logic, otherwise I fail to see any difference (at least relevant to chunking) to plain strict IO.

I thought so, too, but I was informed[1] that iteratees are just a small step up the abstraction ladder. The difference compared to an ordinary file Handle is that you can now reuse one and the same iteratee for reading from a String , for instance, without changing the source code of the iteratee. Furthermore, iteratees can be suspended, which facilities resource management like closing files handles after they've been read. [1]: http://www.reddit.com/r/haskell/comments/ar4wb/understanding_iteratees/c0j0f... Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Ben

7 Sep 7 Sep

5:49 a.m.

New subject: Fwd: Semantics of iteratees, enumerators, enumeratees?

Sorry to be late coming into this conversation..... Something that has bothered me (which I have mentioned to John Lato privately) is that it is very easy to write non-compositional code due to the chunking. For example, there is a standard function map :: (a -> b) -> Enumeratee a b c whose meaning I hope is clear : use the function to transform the type of a stream and pass it to an iteratee. However last I checked the versions provided in both the iteratee and enumerator packages fail to satisfy the equation map f (it1 >> it2) == (map f it1) >> (map f it 2) because of chunking, essentially. You can check this with f == id and it1 and it2 are head: let r = runIdentity . runIteratee runIdentity $ run $ enumList 10 [1..100] $ r $ joinI $ map id $ r (head >> head) --> Right (Just 2) runIdentity $ run $ enumList 10 [1..100] $ r $ joinI $ (map id $ r head) >> (map id $ r head) --> Right (Just 11) It is possible to fix this behavior, but it complicates the "obvious" definitions a lot. B On Wed, Sep 1, 2010 at 5:10 AM, Heinrich Apfelmus wrote:

...

Tilo Wiklund wrote:

...
Daniel Fischer wrote:

...
[...] Well, I just gave an example where one would want chunking for reasons other than performance. That iteratees don't provide the desired functionality is a different matter. [...]

In the case of hashing, wouldn't it be more reasonable to consider iterators over streams of fixed (or at least predictable) sized chunks (where a set of chunks can themselves be chunked), with the chunking behaviour being given by another iteratee over the original stream?

It seems to me that one of the major points of iteratees is to provide an abstraction from the kind of chunking irrelevant to the parsing logic, otherwise I fail to see any difference (at least relevant to chunking) to plain strict IO.

I thought so, too, but I was informed[1] that iteratees are just a small step up the abstraction ladder. The difference compared to an ordinary file Handle is that you can now reuse one and the same iteratee for reading from a String , for instance, without changing the source code of the iteratee.

Furthermore, iteratees can be suspended, which facilities resource management like closing files handles after they've been read.

[1]: http://www.reddit.com/r/haskell/comments/ar4wb/understanding_iteratees/c0j0f...

Regards, Heinrich Apfelmus

-- http://apfelmus.nfshost.com

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

John Millikin

6:21 p.m.

New subject: Fwd: Semantics of iteratees, enumerators, enumeratees?

On Mon, Sep 6, 2010 at 22:49, Ben wrote:

...

Sorry to be late coming into this conversation.....

Something that has bothered me (which I have mentioned to John Lato privately) is that it is very easy to write non-compositional code due to the chunking. For example, there is a standard function

map :: (a -> b) -> Enumeratee a b c

whose meaning I hope is clear : use the function to transform the type of a stream and pass it to an iteratee. However last I checked the versions provided in both the iteratee and enumerator packages fail to satisfy the equation

map f (it1 >> it2) == (map f it1) >> (map f it 2)

because of chunking, essentially. You can check this with f == id and it1 and it2 are head:

let r = runIdentity . runIteratee

runIdentity $ run $ enumList 10 [1..100] $ r $ joinI $ map id $ r (head >> head) --> Right (Just 2)

runIdentity $ run $ enumList 10 [1..100] $ r $ joinI $ (map id $ r head) >> (map id $ r head) --> Right (Just 11)

It is possible to fix this behavior, but it complicates the "obvious" definitions a lot.

Chunking doesn't have anything to do with this, and an iteratee encoding without input chunking would exhibit the same problem. You're running into an (annoying? dangerous?) subtlety in enumeratees. In the particular case of map/head, it's possible to construct an iteratee with the expected behavior by altering the definition of 'map'. However, if the composition is more complicated (like map/parse-json), this alteration becomes impossible. Remember than an enumeratee's return value contains two levels of "extra" input. The outer layer is from the enumeratee (map), while the inner is from the iteratee (head). The iteratee is allowed to consume an arbitrary amount of input before yielding, and depending on its purpose it might yield "extra" input from a previous stream. Perhaps the problem is that 'map' is the wrong name? It might make users expect that it composes "horizontally" rather than "vertically". Normally this incorrect interpretation would be caught by the type checker, but using (>>) allows the code to compile. Anyway, the correct way to encode @(map f it1) >> (map f it 2)@, using above style is: (map id (r head) >>= returnI) >> (map id (r head) >>= returnI) so the full expression becomes: runIdentity $ run $ enumList 10 [1..100] $ r $ (map id (r head)

...

...
= returnI) >> (map id (r head) >>= returnI)

which ought to return the correct value (untested; I have no Haskell compiler on this computer).

5416

Age (days ago)

5429

Last active (days ago)

List overview

Download

8 comments

6 participants

participants (6)

Ben
Daniel Fischer
Heinrich Apfelmus
John Lato
John Millikin
Tilo Wiklund

Re: [Haskell-cafe] Re: Fwd: Semantics of iteratees, enumerators, enumeratees?

Ben

tags

participants (6)