
Jason Dagit wrote:
Heinrich Apfelmus wrote:
I'm curious, can you give an example where you want to be explicit about chunking? I have a hard time imagining an example where chunking is beneficial compared to getting each character in sequence. Chunking seems to be common in C for reasons of performance, but how does that apply to Haskell?
[...] I think it basically comes down to this: We replace lazy io with explicit chunking because lazy io is unsafe, but explicit chunking can be safe.
Ah, I mean to compare Iteratees with chunking to Iteratees with single character access, not to lazy IO. In C, this would be a comparison between read and getchar . If I remember correctly, the former is faster for copying a file simply because copying one character at a time with getchar is too granular (you have to make an expensive system call every time). Of course, this reasoning only applies to C and not necessarily to Haskell. Do you have an example where you want chunking instead of single character access?
Supposing we use lazy io (Prelude.readFile): 1) read the file, compute (a), close the file, read the file, compute (b), and finally close the file. You can do so in constant space. 2) read the file, use one pass to calculate both (a) and (b) at the same time, then close the file. You can do so in constant space. 3) read the file, use one pass to compute (a) followed by a pass to compute (b), then close the file. The space used will be O(filesize).
I consider option #3 to be letting the elements of the stream "leak out". The computation in (b) references them and thus the garbage collector doesn't free them between (a) and (b), and the optimizer cannot fuse (a) and (b) in all cases.
Indeed, Iteratees make it difficult to express option #3, hence discouraging this particular space leak. Compared to lazy IO, they also make sure that the file handle is closed properly and does not leak. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com