Re: [Haskell-cafe] Iteratee-like

From: Permjacov Evgeniy
current links
https://github.com/permeakra/Rank2Iteratee https://github.com/permeakra/PassiveIteratee
The main difference from 'original' iteratees I read about is that both do not use 'chunks' and pass data one-by-one. So, what I wrote may be slower, but should be easier to maintain and more transparent for ghc optimising facilities. I wanted as clean and simple code as possible, but it is still very, very messy at some places and I want it cleaner. Any suggestions? I also want to check, how good ghc does its work with this messy modules. They may become interesting benchmarks.
Have you tried comparing it to either iteratee or enumerator (which had mostly comparable performance last time I checked, with a slight edge to iteratee)? Or to Oleg's library? Try writing test cases, a simple byte-counting application, or similar, so you can compare the performance with the other versions. Both enumerator and iteratee include demo programs that you could use as a starting point. I agree that iteratees which work on a per-element level are very clean and should be amenable to optimization by GHC. It also shows a very clear relationship with stream-fusion techniques. Unfortunately when I last tried it I couldn't get acceptable performance. I was using ghc-6.12.1 IIRC, so it could be different now. John

On 12/15/2010 05:48 PM, John Lato wrote:
From: Permjacov Evgeniy
mailto:permeakra@gmail.com> current links
https://github.com/permeakra/Rank2Iteratee https://github.com/permeakra/PassiveIteratee
The main difference from 'original' iteratees I read about is that both do not use 'chunks' and pass data one-by-one. So, what I wrote may be slower, but should be easier to maintain and more transparent for ghc optimising facilities. I wanted as clean and simple code as possible, but it is still very, very messy at some places and I want it cleaner. Any suggestions? I also want to check, how good ghc does its work with this messy modules. They may become interesting benchmarks.
Have you tried comparing it to either iteratee or enumerator (which had mostly comparable performance last time I checked, with a slight edge to iteratee)? Or to Oleg's library? Try writing test cases, a simple byte-counting application, or similar, so you can compare the performance with the other versions. Both enumerator and iteratee include demo programs that you could use as a starting point.
I wrote a simple counter (attached,works for both variants of package), that literally counts bytes of given value in input. I got three-time slower result then with lazy io ( 0_o) with rank2types variant and six-seven time slower result with no CPS-version. Looks like ghc is really good with list fusion... I'm still reading tutorial from iteratee package, so I have not compared with it yet. An equivalend lazy io programm attached. Can someone give an advice how to improve performance?
I agree that iteratees which work on a per-element level are very clean and should be amenable to optimization by GHC. It also shows a very clear relationship with stream-fusion techniques. Unfortunately when I last tried it I couldn't get acceptable performance. I was using ghc-6.12.1 IIRC, so it could be different now.
John

On 12/15/2010 05:48 PM, John Lato wrote:
From: Permjacov Evgeniy
mailto:permeakra@gmail.com> current links
https://github.com/permeakra/Rank2Iteratee https://github.com/permeakra/PassiveIteratee
The main difference from 'original' iteratees I read about is that both do not use 'chunks' and pass data one-by-one. So, what I wrote may be slower, but should be easier to maintain and more transparent for ghc optimising facilities. I wanted as clean and simple code as possible, but it is still very, very messy at some places and I want it cleaner. Any suggestions? I also want to check, how good ghc does its work with this messy modules. They may become interesting benchmarks.
Have you tried comparing it to either iteratee or enumerator (which had mostly comparable performance last time I checked, with a slight edge to iteratee)? Or to Oleg's library? Try writing test cases, a simple byte-counting application, or similar, so you can compare the performance with the other versions. Both enumerator and iteratee include demo programs that you could use as a starting point.
Ok, I tested with ByteString chunks and got roughly the same performance (less then 5 % difference) as with Data.Iteratee (as expected, as it is not a monad a bottlenec when using chunks). However, with Word8' streams I slows down to point six times slower then lazy IO. this is still may be acceptable if IO actions has to be performed while making nontrivial list fusions, but in general it is fail. Well, ghc has another complicated case for compiler optimisation tests. CPS-style with rank2 types provides boost to performance, but when using chunks it is insignificant, so haskell-98 version of iteratees may be used with no worries.
I agree that iteratees which work on a per-element level are very clean and should be amenable to optimization by GHC. It also shows a very clear relationship with stream-fusion techniques. Unfortunately when I last tried it I couldn't get acceptable performance. I was using ghc-6.12.1 IIRC, so it could be different now.
John

On Thu, Dec 16, 2010 at 12:23 AM, Permjacov Evgeniy
On 12/15/2010 05:48 PM, John Lato wrote:
From: Permjacov Evgeniy
current links
https://github.com/permeakra/Rank2Iteratee https://github.com/permeakra/PassiveIteratee
The main difference from 'original' iteratees I read about is that both do not use 'chunks' and pass data one-by-one. So, what I wrote may be slower, but should be easier to maintain and more transparent for ghc optimising facilities. I wanted as clean and simple code as possible, but it is still very, very messy at some places and I want it cleaner. Any suggestions? I also want to check, how good ghc does its work with this messy modules. They may become interesting benchmarks.
Have you tried comparing it to either iteratee or enumerator (which had mostly comparable performance last time I checked, with a slight edge to iteratee)? Or to Oleg's library? Try writing test cases, a simple byte-counting application, or similar, so you can compare the performance with the other versions. Both enumerator and iteratee include demo programs that you could use as a starting point.
Ok, I tested with ByteString chunks and got roughly the same performance (less then 5 % difference) as with Data.Iteratee (as expected, as it is not a monad a bottlenec when using chunks). However, with Word8' streams I slows down to point six times slower then lazy IO. this is still may be acceptable if IO actions has to be performed while making nontrivial list fusions, but in general it is fail.
Disappointing, but I'm not surprised. I think getting good performance is possible in principle, but currently there's something missing from the implementations. Whether work needs to be done on GHC's optimizer or the iteratee code, I can't say. Honestly I'm not too interested in pursuing this myself now, but if somebody else wants to it could be fruitful. John
participants (2)
-
John Lato
-
Permjacov Evgeniy