Bit streams programs in Haskell

Haskell gurus, We have made a proposal to extend the Erlang `binary' data type from being a sequence of bytes (a byte stream) to being a sequence of bits (a bitstream) with the ability to do pattern matching at the bit level. This proposal has now been fully implemented all these at the level of the BEAM virtual machine and in the HiPE compiler. Most probably, they will be part of the next major Erlang/OTP open source release. (We write "most probably" because we do not really control what Ericsson desides to release as part of its open source system.) We wanted to evaluate the performance of our implementation and the succintness of our syntax, particularly against other `similar' languages. For this reason, we wrote five reasonably representative benchmarks, showing the things that one could employ bit stream pattern matching and bit-level comprehensions for. The benchmarks (drop3, five11, huffman, uudecode, and uuendode) have all been written in Erlang, O'Caml and Haskell. For some of them, C and Java versions exist. They can be found in the following homepage: http://bitbenches.infogami.com/ As you will see there, the Haskell numbers are significantly slower than those of Erlang and O'Caml. We are wondering why this is so. For each language, we have spent a considerable effort in writing the benchmarks in -- at least what we feel -- is the most natural and efficient way one can write them. The only constraint we impose is that for functional languages, data structures without any explicit mutation have to be used in the part of the program for which measurements are taken. Our experience in writing efficient (and beautiful) Haskell programs is close to (if not below) zero. Also, perhaps our mind might be suffering from severe case of strictness and might be completely unable to `think lazily'. So, we request your help in noticing obvious NO-NOs and stupid mistakes that we might have made. We even welcome completely different Haskell programs provided they adhere to the constraint mentioned above -- no mutation. Best regards, Kostis Sagonas and Per Gustafsson

One thing I noticed, is that you are measuring IO in the Haskell version of drop3. hGetContents is lazy. On Mar 22, 2006, at 4:43 PM, Per Gustafsson wrote:
Also, perhaps our mind might be suffering from severe case of strictness and might be completely unable to `think lazily'. So, we request your help in noticing obvious NO-NOs and stupid mistakes that we might have made.
-------------------------------- David F. Place mailto:d@vidplace.com

Per Gustafsson wrote:
Haskell gurus,
I am not a guru, but I'll clean up some of this.
Our experience in writing efficient (and beautiful) Haskell programs is close to (if not below) zero. Also, perhaps our mind might be suffering from severe case of strictness and might be completely unable to `think lazily'. So, we request your help in noticing obvious NO-NOs and stupid mistakes that we might have made. We even welcome completely different Haskell programs provided they adhere to the constraint mentioned above -- no mutation.
Best regards,
Kostis Sagonas and Per Gustafsson
I can't test this, but I have attached a new version of huffman.hs that may perform a bit better. I don't know if all the changes I made helped instead of hurt. I doubt it was sped up by much. -- Chris Kuklewicz --module Huffman where import System.IO import Data.Bits import Data.Word import Data.Array.IO import Data.Array.Unboxed hiding ((!)) import Data.Array.Base(unsafeAt) import System(getArgs) import System.CPUTime(getCPUTime) import Foreign.Marshal.Array (withArrayLen) import Control.Exception(bracket) data HuffTree = Leaf Word8 | Branch HuffTree HuffTree type A = UArray Int Word8 (!) = unsafeAt iter = 10 {-- the do_iter function repeats a function iter times it is not pretty, but it is hard to convince haskell to repeat a computation many times --} do_iter 1 func input = let x = func input in return x do_iter k func input = let x = func input in seq (last x) (do_iter (k-1) func input) main = do [arg] <- getArgs handle <- openFile arg ReadMode let size = 2000000 arrM <- newArray (0,pred size) 0 :: IO (IOUArray Int Word8) read_size <- hGetArray handle arrM size -- convert to immutable array arr <- unsafeFreeze arrM :: IO (UArray Int Word8) t0 <- getCPUTime res <- do_iter iter huff arr t1 <- getCPUTime putStr ((show ((fromInteger(t1-t0)::Float)/(1000000000000.0::Float)))) bracket (openBinaryFile (arg++".haskell") WriteMode) hClose (\file -> withArrayLen res (flip (hPutBuf file))) huff:: A -> [Word8] huff arr = let (hufftree, newindex) = build_tree 4 arr limit = get_32bit_int newindex arr in huffdecode ((newindex+4)*8) arr hufftree (limit+((newindex+4)*8)) huffdecode :: Int -> A -> HuffTree -> Int -> [Word8] huffdecode index arr tree limit = helper index tree where helper index (Leaf charval) | index == limit = [] | otherwise = charval : helper index tree helper index (Branch left right) | index `seq` True = helper (index+1) (if get_bit arr index then right else left) get_bit :: A -> Int -> Bool {-# INLINE get_bit #-} get_bit arr bitoffset = let byte = arr ! (shiftR bitoffset 3) in testBit (shiftL byte (bitoffset .&. 7)) 7 build_tree :: Int->A->(HuffTree,Int) build_tree index arr = let size = get_16_bitint index arr build_tree_2 index limit | (limit-index) == 1 = Leaf (arr ! index) | otherwise = let left_size = get_16_bitint index arr in Branch (build_tree_2 (index+2) (index+2+left_size)) (build_tree_2 (index+4+left_size) limit ) in (build_tree_2 (index+2) (index+2+size) ,(index+2+size)) get_16_bitint :: Int -> A -> Int {-# INLINE get_16_bitint #-} get_16_bitint index arr = (shiftL (fromIntegral (arr ! index)) 8) .|. (fromIntegral (arr ! (index+1))) get_32bit_int :: Int -> A -> Int {-# INLINE get_32bit_int #-} get_32bit_int index arr = (shiftL (fromIntegral (arr ! index)) 24) .|. (shiftL (fromIntegral (arr ! (index+1))) 16) .|. (shiftL (fromIntegral (arr ! (index+2))) 8) .|. (fromIntegral (arr ! (index+3)))

per.gustafsson:
Haskell gurus,
We have made a proposal to extend the Erlang `binary' data type from being a sequence of bytes (a byte stream) to being a sequence of bits (a bitstream) with the ability to do pattern matching at the bit level.
Our experience in writing efficient (and beautiful) Haskell programs is close to (if not below) zero. Also, perhaps our mind might be suffering from severe case of strictness and might be completely unable to `think lazily'. So, we request your help in noticing obvious NO-NOs and stupid mistakes that we might have made. We even welcome completely different Haskell programs provided they adhere to the constraint mentioned above -- no mutation.
Ok, I rewrote the drop3 program to use packed, unboxed arrays as the OCaml version does, instead of lazy boxed lists. It now runs in 3.5s on my linux box, which is around which is around what the OCaml version does. $ ghc B.hs $ ./a.out testdata.drop3 3.617 $ cmp testdata.drop3.haskell testdata.drop3.out Comparing lazy list IO against packed array IO is a bit silly, so I suggest you use the same buffer types in your Haskell code as you do in the OCaml code. Otherwise the comparisons aren't very meaningful. The problem is not so much laziness, as you suggest, but that you're using a completely unsuitable data type: lists, instead of (packed) strings. You can most likely just translate your other OCaml programs into Haskell as I have done here, which would be a good basis for a reasonable comparison. You may also find the Haskell performance resource useful, http://www.haskell.org/haskellwiki/Performance Cheers, Don

I have rewritten the Huffman benchmark (see
http://cse.unl.edu/~sjanssen/huffman.hs), resulting in a significant
performance increase. On my computer it completes one iteration in
about 0.9 seconds. In comparison, the O'Caml version takes around 3.1
seconds. These samples seem to indicate that this Haskell program
will compare favorably with Erlang.
I'd say the program is written in fairly idiomatic Haskell. Turning
the bytes in the buffer into a lazy stream of Bool's (see the
bitstream function) is both elegant and surprisingly efficient.
There is one infidelity in my implementation: the program writes the
output once per iteration, and this IO is included in the measured
time. This results in a small time penalty in comparison to the other
versions. I am still searching for a solution to this, but for now it
seems that printing the output is the cheapest way to force
evaluation.
Improvements and discussion are solicited.
Feel free to include this code in your website, if you'd like.
Spencer Janssen
On 3/22/06, Per Gustafsson
Haskell gurus,
We have made a proposal to extend the Erlang `binary' data type from being a sequence of bytes (a byte stream) to being a sequence of bits (a bitstream) with the ability to do pattern matching at the bit level.
This proposal has now been fully implemented all these at the level of the BEAM virtual machine and in the HiPE compiler. Most probably, they will be part of the next major Erlang/OTP open source release. (We write "most probably" because we do not really control what Ericsson desides to release as part of its open source system.)
We wanted to evaluate the performance of our implementation and the succintness of our syntax, particularly against other `similar' languages. For this reason, we wrote five reasonably representative benchmarks, showing the things that one could employ bit stream pattern matching and bit-level comprehensions for.
The benchmarks (drop3, five11, huffman, uudecode, and uuendode) have all been written in Erlang, O'Caml and Haskell. For some of them, C and Java versions exist. They can be found in the following homepage:
http://bitbenches.infogami.com/
As you will see there, the Haskell numbers are significantly slower than those of Erlang and O'Caml. We are wondering why this is so.
For each language, we have spent a considerable effort in writing the benchmarks in -- at least what we feel -- is the most natural and efficient way one can write them.
The only constraint we impose is that for functional languages, data structures without any explicit mutation have to be used in the part of the program for which measurements are taken.
Our experience in writing efficient (and beautiful) Haskell programs is close to (if not below) zero. Also, perhaps our mind might be suffering from severe case of strictness and might be completely unable to `think lazily'. So, we request your help in noticing obvious NO-NOs and stupid mistakes that we might have made. We even welcome completely different Haskell programs provided they adhere to the constraint mentioned above -- no mutation.
Best regards,
Kostis Sagonas and Per Gustafsson
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

It seems I have spoken too soon!
There is one infidelity in my implementation: the program writes the output once per iteration, and this IO is included in the measured time. Due to a few changes, this caveat no longer applies. As a result Haskell performs just a bit better. The code is still at http://cse.unl.edu/~sjanssen/huffman.hs. The old main is left for reference, renamed to main'.
Run time (in seconds) for 10 iterations: O'Caml: 35.9 Old Haskell: 25.6 New Haskell: 8.8 Spencer Janssen

Hi Thanks to everyone that have helped, the runtimes for the haskell programs have decreased significantly, presently they look like this compared to O'Caml: Benchmark haskell ocaml drop3 5.786 3.151 five11 8.657 7.692 huffman 7.134 18.593 uudecode 6.042 2.657 uuencode 7.775 2.824 The huffman benchmark should not really be that slow, but it seems that O'Caml programs that build large lists tend to be slow. The versions that I have used in this measurement is available at: http://bitbenches.infogami.com/ Many thanks to the people that have responded, and if you have more comments on the programs, particularily uudecode and uuencode (which I have mostly written myself), feel free to suggest further improvments. Note also that uudecode and uuencode for both O'Caml and Haskell breaks the rules slightly by creating an array for each row that is created. To end up with a list of rows that is than turned into one array. Is there a library fuction that can do this? I wrote a function myself, but it is not very pretty. Per
participants (5)
-
Chris Kuklewicz
-
David F. Place
-
dons@cse.unsw.edu.au
-
Per Gustafsson
-
Spencer Janssen