
Bulat Ziganshin wrote:
Hello Donald,
Good work. Probably worth benchmarking against the other compression libraries
are you really want to totally discredit Haskell? :) they should be hundreds of times slower than any practical compression algorithm (and btw, zlib/bzlib isn't good performers anyway, my own algorithm is several times faster with the same compression ratio)
Haskell isn't a good tool to develop compression algorithms because it's the very well studied area where it has meaning to use all the sorts of optimizations. so it falls in the "low-level algorithms" category where using Haskell means at least 3x slower development and 3x worse performance - or faster development with 100x worse performance. Andrew's code should fall into later category
Indeed. I'm more interested in which algorithms produce the best compression ratios than how to implement them fast. Currently the code uses very general, flexible, polymorphic types all over the place, everything is normal lists, I'm using monadic parsing rather than bit-twiddling, etc etc etc. It goes alright with 100 KB files on my monster PC at home, but toss a 4 MB file over there and it takes a minute or two to compress. Hardly a match for a "practical" compression solution that's been optimised to within inches of its life in C or even assembly. ;-) I mentioned that my LZW algorithm isn't even as efficient as it could be - it uses 16 bits per symbol rather than being variable. Partly that's because it's easier to code. But partly that's so that I can write a 16-bit Huffman compressor and run it over the top. (LZW + Huffman being a very common combination.) And that's really what this is - a toolbox for throwing algorithms together to see what they do. OTOH, if the Gods behind GHC want to take a look and figure out whether there's any magic that can be added to the optimiser, I wouldn't complain... ;-) (Realistically though. My program takes a [Word8] and turns it into a [Bool] before running a parser over it. The GHC optimiser doesn't really stand a hope in hell of optimising that into a program that reads a machine word into a CPU register and starts playing with bit flips on it...) PS. Are those zlib libraries actually written in Haskell? Or are they a thin layer on top of a C library? PPS. Does GHC make use of MMX, SSE, et al?