
Hello Andrew, Sunday, July 8, 2007, 7:07:59 PM, you wrote:
Actually, LZW works surprisingly well for such a trivial little algorithm... When you compare it to the complexity of an arithmetic coder driven by a high-order adaptive PPM model (theoretically the best general-purpose algorithm that exists), it's amazing that anything so simple as LZW should work at all!
i don't think that ppm is so complex - it's just probability of symbol in some context. it's just too slow in naive implementation
there is no much meaning to use huffman with lzw because many words will have only one or two occurrences
(The downside of course is that now we need a Huffman table in the output - and any rare symbols might end up with rather long codewords. But remember: Huffman *guarantees* 0% compression or higher on the payload itself. A Huffman-compressed payload can *never* be bigger, only smaller or same size. So long as you can encode the Huffman table efficiently, you should be fine...)
the devil in details. just imagine size of huffman table with 64k entries :) huffman encoding is inappropriate for lzw output simply because most words will have only a few occurrences and economy on their optimal encoding doesn't justify price of their entries in the table
.ru = Russia?
of course
(Realistically though. My program takes a [Word8] and turns it into a [Bool] before running a parser over it. The GHC optimiser doesn't really stand a hope in hell of optimising that into a program that reads a machine word into a CPU register and starts playing with bit flips on it...)
as time goes, those terrible compilers becomes smarter and smarter :)
Oh hey, I think GHC is already pretty smart. But no optimiser can ever hope to cover *every* possible case. And transforming [Bool] -> [Bool] into UArray Word8 ->> UArray Word8 just seems a liiiiittle bit optimistic, methinks. ;-)
15 years ago i've written very smart asm program (btw, it was ARJ unpacker) with handmade function inlining, loop unrolling, register allocation, cpu recognition and so on. now, most of these tricks are standard for C compilers. times changes and now it's hard to imagine which optimizations will be available 10 years later
The way I heard it is that it *does* native code generation, but it's "not as good" as the via-C version. (Can't imagine why... You would have thought directly implementing functional primitives would be much easier than trying to mangle them to fit C's multitude of limitations. Still, what do I know?)
ghc's native and via-C modes are blind vs lame. in native mode, its codegenerator is comparable with 20 years-old C codegenerators. see above how much modern C compilers changed in these years. in via-C mode it generates unnatural C code which is hard to optimize for any C compiler. the jhc is very different story - it generates natural C code, so for simple, "low-level" programs its speed is the same as with C. but it lacks huge base of high-level GHC optimizations. as you may guess, in order to have C-like speed, compiler should implement both good high-level and low-level optimization. there are (low-priority) plans to provide LLVM backend for ghc which may solve this problem. but actually speed of generated code isn't number 1 priority for ghc users -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com