
Bulat Ziganshin wrote:
Hello Andrew,
Sunday, July 8, 2007, 7:07:59 PM, you wrote:
i don't think that ppm is so complex - it's just probability of symbol in some context. it's just too slow in naive implementation
Oh, sure, the *idea* is simple enough. Trying to actually *implement* it correctly is something else... ;-) (Same statemenst go for arithmetic coding, really.)
(The downside of course is that now we need a Huffman table in the output - and any rare symbols might end up with rather long codewords. But remember: Huffman *guarantees* 0% compression or higher on the payload itself. A Huffman-compressed payload can *never* be bigger, only smaller or same size. So long as you can encode the Huffman table efficiently, you should be fine...)
the devil in details. just imagine size of huffman table with 64k entries :) huffman encoding is inappropriate for lzw output simply because most words will have only a few occurrences and economy on their optimal encoding doesn't justify price of their entries in the table
...which is why you need to "encode the Huffman table efficiently", to quote myself. ;-) Using canonical Huffman, you only actually need to know how many bits were assigned to each symbol. This information is probably very ameanable to RLE. (Which, incidentally, is why I started this whole "parser on top of a phaser" crazyness in the first place.) So, yeah, there may be 64k symbols - but if only 1k of them are ever *used*... ;-)
.ru = Russia?
of course
My Russian is very rusty. ;-)
Oh hey, I think GHC is already pretty smart. But no optimiser can ever hope to cover *every* possible case. And transforming [Bool] -> [Bool] into UArray Word8 ->> UArray Word8 just seems a liiiiittle bit optimistic, methinks. ;-)
15 years ago i've written very smart asm program (btw, it was ARJ unpacker) with handmade function inlining, loop unrolling, register allocation, cpu recognition and so on. now, most of these tricks are standard for C compilers. times changes and now it's hard to imagine which optimizations will be available 10 years later
Yes, but there are limits to what an optimiser can hope to accomplish. For example, you wouldn't implement a bubble sort and seriously expect the compiler to be able to "optimise" that into a merge sort, would you? ;-)
ghc's native and via-C modes are blind vs lame. in native mode, its codegenerator is comparable with 20 years-old C codegenerators. see above how much modern C compilers changed in these years. in via-C mode it generates unnatural C code which is hard to optimize for any C compiler.
I'll take your word for it. ;-) (I have made cursory attempts to comprehend the inner workings of GHC - but this is apparently drastically beyond my powers of comprehension.)
the jhc is very different story
Yes - last I heard, it's an experimental research project rather than a production-ready compiler...