Re: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

9 Jul 2007

      Bulat Ziganshin wrote:
...
Hello Andrew,
Sunday, July 8, 2007, 7:07:59 PM, you wrote:
i don't think that ppm is so complex - it's just probability of
symbol in some context. it's just too slow in naive implementation
Oh, sure, the *idea* is simple enough. Trying to actually *implement* it 
correctly is something else... ;-)

(Same statemenst go for arithmetic coding, really.)
...
...
(The downside of course is that now we need a Huffman table in the 
output - and any rare symbols might end up with rather long codewords.
But remember: Huffman *guarantees* 0% compression or higher on the 
payload itself. A Huffman-compressed payload can *never* be bigger, only
smaller or same size. So long as you can encode the Huffman table 
efficiently, you should be fine...)
the devil in details. just imagine size of huffman table with 64k
entries :)  huffman encoding is inappropriate for lzw output simply
because most words will have only a few occurrences and economy on
their optimal encoding doesn't justify price of their entries in the table
...which is why you need to "encode the Huffman table efficiently", to 
quote myself. ;-)

Using canonical Huffman, you only actually need to know how many bits 
were assigned to each symbol. This information is probably very 
ameanable to RLE. (Which, incidentally, is why I started this whole 
"parser on top of a phaser" crazyness in the first place.) So, yeah, 
there may be 64k symbols - but if only 1k of them are ever *used*... ;-)
...
...
.ru = Russia?
of course
My Russian is very rusty. ;-)
...
...
Oh hey, I think GHC is already pretty smart. But no optimiser can ever
hope to cover *every* possible case. And transforming [Bool] -> [Bool]
into UArray Word8 ->> UArray Word8 just seems a liiiiittle bit
optimistic, methinks. ;-)
15 years ago i've written very smart asm program (btw, it was ARJ
unpacker) with handmade function inlining, loop unrolling, register
allocation, cpu recognition and so on. now, most of these tricks are
standard for C compilers. times changes and now it's hard to imagine which
optimizations will be available 10 years later
Yes, but there are limits to what an optimiser can hope to accomplish.

For example, you wouldn't implement a bubble sort and seriously expect 
the compiler to be able to "optimise" that into a merge sort, would you? ;-)
...
ghc's native and via-C modes are blind vs lame. in native mode, its
codegenerator is comparable with 20 years-old C codegenerators. see
above how much modern C compilers changed in these years. in via-C
mode it generates unnatural C code which is hard to optimize for any C
compiler.
I'll take your word for it. ;-)

(I have made cursory attempts to comprehend the inner workings of GHC - 
but this is apparently drastically beyond my powers of comprehension.)
...
the jhc is very different story
Yes - last I heard, it's an experimental research project rather than a 
production-ready compiler...

Re: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

Andrew Coppin