
G'day all.
Quoting Bulat Ziganshin
in this case why you proposes to encode lzw words with a huffman codes? :)
I don't. :-)
oops. ppm build tree of contexts and use context to encode one char. lzw builds dictionary of words and encode word in empty context.
As you noted later, the "dictionary of words" in LZW is also tree- structured. "Words", as you call them, are built by extending words with single characters, in pretty much exactly the same way that PPM contexts are built by extending contexts with single characters. The main difference is this: To encode a "word" in PPM, you encode the conditional probability that it takes to get to the end of the word from the start of the word. It looks like you're encoding single characters (and, programmatically, you are), but because of the way that arithmetic coding works, it's mathematically equivalent to encoding the probability of finding the word as a whole. You're just decomposing the probability of finding the word into the probabilities of finding the individual input symbols along the way. Does that make sense?
you are right. so lzw is just dictionary-based transformation which replaces some words with special codes while ppm replaces chars with their probabilities. i hope you will agree that ppm with flat codes will be totally useless
Absolutely. But augmenting LZW with probabilities to allow for arithmetic coding wouldn't be so dumb, if it weren't for the fact that a) we already have more efficient compression algorithms, and b) it'd destroy the main benefit of LZW (which is that it's effective on slow CPUs with small memories; perfect for 80s-era micros and 8-bit embedded systems).
about which particular algorithm you said? Moffat?
Yes, though if your local university library has a copy, "Managing Gigabytes" might be a better introduction to the topic. Cheers, Andrew Bromage