[Haskell-cafe] Re: ANN: bitspeak 0.0.1

23 Jun 2010


      ...
...
Sure, Huffman was actually my first tought. But I couldn't think
of a pratical display for the result of Huffman encoding that
could be easily followed by a human looking at the screen. Since
it's an optimal code, letters would not be grouped in alphabetical
order.
There is a compromise.
There is such a thing as an ORDERED Huffman code.
Consider a set of strings.
If they call begin with the same first letter,
assume that letter and consider the suffixes instead.
Otherwise, choose a letter L such that
as close as possible to half of the strings begin
with a letter preceding L in the alphabet
as close as possible to half of the strings begin
with the letter L or a later letter.
I believe that's what I've done. I use this file to give weight
to letters:

   http://bitbucket.org/mauricio/bitspeak/src/tip/src/Corpora.hs

After each letter is selected I keep only sufixes after that
letter, and then I use 'splitData' to find the point where
I'll separate both halves:

   http://bitbucket.org/mauricio/bitspeak/src/tip/src/Main.hs

Best,

Maurício

[Haskell-cafe] Re: ANN: bitspeak 0.0.1

Maurí­cio CA

Maurício CA