Re: ANN: logfloat 0.13.3

7 Apr 2015

      On Mon, Mar 30, 2015 at 4:38 AM, Henning Thielemann
 wrote:
...
On Sun, 29 Mar 2015, wren romano wrote:
...
--------------------------------------------
-- logfloat 0.13.3
--------------------------------------------
This package provides a type for storing numbers in the log-domain,
primarily useful for preventing underflow when multiplying many
probabilities as in HMMs and other probabilistic models. The package
also provides modules for dealing with floating numbers correctly.
I am currently working on
   http://hub.darcs.net/thielema/hmm-hmatrix
It does not need log-numbers because it normalizes all temporary results.
This way I can use fast hmatrix operations. Would normalization also be a
solution for other probabilistic models?
For many models, normalization isn't computationally feasible.

Even for HMMs, when the tag/state space is large, I shudder to think
of the overhead. The best cost I can imagine for normalizeFactor is
O(log S); thus, increasing the S-dependent factor of the
forward/backward algorithm's complexity from O(S^2) to O(S^2*log S).
And that's at best; a naive implementation would cost O(S), making the
forward/backward algorithm cubic in the size of the tag/state space!
For small models that may be allowable, but for the sorts of models I
work with that's an unacceptable slowdown.

Even with normalization, your code would benefit from using logfloat
(or some of the tricks included therein). E.g., your implementation of
Math.HiddenMarkovModel.Normalized.logLikelihood will introduce a lot
of unnecessary error, due to iterated use of binary (+) on Floating
values. (The NC.sumElements function used in normalizeFactor may
suffer the same implementation problem; but it's unclear.) The
'product' function for LogFloats performs Kahan summation in order to
reduce the loss of precision due to repeated addition. (And in future
versions of the library, I'm working on alternative implementations
which sacrifice the single-pass property of Kahan summation in order
to *completely* eliminate error of summations.) For this particular
trick, you could also use Edward Kmett's "compensated" library

-- 
Live well,
~wren