
On Mon, Mar 30, 2015 at 4:38 AM, Henning Thielemann
On Sun, 29 Mar 2015, wren romano wrote:
-------------------------------------------- -- logfloat 0.13.3 --------------------------------------------
This package provides a type for storing numbers in the log-domain, primarily useful for preventing underflow when multiplying many probabilities as in HMMs and other probabilistic models. The package also provides modules for dealing with floating numbers correctly.
I am currently working on http://hub.darcs.net/thielema/hmm-hmatrix
It does not need log-numbers because it normalizes all temporary results. This way I can use fast hmatrix operations. Would normalization also be a solution for other probabilistic models?
For many models, normalization isn't computationally feasible. Even for HMMs, when the tag/state space is large, I shudder to think of the overhead. The best cost I can imagine for normalizeFactor is O(log S); thus, increasing the S-dependent factor of the forward/backward algorithm's complexity from O(S^2) to O(S^2*log S). And that's at best; a naive implementation would cost O(S), making the forward/backward algorithm cubic in the size of the tag/state space! For small models that may be allowable, but for the sorts of models I work with that's an unacceptable slowdown. Even with normalization, your code would benefit from using logfloat (or some of the tricks included therein). E.g., your implementation of Math.HiddenMarkovModel.Normalized.logLikelihood will introduce a lot of unnecessary error, due to iterated use of binary (+) on Floating values. (The NC.sumElements function used in normalizeFactor may suffer the same implementation problem; but it's unclear.) The 'product' function for LogFloats performs Kahan summation in order to reduce the loss of precision due to repeated addition. (And in future versions of the library, I'm working on alternative implementations which sacrifice the single-pass property of Kahan summation in order to *completely* eliminate error of summations.) For this particular trick, you could also use Edward Kmett's "compensated" library -- Live well, ~wren