
On 11/5/10 7:54 AM, Alexey Khudyakov wrote:
We already know that there are noncommutative modules/vectorspaces of interest (e.g., modules over quaternions and modules over graph paths), why not support them from the beginning? It seems like you're going out of your way to exclude things that would be trivial to include. This is exactly why this is my standard complaint against the various proposals out there for new numeric hierarchies. People who are used to only using R^n think the proposals are just fine, but none of the proposals capture the structures I work with daily. Which means the new proposals are no better than the Prelude for me.
Could you tell what data structures do you use? It's difficult to think about them without concrete examples.
Data structures? That varies a lot depending on the task: Data.Map, Data.Set, Data.IntMap, Data.IntSet, Data.Trie, Data.ByteString... A lot of my concrete examples of semirings and modules come from natural language processing tasks. One example I already mentioned is the semiring of a collection of paths over a graph (so something like Data.Set (Data.Seq Arc)). Path collections show up, for example, when dealing with Markov chains and HMMs where the goal is to maximize or sum the weights over all paths. To make it clearer, a Markov chain is a probabilistic version of a finite state automaton, so you have some set of nodes, and the arcs for transitioning from one node to another have probabilistic weights on them. An HMM is an extension of a Markov chain into a probabilistic version of a Moore machine, so in addition to the probabilistic transitions from state to state, we also have for each state a probability distribution over emitted symbols. An interesting problem for HMMs is this: given some observed sequence of emitted symbols, reconstruct the most likely path of states which would cause the symbol sequence to be emitted. A nearly identical problem is: given some observed sequence of symbols, determine the total probability of all state sequences which could have generated it. In addition to the perspective of HMMs as probabilistic Moore machines, there are two other perspectives which are helpful. One is the graphical model perspective where we have something that looks a bit like this (in fixed-width font): Q0 -> Q1 -> Q2 -> ... -> Qn | | | v v v S1 S2 Sn Each Sk and Qk are random variables. The random variables Qk represent being in some particular state q at time k, and the choice of which state is drawn from a probability distribution based on the state Q(k-1). The random variables Sk represent emitting some particular symbol s at time k, and the choice of which symbol is drawn from a distribution based on the state Qk. The third perspective, which is the most helpful one for solving our two problems, is if we take this graphical model and unfold it into a trellis graph (ignoring the Sk variables for now). Each node in the trellis represents an assignment of particular values to each of the random variables. So if Q1 could take on values qA, qB, and qC then we'd have three nodes for each of Q1=qA, Q1=qB, and Q1=qC. The arcs in the trellis are weighted with the probability of transitioning from one node to the next; so an arc Q1=q1 -> Q2=q2 has weight Pr(Q2=q2 | Q1=q1). A path through the trellis represents a variable assignment, which is to say a sequence of states in the Markov chain; and the weight of the path is the probability of the Markov chain taking that path. There is a general algorithm for solving the two problems I mentioned, and ultimately they're the same algorithm except with different semirings. Note that a collection of paths between two points on a graph forms a semiring[1] where sum is the union of path collections and product is the extension of paths[2], so the answers we want can be gotten by semiring homomorphisms from the collection of paths to some other domain. To get the probability of all state sequences which could give rise to a given symbol sequence we can use the probability semiring[3] ---which we can simplify to the metric space [0..1] with (+) and (*), since the algorithm ensures that all events are disjoint. This version is called the "forward algorithm". To get the probability the most likely state sequence we can use the semiring [0..1] with max and (+), which is called the "Viterbi algorithm". In practice we tend to use the log version of these semirings in order to prevent underflow. There's also a variant of the Viterbi algorithm which stores backpointers to the most likely previous state, which makes it easier to recover the most likely state sequence instead of just the (log)probability of the sequence. The Viterbi algorithm with back pointers is also a semiring: Maybe(Prob, Maybe State) with Nothing as zero, Just(1,Nothing) as one, argmax as sum: mx<+>my = do { (px,_) <- mx ; (py,_) <- my ; if px > py then mx else my }, and product: mx<*>my = do { (px,x) <- mx ; (py,y) <- my ; Just (px*py, y `mplus` x)}. The collection-of-paths semiring is noncommutative since extending the end of a path is different than extending the beginning. The Viterbi with backpointers semiring is noncommutative since we'll get different backpointers depending on the order of arguments to the product. If we extend our probabilities to use quantum probability theory then all of these probabilistic semirings become noncommutative because of order effects on quantum probabilities. These HMM problems can also be thought of from the perspective of a grammar, where HMMs happen to be restricted to produce linear derivation "trees". Looking at other grammars like probabilistic CFGs allows us to extend the idea of generating sequences to the idea of generating trees. The forward-backward algorithm for sequences (of which the forward algorithm is half) generalizes to the inside-outside algorithm for trees. Ultimately, chart parsing algorithms like CKY are doing something very similar to the Viterbi algorithm in order to determine the most likely parse tree which would give rise to some observed sentence. So all these semirings have tree-based analogues as well. This use of semirings is just like the use of monoids in finger trees[4], except that we generalize the idea from a monoid on one tree into a semiring on a packed forest of trees. Naturally there are many such semirings which are noncommutative since adding a left-child to a tree is different than adding a right-child. [1] Or actually a 2-semiring, much as a groupoid (2-group) is a generalization of groups and a category (2-monoid) is a generalization of a monoid. [2] Note that the collection of paths semiring also has something like a module structure. The "scalars" are single arcs in the underlying graph. I only say that it's something like a module/vector space, because whether the family of arcs supports its own semiring/ring/field operations depends on the specific graph in question. If we assume certain closure properties on the family of arcs, similar to the transitive closure properties of composition in categories, then it is indeed a module. However, we do not have those closure properties for the specific example in question. [3] That is the event space E with operations <+> and <*> with the empty event as zero and the certainty event as one. Colloquially <+> is the union of events and <*> is the intersection. For x,y in E their probabilities are defined by Pr(x<+>y) = Pr(x) + Pr(y) - Pr(x,y) and Pr(x<*>y) = Pr(x,y) where Pr(x,y) = Pr(x)*Pr(y) iff x and y are independent. [4] http://apfelmus.nfshost.com/articles/monoid-fingertree.html -- Live well, ~wren