
This came up as I was doing homework for natural language processing.
I'm constructing a trigram model from training data, but I also need the
bigram and unigram counts. I could, for each triple of characters, add the
3 tuple to a trigram map and increment its count (I know I'm not actually
mutating anything), add the last two letters to a bigram map, and add the
last character to a unigram map.
But this looks very much like a tree... each node contains the character
(the key) and list of characters that can appear after it. (I think this is
a trie, but I don't know very much about tries.) The problem is that lookup
is more horribly inefficient than I can stand, if I use lists.
As for what constitutes convenient, I haven't thought about it that much; I
just noted that my naive attempt was full of boilerplate and decided that
since there was already python code provided I used that. I'm planning to
port the code *after* I have the assignment finished.
On Fri, Oct 8, 2010 at 11:18 PM, wren ng thornton
On 10/8/10 5:46 PM, Thomas DuBuisson wrote:
Alex,
The containers library can do this already - there are no constraints on the elements of a Map. For example:
type TripleNestedMap a = Map Int (Map Char (Map String a))
But this is rather silly as you can just do:
type MapOfTriples a = Map (Int ,Char, String) a
for most uses.
However, Map is a lot less efficient than IntMap when dealing with Ints. And the IntMap (IntMap ... a) type requires you to write (lookup m <=< ... <=< lookup n) instead of just one lookup. Unfortunately, when you're interested in performance issues, the standard tricks for implementing polyvariadic functions aren't very useful.
FWIW, the monadic combinators are usually sufficient to create your own functions legiblely (e.g., using (<=<) for lookup), but it's still a lot noiser than it could be--- especially if you want a trie instead of a product map, since you have to add fst and snd everywhere. I've been playing around with some ad-hoc tries like these a lot lately, both for HMM tagging and for hunting down performance issues in bytestring-trie.
-- Live well, ~wren
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Alex R