
Does there exist a library which allows me to have maps whose elements are maps whose elements ... with a convenient syntax. Alternatively, does there exist a library like Data.Tree where forests are sets rather than lists? -- Alex R

Alex, The containers library can do this already - there are no constraints on the elements of a Map. For example:
type TripleNestedMap a = Map Int (Map Char (Map String a))
But this is rather silly as you can just do:
type MapOfTriples a = Map (Int ,Char, String) a
for most uses.
Cheers,
Thomas
On Fri, Oct 8, 2010 at 2:23 PM, Alex Rozenshteyn
Does there exist a library which allows me to have maps whose elements are maps whose elements ... with a convenient syntax.
Alternatively, does there exist a library like Data.Tree where forests are sets rather than lists?
-- Alex R
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Thomas DuBuisson wrote:
On Fri, Oct 8, 2010 at 2:23 PM, Alex Rozenshteyn
wrote: Does there exist a library which allows me to have maps whose elements are maps whose elements ... with a convenient syntax. The containers library can do this already - there are no constraints on the elements of a Map. For example:
type TripleNestedMap a = Map Int (Map Char (Map String a))
But this is rather silly as you can just do:
type MapOfTriples a = Map (Int ,Char, String) a
for most uses.
True, but I think this:
Alternatively, does there exist a library like Data.Tree where forests are sets rather than lists?
suggests that the OP wants something different, rather like import Data.Map data MapTree a = Leaf | Node (Map a (MapTree a)) I don't know if somebody wrote a library around such a type. Alex, could you give an example for what would be 'a convenient syntax' for such a type? What operations do you think should be available to make using it convenient? Cheers Ben

On 10/8/10 5:46 PM, Thomas DuBuisson wrote:
Alex,
The containers library can do this already - there are no constraints on the elements of a Map. For example:
type TripleNestedMap a = Map Int (Map Char (Map String a))
But this is rather silly as you can just do:
type MapOfTriples a = Map (Int ,Char, String) a
for most uses.
However, Map is a lot less efficient than IntMap when dealing with Ints. And the IntMap (IntMap ... a) type requires you to write (lookup m <=< ... <=< lookup n) instead of just one lookup. Unfortunately, when you're interested in performance issues, the standard tricks for implementing polyvariadic functions aren't very useful. FWIW, the monadic combinators are usually sufficient to create your own functions legiblely (e.g., using (<=<) for lookup), but it's still a lot noiser than it could be--- especially if you want a trie instead of a product map, since you have to add fst and snd everywhere. I've been playing around with some ad-hoc tries like these a lot lately, both for HMM tagging and for hunting down performance issues in bytestring-trie. -- Live well, ~wren

This came up as I was doing homework for natural language processing.
I'm constructing a trigram model from training data, but I also need the
bigram and unigram counts. I could, for each triple of characters, add the
3 tuple to a trigram map and increment its count (I know I'm not actually
mutating anything), add the last two letters to a bigram map, and add the
last character to a unigram map.
But this looks very much like a tree... each node contains the character
(the key) and list of characters that can appear after it. (I think this is
a trie, but I don't know very much about tries.) The problem is that lookup
is more horribly inefficient than I can stand, if I use lists.
As for what constitutes convenient, I haven't thought about it that much; I
just noted that my naive attempt was full of boilerplate and decided that
since there was already python code provided I used that. I'm planning to
port the code *after* I have the assignment finished.
On Fri, Oct 8, 2010 at 11:18 PM, wren ng thornton
On 10/8/10 5:46 PM, Thomas DuBuisson wrote:
Alex,
The containers library can do this already - there are no constraints on the elements of a Map. For example:
type TripleNestedMap a = Map Int (Map Char (Map String a))
But this is rather silly as you can just do:
type MapOfTriples a = Map (Int ,Char, String) a
for most uses.
However, Map is a lot less efficient than IntMap when dealing with Ints. And the IntMap (IntMap ... a) type requires you to write (lookup m <=< ... <=< lookup n) instead of just one lookup. Unfortunately, when you're interested in performance issues, the standard tricks for implementing polyvariadic functions aren't very useful.
FWIW, the monadic combinators are usually sufficient to create your own functions legiblely (e.g., using (<=<) for lookup), but it's still a lot noiser than it could be--- especially if you want a trie instead of a product map, since you have to add fst and snd everywhere. I've been playing around with some ad-hoc tries like these a lot lately, both for HMM tagging and for hunting down performance issues in bytestring-trie.
-- Live well, ~wren
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Alex R

What you describe sounds like a perfect job for a trie, so that's what I think you should look into. - Jake

Hmm.
It seems that both the trie packages I found just provide a standard
map-like interface, without exposing their trie-ness. Makes sense, but
limits usefulness for me.
Thanks.
On Sat, Oct 9, 2010 at 3:08 PM, Jake McArthur
What you describe sounds like a perfect job for a trie, so that's what I think you should look into.
- Jake
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Alex R

On 10/9/10 2:02 AM, Alex Rozenshteyn wrote:
This came up as I was doing homework for natural language processing.
I'm constructing a trigram model from training data, but I also need the bigram and unigram counts. I could, for each triple of characters, add the 3 tuple to a trigram map and increment its count (I know I'm not actually mutating anything), add the last two letters to a bigram map, and add the last character to a unigram map.
But this looks very much like a tree... each node contains the character (the key) and list of characters that can appear after it. (I think this is a trie, but I don't know very much about tries.) The problem is that lookup is more horribly inefficient than I can stand, if I use lists.
Yep, that's a trie :) For what it's worth, in the HMM tagger I'm working on now, I'm using something to the effect of: type IntTrie a = IntMap (a, IntTrie a) Of course that's not a valid type synonym and I'm not really using infinite depth (though there's no reason not to allow it), but that's the idea. The Ints are gotten by interning the word/tag strings and I actually do a lot of newtype wrapping to ensure that my IDs for words, tags, etc and their maps are all kept distinct. In order to update your counts during training you can use things like: succZ z = alter (Just . succ . maybe 0 id) z succYZ y z = alter (Just . (succ *** succZ z) . maybe (0,empty) id) y succYZ x y z = alter (Just . (succ *** succYZ y z) . maybe (0,empty) id) x And to look things up you can use things like: lookupZ z = maybe 0 id . fst . lookup z lookupYZ y z = lookupZ z . snd <=< lookup y lookupXYZ x y z = lookupYZ y z . snd <=< lookup x Clearly there's a lot of boilerplate there, but it works and it's very efficient. Ironically, combining the unigram, bigram, and trigram tables actually decreases performance (slightly) on my benchmarks; but I'm keeping them combined in order to reduce memory pressure. You can also take a look at bytestring-trie[1] which is very similar and provides a nicer interface. Unfortunately, there's currently a big performance hit for using bytestring-trie, which I'm in the process of tracking down. Most of it is probably because it's Word8-based instead of Int-based, but I'd like to be sure. bytestring-trie does expose the trie-ness (more to come in the HEAD version). Of course, for HMM stuff, you don't really take advantage of the trie structure, so there's nothing special about using a trie instead of a Map NGram where, data NGram = Unigram {-# UNPACK #-} !Int | Bigram {-# UNPACK #-} !Int {-# UNPACK #-} !Int | Trigram {-# UNPACK #-} !Int {-# UNPACK #-} !Int {-# UNPACK #-} !Int deriving (Eq, Ord, Read, Show) Expanding that out to three different IntMaps or the nested IntMaps is just a performance improvement, which I'm sure won't be important for your homework (i.e., it's not asymptotically significant). [1] http://hackage.haskell.org/package/bytestring-trie -- Live well, ~wren

On 10/08/2010 04:23 PM, Alex Rozenshteyn wrote:
Does there exist a library which allows me to have maps whose elements are maps whose elements ... with a convenient syntax.
It sounds like you might be looking for a trie of some sort. Would something like the TrieMap package suit your needs? It's hard to guess based only on the question as posed. - Jake
participants (5)
-
Alex Rozenshteyn
-
Ben Franksen
-
Jake McArthur
-
Thomas DuBuisson
-
wren ng thornton