On Tue, Mar 24, 2009 at 3:15 PM, Gü?nther Schmidt <gue.schmidt@web.de> wrote:
Hi,
let say I got an unordered lazy list of key/value pairs like
[('a', 99), ('x', 42), ('a', 33) ... ]
and I need to sum up all the values with the same keys.
So far I wrote a naive implementation, using Data.Map, foldl and insertWith.
The result of this grouping operation, which is effectively another list
of key/value pairs, just sums this time, needs to be further processed.
The building of this map is of course a bottleneck, the successive
processing needs to wait until the entire list is eventually consumed
the Map is built and flattened again.
Is there another way of doing this, something more "streaming
architecture" like?
Yeah, make a trie. Here's a quick example.
import Data.Monoid
newtype IntTrie a = IntTrie [a]
singleton :: (Monoid a) => Int -> a -> IntTrie a
singleton ch x = IntTrie [ if fromIntegral ch == i then x else mempty | i <- [0..] ]
lookupTrie :: IntTrie a -> Int -> a
lookupTrie (IntTrie xs) n = xs !! n
instance (Monoid a) => Monoid (IntTrie a) where
mempty = IntTrie (repeat mempty)
mappend (IntTrie xs) (IntTrie ys) = IntTrie (infZipWith mappend xs ys)
infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys
test = mconcat [ singleton (n `mod` 42) [n] | n <- [0..] ] `lookupTrie` 10
This is an inefficient way to find the class of n such that n mod 42 = 10. Note that it works on an infinite list of inputs.
Here the "trie" was a simple list, but you could replace it with a more advanced data structure for better performace.
Luke