
Ross Paterson wrote:
On Thu, Sep 14, 2006 at 05:22:05PM +0200, Bertram Felgenhauer wrote:
[much subtle code] We can now build the splitStream function, using the following helper function:
splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b])
This works for infinite lists if there is no balancing, but if insert does balancing, the top of the map will not be available until the last key is seen, so splitSeq' could only be used for finite chunks. Then you'll need a way to put the partial answers together.
Just to prove the point, here's the same code with balancing:
SNIP HERE (end marked with <<<) >>>
module SplitSeq (splitSeq) where import Prelude hiding (lookup, map) -- our map data Map k a = Node !Int k a (Map k a) (Map k a) | Leaf deriving Show size :: Map k a -> Int size Leaf = 0 size (Node s _ _ _ _) = s member :: Ord k => k -> Map k a -> Bool member _ Leaf = False member k (Node _ k' _ l r) = case compare k k' of LT -> member k l EQ -> True GT -> member k r -- insert key into blueprint and extract the corresponding value from -- the second argument, threading it backward through all operations insert :: Ord k => k -> Map k () -> Map k a -> (Map k (), a, Map k a) insert k Leaf ~(Node _ _ a _ _) = (Node 1 k () Leaf Leaf, a, Leaf) insert k (Node s k' _ l r) node = case compare k k' of LT -> let (m, a, l'') = insert k l l' (m', a', l', r') = balance k' m r node in (m', a, Node s k' a' l'' r') EQ -> error "inserting existing element" GT -> let (m, a, r'') = insert k r r' (m', a', l', r') = balance k' l m node in (m', a, Node s k' a' l' r'') -- balance and co are taken from Data.Map and adapted balance k l r node | size l + size r <= 1 = let Node _ _ a l' r' = node in (mkNode k () l r, a, l', r') | size r >= 5 * size l = rotateL k l r node | size l >= 5 * size r = rotateR k l r node | otherwise = let Node _ _ a l' r' = node in (mkNode k () l r, a, l', r') rotateL k l r@(Node _ _ _ l' r') node | size l' < 2*size r' = singleL k l r node | otherwise = doubleL k l r node rotateR k l@(Node _ _ _ l' r') r node | size r' < 2*size l' = singleR k l r node | otherwise = doubleR k l r node singleL k l (Node s k' _ m r) ~(Node _ _ a ~(Node _ _ a' l' m') r') = (mkNode k' () (mkNode k () l m) r, a', l', Node s k' a m' r') singleR k (Node s k' _ l m) r ~(Node _ _ a l' ~(Node _ _ a' m' r')) = (mkNode k' () l (mkNode k () m r), a', Node s k' a l' m', r') doubleL k l (Node s k' _ (Node s' k'' _ ml mr) r) ~(Node _ _ a ~(Node _ _ a' l' ml') ~(Node _ _ a'' mr' r')) = (mkNode k'' () (mkNode k () l ml) (mkNode k' () mr r), a', l', Node s k' a'' (Node s' k'' a ml' mr') r') doubleR k (Node s k' _ l (Node s' k'' _ ml mr)) r ~(Node _ _ a ~(Node _ _ a' l' ml') ~(Node _ _ a'' mr' r')) = (mkNode k'' () (mkNode k' () l ml) (mkNode k () mr r), a'', Node s k' a' l' (Node s' k'' a ml' mr'), r') -- make a new node with the correct size mkNode k x l r = Node (size l + size r + 1) k x l r -- update the element associated with the given key update :: Ord k => k -> (a -> a) -> Map k x -> Map k a -> Map k a update k f (Node s k' _ l r) ~(Node _ _ a' l' r') = case compare k k' of LT -> Node s k' a' (update k f l l') r' EQ -> Node s k' (f a') l' r' GT -> Node s k' a' l' (update k f r r') -- standard map function, no blueprints here map :: (a -> b) -> Map k a -> Map k b map _ Leaf = Leaf map f (Node s k a l r) = Node s k (f a) (map f l) (map f r) -- finally, define splitSeq splitSeq :: Ord a => [(a,b)] -> [(a,[b])] splitSeq = fst . splitSeq' Leaf splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b]) splitSeq' bp [] = ([], map (const []) bp) splitSeq' bp ((a,b):xs) = case member a bp of True -> let (l, m) = splitSeq' bp xs in (l, update a (b:) bp m) False -> let (bp', a', m) = insert a bp m' (l, m') = splitSeq' bp' xs in ((a, b : a') : l, m) <<< The balancing code is adopted from Data.Map. I added threading back the result map (from splitSeq) to the balancing operations. This results in the promised O(n*log m) time. The cost associated with a lazy pattern (it creates a closure for each bound variable, IIRC) is quite high but constant, so each call to insert takes O(log m) time, including future forces of the lazy patterns. Likewise, member and update are also O(log m). Example test: Prelude SplitSeq> print $ take 11 $ snd $ splitSeq ([(n `mod` 10000, 'a') | n <- [1..100000]] ++ [(n, 'b') | n <- [0..]]) !! 9534 "aaaaaaaaaab" (1.07 secs, 137026252 bytes) The nonbalancing version would take ages. regards, Bertram