[Haskell-cafe] Re: Optimization problem

14 Sep 2006

      Bertram Felgenhauer wrote:
...
...
splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b])
splitSeq' bp []         = ([], map (const []) bp)
splitSeq' bp ((a,b):xs) = case lookup a bp bp of
    Just _ -> let (l, m) = splitSeq' bp xs in (l, update a (b:) bp m)
    _      -> let (bp', m) = insert a bp m'
                  (l, m')  = splitSeq' bp' xs
                in
                  ((a, b : (fromJust $ lookup a bp' m')) : l, m)
splitSeq' takes a blueprint for a map with all keys seen so far, and
a list tail. It returns the result list for all new keys, and a map
(corresponding to the given blueprint) with the tails of the seen
elements.
The in the base case it just fills the blueprint with empty lists and
returns to the caller.
If a new element is seen, insert is used to create a new blueprint,
including the new key a, which is passed to the recursive call of
splitSeq'. The resulting map from that call is threaded back into
insert, which gives us a new map without the a key which matches
the structure of the original blueprint.
Very interesting! So the map with the seen tails matches the blueprint
and therefore throws away information (the future keys) from the future.
This is what effectively allows the key-tree structure to be rebalanced.
   If one would return the tails-map with all future keys, it would be
_|_ because the key-order in the tree depends on the future keys and
changes everytime when a new key is found.

I thought that there can only be a static solution, i.e. like the one
Ross Paterson presented where the structure (I mean the outermost
constructors) of the returned tree are determined before the future.
This obviously excludes rebalancing.

I found a static solution by trying to fit the key-tails pairs into an
infinite tails-map which is filled "first comes first":
  map ! 1 := (first key seen, tails)
  map ! 2 := (second key seen, tails)
By keeping another key-position-map around which records where each key
has been inserted, so that the future knows where to put the tails. The
point is that the structure of tails-map is known before the future
comes as its keys are just 1,2,3,... and so on.

It remains to construct such an infinite random access list, but this is
turns out to be even easier than finite random access lists (when using
non-uniform recursion from Okasaki's chapter 10) :)
...
data Imp a = Imp a (Imp (a,a)) deriving (Show)
instance Functor Imp where
   fmap h ~(Imp x xs) = Imp (h x) (fmap (\(x,y) -> (h x, h y)) xs)
update :: (a -> a) -> Position -> Imp a -> Imp a
update f 1 ~(Imp x xs) = Imp (f x) xs
update f n ~(Imp x xs) = Imp x $ update f2 (n `div` 2) xs
   where
   f2 ~(y,z) = if even n then (f y, z) else (y, f z)
Note that we can use irrefutable patterns without hesitation as there is
only one constructor.

Folding over an infinite thing is strange but here we are
...
fold :: (a -> b -> b) -> Imp a -> b
fold f ~(Imp x xs) = f x (fold f2 xs)
   where
   f2 (x,y) z = f x (f y z)
It's not so strange anymore when we realize that this can be used to
convert it into an infinite list
...
toList = fold (:)
The implementation of fromList is fun as well, so I won't tell it. As
fold and unfold can be defined generically for Mu f where f is a
functor, i wonder how to deal with it in the case of non-uniform recursion.

For splitStreams, the key-position-map is needed in both directions, so
we quickly define a bijective map
...
type BiMap a b	= (Map.Map a b, Map.Map b a)
switch :: BiMap a b -> BiMap b a
switch (x,y) = (y,x)
with the usual operations (empty, member, size etc.) omitted here.

Now comes splitStreams itself:
...
splitStreams :: Ord a => [(a,b)] -> [(a,[b])]
splitStreams xs =
   takeWhile (not . null . snd) $ toList $ splitStreams' empty xs
splitStreams' :: Ord a => BiMap a Position -> [(a,b)] -> Imp (a,[b])
splitStreams' bimap [] =
   fmap (\i -> (findWithDefault undefined i $ switch bimap,[])) $
       fromList [1..]
splitStreams' bimap ((a,b):xs) =
   update fun pos $ splitStreams' bimap' xs
   where
   fun ~(_,bs) = (a,b:bs)
   sz          = size bimap
   pos         = findWithDefault (sz+1) a bimap
   bimap'      =
      (if member a bimap then id else insert a (sz+1)) bimap
Note that update actually generates fresh constructors, so the structure
of our tails-Imp is not really static. But this is no problem as the
form of the constructors is completely known: there is only one.

Regards,
apfelmus

[Haskell-cafe] Re: Optimization problem

apfelmus＠quantentunnel.de