Re: Generic tries (long)

21 Jun 2008

      Hi Jamie,

thanks for doing this---I think that the functions in the API look
good.  Of course, one can always think of more :-)  One that I often
find useful when working with Data.Map is a function that removes a
key from a map and also returns the associated value---a kind of
destructive lookup.

I think that the library would be better if you exposed a
non-overloaded API for the map(s?) that you plan to implement.  This
is useful because (i) most of the type I anticipiate using a single
type of map, and concrete types lead to more accurate types in type
inference, (ii) it makes the API Haskell'98 and hence likely to work
on most Haskell implementations out there.   If you really think that
the overloading is useful, then you could provide separate modules
that provide the class+instances in terms of the non-overloaded API
and add a cabal flag to turn these on/off.

Also, as others mentioned, using Int# in an API is not great because
this is a GHC internal type.  If you found situatuins where GHC is not
unboxing something please send a mail to the GHC team---they are
usually very good at fixing these sorts of things.  Adrian mentioned
that you use CPP to control the use of Int# but this does not help if
the Int# leaks through the API:  (i) I could still have programs that
work in Hugs and fail to compile in GHC (and vice versa), (ii) if I
want to use your map library, I would have to compile all my programs
with the CPP extensions, which is not nice.

Hope this helps,
-Iavor

On 6/16/08, Jamie Brandon  wrote:
...
Hi, I'm writing a library for generic tries for the Summer of Code.
 The main point of this post is to get some feedback on the api but
 I'll briefly explain the idea.
The point of a trie is to exploit the recursive nature of ADTs to save
 on expensive key comparisons and reduce space consumption. Hinze'
 original formulation is very elegant but results in very deep
 structures and is fairly inefficient. The normal trie optimisations
 (concatenating singleton maps, mainly) cant be applied to the generic
 version.
I intend instead to encode ADTs as lists. This encoding can range from
 a simple walk of the ADT to creating a compressed, bit packing
 representation. The resulting tradeoff between encoding time and space
 usage should make the design fairly flexible. This will look something
 like:
class Serial k where
-- | Flattened form of key consists of a list of Nodes
 -- Node will be Int or UArray Int for compressed implementations
 type Node k
-- | Flatten to a list of Nodes
 serialise :: k -> [Node k]
-- | Reconstruct from a list of Nodes
 unserialise :: [Node k] -> k
The api below is mostly cribbed from Adrian Hey's initial design.
 Guarantees about ordering will probably vary between maps. All
 ascending have descending version too. Strict versions of functions
 will be written where appropriate, I've omitted them here for brevity.
 Key reconstruction is likely to be expensive so it may make more sense
 to seperate foldrKeys and friends into a seperate class.
Adrian has written instance of GMap for lists, UInts and Ord types so
 I can declare
 instance (Serial k) => GMap (ListGMap (Node k)) k where ... etc
I should have a spot on code.haskell.org soon, at which point I'll put
 up a Haddock page with the most up to date version of the api.
class GMap map where
type k
-- | The empty map.
 empty :: map a
-- | Create a map with a single association.
 singleton :: k -> a -> map a
-- | Create a map from a list of associations which /must/ be in
 ascending order of keys
 -- (with /no/ duplicate keys). If in doubt use one of the safer (but
 slower) 'fromAssocs' functions.
 fromAssocsAscending :: [(k,a)] -> map a
-- | Return 'True' if the map contains no associations.
 isEmpty :: map a -> Bool
-- | Return 'True' if the map contains exactly one association.
 isSingleton :: map a -> Bool
-- | Return the value associated with the supplied key (if any).
 lookup :: k -> map a -> Maybe a
-- | Insert a new association in the map if there is currently no
 value associated with the key.
 -- If there is a value associated with the key then replace it with
 the result of
 -- applying the supplied function to that value.
 insert :: (a -> a) -> k -> a -> map a -> map a
-- | Delete the association for the supplied key (if any).
 delete :: k -> map a -> map a
-- | This is a combined insert\/modify\/delete operation. The argument
 to the supplied function
 -- is ('Just' a) if there is a value (a) associated with the supplied
 key, otherwise 'Nothing'.
 -- If the return value is ('Just' a'), a' becomes the new value
 associated with the supplied key.
 -- If the return value is 'Nothing', the association for the supplied
 key (if any) is deleted.
 alter :: (Maybe a -> Maybe a) -> k -> map a -> map a
-- | Evaluate the union of two maps. If the maps contain common keys
 then combine the
 -- values associated with those keys using the supplied function. The
 value arguments
 -- to this function are supplied in the same order as the map arguments.
 union :: (a -> a -> a) -> map a -> map a -> map a
-- | Evaluate the intersection of two maps, combining common
 associations using the supplied function.
 intersection :: (a -> b -> c) -> map a -> map b -> map c
-- | Evaluate the difference between two maps. For any key occuring in
 the second map,
 -- the corresponding association (if any) is deleted from the first map.
 -- The associated values in the second map are irrelevant.
 difference :: map a -> map b -> map a
-- | Returns true if the keys in the first map are a subset of the
 keys in the second map.
 -- (This includes the case where the key sets are identical). Note
 that this function does
 -- not examine the associated values (which are irrelevant). See
 'isSubmapOf' if you
 -- do want associated values examined.
 isSubsetOf :: map a -> map b -> Bool
-- | Returns true if the keys in the first map are a subset of the
 keys in the second map
 -- and the corresponding function always returns true when applied to
 the values associated
 -- with matching keys.
 isSubmapOf :: (a -> b -> Bool) -> map a -> map b -> Bool
-- | Apply the supplied function to every associated value in the map.
 map :: (a -> b) -> map a -> map b
-- | Apply the supplied function to every association in the map, and
 use the result
 -- as the new associated value for the corresponding key.
 mapWithKey :: (k -> a -> b) -> map a -> map b
-- | Delete associations for which the supplied predicate returns
 'False' when applied to
 -- the associated value.
 filter :: (a -> Bool) -> map a -> map a
-- | Fold right over the list of elements in ascending order of keys.
 -- See 'foldrElemsAscending'' for a strict version of this function.
 foldrElemsAscending :: (a -> b -> b) -> map a -> b -> b
-- | Fold right over the list of keys in ascending order.
 -- See 'foldrKeysAscending'' for a strict version of this function.
 foldrKeysAscending :: (k -> b -> b) -> map a -> b -> b
-- | Fold right over the list of associations in ascending order of keys.
 -- See 'foldrAssocsAscending'' for a strict version of this function.
 foldrAssocsAscending :: (k -> a -> b -> b) -> map a -> b -> b
-- | Fold over elements in un-specified order using /unboxed/ Int
 accumulator (with GHC).
 -- Defaults to boxed Int for other Haskells. Typically used for
 counting functions.
 -- Implementations are free to traverse the map in any order.
 -- The folded function is always applied strictly.
 foldElemsUINT :: (a -> UINT -> UINT) -> map a -> UINT -> UINT
In addition there a few functions which are useful for groups of maps
 or nested maps.
-- | Add the number of associations in a map to the supplied /unboxed/
 Int (with GHC).
 -- Defaults to boxed Int for other Haskells.
 addSize :: map a -> UINT -> UINT
-- | Find the value associated with the supplied key (if any) and
 return the result
 -- of applying the supplied continuation function to that value.
 Useful for nested lookup.
 lookupCont :: (a -> Maybe b) -> k -> map a -> Maybe b
-- | Reject empty maps (return Nothing).
 nonEmpty :: map a -> Maybe (map a)
The following functions are useful internally as most of the maps are
 defined in terms of simpler maps. Also see this thread on the need for
 unionMaybe.
-- | Similar to 'insert', but the association is deleted if the
 supplied function returns 'Nothing'.
 -- (The supplied function is always applied strictly.)
 insertMaybe :: (a -> Maybe a) -> k -> a -> map a -> map a
-- | Find the value associated with the supplied key (if any) and
 apply the supplied function
 -- to that value. Delete the association if the result is 'Nothing'.
 Replace the old value with
 -- the new value if the result is ('Just' something).
 -- (The supplied function is always applied strictly.)
 deleteMaybe :: (a -> Maybe a) -> k -> map a -> map a
-- | Evaluate the union of two maps, but delete combined associations
 from the result map
 -- if the combining function returns 'Nothing'.
 -- (The combining function is always applied strictly.)
 unionMaybe :: (a -> a -> Maybe a) -> map a -> map a -> map a
-- | Evaluate the intersection of two maps, but delete combined
 associations from the result map
 -- if the combining function returns 'Nothing'.
 -- (The combining function is always applied strictly.)
 intersectionMaybe :: (a -> b -> Maybe c) -> map a -> map b -> map c
-- | Difference with a combining function. If the combining function returns
 -- @Just a@ then the corresponding association is not deleted from the
 result map
 -- (it is retained with @a@ as the associated value).
 differenceMaybe :: (a -> b -> Maybe a) -> map a -> map b -> map a
-- | Apply the supplied function to every associated value in the map.
 -- If the result is 'Nothing' then the delete the corresponding association.
 -- (The supplied function is always applied strictly.)
 mapMaybe :: (a -> Maybe b) -> map a -> map b
Thanks
Jamie
 _______________________________________________
 Libraries mailing list
 Libraries@haskell.org
 http://www.haskell.org/mailman/listinfo/libraries

Re: Generic tries (long)

Iavor Diatchki