Using associated data types to create unpacked data structures

Hi all, Inspired by the generic maps example at http://www.haskell.org/haskellwiki/GHC/Indexed_types I tried to use associated data types to create a generic finite map that unpacks both the key and value into the leaf data constructor. This makes functions such as lookup faster as the key can be accessed directly instead than via an indirection. It also makes the data structure more space efficient (4 words less per key/value pair for weight balanced trees), which makes it possible to fit more data in main memory (and cache). Memory overhead is important when working with "Big Data" processing, where fitting as much data in memory as possible is important. Working with big data sets is something done daily at companies like Google, Microsoft, Yahoo, Twitter, Facebook, etc. We can achieve the above goals using an associated data type like so: {-# LANGUAGE MultiParamTypeClasses, TypeFamilies #-} module Ex where class Unbox k v where data Map k v :: * empty :: Map k v lookup :: k -> Map k v -> Maybe v insert :: k -> v -> Map k v -> Map k v and an instance instance Unbox Int Double where data Map Int Double = TipIntDouble | BinIntDouble {-# UNPACK #-} !Size {-# UNPACK #-} !Int {-# UNPACK #-} !Double !(Map Int Double) !(Map Int Double) -- implementation elided empty = undefined lookup k m = undefined insert k v m = undefined type Size = Int However, if we try to apply this method to large programs we run into problems: we need to defined instances for a large number of combinations of keys/values. We could generate a large number of instances, hoping that these will be enough for most users' needs, using Template Haskell or CPP. However, the potential number of instances is very large, about a hundred if you consider only Prelude types and tens of thousands if you include tuples. We cannot add instances for types not defined in base without adding a dependency on all libraries which data types we want to add instances for. Since we cannot define all instances up-front we'll have to rely on the user to create instances for the combinations she needs. This is tedious work for the user; most of the time the instance is an exact copy of the above instance for Int/Double, modulo renaming of the type arguments and the constructor names. Unfortunately our problems don't end here. If we assume for a second that the user writes the necessary boilerplate (perhaps using a Template Haskell function that generates it) there are still more problems ahead. It's quite likely that two different libraries wants an instance for the same types, and each declare one locally. However, now the poor user can't use both libraries as there are conflicting instances (or can she using some extension?) and imports always bring in instances. This problem exists for type classes in general but we only use ten or so type classes in most Haskell programs (e.g Functor, Monad, Eq, Ord, and Show) so it doesn't seem to be a big problem so far. What to do? It seems that associated data types might not be right tool for this problem. Could it be extended to work well for this use case? Can it be made to *scale* to large programs. Here's an idea: allow default implementations of associated data types, just like for methods class Unbox k v where data Map k v :: * empty :: Map k v lookup :: k -> Map k v -> Maybe v insert :: k -> v -> Map k v -> Map k v data Map Int Double = Tip | Bin {-# UNPACK #-} !Size {-# UNPACK #-} !k {-# UNPACK #-} !v !(Map k v) !(Map k v) -- implementation elided empty = undefined lookup k m = undefined insert k v m = undefined and export the definition in the interface file. This would allow instantiation of the type class without boilerplate instance Unbox Int Double -- no "body" The compiler would perhaps have to generate unique names for the constructors for this to work. This is not enough. We still have two problems left: * boilerplate instance declarations (but less so that before), and * instance collisions. Could we automatic create instances whenever the user mentions a type class? For example, if a program mentions f :: Map Int Double -> ... we know we need an instance for Int/Double and if we can't find one we derive one using the default definition. It is all the user needs to do when using the classic containers package. This would completely remove the boilerplate instance declarations. We could still use OverlappingInstances to allow the user to provide more specific instances (with a different implementation), if needed. This is akin to C++ template specialization. The compiler will need to help us with the instance collision problem in that it only generates on instance even if the same type parameters are used with the Map type in several different modules. Summary: While associated data types in theory allows us to create more efficient data structures, the feature doesn't seem to scale to large programs, for this use case. Cheers, Johan

On Wed, Aug 11, 2010 at 9:03 AM, Johan Tibell
However, if we try to apply this method to large programs we run into problems: we need to defined instances for a large number of combinations of keys/values. We could generate a large number of instances, hoping that these will be enough for most users' needs, using Template Haskell or CPP. However, the potential number of instances is very large, about a hundred if you consider only Prelude types and tens of thousands if you include tuples. We cannot add instances for types not defined in base without adding a dependency on all libraries which data types we want to add instances for.
This is more or less the type-level version of the "I have to write how many boilerplate instances?" problem that dogs typeclasses. I agree that it's quite painful, and that it effectively keeps type families from being nearly as practically useful as they could be. Like you, I'd love to see an effective solution.

On 11/08/2010 17:03, Johan Tibell wrote:
Inspired by the generic maps example at
http://www.haskell.org/haskellwiki/GHC/Indexed_types
I tried to use associated data types to create a generic finite map that unpacks both the key and value into the leaf data constructor.
What you're trying to do is have the compiler generate a whole module for you, including a datatype specialised to certain type paramters, and operations over that type. Just defining a few of the operations isn't enough: they need to be inlined everywhere, essentially you need to recompile Data.Map for each instance. So I agree it would be nice if this happened automatically, behind the scenes, by virtue of just mentioning "Map Int Double" (though it would still have to be a typeclass of course, so that you can write polymorphic functions over Maps). Automatic specialisation of this kind can be done by JIT runtimes (e.g. the .NET CLR), because there the code generation and caching of instances can be put under control of the runtime. Here we would have to do it in the compiler, and the difficulty is that the compiler needs to support separate compilation. Rather than try to solve this problem in one go, I would go for a low-tech approach for now: write a TH library to generate the code, and ask the user to declare the versions they need. To make a particular version, the user would say something like module MapIntDouble (module MapIntDouble) where import TibbeMagicMapGenerator make_me_a_map ... there's no type class of course, so you can't write functions that work over all specialised Maps. But this at least lets you generate optimised maps for only a little boilerplate, and get the performance boost you were after. Cheers, Simon

On Thu, Aug 12, 2010 at 11:28 AM, Simon Marlow
On 11/08/2010 17:03, Johan Tibell wrote:
Inspired by the generic maps example at
http://www.haskell.org/haskellwiki/GHC/Indexed_types
I tried to use associated data types to create a generic finite map that unpacks both the key and value into the leaf data constructor.
What you're trying to do is have the compiler generate a whole module for you, including a datatype specialised to certain type paramters, and operations over that type. Just defining a few of the operations isn't enough: they need to be inlined everywhere, essentially you need to recompile Data.Map for each instance.
There needs to be some amount of code generation, but much of the implementation can still be shared. I previously tried to defined the type class as {-# LANGUAGE MultiParamTypeClasses, TypeFamilies #-} module Ex2 where import Prelude hiding (lookup) data MapView k v = TipView | BinView {-# UNPACK #-} !Size !k !v !(Map k v) !(Map k v) class Unbox k v where data Map k v :: * tip :: Map k v bin :: Size -> k -> v -> Map k v -> Map k v -> Map k v view :: Map k v -> MapView k v type Size = Int lookup :: (Ord k, Unbox k v) => k -> Map k v -> Maybe v lookup k m = case view m of TipView -> Nothing BinView _ kx x l r -> case compare k kx of LT -> lookup k l GT -> lookup k r EQ -> Just x {-# INLINE lookup #-} Calling lookup from a different module at a know type gives exactly the Core you'd like to see (unpacked types, no MapView constructors). The compiler would still have to generate the associated data type instance and the method implementations. So I agree it would be nice if this happened automatically, behind the
scenes, by virtue of just mentioning "Map Int Double" (though it would still have to be a typeclass of course, so that you can write polymorphic functions over Maps). Automatic specialisation of this kind can be done by JIT runtimes (e.g. the .NET CLR), because there the code generation and caching of instances can be put under control of the runtime. Here we would have to do it in the compiler, and the difficulty is that the compiler needs to support separate compilation.
C++ supports automatic instantiation and separate compilation. We'd have to included the needed information in the .hi files so we can generate the right instances at the usage site. Perhaps there are other problems (that are somehow solved by the C++ compiler) that I'm not considering.
Rather than try to solve this problem in one go, I would go for a low-tech approach for now: write a TH library to generate the code, and ask the user to declare the versions they need. To make a particular version, the user would say something like
module MapIntDouble (module MapIntDouble) where import TibbeMagicMapGenerator make_me_a_map ...
there's no type class of course, so you can't write functions that work over all specialised Maps. But this at least lets you generate optimised maps for only a little boilerplate, and get the performance boost you were after.
This doesn't quite work though as two MapIntDouble defined in two different libraries are incompatible. This is essentially the same problem as with instance collisions. Cheers, Johan

On 12/08/2010 11:13, Johan Tibell wrote:
There needs to be some amount of code generation, but much of the implementation can still be shared. I previously tried to defined the type class as
{-# LANGUAGE MultiParamTypeClasses, TypeFamilies #-} module Ex2 where
import Prelude hiding (lookup)
data MapView k v = TipView | BinView {-# UNPACK #-} !Size !k !v !(Map k v) !(Map k v)
class Unbox k v where data Map k v :: * tip :: Map k v bin :: Size -> k -> v -> Map k v -> Map k v -> Map k v view :: Map k v -> MapView k v
type Size = Int
lookup :: (Ord k, Unbox k v) => k -> Map k v -> Maybe v lookup k m = case view m of TipView -> Nothing BinView _ kx x l r -> case compare k kx of LT -> lookup k l GT -> lookup k r EQ -> Just x {-# INLINE lookup #-}
Calling lookup from a different module at a know type gives exactly the Core you'd like to see (unpacked types, no MapView constructors).
I'm not sure I want lookup (and other operations) to be inlined at every call site though.
Rather than try to solve this problem in one go, I would go for a low-tech approach for now: write a TH library to generate the code, and ask the user to declare the versions they need. To make a particular version, the user would say something like
module MapIntDouble (module MapIntDouble) where import TibbeMagicMapGenerator make_me_a_map ...
there's no type class of course, so you can't write functions that work over all specialised Maps. But this at least lets you generate optimised maps for only a little boilerplate, and get the performance boost you were after.
This doesn't quite work though as two MapIntDouble defined in two different libraries are incompatible. This is essentially the same problem as with instance collisions.
But you get to choose the module name, so you can avoid collisions by using qualified names. Cheers, Simon

On Thu, Aug 12, 2010 at 12:25 PM, Simon Marlow
I'm not sure I want lookup (and other operations) to be inlined at every call site though.
That's a good point. If inlining isn't a the right option in every case we would have to duplicate the implementation. I had a look at how C++ compilers deals with this problem. They do so by duplicating the implementation in each module that instantiates the template, at least according to this document: http://gcc.gnu.org/onlinedocs/gcc/Template-Instantiation.html As I understand it the generated code is not exported from the translation unit so there are no collisions at link time. We could do the same if we could force the generated type class instance to not be exported from the module.
This doesn't quite work though as two MapIntDouble defined in two
different libraries are incompatible. This is essentially the same problem as with instance collisions.
But you get to choose the module name, so you can avoid collisions by using qualified names.
Sorry, I was being unclear. The problem is that library A can't pass a MapIntDouble to library B as the types defined by the two libraries aren't compatible. Cheers, Johan

On 12/08/2010 12:28, Johan Tibell wrote:
This doesn't quite work though as two MapIntDouble defined in two different libraries are incompatible. This is essentially the same problem as with instance collisions.
But you get to choose the module name, so you can avoid collisions by using qualified names.
Sorry, I was being unclear. The problem is that library A can't pass a MapIntDouble to library B as the types defined by the two libraries aren't compatible.
That's true, but is it really going to cause many problems in practice? Which libraries have functions that take a Map argument, other than the Map library itself? Cheers, Simon

On Thu, Aug 12, 2010 at 1:38 PM, Simon Marlow
On 12/08/2010 12:28, Johan Tibell wrote:Sorry, I was being unclear. The problem is that library A can't pass a
MapIntDouble to library B as the types defined by the two libraries aren't compatible.
That's true, but is it really going to cause many problems in practice? Which libraries have functions that take a Map argument, other than the Map library itself?
I've seen a few on Hackage, but I haven't run any comprehensive study. The same problem applies to tuples, lists, sets and other data structure as well. Anecdotally you see more uses of maps and other data structures as programs get larger but by its very nature Hackage is a place for libraries, not large programs. Cheers, Johan

On 12 August 2010 12:28, Johan Tibell
As I understand it the generated code is not exported from the translation unit so there are no collisions at link time. We could do the same if we could force the generated type class instance to not be exported from the module.
Minor point: I think the standard practice is to export the code, but mark it with an attribute that tells the linker to drop any duplicate copies of the code associated with the name. So if I instantiate vector<int> in A.cpp and B.cpp, then both A.o and B.o contain the code for vector<int>, but upon linking these get commoned up so the final executable only has one copy of the code (same mechanism as e.g. COMDAT folding). This produces a small space saving over the simple "instantiate at every call site" model. It is likely that GHC could do slightly better because we could see whether any *modules we were dependent on* had previously generated the necessary specialisation, and reuse that code directly if it had. We would still need the COMDAT stuff to improve situations where we depend on two modules that have independently generated the same specialisation, and to deal with cycles in the module dependency graph. None of the mechanism for making this stuff happen is available at the moment. It's an engineering problem that just needs time to be thrown at it. Cheers, Max

On Thu, Aug 12, 2010 at 1:47 PM, Max Bolingbroke wrote: None of the mechanism for making this stuff happen is available at the
moment. It's an engineering problem that just needs time to be thrown
at it. If we could figure out which mechanisms are needed we would have a better
idea of how much work it would be. If some of the trickier
theoretical/design issues are resolved perhaps a motivated set of
individuals could take it on.
Cheers,
Johan

C++ template instantiations are exported as weak linker symbols. It's just that the linker elides all of the implementations.
Sent from my Verizon Wireless BlackBerry
-----Original Message-----
From: Johan Tibell

On Thu, Aug 12, 2010 at 9:27 PM,
C++ template instantiations are exported as weak linker symbols. It's just that the linker elides all of the implementations.
Yes and dead code elimination should also be able to get rid of much of the code duplication even before it reaches the linker.

On 12 August 2010 20:31, Johan Tibell
Yes and dead code elimination should also be able to get rid of much of the code duplication even before it reaches the linker.
I don't think dead code elimination will help, because presumably you want to generate specialisations on demand. This means that specialisations will only be generated if they have at least one caller. Unless the caller is itself dead, it will be entirely up to the linker to common-up stuff and reduce the code bloat. Cheers, Max

On Fri, Aug 13, 2010 at 1:14 AM, Max Bolingbroke wrote: Yes and dead code elimination should also be able to get rid of much of On 12 August 2010 20:31, Johan Tibell code duplication even before it reaches the linker. I don't think dead code elimination will help, because presumably you
want to generate specialisations on demand. This means that
specialisations will only be generated if they have at least one
caller. Unless the caller is itself dead, it will be entirely up to
the linker to common-up stuff and reduce the code bloat. I guess I was thinking that if you naively duplicated the whole Map module
(instead of individual functions) once per caller then much of the generated
code would become dead as the caller is unlikely to use all the functions in
the module.
Cheers,
Johan

re ...
As I understand it the generated code is not exported from the translation unit so there are no collisions at link time. We could do the same if we could force the generated type class instance to not be exported from the module.
I have encountered several occasions when I wished control over what instances get imported or exported from a module. I have wondered and perhaps someone can explain: what are the issues in explicit control of instance export and import? (apart from defining an appropriate syntax) John

On 13 August 2010 00:13, John Lask
I have wondered and perhaps someone can explain: what are the issues in explicit control of instance export and import? (apart from defining an appropriate syntax)
IMHO main problem with this (and related feature requests like local instance definitions) is a sort of incoherency: """ module A where data Foo = Yes | No deriving (Eq) """ """ module B ( mySet, .. don't export Ord instance .. ) where import A instance Ord Foo where Yes `compare` Yes = EQ No `compare` No = EQ Yes `compare _ = LT _ `compare` _ = GT mySet = Data.Set.fromList [Yes, No] """ """ module Main where import A import B instance Ord Foo where Yes `compare` Yes = EQ No `compare` No = EQ Yes `compare` _ = GT _ `compare` _ = LT main = do print $ valid mySet print $ S.fromList [Yes, No] == mySet """ This program will likely print "False" twice. It is also highly likely that trying to use other Set operations on mySet will result in an error, because the ordering that is being used on Foo in calls from Main is different from that used to construct the set. This is a very subtle error, and could be tricky to debug. Kiselyov and Shan have proposed a sort of local instance definition that appears not to give rise to these problems (because its restricted to only instances that mention a skolem variable, so they can't "leak"). See http://okmij.org/ftp/Haskell/types.html#Prepose for details. This may capture at least some of the use cases that people want instance export control for. Cheers, Max

Hi, On 12.08.2010, at 12:25, Simon Marlow wrote:
On 12/08/2010 11:13, Johan Tibell wrote:
There needs to be some amount of code generation, but much of the implementation can still be shared. I previously tried to defined the type class as
{-# LANGUAGE MultiParamTypeClasses, TypeFamilies #-} module Ex2 where
import Prelude hiding (lookup)
data MapView k v = TipView | BinView {-# UNPACK #-} !Size !k !v !(Map k v) !(Map k v)
class Unbox k v where data Map k v :: * tip :: Map k v bin :: Size -> k -> v -> Map k v -> Map k v -> Map k v view :: Map k v -> MapView k v
type Size = Int
lookup :: (Ord k, Unbox k v) => k -> Map k v -> Maybe v lookup k m = case view m of TipView -> Nothing BinView _ kx x l r -> case compare k kx of LT -> lookup k l GT -> lookup k r EQ -> Just x {-# INLINE lookup #-}
Calling lookup from a different module at a know type gives exactly the Core you'd like to see (unpacked types, no MapView constructors).
I'm not sure I want lookup (and other operations) to be inlined at every call site though.
Wouldn't it be better to enable specialize pragmas from outside the defining module. Then a user of the Ex2 could declare his need for an optimized version of lookup for his particular type. Of course, that would require the inclusion of lookup into the .hi file. -- Jean

On Thu, Aug 12, 2010 at 11:28 AM, Simon Marlow
Rather than try to solve this problem in one go, I would go for a low-tech approach for now: write a TH library to generate the code, and ask the user to declare the versions they need. To make a particular version, the user would say something like
module MapIntDouble (module MapIntDouble) where import TibbeMagicMapGenerator make_me_a_map ...
there's no type class of course, so you can't write functions that work over all specialised Maps. But this at least lets you generate optimised maps for only a little boilerplate, and get the performance boost you were after.
To get a better idea of how many specialized maps the user would have to create under this scheme I ran an analysis of the Chromium codebase [1]. The Chromium codebase is not small but some companies have codebases which are several order of magnitudes larger, which makes the results below more of a lower bound than an upper bound on the number of specialized maps one might need in a program. $ git clone http://src.chromium.org/git/chromium.git Initialized empty Git repository in /tmp/chromium/.git/ remote: Counting objects: 548595, done. remote: Compressing objects: 100% (167063/167063), done. remote: Total 548595 (delta 401993), reused 477011 (delta 343049) Receiving objects: 100% (548595/548595), 1.02 GiB | 24.44 MiB/s, done. Resolving deltas: 100% (401993/401993), done. $ cd chromium $ find . -name \*.h -o -name \*.cc -exec egrep -o "map<[^,]+, ?[^>]+>" {} \; | sort -u | wc -l 220 $ find . -name \*.h -o -name \*.cc -exec w -l {} \; | awk '{tot=tot+$1} END {print tot}' 81328 So in a code base of about 80 KLOC there are 220 unique key/value combinations. While the numbers might not translate exactly to Haskell it still indicates that the number of modules a user would have to create (and put somewhere in the source tree) would be quite large. 1. http://src.chromium.org/git/chromium.git Cheers, Johan

You don't want to go overboard here.
1. You *want* a distinct blob of lookup code for each different key type, because you really do want a different lookup structure for each
2. For the most part you *dont want* a different blob of lookup code for each value type, because almost all of them are represented uniformly by a pointer.
So it's be silly to generate all possible combinations of types. The exception to (2) is that you want different code for a handful of types that you want to unbox into the tree structure itself: Int#, Float# etc. It would be good to design a convenient way to do that. It's nothing directly to do with associated types. We'd like to allow Maybe Int#, say, but we don't at the moment because that data structure would really be represented differently. Some kind of data type and code cloning (a la C++) is probably the right thing. This is what Max meant by "just engineering" but it would require careful thought and design.
Simon
From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On Behalf Of Johan Tibell
Sent: 12 August 2010 16:56
To: Simon Marlow
Cc: glasgow-haskell-users
Subject: Re: Using associated data types to create unpacked data structures
On Thu, Aug 12, 2010 at 11:28 AM, Simon Marlow

On Thu, Aug 12, 2010 at 6:07 PM, Simon Peyton-Jones
1. You **want** a distinct blob of lookup code for each different key type, because you really do want a different lookup structure for each
There are two reasons to want different blobs of code for the lookup function: 1. You want a different data structure (e.g. Patricia trees or weight balanced trees) for different key types. 2. You want the key type unboxed (e.g. Int#) to reduce indirection/reboxing and that way speed up the function by some constant factor. The second could perhaps be achieved using SPECIALIZE pragmas, if they could be given in modules that uses the data type and not only in the module that defines the data type.
2. For the most part you **dont want** a different blob of lookup code for each value type, because almost all of them are represented uniformly by a pointer.
I don't know if I want a different blob of lookup code for each value type. It seems to be that having a different blob of lookup code even for the value type could avoid some reboxing of the value when the caller will immediately unbox the value again. What I definitely want is different data representation for each key/value type to avoid the four word overhead (two pointers and two constructors) for key/value types that can be unboxed into the data structures data constructors.
So it’s be silly to generate all possible combinations of types.
My proposal is to do the generation lazily, only for the combination of types that are actually used, just like C++ does. If we can get the same result without code duplication that would be better.
The exception to (2) is that you want different code for a handful of types that you want to unbox into the tree structure itself: Int#, Float# etc. It would be good to design a convenient way to do that. It’s nothing directly to do with associated types. We’d like to allow Maybe Int#, say, but we don’t at the moment because that data structure would really be represented differently. Some kind of data type and code cloning (a la C++) is probably the right thing. This is what Max meant by “just engineering” but it would require careful thought and design.
The number of data types you might want to unbox may be larger than you think: All the Int types, all the Word types, Floats, Doubles, tuples, and user defined records (which are technically tuples I guess). Anything the UNPACK pragma applies to really. The unboxing into the data type constructors is really what I'm after, the specialization of e.g. the lookup function to avoid reboxing seems to follow naturally from that. Cheers, Johan
participants (8)
-
Bryan O'Sullivan
-
Jean-Marie Gaillourdet
-
Johan Tibell
-
John Lask
-
Max Bolingbroke
-
scooter.phd@gmail.com
-
Simon Marlow
-
Simon Peyton-Jones