How to avoid repeating code

Hi, I created a Data.MultiMap module based on Data.Map and Data.Set like this: data MultiMap k v = MultiMap (Map k (Set v)) and a Data.IntMultiMap module based on Data.IntMap and data.IntSet like this: data IntMultiMap = IntMultiMap (IntMap IntSet) For example the functions to add a value I wrote are: For MultiMap: addValue :: k -> v -> MultiMap k v -> MultiMap k v addValue k v (MultiMap m) = MultiMap $ Map.insertWith (\new old -> Set.insert v old) k (Set.singleton v) m For IntMultiMap: addValue :: Int -> Int -> IntMultiMap -> IntMultiMap addValue k v (IntMultiMap m) = IntMultiMap $ IntMap.insertWith (\new old -> IntSet.insert v old) k (IntSet.singleton v) m Both modules look almost the same, with the same documentation, same behavior, same function names but with different type signatures. Is there a way to make this simpler? The same thing happens to the modules that are using MultiMap and IntMultiMap, I have to write two versions of each. Thanks! -- Federico Mastellone Computer Science Engineer - ITBA ".. there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult." Tony Hoare, 1980 ACM Turing Award Lecture.

On Thursday 26 May 2011 02:23:06, Federico Mastellone wrote:
Hi,
I created a Data.MultiMap module based on Data.Map and Data.Set like this:
data MultiMap k v = MultiMap (Map k (Set v))
and a Data.IntMultiMap module based on Data.IntMap and data.IntSet like this:
data IntMultiMap = IntMultiMap (IntMap IntSet)
Both of these would better be newtypes instead of data, I think. Using data incurs some run-time overhead (the newtype doesn't exist at run- time, only during compile-time [type checking phase], so it's a strictly controlled type alias in practice, making it easier [or possible at all] to apply optimisations available for the underlying type) due to the extra indirections via the constructor and introduces the new value (MultiMap _|_), which complicates strictness analysis and optimisations in general.
For example the functions to add a value I wrote are:
For MultiMap: addValue :: k -> v -> MultiMap k v -> MultiMap k v addValue k v (MultiMap m) = MultiMap $ Map.insertWith (\new old -> Set.insert v old) k (Set.singleton v) m
For IntMultiMap: addValue :: Int -> Int -> IntMultiMap -> IntMultiMap addValue k v (IntMultiMap m) = IntMultiMap $ IntMap.insertWith (\new old -> IntSet.insert v old) k (IntSet.singleton v) m
Both modules look almost the same, with the same documentation, same behavior, same function names but with different type signatures.
Well, we have the same situation with Map/IntMap and Set/IntSet, so
Is there a way to make this simpler?
Not a really good one (at least, none I know).
The same thing happens to the modules that are using MultiMap and IntMultiMap, I have to write two versions of each.
You can reduce the code duplication at the use sites with a type class, {-# LANGUAGE TypeFamilies #-} class MultiMapClass m where type Key m type Value m empty :: m singleton :: Key m -> Value m -> m addValue :: Key m -> Value m -> m -> m ... instance (Ord k, Ord v) => MultiMapClass (MultiMap k v) where type Key (MultiMap k v) = k type Value (MultiMap k v) = v empty = MultiMap Map.empty ... instance MultiMapClass IntMultiMap where type Key IntMultiMap = Int type Value IntMultiMap = Int empty = IntMultiMap IntMap.empty ...
Thanks!

On Thu, May 26, 2011 at 8:47 AM, Daniel Fischer < daniel.is.fischer@googlemail.com> wrote:
On Thursday 26 May 2011 02:23:06, Federico Mastellone wrote:
Hi,
I created a Data.MultiMap module based on Data.Map and Data.Set like this:
data MultiMap k v = MultiMap (Map k (Set v))
and a Data.IntMultiMap module based on Data.IntMap and data.IntSet like this:
data IntMultiMap = IntMultiMap (IntMap IntSet)
Both of these would better be newtypes instead of data, I think. Using data incurs some run-time overhead (the newtype doesn't exist at run- time, only during compile-time [type checking phase], so it's a strictly controlled type alias in practice, making it easier [or possible at all] to apply optimisations available for the underlying type) due to the extra indirections via the constructor and introduces the new value (MultiMap _|_), which complicates strictness analysis and optimisations in general.
Thanks for the tip, I use newtype where I can but I thought it would be simpler here to explain my problem using data. Anyway, now I know better why I should use newtype instead.
For example the functions to add a value I wrote are:
For MultiMap: addValue :: k -> v -> MultiMap k v -> MultiMap k v addValue k v (MultiMap m) = MultiMap $ Map.insertWith (\new old -> Set.insert v old) k (Set.singleton v) m
For IntMultiMap: addValue :: Int -> Int -> IntMultiMap -> IntMultiMap addValue k v (IntMultiMap m) = IntMultiMap $ IntMap.insertWith (\new old -> IntSet.insert v old) k (IntSet.singleton v) m
Both modules look almost the same, with the same documentation, same behavior, same function names but with different type signatures.
Well, we have the same situation with Map/IntMap and Set/IntSet, so
Yes, both pairs, (Set, InSet), and (Map, IntMap), have the exact same problem as my MultiMap module. But I think that as your solution involves using non-standard extensions we could not solve this problem in GHC's libraries. Am I OK?
Is there a way to make this simpler?
Not a really good one (at least, none I know).
The same thing happens to the modules that are using MultiMap and IntMultiMap, I have to write two versions of each.
You can reduce the code duplication at the use sites with a type class,
{-# LANGUAGE TypeFamilies #-}
Is this a mostly experimental extension or I can use it safely? Is it used around the GHC packages?
class MultiMapClass m where type Key m type Value m empty :: m singleton :: Key m -> Value m -> m addValue :: Key m -> Value m -> m -> m ...
instance (Ord k, Ord v) => MultiMapClass (MultiMap k v) where type Key (MultiMap k v) = k type Value (MultiMap k v) = v empty = MultiMap Map.empty ...
instance MultiMapClass IntMultiMap where type Key IntMultiMap = Int type Value IntMultiMap = Int empty = IntMultiMap IntMap.empty ...
I'm going to read about this extension, try it and comment about it here.
Thanks!
Thank you very much for your complete response, really helpful! -- Federico Mastellone Computer Science Engineer - ITBA ".. there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult." Tony Hoare, 1980 ACM Turing Award Lecture.

Now I have a new problem, it's getting really difficult to program
generically and to create highly parameterized libraries.
So far so good with type families, but when I want to return a generic Set
for the getValues function and provide a default implementation for
getValuesCount function I don't know how to do it, I don't even know if it
is possible.
newtype MultiMap k v = MultiMap (Map.Map k (Set.Set v))
newtype IntMultiMap = IntMultiMap (IntMap.IntMap IntSet.IntSet)
class MultiMapClass m where
type Key m
type Value m
empty :: m
addValue :: Key m -> Value m -> m -> m
getValues :: Key m -> m -> Set (Value m)
getValueCount :: Key m -> m -> Int
getValueCount k m = Set.size $ getValues k m
instance (Ord k, Ord v) => MultiMapClass (MultiMap k v) where
type Key (MultiMap k v) = k
type Value (MultiMap k v) = v
empty = MultiMap Map.empty
addValue k v m = ..........
getValues k (MultiMap map) = map Map.! k
instance MultiMapClass IntMultiMap where
type Key IntMultiMap = Int
type Value IntMultiMap = Int
empty = IntMultiMap IntMap.empty
addValue k v m = ..........
getValues k (IntMultiMap map) = map IntMap.! k
On Thu, May 26, 2011 at 1:47 PM, Federico Mastellone
On Thu, May 26, 2011 at 8:47 AM, Daniel Fischer < daniel.is.fischer@googlemail.com> wrote:
On Thursday 26 May 2011 02:23:06, Federico Mastellone wrote:
Hi,
I created a Data.MultiMap module based on Data.Map and Data.Set like this:
data MultiMap k v = MultiMap (Map k (Set v))
and a Data.IntMultiMap module based on Data.IntMap and data.IntSet like this:
data IntMultiMap = IntMultiMap (IntMap IntSet)
Both of these would better be newtypes instead of data, I think. Using data incurs some run-time overhead (the newtype doesn't exist at run- time, only during compile-time [type checking phase], so it's a strictly controlled type alias in practice, making it easier [or possible at all] to apply optimisations available for the underlying type) due to the extra indirections via the constructor and introduces the new value (MultiMap _|_), which complicates strictness analysis and optimisations in general.
Thanks for the tip, I use newtype where I can but I thought it would be simpler here to explain my problem using data. Anyway, now I know better why I should use newtype instead.
For example the functions to add a value I wrote are:
For MultiMap: addValue :: k -> v -> MultiMap k v -> MultiMap k v addValue k v (MultiMap m) = MultiMap $ Map.insertWith (\new old -> Set.insert v old) k (Set.singleton v) m
For IntMultiMap: addValue :: Int -> Int -> IntMultiMap -> IntMultiMap addValue k v (IntMultiMap m) = IntMultiMap $ IntMap.insertWith (\new old -> IntSet.insert v old) k (IntSet.singleton v) m
Both modules look almost the same, with the same documentation, same behavior, same function names but with different type signatures.
Well, we have the same situation with Map/IntMap and Set/IntSet, so
Yes, both pairs, (Set, InSet), and (Map, IntMap), have the exact same problem as my MultiMap module.
But I think that as your solution involves using non-standard extensions we could not solve this problem in GHC's libraries. Am I OK?
Is there a way to make this simpler?
Not a really good one (at least, none I know).
The same thing happens to the modules that are using MultiMap and IntMultiMap, I have to write two versions of each.
You can reduce the code duplication at the use sites with a type class,
{-# LANGUAGE TypeFamilies #-}
Is this a mostly experimental extension or I can use it safely? Is it used around the GHC packages?
class MultiMapClass m where type Key m type Value m empty :: m singleton :: Key m -> Value m -> m addValue :: Key m -> Value m -> m -> m ...
instance (Ord k, Ord v) => MultiMapClass (MultiMap k v) where type Key (MultiMap k v) = k type Value (MultiMap k v) = v empty = MultiMap Map.empty ...
instance MultiMapClass IntMultiMap where type Key IntMultiMap = Int type Value IntMultiMap = Int empty = IntMultiMap IntMap.empty ...
I'm going to read about this extension, try it and comment about it here.
Thanks!
Thank you very much for your complete response, really helpful!
-- Federico Mastellone Computer Science Engineer - ITBA
".. there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."
Tony Hoare, 1980 ACM Turing Award Lecture.
-- Federico Mastellone Computer Science Engineer - ITBA ".. there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult." Tony Hoare, 1980 ACM Turing Award Lecture.

On Friday 27 May 2011 19:23:54, Federico Mastellone wrote:
Now I have a new problem, it's getting really difficult to program generically and to create highly parameterized libraries.
Yes.
So far so good with type families, but when I want to return a generic Set for the getValues function and provide a default implementation for getValuesCount function I don't know how to do it, I don't even know if it is possible.
You can't return a generic Set in getValues (unless you jump through a lot of hoops, IntMultiMap has IntSets, so you'd have to transform those to Sets, ... yuck) and default implementations wouldn't be possible [or at least rather horrible], but
newtype MultiMap k v = MultiMap (Map.Map k (Set.Set v))
newtype IntMultiMap = IntMultiMap (IntMap.IntMap IntSet.IntSet)
class (Ord (Elem c)) => Collection c where type Elem c emtpy :: c singleton :: Elem c -> c size :: c -> Int null :: c -> Bool member :: Elem c -> c -> Bool ...
class MultiMapClass m where
class (Collection (Coll m)) => MultiMapClass m where
type Key m type Value m
type Coll m
empty :: m addValue :: Key m -> Value m -> m -> m getValues :: Key m -> m -> Set (Value m)
getValues :: Key m -> m -> Coll m
getValueCount :: Key m -> m -> Int getValueCount k m = Set.size $ getValues k m
getValueCount k m = size (getValues k m) Should work or be possible to make working. But things get more complicated the more generic functionality you want to add. It would probably be possible to get a more elegant implementation if you designed the library to use a class-based interface from the ground up (take a look at the edison library [EdisonAPI and EdisonCore on hackage] for an idea of how to structure the classes - edison is old, it was created long before type families or functional dependencies were available, I don't know what new stuff was incorporated into it, I suspect not too much, so you could probably improve the design with the new powerful tools at your hands, but as a source of inspiration, it should still serve well). The problem is that, as a rule of thumb, class based genericity costs performance. And the point of the containers package was to provide performant data structures, the genericity was wilfully forsaken. So, perhaps writing a generic envelope for the containers types might not be the best option. It could be better to start from a clean sheet. [disclaimer: I have not thought about how to do that, nor looked at the API or implementation from that point of view, so my hunch may be quite wrong.] Cheers, Daniel

Thanks again Daniel!! I've been looking at the Edison package, it has a big class hierarchy without default implementations using functional dependencies and I found tidier doing something similar with associated types. Writing default implementations continues to be troublesome. Here is an example adding an elems function to the Collection class elems :: c -> [Elem c] I can't write on the MultiMapClass class a default implementation for getValueList like this getValueList :: Key m -> m -> [Value m] getValueList k m = elems $ getValues k m because elems returns [Elem (Coll m)] so I have to write getValueList :: Key m -> m -> [Elem (Coll m)] and it doesn't matter that this means the same for the implementations I have Elem (Set.Set a) ~ Value (MultiMap k v) and Elem IntSet.IntSet ~ Value IntMultiMap Now when using classes like this you need to need to think twice when coding! Typing becomes much more complicated. I don't know if I am too object oriented or is the lack of IDEs but reusing code, grouping together code with similarities and managing many modules is not easy with Haskell. Things that I found essential to write large programs. I'm starting to think that the easiest way of writing generic and reusable code with Haskell is writing a Haskell parser and code generator in Haskell. On Fri, May 27, 2011 at 3:00 PM, Daniel Fischer < daniel.is.fischer@googlemail.com> wrote:
On Friday 27 May 2011 19:23:54, Federico Mastellone wrote:
Now I have a new problem, it's getting really difficult to program generically and to create highly parameterized libraries.
Yes.
So far so good with type families, but when I want to return a generic Set for the getValues function and provide a default implementation for getValuesCount function I don't know how to do it, I don't even know if it is possible.
You can't return a generic Set in getValues (unless you jump through a lot of hoops, IntMultiMap has IntSets, so you'd have to transform those to Sets, ... yuck) and default implementations wouldn't be possible [or at least rather horrible], but
newtype MultiMap k v = MultiMap (Map.Map k (Set.Set v))
newtype IntMultiMap = IntMultiMap (IntMap.IntMap IntSet.IntSet)
class (Ord (Elem c)) => Collection c where type Elem c emtpy :: c singleton :: Elem c -> c size :: c -> Int null :: c -> Bool member :: Elem c -> c -> Bool ...
class MultiMapClass m where
class (Collection (Coll m)) => MultiMapClass m where
type Key m type Value m
type Coll m
empty :: m addValue :: Key m -> Value m -> m -> m getValues :: Key m -> m -> Set (Value m)
getValues :: Key m -> m -> Coll m
getValueCount :: Key m -> m -> Int getValueCount k m = Set.size $ getValues k m
getValueCount k m = size (getValues k m)
Should work or be possible to make working. But things get more complicated the more generic functionality you want to add.
It would probably be possible to get a more elegant implementation if you designed the library to use a class-based interface from the ground up (take a look at the edison library [EdisonAPI and EdisonCore on hackage] for an idea of how to structure the classes - edison is old, it was created long before type families or functional dependencies were available, I don't know what new stuff was incorporated into it, I suspect not too much, so you could probably improve the design with the new powerful tools at your hands, but as a source of inspiration, it should still serve well).
The problem is that, as a rule of thumb, class based genericity costs performance. And the point of the containers package was to provide performant data structures, the genericity was wilfully forsaken.
So, perhaps writing a generic envelope for the containers types might not be the best option. It could be better to start from a clean sheet. [disclaimer: I have not thought about how to do that, nor looked at the API or implementation from that point of view, so my hunch may be quite wrong.]
Cheers, Daniel
-- Federico Mastellone Computer Science Engineer - ITBA ".. there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult." Tony Hoare, 1980 ACM Turing Award Lecture.

On Monday 30 May 2011 21:44:32, Federico Mastellone wrote:
I've been looking at the Edison package, it has a big class hierarchy without default implementations using functional dependencies and I found tidier doing something similar with associated types.
Doesn't surprise me.
Writing default implementations continues to be troublesome.
Here is an example adding an elems function to the Collection class elems :: c -> [Elem c] I can't write on the MultiMapClass class a default implementation for getValueList like this getValueList :: Key m -> m -> [Value m] getValueList k m = elems $ getValues k m because elems returns [Elem (Coll m)] so I have to write getValueList :: Key m -> m -> [Elem (Coll m)] and it doesn't matter that this means the same for the implementations I have Elem (Set.Set a) ~ Value (MultiMap k v) and Elem IntSet.IntSet ~ Value IntMultiMap
Well, I guess you'd want to have Value m ~ Elem (Coll m) in all cases, so you could remove the Value type completely and replace (Value m) with (Elem (Coll m)) in all places (although it would be nicer to be able to use the shorter Value m, I'm not sure whether you can write such an alias, type (constraints?) => Value m = Elem (Coll m), be it at the top level or in the class declaration, but you could add the equality constraint to the class context).
Now when using classes like this you need to need to think twice when coding! Typing becomes much more complicated.
I don't know if I am too object oriented or is the lack of IDEs but
That could be part of it. Trying to write your code in the OO way in Haskell tends to be painful, you have to look at things from a different angle to play to the language's strengths.
reusing code, grouping together code with similarities and managing many modules is not easy with Haskell.
Common opinion among Haskellers seems to be that it's easier in Haskell than in (most) other languages (massive selection bias, of course).
Things that I found essential to write large programs. I'm starting to think that the easiest way of writing generic and reusable code with Haskell is writing a Haskell parser and code generator in Haskell.
Considering how easy writing such is in Haskell, that could in fact be not too much of an exaggeration :)
participants (2)
-
Daniel Fischer
-
Federico Mastellone