
On Mon, Sep 24, 2012 at 2:53 PM, George Giorgidze
Hi Michael,
Here at the University of Tübingen, I am co-supervising (together with Jeroen Weijers) a student project implementing the OverloadedLists extension for GHC. Achim Krause is the student who is working on the project. We took into consideration earlier discussions on this topic [1,2] before embarking on the project.
Achim has worked on two approaches.
The first approach is very simple, both from the user's and the extension implementor's perspective (it follows the implementation of OverloadedStrings closely) and typechecks and desugars lists like
[] ; [x,y,z] ; ['a' .. 'z'] ;
as
fromList [] ; fromList [x,y,z] ; fromList ['a' .. 'z'] ;
where fromList is whatever is in scope with that name. That said, we do provide the FromList type class that can be used to overload fromList. In the following I give the definition of the class, as well as, example instances:
class FromList l where type Item l fromList :: [Item l] -> l
instance FromList [a] where type Item [a] = a fromList = id
instance (Ord a) => FromList (Set a) where type Item (Set a) = a fromList = Set.fromList
instance (Ord k) => FromList (Map k v) where type Item (Map k v) = (k,v) fromList = Map.fromList
instance FromList (IntMap v) where type Item (IntMap v) = (Int,v) fromList = IntMap.fromList
instance FromList Text where type Item Text = Char fromList = Text.pack
This approach has already been implemented by Achim as patch against GHC head.
This approach is very simple, but can be inefficient as it may result into unnecessary construction of lists at runtime. This can be a serious issue when constructing large structures from arithmetic sequences (e.g., from the [ .. ] notation) or when using non-literal expressions (e.g., variables) inside the square brackets.
Our second approach to OverloadedLists is to avoid the construction of lists altogether. By typechecking and desugaring lists like
[] ; [x,y,z] ; ['a' .. 'z'] ;
as
mempty ; singleton x `mappend` singleton y `mappend` singleton z ; genericEnumFromTo 'a' 'z' ;
We provide the Singleton and GenericEnum type classes for overloading singleton and genericEnum(..) functions. In the following, I give the definitions of the classes, as well as, example instances:
-- Singleton class
class Singleton l where type SingletonItem l singleton :: SingletonItem l -> l
-- Singleton instances
instance Singleton [a] where type SingletonItem [a] = a singleton a = [a]
instance (Ord a) => Singleton (Set a) where type SingletonItem (Set a) = a singleton = Set.singleton
instance (Ord k) => Singleton (Map k v) where type SingletonItem (Map k v) = (k,v) singleton (k,v) = Map.singleton k v
instance Singleton (IntMap v) where type SingletonItem (IntMap v) = (Int,v) singleton (k,v) = IntMap.singleton k v
instance Singleton Text where type SingletonItem Text = Char singleton = Text.singleton
-- GenericEnum class
class GenericEnum l where type EnumItem l genericEnumFrom :: EnumItem l -> l genericEnumFromThen :: EnumItem l -> EnumItem l -> l genericEnumFromTo :: EnumItem l -> EnumItem l -> l genericEnumFromThenTo :: EnumItem l -> EnumItem l -> EnumItem l -> l
-- GenericEnum instances
instance (Enum a) => GenericEnum [a] where type EnumItem [a] = a genericEnumFrom = enumFrom genericEnumFromThen = enumFromThen genericEnumFromTo = enumFromTo genericEnumFromThenTo = enumFromThenTo
instance (Ord a,Enum a) => GenericEnum (Set a) where type EnumItem (Set a) = a genericEnumFrom a = Set.fromList (enumFrom a) genericEnumFromThen a b = Set.fromList (enumFromThen a b) genericEnumFromTo a b = Set.fromList (enumFromTo a b) genericEnumFromThenTo a b c = Set.fromList (enumFromThenTo a b c)
instance (Ord k,Enum (k,v)) => GenericEnum (Map k v) where type EnumItem (Map k v) = (k,v) genericEnumFrom a = Map.fromList (enumFrom a) genericEnumFromThen a b = Map.fromList (enumFromThen a b) genericEnumFromTo a b = Map.fromList (enumFromTo a b) genericEnumFromThenTo a b c = Map.fromList (enumFromThenTo a b c)
instance (Enum (Int,v)) => GenericEnum (IntMap v) where type EnumItem (IntMap v) = (Int,v) genericEnumFrom a = IntMap.fromList (enumFrom a) genericEnumFromThen a b = IntMap.fromList (enumFromThen a b) genericEnumFromTo a b = IntMap.fromList (enumFromTo a b) genericEnumFromThenTo a b c = IntMap.fromList (enumFromThenTo a b c)
instance GenericEnum Text where type EnumItem Text = Char genericEnumFrom a = Text.pack (enumFrom a) genericEnumFromThen a b = Text.pack (enumFromThen a b) genericEnumFromTo a b = Text.pack (enumFromTo a b) genericEnumFromThenTo a b c = Text.pack (enumFromThenTo a b c)
Note that the GenericEnum instances can be implemented more efficiently, but for now I give simple definitions that go through lists.
Our second approach avoids the construction of intermediate lists at runtime and directly constructs the target data structure for which the list notation is used.
We will release GHC patches for both approaches, meanwhile the feedback from the community on the approaches that we took would be very much appreciated. Which one those would you prefer? or would you suggest a different one.
Note that we intend to make fromList in the first approach and singleton, genericEnum(..), mempty and mapped rebindable. This means that the definitions of the type classes that overload this functions can be easily changed. Having said that, altering the changes that Achim already made to the GHC source code (including typechecking and desugaring rules) will be more work and we hope that one of the approaches that we took will be acceptable for the community.
Cheers, George
[1] http://www.mail-archive.com/glasgow-haskell-users@haskell.org/msg20447.html [2] http://www.mail-archive.com/glasgow-haskell-users@haskell.org/msg20518.html
Hi George, It's very exciting to hear that this work has already been starting, thank you for letting me know. Your first approach is more inline with my initial proposal, though that doesn't mean I necessarily prefer it. I'm certainly interested having a more efficient implementation available, but I wonder if this second approach isn't a premature optimization. GHC rewrite rules provide a lot of power in this department: both ByteString and Text are able to avoid the intermediate String and build their representations from the buffers GHC creates. In my own work in conduit, I was able (with some help from Joachim Breitner[1]) to remove the intermediate list structure, and I'd be surprised if it's not possible to do the same thing with calls to `fromList`. That said, I haven't actually tried implementing any of this in real code, and it could be that there are serious performance advantages to the second approach. I'm quite happy with either implementation making it into GHC. Michael [1] http://www.haskell.org/pipermail/haskell-cafe/2012-April/100793.html