
I could not quickly find anyone else writing this boiler plate, so I have posted this useful wrapper on the Haskell wiki at http://haskell.org/haskellwiki/EnumSet_EnumMap This uses a cheap newtype to wrap IntSet and IntMap so that you can store any Enum in them. It saves either writing many toEnum/fromEnum functions or using the slower and more generic Data.Set and Data.Map. The original motivation was to go from Map Char to IntMap. And as a bonus, the type signature of the newtype is the same kind as Data.Set and Data.Map (which matters when declaring instances...)
newtype EnumSet e = EnumSet {unEnumSet :: S.IntSet} deriving (Monoid,Eq,Read,Show,Ord)
newtype EnumMap k a = EnumMap {unEnumMap :: M.IntMap a} deriving (Eq,Read,Show,Ord,Monoid,Foldable,Functor)
This has been quickly tested with GHC 6.6 and may contain typographical errors I have not caught yet. -- Chris

Chris Kuklewicz wrote:
I have posted this useful wrapper on the Haskell wiki at http://haskell.org/haskellwiki/EnumSet_EnumMap
In my opinion, this is important enough that IntMap and IntSet should be deprecated from the standard libraries in favor of EnumMap and EnumSet. Besides the obvious advantage of a huge increase in generality at no performance cost (I think), there is another reason. The difference in interface been the Int and non-Int versions of Set and Map forces you to make an early decision about which to use. That decision get pervasively hard-wired into your code. The result is that it is sometimes hard to write flexible polymorphic Haskell while still reserving the option of switching to Int as an optimization. Chris' idea fixes that problem. Regards, Yitz

Yitzchak Gale wrote:
Chris Kuklewicz wrote:
I have posted this useful wrapper on the Haskell wiki at http://haskell.org/haskellwiki/EnumSet_EnumMap
In my opinion, this is important enough that IntMap and IntSet should be deprecated from the standard libraries in favor of EnumMap and EnumSet.
I disagree.
Besides the obvious advantage of a huge increase in generality at no performance cost (I think), there is another reason.
There is a performance cost. Each use of Data.List.map in that code is a performance cost. And if toEnum/fromEnum is non-trivial there is a performance cost. For instance, I have abandoned (EnumMap Char) and made a specific CharMap module that uses GHC.Base.unsafeChr to avoid the bounds checks. If you want zero performance penalty you need to stop using Enum and create an UnsafeEnum type class that does casting without bounds checking, and knows how to cast things like "[Char]" to "[Int]" and back without touching each element.
The difference in interface been the Int and non-Int versions of Set and Map forces you to make an early decision about which to use. That decision get pervasively hard-wired into your code.
This is true. The solution for real application is to adopt a type class interface to collections, for which a few examples exist.
Chris' idea fixes that problem.
EnumMap and EnumSet increase type safety but not always efficiency.
Regards, Yitz

Chris Kuklewicz wrote:
There is a performance cost. Each use of Data.List.map in that code is a performance cost. And if toEnum/fromEnum is non-trivial there is a performance cost.
Are you sure? I am under the impression that at least for GHC, toEnum/fromEnum on Int is free. And every Data.List.map in your code is mapping one of toEnum, fromEnum, or a newtype constructor, so those should also be free, or perhaps could be made free with a pragma. Expert opinions on this? On types other than Int, you'll incur a very small cost. It should still be much better than generic Map and Set. If the difference between that and building something by hand for types other than Int is important, you can still do that, as now.
The difference in interface been the Int and non-Int versions of Set and Map forces you to make an early decision about which to use. That decision gets pervasively hard-wired into your code.
This is true. The solution for real application is to adopt a type class interface to collections, for which a few examples exist.
That is a far-reaching change. I hope it happens some day. In the meantime, your module are usable now, with very little change to existing code. It should certainly be added to the standard libraries. I personally would then stop using IntMap/IntSet for new code. If there is any penalty at all, it will certainly be insignificant to me. (I am not currently writing any regexp libraries.) If this is indeed free of performance cost for most or all compilers, I would then expect the old Int-specific implementations to disappear in a few years. Regards, Yitz

Yitzchak Gale wrote:
Chris Kuklewicz wrote:
There is a performance cost. Each use of Data.List.map in that code is a performance cost. And if toEnum/fromEnum is non-trivial there is a performance cost.
Are you sure? I am under the impression that at least for GHC, toEnum/fromEnum on Int is free. And every Data.List.map in your code is mapping one of toEnum, fromEnum, or a newtype constructor, so those should also be free, or perhaps could be made free with a pragma.
Expert opinions on this?
GHC's darcs repostory, at http://darcs.haskell.org/packages/base/GHC/Base.lhs :
-- | The 'Prelude.toEnum' method restricted to the type 'Data.Char.Char'. chr :: Int -> Char chr (I# i#) | int2Word# i# `leWord#` int2Word# 0x10FFFF# = C# (chr# i#) | otherwise = error "Prelude.chr: bad argument"
unsafeChr :: Int -> Char unsafeChr (I# i#) = C# (chr# i#)
-- | The 'Prelude.fromEnum' method restricted to the type 'Data.Char.Char'. ord :: Char -> Int ord (C# c#) = I# (ord# c#)
As you can see, the 'chr' function rightly checks bounds, and this is what is used in http://darcs.haskell.org/packages/base/GHC/Enum.lhs to make the Enum Char instance. I made a special CharMap module that uses unsafeChr instead. An interesting variation would be to define class UnsafeEnum and implement non-bounds checked unsafeToEnum and then base EnumMap/Set around this instead of the usual Enum. Or you could newtype an UnsafeChar with a custom Enum instance that uses unsafeChr.
On types other than Int, you'll incur a very small cost. It should still be much better than generic Map and Set. If the difference between that and building something by hand for types other than Int is important, you can still do that, as now.
Correct.
The difference in interface been the Int and non-Int versions of Set and Map forces you to make an early decision about which to use. That decision gets pervasively hard-wired into your code.
This is true. The solution for real application is to adopt a type class interface to collections, for which a few examples exist.
That is a far-reaching change. I hope it happens some day.
Perhaps.
In the meantime, your module are usable now, with very little change to existing code.
The (EnumMap Char) was not a real gain on (Map Char) until I made CharMap and added INLINE pragmas and avoided the List.map conversions to/from Char.
It should certainly be added to the standard libraries.
I think the priority should be to add a type class interface, just as we have IArray and MArray. Many of us have rolled our own String/List type class interface as well. But this needs to take advantage of MPTC and probably fundeps or associated types.
I personally would then stop using IntMap/IntSet for new code. If there is any penalty at all, it will certainly be insignificant to me. (I am not currently writing any regexp libraries.)
Then please enjoy EnumSet and EnumMap. You probably should add an INLINE pragma for each of the definitions. (I used a sed command to generate them).
If this is indeed free of performance cost for most or all compilers, I would then expect the old Int-specific implementations to disappear in a few years.
Well, for Int the toEnum/fromEnum is identity, so any compiler that can inline the calls should remove this performance penalyy.
Regards, Yitz

Chris Kuklewicz wrote:
There is a performance cost.
I wrote:
Are you sure?
GHC's darcs repostory, at http://darcs.haskell.org/packages/base/GHC/Base.lhs
I am talking about Int, not Char. I am comparing standard libs before and after, not what you can do by hand. I think the standard libs comparison works out like this: Int: No difference in performance, much better interface. Other Enum instances, such as Char: Currently not available at all, you have to use Map and Set. So big performance gain.
In the meantime, your module are usable now, with very little change to existing code.
The (EnumMap Char) was not a real gain on (Map Char) until I made CharMap and added INLINE pragmas and avoided the List.map conversions to/from Char.
How about with INLINE but also with the map conversions (the regular ones, not the unsafe ones). That is what would be added to the standard libs, and I would think it should be a gain over Man and Set. Do you have any numbers?
It should certainly be added to the standard libraries... I personally would then stop using IntMap/IntSet for new code...
Then please enjoy EnumSet and EnumMap.
Thanks - they really are nice. As a policy, I do try to stick to standard libraries for stability. Regards, Yitz

On Fri, Feb 23, 2007 at 03:24:27PM +0000, Chris Kuklewicz wrote:
I could not quickly find anyone else writing this boiler plate, so I have posted this useful wrapper on the Haskell wiki at http://haskell.org/haskellwiki/EnumSet_EnumMap
concurrent evolution :) http://repetae.net/repos/jhc/Util/SetLike.hs actually, my 'SetLike' and 'MapLike' typeclasses also defined in that file are quite useful too. John -- John Meacham - ⑆repetae.net⑆john⑈
participants (3)
-
Chris Kuklewicz
-
John Meacham
-
Yitzchak Gale