
Yitzchak Gale wrote:
Chris Kuklewicz wrote:
There is a performance cost. Each use of Data.List.map in that code is a performance cost. And if toEnum/fromEnum is non-trivial there is a performance cost.
Are you sure? I am under the impression that at least for GHC, toEnum/fromEnum on Int is free. And every Data.List.map in your code is mapping one of toEnum, fromEnum, or a newtype constructor, so those should also be free, or perhaps could be made free with a pragma.
Expert opinions on this?
GHC's darcs repostory, at http://darcs.haskell.org/packages/base/GHC/Base.lhs :
-- | The 'Prelude.toEnum' method restricted to the type 'Data.Char.Char'. chr :: Int -> Char chr (I# i#) | int2Word# i# `leWord#` int2Word# 0x10FFFF# = C# (chr# i#) | otherwise = error "Prelude.chr: bad argument"
unsafeChr :: Int -> Char unsafeChr (I# i#) = C# (chr# i#)
-- | The 'Prelude.fromEnum' method restricted to the type 'Data.Char.Char'. ord :: Char -> Int ord (C# c#) = I# (ord# c#)
As you can see, the 'chr' function rightly checks bounds, and this is what is used in http://darcs.haskell.org/packages/base/GHC/Enum.lhs to make the Enum Char instance. I made a special CharMap module that uses unsafeChr instead. An interesting variation would be to define class UnsafeEnum and implement non-bounds checked unsafeToEnum and then base EnumMap/Set around this instead of the usual Enum. Or you could newtype an UnsafeChar with a custom Enum instance that uses unsafeChr.
On types other than Int, you'll incur a very small cost. It should still be much better than generic Map and Set. If the difference between that and building something by hand for types other than Int is important, you can still do that, as now.
Correct.
The difference in interface been the Int and non-Int versions of Set and Map forces you to make an early decision about which to use. That decision gets pervasively hard-wired into your code.
This is true. The solution for real application is to adopt a type class interface to collections, for which a few examples exist.
That is a far-reaching change. I hope it happens some day.
Perhaps.
In the meantime, your module are usable now, with very little change to existing code.
The (EnumMap Char) was not a real gain on (Map Char) until I made CharMap and added INLINE pragmas and avoided the List.map conversions to/from Char.
It should certainly be added to the standard libraries.
I think the priority should be to add a type class interface, just as we have IArray and MArray. Many of us have rolled our own String/List type class interface as well. But this needs to take advantage of MPTC and probably fundeps or associated types.
I personally would then stop using IntMap/IntSet for new code. If there is any penalty at all, it will certainly be insignificant to me. (I am not currently writing any regexp libraries.)
Then please enjoy EnumSet and EnumMap. You probably should add an INLINE pragma for each of the definitions. (I used a sed command to generate them).
If this is indeed free of performance cost for most or all compilers, I would then expect the old Int-specific implementations to disappear in a few years.
Well, for Int the toEnum/fromEnum is identity, so any compiler that can inline the calls should remove this performance penalyy.
Regards, Yitz