More on Proposal #2717: Add nubWith, nubOrd

As those who follow this list know, the consensus after discussion of Proposal #2717 was to just add nubOrd to Data.Set. Final question: should we add nubInt to Data.IntSet as well? It seems to me like the right thing to do, given the existence of Data.IntSet in the first place. Once this last question is answered, I will start a new proposal round with a "final" set of proposed patches. Bart Massey bart <at> cs.pdx.edu

Data.Fixed contains divMod' :: (Real a, Integral b) => a -> a -> (b, a) as well as div' and mod'. There are two problems: 1) Implementation of this type signature requires conversion to Rational, which is extremely slow. It really isn't usable in any kind of significant loop at all. 2) This has nothing specific to do with fixed precision arithmetic despite being in the Data.Fixed module. I played with writing a patch but I think this requires discussion. I believe the original author was aware of the problems and didn't consider them urgent. Should there be a RULES to address the performance issues? Should this be based on RealFrac or specific to Float, Double, etc? Alternately OR additionally should there be a separate set of functions created based on RealFrac? Another possibility is to change the signature of the divMod' family itself to use RealFrac instead of Real. This might break at compile time, but Rational is still an instance of RealFrac, so the fix would easy, and the performance penalty of calling to/fromRational would be visible in the calling code instead of hidden away. If we use RULES, this may make programs that depend on it effectively unusable without the optimizer. Whatever we do will probably change the semantics a bit in terms of infinite/NaN, but I don't think that is likely to matter. I personally prefer changing the signature of the divMod' family, but I anticipate objections. I also think that we should make sure that the divMod' family gets inlined as my understanding is this will avoid dictionary lookups and aid the strictness analyzer, but I'm not 100% certain on this point. Does a module (Data.Num?) need to be created to keep these? Thanks, --Lane

lane:
Data.Fixed contains divMod' :: (Real a, Integral b) => a -> a -> (b, a) as well as div' and mod'.
There are two problems:
1) Implementation of this type signature requires conversion to Rational, which is extremely slow. It really isn't usable in any kind of significant loop at all.
2) This has nothing specific to do with fixed precision arithmetic despite being in the Data.Fixed module.
I played with writing a patch but I think this requires discussion. I believe the original author was aware of the problems and didn't consider them urgent.
Should there be a RULES to address the performance issues? Should this be based on RealFrac or specific to Float, Double, etc? Alternately OR additionally should there be a separate set of functions created based on RealFrac?
Almost certainly. Things like fromIntegral and truncate already have extensive type-based rules.
If we use RULES, this may make programs that depend on it effectively unusable without the optimizer. Whatever we do will probably change the semantics a bit in terms of infinite/NaN, but I don't think that is likely to matter.
Well, we make them the same as they currently are.
I personally prefer changing the signature of the divMod' family, but I anticipate objections.
I also think that we should make sure that the divMod' family gets inlined as my understanding is this will avoid dictionary lookups and aid the strictness analyzer, but I'm not 100% certain on this point.
Check the core! -- Donn

On 2008 Nov 14, at 22:58, Christopher Lane Hinson wrote:
Data.Fixed contains divMod' :: (Real a, Integral b) => a -> a -> (b, a) as well as div' and mod'.
There are two problems:
1) Implementation of this type signature requires conversion to Rational, which is extremely slow. It really isn't usable in any kind of significant loop at all.
I dunno, Data.Fixed seems to me to be Rational's cousin, and I can also see how divMod' would be needed to implement Fixed.
2) This has nothing specific to do with fixed precision arithmetic despite being in the Data.Fixed module.
Last I checked the Haddock noted that it has wider use than Data.Fixed. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Hi Bart,
As those who follow this list know, the consensus after discussion of Proposal #2717 was to just add nubOrd to Data.Set. Final question: should we add nubInt to Data.IntSet as well? It seems to me like the right thing to do, given the existence of Data.IntSet in the first place.
Yes, seems very sensible! Thanks Neil

I've attached new patches against Darcs head to my proposal at http://hackage.haskell.org/trac/ghc/ticket/2717 The patches implement Set.nubOrd and IntSet.nubInt, and modify the documentation for List.nub. Please take a look at these patches and give any comments on this "final" version of code or documentation. If I don't hear otherwise shortly, I'll go ahead and request that these patches be incorporated for 2.6.11. Thanks. Bart Massey bart <at> cs.pdx.edu

Bart Massey wrote:
The patches implement Set.nubOrd and IntSet.nubInt, and modify the documentation for List.nub.
Could you please call it IntSet.nubOrd instead? The namespaces of Set and IntSet should be compatible wherever possible, so that you can just change the import statement (and perhaps a type signature) to switch between them. Thanks, Yitz

Hi
Could you please call it IntSet.nubOrd instead?
I considered raising this, but decided nubInt is better than nubOrd. These functions don't really belong in Set/IntSet - its just a case of dependencies etc. As such, I don't think the usual rules apply, because nubOrd doesn't work on Set's at all. From a usage point of view, nubInt/nubOrd is much clearer. Thanks Neil

I wrote:
Could you please call it IntSet.nubOrd instead?
Neil Mitchell wrote:
I considered raising this, but decided nubInt is better than nubOrd. These functions don't really belong in Set/IntSet - its just a case of dependencies etc. As such, I don't think the usual rules apply, because nubOrd doesn't work on Set's at all. From a usage point of view, nubInt/nubOrd is much clearer.
Usually the difference between using the Map or IntMap version of nub won't be that significant. If you are worried about the distinction being clear, you can always use qualified imports, so that you'll write Map.nubOrd vs. IntMap.nubOrd. It has become standard style to give two functions the same name if they live in different modules and they do the "same thing" except at different types. In general I think it is worthwhile to emphasize simplicity and consistency in namespace choices. In the case of Map and IntMap, I would like to see us strive for the two modules to export *exactly* the same set of symbols, except for the type names/constructors Map and IntMap. Thanks, Yitz

Yitzchak Gale
Could you please call it IntSet.nubOrd instead?
Neil Mitchell wrote:
I considered raising this, but decided nubInt is better than nubOrd. From a usage point of view, nubInt/nubOrd is much clearer.
In the case of Map and IntMap,I would like to see us strive for the two modules to export *exactly* the same set of symbols, except for the type names/constructors Map and IntMap.
There's only two possibilities that make sense to me: (1) do what I did and call the three functions in question nub, nubOrd, and nubInt, or (2) call them all nub. If we're going to be consistent with names across types, then there's no reason for nubOrd to be a special case either, I think. And calling a function nubOrd when it won't work with arbitrary Ord data just seems broken to me. Given the pain in the neck that is Haskell's management of names that are duplicated in different imported modules, and the propensity to use Data.List and Data.Set together, I prefer (1). But if folks prefer (2), or if they prefer Yitzchak's intermediate version, please let us know on-list ASAP. Bart Massey bart <at> cs.pdx.edu

There's only two possibilities that make sense to me: (1) do what I did and call the three functions in question nub, nubOrd, and nubInt, or (2) call them all nub. If we're going to be consistent with names across types, then there's no reason for nubOrd to be a special case either, I think. And calling a function nubOrd when it won't work with arbitrary Ord data just seems broken to me.
Given the pain in the neck that is Haskell's management of names that are duplicated in different imported modules, and the propensity to use Data.List and Data.Set together, I prefer (1). But if folks prefer (2), or if they prefer Yitzchak's intermediate version, please let us know on-list ASAP.
(+1) for option (1) Neil

Bart Massey wrote:
There's only two possibilities that make sense to me: (1) do what I did and call the three functions in question nub, nubOrd, and nubInt, or (2) call them all nub.
it would be possible to do both: for example, export it as both "nub" and "nubInt" from IntMap. But if that seems too silly, I vote for (1) -Isaac
participants (7)
-
Bart Massey
-
Brandon S. Allbery KF8NH
-
Christopher Lane Hinson
-
Don Stewart
-
Isaac Dupree
-
Neil Mitchell
-
Yitzchak Gale