abs minBound < (0 :: Int) && negate minBound == (minBound :: Int)

The docs at http://www.haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v:gc... give a NB mentioning that (abs minBound == minBound) is possible for fixed-width types. This holds, for example, at Int. It is also the case that (negate minBound == minBound). Two questions: 1) This behavior surprised me. Does it surprise enough people to include a warning in the Haddock for abs and negate? IE Here. http://www.haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#t:Nu... 2) Is this a common behavior in other languages? My tinkering with gcc suggests it does not support the value -2^63, but instead bottoms out at (-2^63+1). Thanks.

On 19/08/2013, at 3:38 AM, Nicolas Frisby wrote:
The docs at
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v:gc...
give a NB mentioning that (abs minBound == minBound) is possible for fixed-width types.
At least three ways to represent negative integers in binary have been used: - twos-complement (asymmetric, -INT_MIN == INT_MIN, abs can have a negative result) - ones-complement (symmetric, -INT_MIN == INT_MAX, abs is always non-negative) - sign-and-magnitude (symmetric, -INT_MIN == INT_MAX, abs is always non-negative) Having used a B6700 as an undergraduate, I still think sign-and-magnitude is the only really safe-and-simple scheme. However, twos-complement has conquered. The argument for twos-complement, which always puzzled me, is that the other systems have two ways to represent zero. I never found this to be a problem, not even for bitwise operations, on the B6700. I *did* find "abs x < 0" succeeding to be a pain in the posterior. (The B6700 had two different tests: 'are these two numbers equal' and 'are these two bit patterns equal'.)
Two questions:
1) This behavior surprised me. Does it surprise enough people to include a warning in the Haddock for abs and negate?
We cannot expect everyone who uses Haskell to be familiar with the eccentricities of popular hardware. I think it's worth mentioning.
2) Is this a common behavior in other languages?
Yes.
My tinkering with gcc suggests it does not support the value -2^63, but instead bottoms out at (-2^63+1).
Your tinkering was insufficient.
f% cat 2c.c
#include
Thanks. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sun, Aug 18, 2013 at 8:04 PM, Richard A. O'Keefe
The argument for twos-complement, which always puzzled me, is that the other systems have two ways to represent zero. I never found this to be a problem, not even for bitwise operations, on the B6700. I *did* find "abs x < 0" succeeding to be a pain in the posterior. (The B6700 had two different tests: 'are these two numbers equal' and 'are these two bit patterns equal'.)
I think a better argument for twos complement is that you're just doing all of your computations modulo 2^n (where n is 32 or 64 or whatever), and addition and multiplication work as expected modulo anything. The idea of negative and positive in a system like this is artificial, though. The standard convention follows from the observation that 0-1 should be -1, and if you use the hardware that is for positive numbers, and assume that you can always carry another 1 for the highest order bit, then 0-1 is 1111...111. It's not that 2^n - 1 is actually -1, but that it acts like a -1 would act. It's only by convention that numbers which begin with a 1 are considered to be negative (that, and Intel has special hardware for detecting if a number has its leading 1 set). However, we could adopt the convention that you need two leading ones to make a negative number, and everything would work out, except for the inequality testing built into the CPU. This would give us a lot more positive numbers, and then abs wouldn't have this problem :-) Or, three other options: 1) make MIN_INT outside the domain of abs, 2) make the range of abs be some unsigned int type, or 3) use Integer (i.e., use a type which actually represents integers rather than a type which can only handle small integers). Kyle

On Mon, Aug 19, 2013 at 11:43 AM, Kyle Miller
Or, three other options: 1) make MIN_INT outside the domain of abs, 2) make the range of abs be some unsigned int type, or 3) use Integer (i.e., use a type which actually represents integers rather than a type which can only handle small integers).
I think I would just document that Int is intended to represent a machine word and therefore inherits the behavior of machine words, behavior at its extrema is subject to the CPU behavior as a result, and if consistent behavior is required then Integer should be used. (Indeed, it should probably note that Int is a performance hack; but then you run into all the Data.List etc. functions that use it.) -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On 20/08/2013, at 3:43 AM, Kyle Miller wrote:
On Sun, Aug 18, 2013 at 8:04 PM, Richard A. O'Keefe
wrote: The argument for twos-complement, which always puzzled me, is that the other systems have two ways to represent zero. I never found this to be a problem, not even for bitwise operations, on the B6700. I *did* find "abs x < 0" succeeding to be a pain in the posterior. (The B6700 had two different tests: 'are these two numbers equal' and 'are these two bit patterns equal'.) I think a better argument for twos complement is that you're just doing all of your computations modulo 2^n (where n is 32 or 64 or whatever), and addition and multiplication work as expected modulo anything.
To me, that's not a better argument. It isn't even a _good_ argument. It amounts to saying "if you do things wrong, you can justify it by saying you're really doing something else right, and it's the programmer's fault for wanting the wrong thing." One great thing about the B6700 was that you were guaranteed EITHER the mathematically correct answer in the numbers you were thinking in terms of OR an exception telling you the machine couldn't do what you wanted. When it comes to *applications* programming, the number of times I have *wanted* arithmetic modulo 2^n (in the last 40 years) can be counted on the fingers of one ear. You may call it "multiplication work[ing] as expected" when the product of two positive numbers comes out negative; I call it a wrong answer. Prelude> let tens = 1 : map (*10) tens :: [Int] Prelude> take 19 tens [1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000,10000000000,100000000000,1000000000000,10000000000000,100000000000000,1000000000000000,10000000000000000,100000000000000000,1000000000000000000] Prelude> [x * x | x <- take 19 tens] [1,100,10000,1000000,100000000,10000000000,1000000000000,100000000000000,10000000000000000,1000000000000000000,7766279631452241920,1864712049423024128,2003764205206896640,-2537764290115403776,4477988020393345024,5076944270305263616,-8814407033341083648,4003012203950112768,-5527149226598858752] Yes, I know that Haskell has Integer. If I want to do more arithmetic than a bit of simple counting, I like to use it. The gibberish that Int multiplication spewed out above is why. Roughly speaking, there are three ways to handle integer arithmetic: the Lisp way, the Ada way, and the Java way. Lisp just gets it right (think Haskell's "Integer" type). Java *defines* wrong answers to be "right". Ada recognises that sometimes you want modular arithmetic (so it offers you modular types) and sometimes you don't (so it offers you bounded but non-modular types, where overflow is trapped). Even the C standard is careful to allow signed arithmetic to trap overflows. When we tell our students that there is an integer N > 0 such that N+1 < 0, first they are shocked and confused, then they forget about it, and then they are caught by it, and at last they remember it, but aren't sure what they can _do_ about it in Java.

By "working as expected" I actually just meant that they distribute (as in
a(b+c)=ab+ac) and commute (ab=ba and a+b=b+a), and those are true in all
cases with two's compliment using simple adder and multiplier hardware*.
The interpretation of any of these numbers being positive or negative is
specious, but people still do it because it works reasonably well for small
integers. Also, there's the definite advantage that you can use the same
instructions for adding/multiplying signed and unsigned integers (for
instance, pointer arithmetic).
You mention that the B6700 trapped on overflows. While this is a nice
feature, this has nothing to do with the number format. The Intel hardware
could have come with overflow trapping, yet it didn't...
One example of a nice thing about doing computations modulo 2^n is that you
can do a bit twiddling trick called reciprocal multiplication (maybe
sometimes called magic number multiplication). One reference for this is
at [1]. Another reference is Hacker's Delight. But maybe you can save
this for your ear's fingers.
I can't really say I understand why anyone would actually want to use Int
(unless they knew they wanted a modulo 2^n Int). I think it's a bit of a
shame it was given such a simple name instead of something like FastInt or
some such (and in any case it's probably safer to use Int32 or Int64 since
Int only guarantees representing signed numbers up to about 2^29 (!)).
I don't really know how well it optimizes, but Integer is actually (if
you're using GMP with your ghc):
data Integer = S# Int# | J# Int# ByteArray# -- small integers and large
integers, respectively
That Int# is the same as in "data Int = I# Int#", so you're not really
saving anything, other than skipping bounds checking, by using Int instead
of Integer.
Where are you getting that about C? Do you mean that it's careful to allow
implementations to decide to trap overflows? Because as far as I can tell,
the C standard just says signed overflows give undefined behavior.
Exercise for the reader: Make a new integer type called SafeInt... scratch
that. The name was too good not to be taken, and sure enough, there's [2].
Although I was going to propose defining "data SafeInt = Overflow | SI
Int64" and making an instance Num SafeInt which propagates Overflow (rather
than using exceptions like in [2]).
[1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
[2] http://hackage.haskell.org/package/safeint
Kyle
* It's unclear to me how important simple adder and multiplier hardware is
these days. But, at least two's complement doesn't require defining
case-by-case behavior as I'd expect a sign-and-magnitude operations to be
defined.
On Mon, Aug 19, 2013 at 9:07 PM, Richard A. O'Keefe
On 20/08/2013, at 3:43 AM, Kyle Miller wrote:
On Sun, Aug 18, 2013 at 8:04 PM, Richard A. O'Keefe
wrote: The argument for twos-complement, which always puzzled me, is that the other systems have two ways to represent zero. I never found this to be a problem, not even for bitwise operations, on the B6700. I *did* find "abs x < 0" succeeding to be a pain in the posterior. (The B6700 had two different tests: 'are these two numbers equal' and 'are these two bit patterns equal'.) I think a better argument for twos complement is that you're just doing all of your computations modulo 2^n (where n is 32 or 64 or whatever), and addition and multiplication work as expected modulo anything.
To me, that's not a better argument. It isn't even a _good_ argument. It amounts to saying "if you do things wrong, you can justify it by saying you're really doing something else right, and it's the programmer's fault for wanting the wrong thing."
One great thing about the B6700 was that you were guaranteed EITHER the mathematically correct answer in the numbers you were thinking in terms of OR an exception telling you the machine couldn't do what you wanted. When it comes to *applications* programming, the number of times I have *wanted* arithmetic modulo 2^n (in the last 40 years) can be counted on the fingers of one ear.
You may call it "multiplication work[ing] as expected" when the product of two positive numbers comes out negative; I call it a wrong answer.
Prelude> let tens = 1 : map (*10) tens :: [Int] Prelude> take 19 tens
[1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000,10000000000,100000000000,1000000000000,10000000000000,100000000000000,1000000000000000,10000000000000000,100000000000000000,1000000000000000000] Prelude> [x * x | x <- take 19 tens]
[1,100,10000,1000000,100000000,10000000000,1000000000000,100000000000000,10000000000000000,1000000000000000000,7766279631452241920,1864712049423024128,2003764205206896640,-2537764290115403776,4477988020393345024,5076944270305263616,-8814407033341083648,4003012203950112768,-5527149226598858752]
Yes, I know that Haskell has Integer. If I want to do more arithmetic than a bit of simple counting, I like to use it. The gibberish that Int multiplication spewed out above is why.
Roughly speaking, there are three ways to handle integer arithmetic: the Lisp way, the Ada way, and the Java way. Lisp just gets it right (think Haskell's "Integer" type). Java *defines* wrong answers to be "right". Ada recognises that sometimes you want modular arithmetic (so it offers you modular types) and sometimes you don't (so it offers you bounded but non-modular types, where overflow is trapped).
Even the C standard is careful to allow signed arithmetic to trap overflows.
When we tell our students that there is an integer N > 0 such that N+1 < 0, first they are shocked and confused, then they forget about it, and then they are caught by it, and at last they remember it, but aren't sure what they can _do_ about it in Java.

On 20/08/2013, at 6:44 PM, Kyle Miller wrote:
By "working as expected" I actually just meant that they distribute (as in a(b+c)=ab+ac) and commute (ab=ba and a+b=b+a),
That is a tiny fraction of "working as expected". The whole "modular arithmetic" argument would come close to having some virtue, *except* that division just plain does not fit. In particular, it's painfully easy to find x y such that (x*y) div y is not congruent to x modulo 2**n. The existence of division makes the "everything's OK because it's modular" argument look very sick indeed.
The interpretation of any of these numbers being positive or negative is specious, but people still do it because it works reasonably well for small integers.
No, they do it because their programming language specifications encourage them to do it, because their introductory courses teach them to do it, and their compilers, on seeing x < 0, do not scream at them "that is not well defined!", so the idea that it is supposed to work is constantly reinforced. And above all, "people still do it because" in most programming languages they have no practical alternative.
Also, there's the definite advantage that you can use the same instructions for adding/multiplying signed and unsigned integers (for instance, pointer arithmetic).
As the user of a programming language, what conceivable advantage is that to me? All the machines I have access to these days have two sets of multiply and divide instructions, and there have been machines with two sets of add and subtract instructions, and it is no big deal. The B6700 had one set of instructions (which actually dealt with both integers and floating point). It didn't have _any_ instructions for unsigned integer arithmetic, and Fortran, COBOL, BASIC, Pascal, Algol, and PL/I programmers didn't miss them. (In particular, to a B6700 programmer familiar with his/her machine's instruction set, the idea that variable access might have anything to do with unsigned operations would have seemed, heck, did seem quite bizarre.) For that matter, IBM mainframes have had, since the 1960s, A signed Add AL unsigned Add Logical S signed Subtract SL unsigned Subtract Logical M signed Multiply \ ML unsigned Multiply Logical | these four D signed Divide | are common DL unsigned Divide Logical / and it never stopped them being fast. I doubt that the presence of these instructions had any significant effect on the complexity of the machines. Even some 1980s single-chip machines did this.
You mention that the B6700 trapped on overflows. While this is a nice feature, this has nothing to do with the number format.
That's half true. The B6700 number format was such that there was nothing sensible they _could_ do on integer overflow. (Look it up or trust me; truncation would have been violently unnatural on those lovely machines.)
One example of a nice thing about doing computations modulo 2^n is that you can do a bit twiddling trick called reciprocal multiplication (maybe sometimes called magic number multiplication). One reference for this is at [1]. Another reference is Hacker's Delight. But maybe you can save this for your ear's fingers.
I have a copy of Hacker's Delight within arm's reach. This is not really something that people writing applications want. Usually, what they need is affordable multiplication and division that give right answers when right answers are to be had and don't drive the program insane with rubbish when right answers are not to be had. There are plenty of clever things computer architects can do, up to and including keeping a "last divisor" cache in the division unit to accelerate divisions that reuse a recent divisor.
I can't really say I understand why anyone would actually want to use Int (unless they knew they wanted a modulo 2^n Int).
Because they are calling an existing function that requires it. It's just like the question "the only integral type in standard C that *cannot* be used safely to hold an 8-bit character is char, so why would anyone want to use char*?" Answer: "because of all the library functions that demand it."
but Integer is actually (if you're using GMP with your ghc):
Yes, that's tolerably well known. You only pay the space overhead when you need it (like Lisp or Smalltalk). But you always pay the time overhead.
Where are you getting that about C? Do you mean that it's careful to allow implementations to decide to trap overflows? Because as far as I can tell, the C standard just says signed overflows give undefined behavior.
The thing is that the standardisers *could* have defined signed int arithmetic to wrap (just like the benighted Java designers did) but they *chose* to leave the effect undefined (just like the Pascal standard) *so that* implementations that trap on overflow (which did exist) would be allowed (again, just like the Pascal standard). I used to spend way too much time reading comp.std.c. Generally, when the C standard leaves something undefined, that's because there were already systems that did things too differently to capture in one tidy definition.
* It's unclear to me how important simple adder and multiplier hardware is these days. But, at least two's complement doesn't require defining case-by-case behavior as I'd expect a sign-and-magnitude operations to be defined.
No. Multiplication and division get easier, comparison at worst requires one extra exclusive-or, and addition and subtraction require some cunning but are doable. For what it's worth, the descendant of Burroughs sells a descendant of the B6700 that is actually emulated on twos-complement hardware, and still runs fast enough (given its typically I/O bound applications) to survive in today's market. (Why twos-complement hardware? Because it's mass produced cheap commodity hardware.)

On Tue, Aug 20, 2013 at 6:37 AM, Richard A. O'Keefe
On 20/08/2013, at 3:43 AM, Kyle Miller wrote:
On Sun, Aug 18, 2013 at 8:04 PM, Richard A. O'Keefe
wrote: The argument for twos-complement, which always puzzled me, is that the other systems have two ways to represent zero. I never found this to be a problem, not even for bitwise operations, on the B6700. I *did* find "abs x < 0" succeeding to be a pain in the posterior. (The B6700 had two different tests: 'are these two numbers equal' and 'are these two bit patterns equal'.) I think a better argument for twos complement is that you're just doing all of your computations modulo 2^n (where n is 32 or 64 or whatever), and addition and multiplication work as expected modulo anything.
To me, that's not a better argument.
These kinds of argument usually come down to a question of framing -- whether pragmatic, philosophical or pedagogical. Let me start with the philosophical
It isn't even a _good_ argument. It amounts to saying "if you do things wrong, you can justify it by saying you're really doing something else right, and it's the programmer's fault for wanting the wrong thing."
This argument works if 'doing something right' is an available option. What if its not?
One great thing about the B6700 was that you were guaranteed EITHER the mathematically correct answer in the numbers you were thinking in terms of OR an exception telling you the machine couldn't do what you wanted. When it comes to *applications* programming, the number of times I have *wanted* arithmetic modulo 2^n (in the last 40 years) can be counted on the fingers of one ear.
You may call it "multiplication work[ing] as expected" when the product of two positive numbers comes out negative; I call it a wrong answer.
Prelude> let tens = 1 : map (*10) tens :: [Int] Prelude> take 19 tens [1,10,100,1000,10000,100000,
1000000,10000000,100000000, 1000000000,10000000000, 100000000000,1000000000000,10000000000000,100000000000000, 1000000000000000,10000000000000000,100000000000000000,1000000000000000000]
Prelude> [x * x | x <- take 19 tens] [1,100,10000,1000000,100000000,10000000000, 1000000000000,100000000000000,10000000000000000,1000000000000000000, 7766279631452241920,1864712049423024128,2003764205206896640, -2537764290115403776,4477988020393345024,5076944270305263616, -8814407033341083648,4003012203950112768,-5527149226598858752]
Yes, I know that Haskell has Integer. If I want to do more arithmetic than a bit of simple counting, I like to use it. The gibberish that Int multiplication spewed out above is why.
Roughly speaking, there are three ways to handle integer arithmetic: the Lisp way, the Ada way, and the Java way. Lisp just gets it right (think Haskell's "Integer" type). Java *defines* wrong answers to be "right". Ada recognises that sometimes you want modular arithmetic (so it offers you modular types) and sometimes you don't (so it offers you bounded but non-modular types, where overflow is trapped).
This issue is really a specific instance of the question: Are computers finite or infinite? If one says finite then the infinite-taped Turing machine has nothing to do with computers If one says infinite then the abstraction we are talking of is unrelated to the boxes on our desks/palmtops. If one recognises that in juggling between these two views -- dare I say a central project for a programmer?? -- we need to stuff an infinite conceptual object into a finite actual one. And in doing so some corners will be cut willy-nilly. So to say Lisp is 'right' because arithmetic does not overflow at machine word size limits misses the fact that it overflows more unpredictably when the machine memory fills out. Lets look at good ol factorial fact 0 = 1 fact n = n * fact (n-1) Now I ran it as fact 1000000 with signature Int -> Int and with Integer -> Integer In the first case I got 0 in about 3 seconds In the second... I thought I'd see what happens but after about 2 minutes of the CPU fans maxing out, firefox started giving me alarms about an 'unresponsive script'; I felt my machine had had enough punishment and gave in with C-c! And if that sounds like a unreal argument, consider representing and storing Graham's number. Of course I am arguing philosophically not pragmatically. Philosophically: Graham's number is 'just' a finite number, though a rather obese one Pragmatically: 32-bits is unwise for a bank-balance, 64 should be a bit more safe So coming to the pragmatic and to lisp... I remember a story (maybe apocryphal) about a robot in the MIT(?) lab that did a lot of clever things and then tumbled down the stairs. When asked, the concerned researcher/academic shrugged it off: "It was garbage collecting" If the robot had been programmed in C its real-time behavior would have been sturdier though its integer overflow properties would have been flimsier. More pragmatically, its good to remember Whorf's finding that wrong signboards/terminology can actually cause buildings to catch fire. So yes, Haskell's Int, should have been called FastInt or Int29 or somethin' Many such unfortunate nomenclatures... Just yesterday, been struggling with teaching the fact that 'data' is not data but type, 'type' is not type but type-synonym etc etc Yeah thats the third framing -- pedagogy... And I better stop now with a... tl;dr Abstractions leaking is a center-piece of our story. We can look the other way -- and be elegant theoreticians Or we can be good engineers and try to plug the leaks -- only to have the leak explode ever more noisily and messily

fact 0 = 1 fact n = n * fact (n-1)
Now I ran it as fact 1000000 with signature Int -> Int and with Integer -> Integer
In the first case I got 0 in about 3 seconds [...] And if that sounds like a unreal argument, consider representing and storing Graham's number.
So, since computers are finite anyway, we can just arbitrarily (well, almost) redefine large constants, and set all factorials above some threshold to zero? Perhaps we should also set pi=3, that would simplify lots of things :-)
Pragmatically: 32-bits is unwise for a bank-balance, 64 should be a bit more safe
Please start a bank using modulo arithmetic, I'm looking forward to overdrafting my account!
So yes, Haskell's Int, should have been called FastInt or Int29 or somethin'
On a more serious note, I accept that Int (and other limited precision numbers) is a fact of life, and sometimes useful for performance reasons. I would have liked, however, to have a compiler option or some other way to make my programs throw an exception on overflow - even if this turned out to be slower, I could at least use it when testing my programs, which would have caught a few bugs. -k -- If I haven't seen further, it is by standing in the footprints of giants

On Wed, Aug 21, 2013 at 11:47 AM, Ketil Malde
On a more serious note, I accept that Int (and other limited precision numbers) is a fact of life, and sometimes useful for performance reasons.
I would have liked, however, to have a compiler option or some other way to make my programs throw an exception on overflow - even if this turned out to be slower, I could at least use it when testing my programs, which would have caught a few bugs.
Yes for software detection, some will want it and some will not; see the same discussion a few months ago http://www.haskell.org/pipermail/haskell-cafe/2013-June/107153.html The 'some other way' is hardware detection. And there is glimmer of hope that Intel may begin to do due diligence http://www.emulators.com/docs/LazyOverflowDetect_Final.pdf

Richard A. O'Keefe
I think a better argument for twos complement is that you're just doing all of your computations modulo 2^n (where n is 32 or 64 or whatever), and addition and multiplication work as expected modulo anything.
To me, that's not a better argument. It isn't even a _good_ argument. It amounts to saying "if you do things wrong, you can justify it by saying you're really doing something else right, and it's the programmer's fault for wanting the wrong thing."
Not only that, but in Haskell, you don't really know 'n', it is only specified to be at least 23, or something like that. Which basically means that any code that relies on this behaviour without rigorously checking it basically is wrong. -k -- If I haven't seen further, it is by standing in the footprints of giants
participants (7)
-
Brandon Allbery
-
Evan Laforge
-
Ketil Malde
-
Kyle Miller
-
Nicolas Frisby
-
Richard A. O'Keefe
-
Rustom Mody