Strange results when trying to create large Bool arrays.

My first ever Haskell program just creates an array in memory. I wanted to try creating really big arrays. import Data.Array.IO import System import Text.Printf main = do n <- getArgs >>= readIO . head :: IO Int a <- newArray (1,n) True :: IO (IOUArray Int Bool) printf "Created array 1 .. %8d \n" (n::Int) :: IO () It appears to work up to quite large numbers, but then gets strange. When I give it an array size of 1,000,000,000,000 it returns this Created array 1 .. -727379968 Presumably the Int has overflowed without warning when read from the argument. I guess there must be a switch to make it produce a nice error message rather than overflowing without warning. If I replace all the 'Int's in the program with 'Integer's it works better, but sometimes it freezes when I give it a silly large number rather than returning an 'unable to allocate that much memory' error. And what really puzzles me is this behaviour in the 'Integer' version. 1000 created array 1000000 created array 1000000000 created array 10000000000 created array (approximately my bits of real memory) 100000000000 created array 1000000000000 freezes 10000000000000 created array 100000000000000 created array 1000000000000000 freezes 10000000000000000 created array It seems to randomly claim to have successfully created huge sizes of array. So now I am not sure which of these arrays it is really creating. I would expect it to consistently freeze above my real memory size. Or preferably return a 'not enough memory' error message. I have 1Gb of real memory in my machine. GHC 6.6.1 for Windows, running on W2K. Assuming these errors are in Haskell rather than my code (beginner's fallacy?) could I vote for having the memory, bounds, overflow etc checking switched on by default. I like my errors to be made as visible as possible as soon as possible. Or did I do something wrong? Richard.

Richard Kelsall wrote:
main = do n <- getArgs >>= readIO . head :: IO Int a <- newArray (1,n) True :: IO (IOUArray Int Bool) printf "Created array 1 .. %8d \n" (n::Int) :: IO ()
It appears to work up to quite large numbers, but then gets strange. When I give it an array size of 1,000,000,000,000 it returns this
Created array 1 .. -727379968
Presumably the Int has overflowed without warning when read from the argument.
Yes, that's right.
I guess there must be a switch to make it produce a nice error message rather than overflowing without warning.
Actually, there isn't. Int is a bit of an odd fish that way; it's a window onto the underlying machine's behaviour, not a tidy, well-behaved mathematical ring. In a similar vein, I was initially perplexed when I found that an expression like this produces garbage instead of an error: read "11111111111111111111111111111111111" :: Int I have not seen a lot of interest expressed in fixing this sort of misbehaviour, which jars a little with the usual emphasis on stringency and testing.
It seems to randomly claim to have successfully created huge sizes of array.
This may be outside of Haskell's control, as you're not actually touching the memory you allocate. I wouldn't be surprised if the underlying page allocation is succeeding by virtue of the OS being willing to overcommit resources that may not actually be used. (This would be normal behaviour on Linux, for example.) In such a case, the Haskell runtime may not receive an error until you try to actually touch the data. You can get GHC to fix an upper limit on the heap size it will try to use, by passing "+RTS -M768m -RTS" to your compiled program on the command line. That should cause your program to crash more reliably.

On Wed, Jul 11, 2007 at 10:55:28AM -0700, Bryan O'Sullivan wrote:
Richard Kelsall wrote:
It seems to randomly claim to have successfully created huge sizes of array.
This may be outside of Haskell's control, as you're not actually touching the memory you allocate. I wouldn't be surprised if the underlying page allocation is succeeding by virtue of the OS being willing to overcommit resources that may not actually be used. (This would be normal behaviour on Linux, for example.) In such a case, the Haskell runtime may not receive an error until you try to actually touch the data.
Not a sufficient explanation - you can only allocate addrssable memory, even on Linux I can't allocate more than 3GB of storage.
You can get GHC to fix an upper limit on the heap size it will try to use, by passing "+RTS -M768m -RTS" to your compiled program on the command line. That should cause your program to crash more reliably.
That still won't work because GHC trims the array size to an Int interally, and trims the byte count to an Int again before passing it to the OS. stefan@stefans:~$ ghci Loading package base ... linking ... done. ___ ___ _ / _ \ /\ /\/ __(_) / /_\// /_/ / / | | GHC Interactive, version 6.7.20070612, for Haskell 98. / /_\\/ __ / /___| | http://www.haskell.org/ghc/ \____/\/ /_/\____/|_| Type :? for help. Prelude> :m + Data.Array.Unboxed Prelude Data.Array.Unboxed> array (0,maxBound :: Int) [(-2, 42::Int)] :: UArray Int Int array Segmentation fault stefan@stefans:~$ http://hackage.haskell.org/trac/ghc/ticket/229 (yes, that's a 3 digit bug number) Stefan

Bryan O'Sullivan wrote:
Int is a bit of an odd fish that way; it's a window onto the underlying machine's behaviour, not a tidy, well-behaved mathematical ring.
I'm just being picky here: where the underlying machine's behaviour is 2's complement binary, it (Int, +, *) is actually a tidy, well-behaved mathematical ring, isomorphic to Z / 2^n Z. Furthermore, if there were overflow errors, the result would not be a ring, but an error monad over a ring or a CPO lifting a ring. (Not to say that they are not tidy, well-behaved, mathematical.)

Albert Y. C. Lai wrote:
I'm just being picky here: where the underlying machine's behaviour is 2's complement binary, it (Int, +, *) is actually a tidy, well-behaved mathematical ring, isomorphic to Z / 2^n Z.
Yes, naturally it wasn't until a few moments after I had sent the message that I noticed my error.

Bryan O'Sullivan wrote:
Richard Kelsall wrote:
I guess there must be a switch to make it produce a nice error message rather than overflowing without warning.
Actually, there isn't.
I for one sometimes wish there was... Of course, sometimes you purposely write code which you know is going to overflow and wrap round in a specific way. But frequently you *don't* want this behaviour - and I wish there were some pragma or something to make this be checked. AFAIK, most CPU types give you an efficient way to testing for such conditions...

On Wed, Jul 11, 2007 at 08:16:50PM +0100, Andrew Coppin wrote:
Bryan O'Sullivan wrote:
Richard Kelsall wrote:
I guess there must be a switch to make it produce a nice error message rather than overflowing without warning.
Actually, there isn't.
I for one sometimes wish there was...
Of course, sometimes you purposely write code which you know is going to overflow and wrap round in a specific way. But frequently you *don't* want this behaviour - and I wish there were some pragma or something to make this be checked. AFAIK, most CPU types give you an efficient way to testing for such conditions...
Indeed. I beleive that Int should be removed from the Prelude. People who need the algebraic properties of rings modulo 2^(2^n) can use the sized integral types from Data.Int and Data.Word; people who want speed and can satisfy the proof obligations can use Int and Word from the same modules. Everyone else can use Integer, which should be made shorter than Int for obvious psychological reasons. Stefan

Stefan O'Rear wrote:
On Wed, Jul 11, 2007 at 08:16:50PM +0100, Andrew Coppin wrote:
Of course, sometimes you purposely write code which you know is going to overflow and wrap round in a specific way. But frequently you *don't* want this behaviour - and I wish there were some pragma or something to make this be checked. AFAIK, most CPU types give you an efficient way to testing for such conditions...
Indeed. I beleive that Int should be removed from the Prelude. People who need the algebraic properties of rings modulo 2^(2^n) can use the sized integral types from Data.Int and Data.Word; people who want speed and can satisfy the proof obligations can use Int and Word from the same modules. Everyone else can use Integer, which should be made shorter than Int for obvious psychological reasons.
Do we really want to do that? I mean, make Haskell in general 2 orders of magnitude slower (and heaven knows how many orders of magnitude more RAM hungry) for any program using more than a handful of integers? Personally, I'd prefer a way to just throw an exception when a numeric overflow happens. (Probably only for test purposes - so maybe a compiler flag?) How about the floating-point types? What do they currently do?

Yes, I think we want Integer to be the type that is used unless you ask for
something else.
It adheres to the principle of getting it right before optimizing.
On 7/11/07, Andrew Coppin
On Wed, Jul 11, 2007 at 08:16:50PM +0100, Andrew Coppin wrote:
Of course, sometimes you purposely write code which you know is going to overflow and wrap round in a specific way. But frequently you *don't* want this behaviour - and I wish there were some pragma or something to make this be checked. AFAIK, most CPU types give you an efficient way to testing for such conditions...
Indeed. I beleive that Int should be removed from the Prelude. People who need the algebraic properties of rings modulo 2^(2^n) can use the sized integral types from Data.Int and Data.Word; people who want speed and can satisfy the proof obligations can use Int and Word from the same modules. Everyone else can use Integer, which should be made shorter
Stefan O'Rear wrote: than
Int for obvious psychological reasons.
Do we really want to do that?
I mean, make Haskell in general 2 orders of magnitude slower (and heaven knows how many orders of magnitude more RAM hungry) for any program using more than a handful of integers?
Personally, I'd prefer a way to just throw an exception when a numeric overflow happens. (Probably only for test purposes - so maybe a compiler flag?)
How about the floating-point types? What do they currently do?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Lennart Augustsson wrote:
Yes, I think we want Integer to be the type that is used unless you ask for something else. It adheres to the principle of getting it right before optimizing.
What, as in the way that simple strings are lists, and you change it to something less flexible but faster if you actually need it? I suppose that makes some sense...

On 7/12/07, Andrew Coppin
What, as in the way that simple strings are lists, and you change it to something less flexible but faster if you actually need it?
I suppose that makes some sense...
Not much. There is a very significant difference between the two. The differences between Int and Integer operations are mostly constant factors. The differences between String and ByteString operations are algorithmic (consider length). T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

Hello Thomas, Thursday, July 12, 2007, 3:14:57 AM, you wrote:
The differences between Int and Integer operations are mostly constant factors.
well, i will be unlucky if in my real-world program Integers would be used instead of Ints. defaulting provides a great way to solve this dilemma, so good-for-anyone approach may be: default defaulting to Integer instead of Int, and use (Num a) instead of Int in all standard functions such as length. with jhc-like automatic specialization feature this should provide enough speed -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Of course you should be able to specify what types you want. But it would
be nice if the default was correct rather than fast.
On 7/12/07, Bulat Ziganshin
Hello Thomas,
Thursday, July 12, 2007, 3:14:57 AM, you wrote:
The differences between Int and Integer operations are mostly constant factors.
well, i will be unlucky if in my real-world program Integers would be used instead of Ints. defaulting provides a great way to solve this dilemma, so good-for-anyone approach may be: default defaulting to Integer instead of Int, and use (Num a) instead of Int in all standard functions such as length. with jhc-like automatic specialization feature this should provide enough speed
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

lennart:
Yes, I think we want Integer to be the type that is used unless you ask for something else. It adheres to the principle of getting it right before optimizing.
On a similar note, there's been a long discussion in #haskell about the problems of defaulting to floating point, and should Rational, or something similar be used instead, given that Doubles and Float are broken for a lot of basic things (like Eq and Ord), much as we default to Integer already. The issues raised regarding Rational was that you can unexpectedly build up large precision, and performance in general, of course. It was unknown whether there were suitable replacement types. Rational also can't be used with Floating functionsl, like sqrt, which would bring back Double defaulting. But shouldn't this really work in Haskell, and if you want imprecision you must ask for Double explicitly: Prelude> 1.1 + 2.2 - 3.3 4.440892098500626e-16 Prelude> 1.1 + 2.2 - 3.3 :: Rational 0%1 -- Don

dons@cse.unsw.edu.au (Donald Bruce Stewart) writes:
[...] there's been a long discussion in #haskell about the problems of defaulting to floating point, and should Rational, or something similar be used instead, given that Doubles and Float are broken for a lot of basic things (like Eq and Ord), much as we default to Integer already.
The issues raised regarding Rational was that you can unexpectedly build up large precision, and performance in general, of course. It was unknown whether there were suitable replacement types. Rational also can't be used with Floating functionsl, like sqrt, which would bring back Double defaulting.
But shouldn't this really work in Haskell, and if you want imprecision you must ask for Double explicitly:
Prelude> 1.1 + 2.2 - 3.3 4.440892098500626e-16
Prelude> 1.1 + 2.2 - 3.3 :: Rational 0%1
I'd say we do want something like that. I vaguely remember that Double wasn't called Real because Haskell could, in principle have an "exact Real" type. Now, a proper exact real type is doubtless very inefficient, but wouldn't it be possible to define something that had a fairly efficient head, and a lazy tail? So you'd have, say
data Real = R {big::(Ratio !Int !Int), small:: More_Precision}
for some exact real representation More_Precision such that R a b represents the number (a+b) (It might be better to use something shorter than Int for the Ratio so that it takes less space). For any rational arithmetic that fits in the big part, the small part would be zero (and therefore small!). This would give exact answers for the sort of arithmetic you list above, but could still be an instance of Floating etc. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

On Thu, 12 Jul 2007, Jon Fairbairn wrote:
Now, a proper exact real type is doubtless very inefficient, but wouldn't it be possible to define something that had a fairly efficient head, and a lazy tail? So you'd have, say
data Real = R {big::(Ratio !Int !Int), small:: More_Precision}
for some exact real representation More_Precision such that R a b represents the number (a+b) (It might be better to use something shorter than Int for the Ratio so that it takes less space). For any rational arithmetic that fits in the big part, the small part would be zero (and therefore small!).
This would give exact answers for the sort of arithmetic you list above, but could still be an instance of Floating etc.
Interesting approach. Somehow similar to making Integer a sum of Int and BigInt. Indeed, I have used transcendent arithmetic on Doubles to speedup computations for real numbers. However real numbers cannot be checked for equality. This can be also considered an advantage, because using (==) for floating point numbers is most oftenly a bug. I think that this hybrid type is nice and could be used by default. But it should not replace native floating point types, since they have guaranteed speed in favor of not guaranteed precision. And we need a correct implementation.

Henning Thielemann
On Thu, 12 Jul 2007, Jon Fairbairn wrote:
Now, a proper exact real type is doubtless very inefficient, but wouldn't it be possible to define something that had a fairly efficient head, and a lazy tail? So you'd have, say
data Real = R {big::(Ratio !Int !Int), small:: More_Precision}
Interesting approach.
But flawed as I put it: the big part can't express big numbers! The big part needs to be either Rational (and the precision of that part limited during arithmetic) or BigFloat where
Data BigFloat = BF {mantissa:: Int, exponent:: Integer}
(ie limited precision, but unbounded magnitude). If we were to use BigFloat the base would need to be a power of ten to get the desired results for things like Don's example)
Somehow similar to making Integer a sum of Int and BigInt. Indeed, I have used transcendent arithmetic on Doubles to speedup computations for real numbers. However real numbers cannot be checked for equality. This can be also considered an advantage, because using (==) for floating point numbers is most oftenly a bug.
Agreed. That should be part of the change to the numeric hierarchy.
I think that this hybrid type is nice and could be used by default. But it should not replace native floating point types, since they have guaranteed speed in favor of not guaranteed precision. And we need a correct implementation.
Absolutely! -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk http://www.chaos.org.uk/~jf/Stuff-I-dont-want.html (updated 2007-05-07)

On Thu, 12 Jul 2007, Jon Fairbairn wrote:
Henning Thielemann
writes: On Thu, 12 Jul 2007, Jon Fairbairn wrote:
Now, a proper exact real type is doubtless very inefficient, but wouldn't it be possible to define something that had a fairly efficient head, and a lazy tail? So you'd have, say
data Real = R {big::(Ratio !Int !Int), small:: More_Precision}
Interesting approach.
But flawed as I put it: the big part can't express big numbers! The big part needs to be either Rational (and the precision of that part limited during arithmetic) or BigFloat where
Data BigFloat = BF {mantissa:: Int, exponent:: Integer}
(ie limited precision, but unbounded magnitude). If we were to use BigFloat the base would need to be a power of ten to get the desired results for things like Don's example)
People will be confused, that 'sin pi' won't lead to a result since the correct result is zero and it will need forever to normalize that number. They will be confused, that the result of 'sqrt 2 ^ 2' cannot be shown in usual decimal notation, since the formatting algorithm cannot decide between starting with 2.0000 and 1.9999. However 'round (sqrt 2 ^ 2)' will work as expected. In short, the Real number type will leed to all difficulties known from computable reals.

Henning Thielemann
On Thu, 12 Jul 2007, Jon Fairbairn wrote:
(ie limited precision, but unbounded magnitude). If we were to use BigFloat the base would need to be a power of ten to get the desired results for things like Don's example)
People will be confused, that 'sin pi' won't lead to a result since the correct result is zero and it will need forever to normalize that number.
Surely the first few digits can be computed? I thought sin pi was a computable numer, anyway. Note that in my representation I didn't specify what form the small part would take; I'm not sufficiently familiar with computing on proper reals to know the best choice for that, but as I undestand it, once we reach the point of showing a number to a finite precision, we /can/ compute the necessary digits.
They will be confused, that the result of 'sqrt 2 ^ 2' cannot be shown in usual decimal notation, since the formatting algorithm cannot decide between starting with 2.0000 and 1.9999.
Again, if it's being shown to finite precision, it can look at the next digit after the last one to be shown and use that to decide what to start with. There's no reason why show should be defined to truncate rather than defined to round after the last digit, is there?
In short, the Real number type will leed to all difficulties known from computable reals.
All the real ones, anyway :-). -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

On Friday 13 July 2007, Jon Fairbairn wrote:
Henning Thielemann
writes: On Thu, 12 Jul 2007, Jon Fairbairn wrote:
(ie limited precision, but unbounded magnitude). If we were to use BigFloat the base would need to be a power of ten to get the desired results for things like Don's example)
People will be confused, that 'sin pi' won't lead to a result since the correct result is zero and it will need forever to normalize that number.
Surely the first few digits can be computed?
That was my first thought, too. We can't define data Real = Real{ wholePart :: Integer, fractionPart :: [Int]} because you can't yield e.g. sin pi as an infinite list of digits, but you can define data Real = Real{ exponent :: Int, mantissa :: Int -> [Int]} where mantissa rounds the number when it's called. But unless these can be memoized fairly well, I would expect performance to be *quite* surprising to new users. . .
I thought sin pi was a computable numer, anyway. Note that in my representation I didn't specify what form the small part would take; I'm not sufficiently familiar with computing on proper reals to know the best choice for that, but as I undestand it, once we reach the point of showing a number to a finite precision, we /can/ compute the necessary digits.
They will be confused, that the result of 'sqrt 2 ^ 2' cannot be shown in usual decimal notation, since the formatting algorithm cannot decide between starting with 2.0000 and 1.9999.
Again, if it's being shown to finite precision, it can look at the next digit after the last one to be shown and use that to decide what to start with. There's no reason why show should be defined to truncate rather than defined to round after the last digit, is there?
In short, the Real number type will leed to all difficulties known from computable reals.
All the real ones, anyway :-).
Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs

Jonathan Cast
On Friday 13 July 2007, Jon Fairbairn wrote:
Henning Thielemann
writes: On Thu, 12 Jul 2007, Jon Fairbairn wrote: Surely the first few digits can be computed?
That was my first thought, too.
We can't define
data Real = Real{ wholePart :: Integer, fractionPart :: [Int]}
because you can't yield e.g. sin pi as an infinite list of digits, but you can define
Well, no, but there are much better representations of reals than that.
data Real = Real{ exponent :: Int, mantissa :: Int -> [Int]}
That's somewhat better, but all that's required is that the infinite part should be lazy and give successively more information about the true value of the number.
where mantissa rounds the number when it's called. But unless these can be memoized fairly well, I would expect performance to be *quite* surprising to new users. . .
Hence my wish to use something with an efficient representation at the head. Here's another attempt:
data Real = R {easy:: Double, extra_exponent:: Integer, error_value:: ExactReal }
where extra_exponent is going to be zero for "ordinary size" numbers and error_value is some good exact real representation (as I said earlier in the thread, I'm not sufficiently familiar with the area to choose one. There's some discussion here http://www.dcs.ed.ac.uk/home/mhe/plume/node15.html), and the number represented is easy*base^extra_exponent+error_value. The hope would be that for low precision arithmetic the "easy" part would be enough, though it's possible that cases where some inspection of the error_value would be necessary would turn out to be common. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

Donald Bruce Stewart wrote:
should Rational, or something similar be used instead, given that Doubles and Float are broken for a lot of basic things (like Eq and Ord), much as we default to Integer already.
The issues raised regarding Rational was that you can unexpectedly build up large precision, and performance in general, of course.
Well, non-broken Eq and Ord very much depend on large precision. In a sense, the instances of Eq and Ord for floating point numbers are wrong. What about rolling new classes for approximate equality and ordering? class ApproxEq a where (≈) :: a -> a -> Bool -- almost equal to class ApproxOrd a where << :: a -> a -> Bool -- really less than >> :: a -> a -> Bool -- really greater than together with phantom-epsilon data Eps10 newtype Floating e = F Double instance ApproxEq (Floating Eps10) where x ≈ y = abs (x-y) < 1e-10 Regards, apfelmus

apfelmus wrote:
In a sense, the instances of Eq and Ord for floating point numbers are wrong. What about rolling new classes for approximate equality and ordering?
class ApproxEq a where (≈) :: a -> a -> Bool -- almost equal to
The problems with this approach are generally worse than those with Eq, whose shortcomings are at least well defined and widely understood. You need to choose an epsilon of the right magnitude for the numbers you're working with, and the epsilon is likely to be domain-specific. The signedness of the numbers you're comparing is also a factor: should the behaviour be different if they have different signs? Also, since these aren't equivalence relations, ApproxEq has the weird property that a ≈ b and b ≈ c does not imply a ≈ c; ApproxOrd suffers from the same problem.

Bryan O'Sullivan wrote:
apfelmus wrote:
In a sense, the instances of Eq and Ord for floating point numbers are wrong. What about rolling new classes for approximate equality and ordering?
class ApproxEq a where (≈) :: a -> a -> Bool -- almost equal to
The problems with this approach are generally worse than those with Eq, whose shortcomings are at least well defined and widely understood.
What I wanted to do is to capture common patterns x - y >= epsilon abs (x - y) <= epsilon for comparing floating point numbers in nice functions x >> y = x - y >= epsilon x ≈ y = abs (x - y) <= epsilon This way, one could simply use >> and ≈ with floating point numbers and be assured without much thinking that the resulting code is more or less robust. But I guess that there are too many variants of these patterns and that thinking is always required.
You need to choose an epsilon of the right magnitude for the numbers you're working with, and the epsilon is likely to be domain-specific.
In case the epsilon is problem specific but static, one can use phantom types.
Also, since these aren't equivalence relations, ApproxEq has the weird property that a ≈ b and b ≈ c does not imply a ≈ c; ApproxOrd suffers from the same problem.
Yes. But that's intended and the very nature of robustly comparing Doubles and Floats :( Regards, apfelmus

On 13 Jul 2007, at 2:58 am, apfelmus wrote:
What I wanted to do is to capture common patterns
x - y >= epsilon abs (x - y) <= epsilon
for comparing floating point numbers in nice functions
x >> y = x - y >= epsilon x ≈ y = abs (x - y) <= epsilon
See Knuth, The Art of Computer Programming, Volume 2, Semi-Numerical algorithms, for a suitable set of predicates and axioms they satisfy. However, they depend on epsilon (or APL's []CT, comparison tolerance). I once implemented the four predicates Knuth discussed in Quintus Prolog (library(fuzzy)). As far as I know, nobody ever used it. I had plans for making the code faster if anybody cared, but no-one did. I suspect these functions would be no more useful in Haskell: anyone who knew enough to use them would know enough not to.

On 7/12/07, Stefan O'Rear
Indeed. I beleive that Int should be removed from the Prelude.
metoo. Actually, one of the really annoying things that I am finding in my code which is a mixture of ByteString, Word16, Word32 and Word64, is that all the standard libraries use Int everywhere, so I end up having to get out the fromIntegral spray-gun, which uglifies the code and makes it much harder to read. Is there a compelling reason (hysterical raisins is not a compelling reason) why Data.*.{length,size,take,drop,etc} use Int and not Num n => .... or similar? If the answer is efficiency, then cannot they use Int# or similar internally, and have an optimizable fromIntegral *inside* them? T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

On Jul 11, 2007, at 19:11 , Thomas Conway wrote:
Is there a compelling reason (hysterical raisins is not a compelling reason) why Data.*.{length,size,take,drop,etc} use Int and not Num n => .... or similar?
Efficiency, but many of them have generic equivalents (e.g. genericLength, genericDrop, etc.) at least in Data.List. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Wed, 2007-07-11 at 10:55 -0700, Bryan O'Sullivan wrote:
In a similar vein, I was initially perplexed when I found that an expression like this produces garbage instead of an error:
read "11111111111111111111111111111111111" :: Int
I have not seen a lot of interest expressed in fixing this sort of misbehaviour, which jars a little with the usual emphasis on stringency and testing.
I'd really like to have errors on overflow, at least as an option, even if it is costly in terms of performance. Is there a Trac ticket or something for this? -k

Ketil Malde wrote:
I'd really like to have errors on overflow, at least as an option, even if it is costly in terms of performance. Is there a Trac ticket or something for this?
Not that I know of. I filed a Trac ticket against ByteString's readInt function before I noticed that read has the same problem, and it was closed because read does the same thing. I've been reluctant to pop my head over the parapet since. CPUs generally don't trap on integer overflow, so generating the additional tests and jumps necessary to handle this would be a bit involved, and would certainly cost a few percent in performance. There's also overflow in truncation and sign conversions to worry about, e.g. Word32 -> Word16, Word32 -> Int (on 32-bit systems), etc. On the other hand, integer overflows have long been a popular attack vector for getting programs to misbehave in the exploit community. If you Google for "ia32 integer overflow" or "i386 integer overflow", the first several *pages* of results in each case consist entirely of security advisories. Haskell's FFI makes it as vulnerable as the libraries it interfaces to. Here's a cute-looking paper entitled "Efficient and accurate detection of integer-based attacks". http://www.cs.cmu.edu/~dbrumley/pubs/integer-ndss-07.pdf

I'd really like to have errors on overflow, at least as an option, even if it is costly in terms of performance. Is there a Trac ticket or something for this?
as far as safety aspects are concerned, you might consider adding yourself to http://hackage.haskell.org/trac/ghc/ticket/1380 ? although i understand that you also want checking for overflow in otherwise unrestricted programs, which would be a separate issue/ticket.
Haskell's FFI makes it as vulnerable as the libraries it interfaces to.
if you have anything beyond unsafe foreign libs, or invalid indexing in mind, could you add it to that ticket, please? and perhaps to http://haskell.org/haskellwiki/Safely_running_untrusted_Haskell_code which concerns lessons learned about guarding lambdabot against exploits? lambdabot has been boldly going where no one has gone before for quite a while now, but that doesn't ensure that all holes have been plugged. claus
participants (17)
-
Albert Y. C. Lai
-
Andrew Coppin
-
apfelmus
-
Brandon S. Allbery KF8NH
-
Bryan O'Sullivan
-
Bulat Ziganshin
-
Claus Reinke
-
dons@cse.unsw.edu.au
-
Henning Thielemann
-
Jon Fairbairn
-
Jonathan Cast
-
Ketil Malde
-
Lennart Augustsson
-
ok
-
Richard Kelsall
-
Stefan O'Rear
-
Thomas Conway