Different answers on different machines

Dear HCafe-ers, Yesterday I decided to take a look at the most recent Euler problem, number 249, and give it a shot. I have a couple of computers at home, a Dell laptop and a desktop. I compiled this message with ghc -O2 --make ex429.lhs and ran it on each machine. On the Dell I get: time ./ex429 [650,16900,547924,27396200,746640991,773879749,683631060] [650,16900,547924,27396200,746640991,773879749,683631060] 136342232 ./ex429 8.66s user 0.02s system 99% cpu 8.695 total When I run this exact same file on the desktop, I get: time ./ex429 [650,16900,547924,27396200,746640991,773879749,683631060] [650,16900,547924,27396200,746640991,773879749,683631060] 98792821 ./ex429 6.50s user 0.03s system 99% cpu 6.537 total Which happens to be the right answer. But WHY is the output from the Dell different? Machine info is at the bottom of this message.
{-# LANGUAGE BangPatterns #-} {-# LANGUAGE FlexibleContexts #-} {-# OPTIONS -O2 -optc-O #-} {- Sum of squares of unitary divisors Problem 429 A unitary divisor d of a number n is a divisor of n that has the property gcd(d, n/d) = 1. The unitary divisors of 4! = 24 are 1, 3, 8 and 24. The sum of their squares is 12 + 32 + 82 + 242 = 650.
Let S(n) represent the sum of the squares of the unitary divisors of n. Thus S(4!)=650.
Find S(100 000 000!) modulo 1 000 000 009.
-}
import Control.Monad.ST import Data.Array.ST import Data.Array.IArray as I import Data.Array.Unboxed as U import Data.Word import Data.Ord import Data.List
allfactors n = [ i | i <- [1..n] , n `mod` i == 0]
factorial n = product [1..n] ud0 n = let nf = factorial n a = allfactors nf b = filter (\x -> gcd x (nf `div` x) == 1) a in b
ud1 = sum . map (\x -> x*x) . ud0 ansSlow n = ud1 n `mod` (fromIntegral modulus)
largestExponentInFactorial n p = let a = [ n `div` (p^i) | i <- [1..] ] b = takeWhile (>0) a in sum b
modProduct :: [Int] -> Int modProduct = foldl' (\a b -> times a b modulus) 1
pA n = primesA (fromIntegral n)
primesN :: Int -> [Int] primesN n = map fromIntegral $ primeS (pA n)
times :: Int -> Int -> Int -> Int times x y n = let x1 = fromIntegral x :: Integer y1 = fromIntegral y :: Integer n1 = fromIntegral n :: Integer result = fromIntegral $! x1 * y1 `mod` n1 in result
fastPower :: Int -> Int -> Int -> Int fastPower x 0 modulus = 1 fastPower x 1 modulus = x `mod` modulus fastPower x n modulus | even n = fastPower (times x x modulus) (n `div` 2) modulus | otherwise = (times x (fastPower x (n-1) modulus)) modulus
foldFun :: Int -> Int -> Int foldFun n p = let a = largestExponentInFactorial n p b = fastPower p a modulus c = times b b modulus + 1 in c
ff :: Int -> [Int] -> Int ff n = foldl' (\a p -> times a (foldFun n p) modulus) 1
ans n = let ps = primeS $ primesA n -- ps = takeWhile (<= n) primes in ff n ps
modulus = 1000000009 :: Int main = do print $ map ans [4..10] print $ map ansSlow [4..10] print $ ans 100000000
{-
intended Usage:
pA = primesA (10^9) primes = primeS pA isPrime = isPrimE pA
-}
sieve :: STUArray s Int Bool -> Int -> Int -> ST s (STUArray s Int Bool) sieve !a !m !n | n == m = return a | otherwise = do e <- readArray a (fromIntegral n) if e then let loop !j | j <= m = writeArray a (fromIntegral j) False >> loop (j+n) | otherwise = sieve a m (n+1) in loop (n+n) else sieve a m (n+1)
primesA :: Int -> UArray Int Bool primesA sizeN = runSTUArray (do a <- newArray (0,sizeN) True :: ST s (STUArray s Int Bool) writeArray a 0 False writeArray a 1 False sieve a sizeN 2)
primeS :: (IArray a1 Bool, Ix a) => a1 a Bool -> [a] primeS primeArray = map fst $ filter (\x -> snd x) (assocs primeArray)
isPrimE :: (IArray a e, Ix i) => a i e -> i -> e isPrimE primeArray n = primeArray I.! n
AMD-64 Desktop uname -a Linux myth 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2+deb7u2 x86_64 GNU/Linux ghc --version The Glorious Glasgow Haskell Compilation System, version 7.6.2 hwinfo --cpu 01: None 00.0: 10103 CPU [Created at cpu.304] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: X86-64 Vendor: "AuthenticAMD" Model: 21.1.2 "AMD FX(tm)-4100 Quad-Core Processor " Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,sse,sse2,ht,syscall,nx,mmxext,fxsr_opt,pdpe1gb,rdtscp,lm,constant_tsc,rep_good,nopl,nonstop_tsc,extd_apicid,aperfmperf,pni,pclmulqdq,monitor,ssse3,cx16,sse4_1,sse4_2, Clock: 1400 MHz BogoMips: 7248.25 Cache: 2048 kb Units/Processor: 4 Config Status: cfg=new, avail=yes, need=no, active=unknown 02: None 01.0: 10103 CPU [Created at cpu.304] Unique ID: wkFv.j8NaKXDZtZ6 Hardware Class: cpu Arch: X86-64 Vendor: "AuthenticAMD" Model: 21.1.2 "AMD FX(tm)-4100 Quad-Core Processor " Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,sse,sse2,ht,syscall,nx,mmxext,fxsr_opt,pdpe1gb,rdtscp,lm,constant_tsc,rep_good,nopl,nonstop_tsc,extd_apicid,aperfmperf,pni,pclmulqdq,monitor,ssse3,cx16,sse4_1,sse4_2, Clock: 3600 MHz BogoMips: 9201.22 Cache: 2048 kb Units/Processor: 4 Config Status: cfg=new, avail=yes, need=no, active=unknown 03: None 02.0: 10103 CPU [Created at cpu.304] Unique ID: +rIN.j8NaKXDZtZ6 Hardware Class: cpu Arch: X86-64 Vendor: "AuthenticAMD" Model: 21.1.2 "AMD FX(tm)-4100 Quad-Core Processor " Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,sse,sse2,ht,syscall,nx,mmxext,fxsr_opt,pdpe1gb,rdtscp,lm,constant_tsc,rep_good,nopl,nonstop_tsc,extd_apicid,aperfmperf,pni,pclmulqdq,monitor,ssse3,cx16,sse4_1,sse4_2, Clock: 1400 MHz BogoMips: 7253.01 Cache: 2048 kb Units/Processor: 4 Config Status: cfg=new, avail=yes, need=no, active=unknown 04: None 03.0: 10103 CPU [Created at cpu.304] Unique ID: 4zLr.j8NaKXDZtZ6 Hardware Class: cpu Arch: X86-64 Vendor: "AuthenticAMD" Model: 21.1.2 "AMD FX(tm)-4100 Quad-Core Processor " Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,sse,sse2,ht,syscall,nx,mmxext,fxsr_opt,pdpe1gb,rdtscp,lm,constant_tsc,rep_good,nopl,nonstop_tsc,extd_apicid,aperfmperf,pni,pclmulqdq,monitor,ssse3,cx16,sse4_1,sse4_2, Clock: 1400 MHz BogoMips: 6931.30 Cache: 2048 kb Units/Processor: 4 Config Status: cfg=new, avail=yes, need=no, active=unknown ------------------------------------------------------------------------ Dell Laptop uname -a Linux dell 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2 x86_64 GNU/Linux ghc --version The Glorious Glasgow Haskell Compilation System, version 7.6.3 01: None 00.0: 10103 CPU [Created at cpu.304] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: Intel Vendor: "GenuineIntel" Model: 6.42.7 "Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz" Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,nopl,xtopology,nonstop_tsc,aperfmperf,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,e Clock: 800 MHz BogoMips: 3987.12 Cache: 6144 kb Units/Processor: 16 Config Status: cfg=new, avail=yes, need=no, active=unknown 02: None 01.0: 10103 CPU [Created at cpu.304] Unique ID: wkFv.j8NaKXDZtZ6 Hardware Class: cpu Arch: Intel Vendor: "GenuineIntel" Model: 6.42.7 "Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz" Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,nopl,xtopology,nonstop_tsc,aperfmperf,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,e Clock: 800 MHz BogoMips: 3986.91 Cache: 6144 kb Units/Processor: 16 Config Status: cfg=new, avail=yes, need=no, active=unknown 03: None 02.0: 10103 CPU [Created at cpu.304] Unique ID: +rIN.j8NaKXDZtZ6 Hardware Class: cpu Arch: Intel Vendor: "GenuineIntel" Model: 6.42.7 "Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz" Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,nopl,xtopology,nonstop_tsc,aperfmperf,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,e Clock: 800 MHz BogoMips: 3986.92 Cache: 6144 kb Units/Processor: 16 Config Status: cfg=new, avail=yes, need=no, active=unknown 04: None 03.0: 10103 CPU [Created at cpu.304] Unique ID: 4zLr.j8NaKXDZtZ6 Hardware Class: cpu Arch: Intel Vendor: "GenuineIntel" Model: 6.42.7 "Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz" Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,nopl,xtopology,nonstop_tsc,aperfmperf,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,e Clock: 800 MHz BogoMips: 3986.92 Cache: 6144 kb Units/Processor: 16 Config Status: cfg=new, avail=yes, need=no, active=unknown 05: None 04.0: 10103 CPU [Created at cpu.304] Unique ID: 94PJ.j8NaKXDZtZ6 Hardware Class: cpu Arch: Intel Vendor: "GenuineIntel" Model: 6.42.7 "Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz" Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,nopl,xtopology,nonstop_tsc,aperfmperf,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,e Clock: 800 MHz BogoMips: 3986.91 Cache: 6144 kb Units/Processor: 16 Config Status: cfg=new, avail=yes, need=no, active=unknown 06: None 05.0: 10103 CPU [Created at cpu.304] Unique ID: EBSn.j8NaKXDZtZ6 Hardware Class: cpu Arch: Intel Vendor: "GenuineIntel" Model: 6.42.7 "Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz" Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,nopl,xtopology,nonstop_tsc,aperfmperf,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,e Clock: 800 MHz BogoMips: 3986.92 Cache: 6144 kb Units/Processor: 16 Config Status: cfg=new, avail=yes, need=no, active=unknown 07: None 06.0: 10103 CPU [Created at cpu.304] Unique ID: JIVF.j8NaKXDZtZ6 Hardware Class: cpu Arch: Intel Vendor: "GenuineIntel" Model: 6.42.7 "Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz" Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,nopl,xtopology,nonstop_tsc,aperfmperf,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,e Clock: 800 MHz BogoMips: 3986.91 Cache: 6144 kb Units/Processor: 16 Config Status: cfg=new, avail=yes, need=no, active=unknown 08: None 07.0: 10103 CPU [Created at cpu.304] Unique ID: OPYj.j8NaKXDZtZ6 Hardware Class: cpu Arch: Intel Vendor: "GenuineIntel" Model: 6.42.7 "Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz" Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,nopl,xtopology,nonstop_tsc,aperfmperf,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,e Clock: 800 MHz BogoMips: 3986.91 Cache: 6144 kb Units/Processor: 16 Config Status: cfg=new, avail=yes, need=no, active=unknown -- Best wishes, Henry Laxen

On Sat, Jun 1, 2013 at 1:55 PM,
a Dell laptop and a desktop. I compiled this message with ghc -O2 --make ex429.lhs and ran it on each machine. On the Dell I get:
136342232 ./ex429 8.66s user 0.02s system 99% cpu 8.695 total
When I run this exact same file on the desktop, I get:
98792821 ./ex429 6.50s user 0.03s system 99% cpu 6.537 total
Which happens to be the right answer. But WHY is the output from the Dell different?
(hardware description elided) You're missing one piece of information: do you have the 32-bit or the 64-bit ghc installed on each machine? -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On Sat, Jun 1, 2013 at 7:55 PM,
Yesterday I decided to take a look at the most recent Euler problem, number 249, and give it a shot. I have a couple of computers at home, a Dell laptop and a desktop. I compiled this message with ghc -O2 --make ex429.lhs and ran it on each machine. On the Dell I get:
Odds are good it's integer overflow. Change Int to Integer or Int64 and
retry.
G
--
Gregory Collins

Gregory Collins
On Sat, Jun 1, 2013 at 7:55 PM,
wrote: Yesterday I decided to take a look at the most recent Euler problem, number 249, and give it a shot. I have a couple of computers at home, a Dell laptop and a desktop. I compiled this message with ghc -O2 --make ex429.lhs and ran it on each machine. On the Dell I get: Odds are good it's integer overflow. Change Int to Integer or Int64 and
retry.G-- Gregory Collins
Yes, that was it. The dell was a 32 bit system, and the desktop a 64. I changed everything from Int to Integer, and now both agree. Thanks for the pointer. Best wishes, Henry Laxen

On Jun 2, 2013, at 12:52 , Henry Laxen
Yes, that was it. The dell was a 32 bit system, and the desktop a 64. I changed everything from Int to Integer, and now both agree. Thanks for the pointer.
Isn't that just terrible? I hate the fact that Haskell was defined to neither trap the overflow or just treat everything as Integer [like Scheme]. A sacrifice of program safety in the name of efficiency. I disagree with this choice and posit that a clever implementation can minimize the cost of the overflow checking in most relevant cases. I wish this fatal flaw would be reconsidered for the next major revision. Tommy

On Sun, Jun 2, 2013 at 11:02 PM, Tommy Thorn
On Jun 2, 2013, at 12:52 , Henry Laxen
wrote: Yes, that was it. The dell was a 32 bit system, and the desktop a 64. I changed everything from Int to Integer, and now both agree. Thanks for the pointer.
Isn't that just terrible? I hate the fact that Haskell was defined to neither trap the overflow or just treat everything as Integer [like Scheme]. A sacrifice of program safety in the name of efficiency.
If you want to use Integer for everything, just do that. -- mithrandi, i Ainil en-Balandor, a faer Ambar

On Sunday, June 2, 2013 at 5:02 PM, Tommy Thorn wrote:
On Jun 2, 2013, at 12:52 , Henry Laxen
wrote: Yes, that was it. The dell was a 32 bit system, and the desktop a 64. I changed everything from Int to Integer, and now both agree. Thanks for the pointer.
Isn't that just terrible? I hate the fact that Haskell was defined to neither trap the overflow or just treat everything as Integer [like Scheme]. A sacrifice of program safety in the name of efficiency.
I disagree with this choice and posit that a clever implementation can minimize the cost of the overflow checking in most relevant cases.
I wish this fatal flaw would be reconsidered for the next major revision.
Tommy In addition to Haskell already having an arbitrary-width integer type called Integer, consider the case where you have some program that basically boils down to
f :: Int -> Int f x = {- some super-complicated mathematical expression -} f can only have bounds checks eliminated if the values of the inputs are known in advance. How often are you really going to know that? If you do something like main = do x <- read <$> getLine print $ f x then you have to put all the bounds checks in *anyway*.

On Jun 2, 2013, at 14:13 , Kata
In addition to Haskell already having an arbitrary-width integer type called Integer
But I wasn't asking for arbitrary-width. I was asking for explicit failures (overflow) rather than C-like silent corruption.
, consider the case where you have some program that basically boils down to
f :: Int -> Int f x = {- some super-complicated mathematical expression -}
f can only have bounds checks eliminated if the values of the inputs are known in advance. How often are you really going to know that? If you do something like
1. I said "minimize the cost of the overflow checking", I didn't say anything about bounds checking elimination. A conditional branch on the overflow from an add is nearly zero cost as it predicts perfectly and can be issued in parallel with all other instructions. Even better, some architectures (eg. SPARC) have overflow checking variants with zero overhead. And yes, for many instances it's trivial to see that overflow can't happen. 2. Even if that wasn't the case, I never want to sacrifice safety for a trivial perf overhead (for that stuff I use C). Tommy

There is a package that implements an Int that throws an exception on
overflow:
http://hackage.haskell.org/package/safeint
Since Int's existence is pretty much all about trading for performance, I
wouldn't recommend holding your breath on the above becoming the default.
If you want things to work like Scheme, that's exactly what Integer is (in
GHC, anyhow). And Integer is what you get by default(ing) unless you use
something else that is specifically defined to use Int, or specify it
yourself.
On Sun, Jun 2, 2013 at 5:02 PM, Tommy Thorn
On Jun 2, 2013, at 12:52 , Henry Laxen
wrote: Yes, that was it. The dell was a 32 bit system, and the desktop a 64. I changed everything from Int to Integer, and now both agree. Thanks for the pointer.
Isn't that just terrible? I hate the fact that Haskell was defined to neither trap the overflow or just treat everything as Integer [like Scheme]. A sacrifice of program safety in the name of efficiency.
I disagree with this choice and posit that a clever implementation can minimize the cost of the overflow checking in most relevant cases.
I wish this fatal flaw would be reconsidered for the next major revision.
Tommy
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Indeed, as Dan says, theres the safeint library and the Integer type.
If the Int type had either of these semantics by default, many many
performance sensitive libraries would suddenly have substantially less
compelling performance. Every single operation that was branchless before
would have a branch *every* operation..... this would be BAD.
I'm actually quite happy with (ab)using Int as just a sequence of bits that
sometimes i treat as a number, and sometimes i treat as a bitvector. In
fact thats actually most of my work these days. GHC generates VERY nice
code for Ints and Words, similar to what i'd expect to be Generated by a
decent C compiler when not explicitly using SIMD operations. This is a
good thing!
Additionally, theres work in progress to support "branchless" Bool
operations in GHC by having Bool be represented internally With 0,1 valued
Ints, http://hackage.haskell.org/trac/ghc/wiki/PrimBool
point being: its easy to have the safety with SafeInt, or Using Integer,
and fast inner loops can't have branches, and that actually matters in many
applications.
cheers
-Carter
On Sun, Jun 2, 2013 at 6:42 PM, Dan Doel
There is a package that implements an Int that throws an exception on overflow:
http://hackage.haskell.org/package/safeint
Since Int's existence is pretty much all about trading for performance, I wouldn't recommend holding your breath on the above becoming the default.
If you want things to work like Scheme, that's exactly what Integer is (in GHC, anyhow). And Integer is what you get by default(ing) unless you use something else that is specifically defined to use Int, or specify it yourself.
On Sun, Jun 2, 2013 at 5:02 PM, Tommy Thorn
wrote: On Jun 2, 2013, at 12:52 , Henry Laxen
wrote: Yes, that was it. The dell was a 32 bit system, and the desktop a 64. I changed everything from Int to Integer, and now both agree. Thanks for the pointer.
Isn't that just terrible? I hate the fact that Haskell was defined to neither trap the overflow or just treat everything as Integer [like Scheme]. A sacrifice of program safety in the name of efficiency.
I disagree with this choice and posit that a clever implementation can minimize the cost of the overflow checking in most relevant cases.
I wish this fatal flaw would be reconsidered for the next major revision.
Tommy
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Jun 2, 2013, at 23:58 , Carter Schonwald
Indeed, as Dan says, theres the safeint library and the Integer type.
If the Int type had either of these semantics by default, many many performance sensitive libraries would suddenly have substantially less compelling performance. Every single operation that was branchless before would have a branch *every* operation..... this would be BAD.
I'd like to see actual data, measurements of actual wall-time impact on real code on modern hardware, not assumptions. Tommy

Tommy, respectfully,
I have quite a few bits of code where a bad branch predictor in a tight
inner loops makes code 10x slower.
you are welcome to do your own experimentation so that you too can learn by
branches are bad in tight loops. (even if the branch predictor is doing
its job, there will be a measurable slow down, albeit less than 10x)
Please shift this conversation to the libraries list if you want to
actually make a concrete libraries change proposal. Otherwise I don't
understand your contentions. Int is "native register sized integer" not
"integer that i need to exception handle because I used int instead of
integer because of premature optimization in my web app or project euler
codes"
My opinions are based upon spending all of my time over the past year
working on writing robustly performant numerical codes. Some of them are
actually faster than the standard fortran ones.
My point being: as mentioned above, by others much more articulate than I,
unless you have performance related reasons, always use Integer instead of
Int. There is never a good reason to use Int instead of Integer unless it
will change the performance characteristics of your code. Asking for Int to
pretend to be Integer because you wanted to do premature optimization and
then it didn't behave like Integer is a no go.
I am happy to direct you towards further reading if you'd like to learn
about writing performance sensitive software:
the intel optimization
manualhttp://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimizat...has
many good ideas (eg Structure of Arrays, which is essentially used by
the haskell Vector lib) that are actually realized by the more performant
haskell libraries.
Likewise, for an informative idea of the cost models for various
operations on the CPU, the agner fog
http://agner.org/optimize/#manuals manuals
are actually very educational.
merry hacking
-Carter
On Mon, Jun 3, 2013 at 3:07 AM, Tommy Thorn
On Jun 2, 2013, at 23:58 , Carter Schonwald
wrote: Indeed, as Dan says, theres the safeint library and the Integer type.
If the Int type had either of these semantics by default, many many performance sensitive libraries would suddenly have substantially less compelling performance. Every single operation that was branchless before would have a branch *every* operation..... this would be BAD.
I'd like to see actual data, measurements of actual wall-time impact on real code on modern hardware, not assumptions.
Tommy

On Jun 3, 2013, at 00:23 , Carter Schonwald
Int is "native register sized integer"
Actually it's not. Read the definition. Int is only guaranteed to be 29 bits. Here's *one* _actual_ data point (from a 2.8 GHz i7, 64-bit code): time ./fib fib(43) = 701408733 3.27 real 3.27 user 0.00 sys time ./fib-safe fib(43) = 701408733 3.45 real 3.45 user 0.00 sys (NB: I do not check the n-1 and n-2 as it's trivial to see from a data flow analysis that the proceeding conditional guarantees that those can't overflow. The empty asm() is necessary to get GCC to generate comparable code). Obviously, for some examples this will be much worse, for others, much better, but without this implemented in GHC it will be difficult to measure. Tommy

GHC is not the spec, I am talking about GHC Haskell, not Haskell the
standard that I don't use.
On 32bit machines, GHC Int is 32bits. On 64bit GHC on 64bit machines Int is
64 bits.
If you have another well engineered suitable for wide use Haskell compiler
in mind, I'd love to try it out, but with interesting software you will be
using none portable features per target platform. And thats OK. Its a
tradeoff thats sometimes worth making.
On Jun 3, 2013 4:19 AM, "Tommy Thorn"
On Jun 3, 2013, at 00:23 , Carter Schonwald
wrote: Int is "native register sized integer"
Actually it's not. Read the definition. Int is only guaranteed to be 29 bits.
Here's *one* _actual_ data point (from a 2.8 GHz i7, 64-bit code):
time ./fib fib(43) = 701408733 3.27 real 3.27 user 0.00 sys time ./fib-safe fib(43) = 701408733 3.45 real 3.45 user 0.00 sys
(NB: I do not check the n-1 and n-2 as it's trivial to see from a data flow analysis that the proceeding conditional guarantees that those can't overflow. The empty asm() is necessary to get GCC to generate comparable code).
Obviously, for some examples this will be much worse, for others, much better, but without this implemented in GHC it will be difficult to measure.
Tommy

On 3/06/2013, at 6:58 PM, Carter Schonwald wrote:
If the Int type had either of these semantics by default, many many performance sensitive libraries would suddenly have substantially less compelling performance. Every single operation that was branchless before would have a branch *every* operation..... this would be BAD.
Actually, the x86 can be configured to trap integer overflows, so on that not entirely unpopular platform, there need be NO extra branches. Alternatively, and more portably, there could be separate Int and UnsafeInt types, and the performance sensitive libraries could be rewritten to use UnsafeInt. For just one week, I had the joy of using a C compiler where signed integer overflow was trapped. It was *wonderful*.

On Tue, Jun 4, 2013 at 7:35 AM, Richard A. O'Keefe
If the Int type had either of these semantics by default, many many
On 3/06/2013, at 6:58 PM, Carter Schonwald wrote: performance sensitive libraries would suddenly have substantially less compelling performance. Every single operation that was branchless before would have a branch *every* operation..... this would be BAD.
Actually, the x86 can be configured to trap integer overflows, so on that not entirely unpopular platform, there need be NO extra branches.
Well yes and no. See http://software.intel.com/en-us/forums/topic/306156 Using instructions like cmovo "Conditional MOve on Overflow" we can test without a branch -- so in that sense yes. No, because the use of the word 'trap' is a bit misleading. If we understand 'trap' as synchronous interrupt, then intel provides the infrastructure to literally trap floating point errors but for integers such a 'trap' only works if the instruction stream contains instructions like INTO or CMOVO etc.
Alternatively, and more portably, there could be separate Int and UnsafeInt types, and the performance sensitive libraries could be rewritten to use UnsafeInt.
For just one week, I had the joy of using a C compiler where signed integer overflow was trapped. It was *wonderful*.
In Discipline of Programming (in 1976!) Dijkstra exactly described this problem, and squarely put the blame on poorly engineered machines. He introduced 3 concepts/terms: UM : unbounded machine SLM : sufficiently large machine HSLM : hopefully sufficiently large machine The UM -- like a Turing machine -- has no messy restrictions of finiteness like wordsize and is therefore pleasant to reason with and impossible to physically build. The SLM is like most of our actual machines -- actual finite state machines approximating our conceptually nice unbounded machines. The problem is when the approximation fails, the SLM behaves unpredictably. So we have the HSLM, which (I quote): The HSLM is two things merged into one. Besides acting as the largest SLM we can afford, it checks, when called to execute a program, as the computation proceeds, whether this SLM is large enough for the current computation. If so, it proceeds with the simulation of the UM's behaviour, otherwise it refuses to continue. There exist, regretfully enough,in which the continuous check that the simulation of the behaviour of the UM is not beyond their capacity is so time-consuming, that the check is suppressed for the sake of efficiency. It is very difficult to use such machines… and we ignore them in the sequel!! In short the problem is our machines: if catching errors involves checking and checking involves a cost, some program(ers) will sometimes seek to avoid that. Moving the check to the hardware -- ie synchronous trap on errors -- removes the cost and the temptation to avoid. Until we get such machines, these arguments will continue to be there!

On 4/06/2013, at 4:22 PM, Rustom Mody wrote:
On Tue, Jun 4, 2013 at 7:35 AM, Richard A. O'Keefe
wrote: On 3/06/2013, at 6:58 PM, Carter Schonwald wrote:
If the Int type had either of these semantics by default, many many performance sensitive libraries would suddenly have substantially less compelling performance. Every single operation that was branchless before would have a branch *every* operation..... this would be BAD.
Actually, the x86 can be configured to trap integer overflows, so on that not entirely unpopular platform, there need be NO extra branches.
Well yes and no. See http://software.intel.com/en-us/forums/topic/306156
I made a mistake, for which I apologise. There were two things I wanted the x86 to trap, several years ago, and I found that one of them *could* be trapped and the other could not. The one that couldn't was integer overflow. I do note that the page cited answers a *different* question which is "does the Intel COMPILER support integer overflow trapping." The question I answered wrongly was "does the Intel HARDWARE support integer overflow trapping (by raising an exception on integer overflow if a bit is set in a certain control register)." Having apologised for my error, I close with the observation that Jacob Navia, developer of lcc-win32 (he started with the LCC compiler but added serious x86-specific optimisation and other Windows goodness), claims that sticking JO after signed integer operations adds very little to run time because it is predicted very well by the branch prediction hardware, since it is almost never taken.
In Discipline of Programming (in 1976!) Dijkstra exactly described this problem, and squarely put the blame on poorly engineered machines. He introduced 3 concepts/terms: UM : unbounded machine SLM : sufficiently large machine HSLM : hopefully sufficiently large machine
Dijkstra was a Burroughs Research Fellow, and the B6700 was a textbook example of an HSLM. I couldn't believe how primitive other systems were after using that. The signed-integer-overflow-trapping C compiler I mentioned was a MIPS one (MIPS distinguishing between ADD and ADDU, &c).
participants (11)
-
Brandon Allbery
-
Carter Schonwald
-
Dan Doel
-
Gregory Collins
-
Henry Laxen
-
Kata
-
nadine.and.henry@pobox.com
-
Richard A. O'Keefe
-
Rustom Mody
-
Tommy Thorn
-
Tristan Seligmann