
On my 1.2 GHz Duron (SuSE linux), I get significantly different timings than those on the wiki. Sebastian Sylvan's, Kimberley Burchett's and Bertram Felgenhauer's all take roughly thrice as long as posted (that's rather consistent, but the factor is surprisingly large). Cale Gibbard's takes 18.44 secs for N = 9, which is rather bad, I think. Really surprising (to me) are the following (N = 10) Algo Wiki Here Clean imperative 2.1 s 4.54 s Fastest Pure 1.8 s 8.65 s Fastest Impure 1.4 s 1.77 s. So the ratio for the fast impure version is approximately 1.6GHz/1.2GHz, which is natural, but the ratio for clean impure is 2.2 and for the fast pure version, it's even 4.8! All are compiled with -O2 -optc-O3. Does that mean my system is really poor in producing binaries from pure Haskell? And why would that be so? Cheers, Daniel

On 10/01/06, Daniel Fischer
On my 1.2 GHz Duron (SuSE linux), I get significantly different timings than those on the wiki. Sebastian Sylvan's, Kimberley Burchett's and Bertram Felgenhauer's all take roughly thrice as long as posted (that's rather consistent, but the factor is surprisingly large). Cale Gibbard's takes 18.44 secs for N = 9, which is rather bad, I think.
Well, yeah, it's known not to be as efficient as the others -- I threw it together rather quickly from some parts I had laying around. I unfortunately hadn't tried the naive implementation of permutations since it's actually faster. (My version ought to still be better than the original though.)
Really surprising (to me) are the following (N = 10)
Algo Wiki Here Clean imperative 2.1 s 4.54 s Fastest Pure 1.8 s 8.65 s Fastest Impure 1.4 s 1.77 s.
So the ratio for the fast impure version is approximately 1.6GHz/1.2GHz, which is natural, but the ratio for clean impure is 2.2 and for the fast pure version, it's even 4.8! All are compiled with -O2 -optc-O3. Does that mean my system is really poor in producing binaries from pure Haskell? And why would that be so?
Hmm... that's really odd. Which version of ghc/gcc are you using and how are you measuring the times? Perhaps other processes on your system are interfering? Do you get these numbers consistently? - Cale

Am Dienstag, 10. Januar 2006 16:10 schrieben Sie:
On 10/01/06, Daniel Fischer
wrote: On my 1.2 GHz Duron (SuSE linux), I get significantly different timings than those on the wiki. Sebastian Sylvan's, Kimberley Burchett's and Bertram Felgenhauer's all take roughly thrice as long as posted (that's rather consistent, but the factor is surprisingly large). Cale Gibbard's takes 18.44 secs for N = 9, which is rather bad, I think.
Well, yeah, it's known not to be as efficient as the others -- I threw it together rather quickly from some parts I had laying around. I unfortunately hadn't tried the naive implementation of permutations since it's actually faster. (My version ought to still be better than the original though.)
Really surprising (to me) are the following (N = 10)
Algo Wiki Here Clean imperative 2.1 s 4.54 s Fastest Pure 1.8 s 8.65 s Fastest Impure 1.4 s 1.77 s.
So the ratio for the fast impure version is approximately 1.6GHz/1.2GHz, which is natural, but the ratio for clean impure is 2.2 and for the fast pure version, it's even 4.8! All are compiled with -O2 -optc-O3. Does that mean my system is really poor in producing binaries from pure Haskell? And why would that be so?
Hmm... that's really odd. Which version of ghc/gcc are you using and how are you measuring the times? Perhaps other processes on your system are interfering? Do you get these numbers consistently?
- Cale
I use ghc 6.4.1 and gcc 3.3. Times are measured either by time ./fannPure 10 ... or ./fannPure 10 +RTS -sstderr These are user/MUT times, at the moment, my machine is busy, so that elapsed time is about double that, otherwise these times are rather consistently reproduced (between 8.4 and 8.9 for pure, 1.7 and 1.9 for impure, clean imperative I've done only twice, second run took 4.82s MUT time). Any idea? Daniel

Hello Daniel, Tuesday, January 10, 2006, 7:40:24 PM, you wrote: DF> These are user/MUT times, at the moment, my machine is busy, so that elapsed DF> time is about double that, otherwise these times are rather consistently DF> reproduced (between 8.4 and 8.9 for pure, 1.7 and 1.9 for impure, clean DF> imperative I've done only twice, second run took 4.82s MUT time). on the busy machine even MUT times may be (and even should be) inaccurate because other threads will throw out data in CPU caches -- Best regards, Bulat mailto:bulatz@HotPOP.com

Am Dienstag, 10. Januar 2006 19:11 schrieben Sie:
Hello Daniel,
Tuesday, January 10, 2006, 7:40:24 PM, you wrote:
DF> These are user/MUT times, at the moment, my machine is busy, so that elapsed DF> time is about double that, otherwise these times are rather consistently DF> reproduced (between 8.4 and 8.9 for pure, 1.7 and 1.9 for impure, clean DF> imperative I've done only twice, second run took 4.82s MUT time).
on the busy machine even MUT times may be (and even should be) inaccurate because other threads will throw out data in CPU caches
So that explains (not surprisingly) that MUT times on the busy machine are slightly higher (8.9 resp. 1.9), while on an otherwise idle machine, I consistently got values in the range of 8.4 - 8.7 (I'm not absolutely sure, but I think actually the range was 8.55 - 8.7) resp. 1.7 - 1.78 secs. The problem remains, why is the speed-ratio so different for the different algorithms? Cheers, Daniel

daniel.is.fischer:
Am Dienstag, 10. Januar 2006 19:11 schrieben Sie:
Hello Daniel,
Tuesday, January 10, 2006, 7:40:24 PM, you wrote:
DF> These are user/MUT times, at the moment, my machine is busy, so that elapsed DF> time is about double that, otherwise these times are rather consistently DF> reproduced (between 8.4 and 8.9 for pure, 1.7 and 1.9 for impure, clean DF> imperative I've done only twice, second run took 4.82s MUT time).
on the busy machine even MUT times may be (and even should be) inaccurate because other threads will throw out data in CPU caches
So that explains (not surprisingly) that MUT times on the busy machine are slightly higher (8.9 resp. 1.9), while on an otherwise idle machine, I consistently got values in the range of 8.4 - 8.7 (I'm not absolutely sure, but I think actually the range was 8.55 - 8.7) resp. 1.7 - 1.78 secs.
The problem remains, why is the speed-ratio so different for the different algorithms?
Some things I can think of: * OpenBSD vs Linux. I've noticed on some benchmarks large (2x) differences between the same code running on the two OSs. Code that runs 2x faster on openbsd, compared to and older version of the code, will show no speedup on linux. * Cpu arch. I'm testing on a Penitium M. This 1.6Ghz Pentium M outperforms a 2.4Ghz P4 sitting on my desk. * Cache size. Etc. -- Don
participants (4)
-
Bulat Ziganshin
-
Cale Gibbard
-
Daniel Fischer
-
dons@cse.unsw.edu.au