Haskell performance in heavy numerical computations

Is anyone using Haskell for heavy numerical computations? Could you share your experience? My app will be mostly about running computations over huge amounts of stock data (time series) so I'm thinking I might be better of with OCaml. Thanks, Joel -- http://wagerlabs.com/

I have tried. Using just normal lists, I found it difficult to avoid huge
space leaks, but once things are working, the end result is easy to reason
about.
I'm considering giving it another try using Stream Processors or some form
of Arrows instead of lists. I think this strategy might allow me to do
several transformations of the same input list in pseudo-parallel, which
would allow the GC to collect old data as soon as its not needed by the
computation that decides when to buy and sell. The Stream Processor library
I was looking at uses Continuation Passing Style, which I don't know too
much about, and I'm not sure Arrows will work out.
Generalizing the problem, how does one express: "given multiple
transformations of an infinite numeric list, perform some IO at the first
instance some predicate is met"?
I feel that Arrows might somehow trivially allow me to compute on huge lists
without leaking, but don't understand them well enough to apply them.
If one could write an arrow that applies the transformations to the input
list in parallel, then I'd think you could solve the above example something
like this:
(f &&& g) >>> dropWhile (\(x,y) -> x > 0 && y > 0) >>> performSomeIO
Maybe <+> is the arrow operator I'm looking for, not &&&, but to be honest,
the documentation for Arrows blows my mind. I think a few examples would go
a long way.
Thanks,
Greg
On 7/6/06, Joel Reymont
Is anyone using Haskell for heavy numerical computations? Could you share your experience?
My app will be mostly about running computations over huge amounts of stock data (time series) so I'm thinking I might be better of with OCaml.
Thanks, Joel
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

honest, the documentation for Arrows blows my mind. I think a few examples would go a long way.
John Hughes' original paper on arrows is full of examples. Additionally, Hughes wrote a tutorial on programming with arrows, for the 2004 AFP summer school in Tartu, which is very accessible. Both papers are available from Hughes' publication page. Also Ross Paterson wrote a piece called Arrows and Computation, available on the arrows homepage, which contains many illuminating examples. -Jeff -- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Joel Reymont wrote:
Is anyone using Haskell for heavy numerical computations? Could you share your experience?
My app will be mostly about running computations over huge amounts of stock data (time series) so I'm thinking I might be better of with OCaml.
If you are really serious about numerical throughput, you probably should be looking at vector instructions, CPU cache sizes, and parallelising your algorithms, not choosing between high-level languages. Hopefully it's enough to do that sort of thing for a few important primitives, and then you can write the rest of the program in whatever langauge you like. * MonetDB It sounds like MonetDB might be related to your problem. This database is designed for ad-hoc analytical queries which touch most of the rows (but hopefully only a few of the columns) of a database. The design of the system has a lot of interesting analysis about the performance characteristics of modern hardware. Publications: http://monetdb.cwi.nl/Research/Articles/index.html The first sections of the Boncz thesis provide an overview which will hopefully tell you rather this stuff applies at all. The only paper with "X100" in the title is an overview of what's been happening more lately. * Optimizing Combinator Libraries If you are trying to build a nice and efficient library in Haskell over a few fast primitives, the ByteString library could be a good example. In particular, the use of rewrite rules for loop fusion might help in your case as well. If you are trying to build a combinator library in Haskell over a few efficient primitives, you might like the use of rewrite rules in Data.ByteString to implement some kinds of loop fusion. http://www.cse.unsw.edu.au/~dons/fps.html I liked the slides Perhaps idea from this paper could help if you are doing more complicated things: Dynamic Optimization for Functional Reactive Programming using Generalized Algebraic Data Types http://www.cs.nott.ac.uk/~nhn/Publications/icfp2005.pdf Brandon

Hello Joel, Friday, July 7, 2006, 2:03:11 AM, you wrote:
Is anyone using Haskell for heavy numerical computations? Could you share your experience?
My app will be mostly about running computations over huge amounts of stock data (time series) so I'm thinking I might be better of with OCaml.
are you sure that numerical speed will be critical for your app? may be, for example, that speed of thread switching and passing data, or ability to easily spread tasks between several cpu cores will be more important? numerical speed is poor in ghc 6.4, according to my tests. it's 10-20 times worse than of gcc. afair, the mandelbrot benchmark of Great Language Shootout proves this - despite all optimization attempts, GHC entry is still 5-10 times slower than gcc/ocaml/clean ones i heard rumors that Roman Leshinsky works on improving this in 6.6, though. try to ask GHC Team about his work it's also possible that your computations can be run via existing libraries or written by you in C and rest of program can be written in Haskell. my own program uses architecture like this - it consists of time-critical part - compression library written in C, and remaining part written in Haskell -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
numerical speed is poor in ghc 6.4, according to my tests. it's 10-20 times worse than of gcc. afair, the mandelbrot benchmark of Great Language Shootout proves this - despite all optimization attempts, GHC entry is still 5-10 times slower than gcc/ocaml/clean ones
We average 1.3x slower than C in the shootout. Thanks to Don Stewart for the following stats... The numerical floating-point-intensive benchmarks: mandelbrot 2.7x C (Clean 1.7x, OCaml 2.4x, C++ 2.6x) n-body 2.1x C (Clean 0.9x, OCaml 1.3x, C++ 1.4x) http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all http://shootout.alioth.debian.org/gp4/benchmark.php?test=nbody&lang=all So we're still only 2-3 times slower than C on these benchmarks, and we know how to improve them. In other cases, for example some integer/floating math, we beat GCC: partial-sums: http://shootout.alioth.debian.org/gp4/benchmark.php?test=partialsums&lang=all The full list, Haskell vs. C: 0.1x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=chameneos&lang=all 0.1x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=message&lang=all 0.1x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=all 0.5x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=binarytrees&lang=all 0.7x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsieve&lang=all 0.9x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsievebits&lang=all 0.9x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=partialsums&lang=all 1.3x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=recursive&lang=all 1.6x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=pidigits&lang=all 1.8x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=all 1.8x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=nbody&lang=all 2.3x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=all * 2.5x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=sumcol&lang=all 2.7x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all * 3.2x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=fannkuch&lang=all * 11.0x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=revcomp&lang=all * 22.0x C http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotide&lang=all * Known, or expected, to improve under Data.ByteString Cheers, Simon

Hello Simon, Monday, July 10, 2006, 1:12:13 PM, you wrote:
numerical speed is poor in ghc 6.4, according to my tests. it's 10-20 times worse than of gcc. afair, the mandelbrot benchmark of Great Language Shootout proves this - despite all optimization attempts, GHC entry is still 5-10 times slower than gcc/ocaml/clean ones
We average 1.3x slower than C in the shootout. Thanks to Don Stewart for the following stats...
as i once wrote, most of shootout benchmarks depends on the libs. for example, multi-threading benchmark is much faster on GHC than on GCC because former has user-level threads support pre-included and for later the external libs should be used (testing policy prohibits using of additional libs)
The numerical floating-point-intensive benchmarks:
mandelbrot 2.7x C (Clean 1.7x, OCaml 2.4x, C++ 2.6x) n-body 2.1x C (Clean 0.9x, OCaml 1.3x, C++ 1.4x)
that is the benchmarks that i had in mind
http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all
the same benchmark with N=600 and Sempron processor: http://shootout.alioth.debian.org/debian/benchmark.php?test=mandelbrot&lang=all here GHC is 10x slower than GCC and more than 5 times compared to Clean and Ocaml. i think that this is the real computing speed difference while with N=2000 C/Ocaml/Clean programs was really limited by memory speed. at least the same situation was for trivial "a[i]=b[i]+c[i]" benchmark discussed half-year ago - when arrays was small enough to fit in cpu cache, speed difference was 20 times. on large arrays C program was limited by memory speed and speed difference was only 3 times
* 2.5x C * 3.2x C * 11.0x C * 22.0x C
* Known, or expected, to improve under Data.ByteString
one more sign that this is more testing of libraries than compilers. for example, revcomp test performs only about 1000 operations in main language but calls regex functions with 100kb strings. not strange that in this test TCL is now fastest entry about numerical apps - afair, Manuel Chakravarty works on new version of his parallel arrays engine -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

bulat.ziganshin:
Hello Simon,
Monday, July 10, 2006, 1:12:13 PM, you wrote:
numerical speed is poor in ghc 6.4, according to my tests. it's 10-20 times worse than of gcc. afair, the mandelbrot benchmark of Great Language Shootout proves this - despite all optimization attempts, GHC entry is still 5-10 times slower than gcc/ocaml/clean ones
We average 1.3x slower than C in the shootout. Thanks to Don Stewart for the following stats...
as i once wrote, most of shootout benchmarks depends on the libs. for example, multi-threading benchmark is much faster on GHC than on GCC because former has user-level threads support pre-included and for later the external libs should be used (testing policy prohibits using of additional libs)
The numerical floating-point-intensive benchmarks:
mandelbrot 2.7x C (Clean 1.7x, OCaml 2.4x, C++ 2.6x) n-body 2.1x C (Clean 0.9x, OCaml 1.3x, C++ 1.4x)
that is the benchmarks that i had in mind
http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all
the same benchmark with N=600 and Sempron processor:
http://shootout.alioth.debian.org/debian/benchmark.php?test=mandelbrot&lang=all
here GHC is 10x slower than GCC and more than 5 times compared to Clean and Ocaml. i think that this is the real computing speed difference while with N=2000 C/Ocaml/Clean programs was really limited by memory speed.
Ah! In this case, on the debian, the benchmark has been compiled _without_ -fexcess-precision, that's what's causing the big slow down. We had to turn it on, on the gp4, but it seems the flag wasn't propagated to the debian/sempron builds for some reason. Looks like the ghc/mandelbrot benchmarks just needs to be rerun with -fexcess-precision in this case. -- Don

On Jul 10, 2006, at 4:23 AM, Donald Bruce Stewart wrote:
Ah! In this case, on the debian, the benchmark has been compiled _without_ -fexcess-precision, that's what's causing the big slow down. We had to turn it on, on the gp4, but it seems the flag wasn't propagated to the debian/sempron builds for some reason.
Looks like the ghc/mandelbrot benchmarks just needs to be rerun with -fexcess-precision in this case.
Ah! Thanks for noticing this -- I did not catch it. I've rerun, and the new value is more like 0.428 secs for 600. Thanks, -Brent

Alberto Ruiz has developed a linear algebra library which could be seen as an alternative to Matlab/Octave, using the GSL, ATLAS, LAPACK, etc. IIRC. http://dis.um.es/~alberto/GSLHaskell/ I've optimized it in some places, and added an interface which guarantees operand conformability through the type system at compile time. http://ofb.net/~frederik/stla/ It is almost twice as fast as Octave on an example program, and probably comparable to Matlab. However, it is currently very difficult to debug some of the type errors which are emitted by GHC when using it, and there are some outstanding heap-related bugs. Also, we don't have as wide a variety of built-in functions as Matlab and Octave do. Frederik On Thu, Jul 06, 2006 at 11:03:11PM +0100, Joel Reymont wrote:
Is anyone using Haskell for heavy numerical computations? Could you share your experience?
My app will be mostly about running computations over huge amounts of stock data (time series) so I'm thinking I might be better of with OCaml.
Thanks, Joel
participants (9)
-
Brandon Moore
-
Brent Fulgham
-
Bulat Ziganshin
-
dons@cse.unsw.edu.au
-
Frederik Eaton
-
Greg Fitzgerald
-
Jeff Polakow
-
Joel Reymont
-
Simon Marlow