
Hello Cafe, Being a complete beginner in the field of numerical analysis, but anyway needing it to solve "real problems", I wrote a few functions recently to solve systems of polynomial equations using the "projected polyhedron" method by Maekawa and Patrikakalis. This requires solving systems of linear equations precisely, thus the simple Gauss method was not enough, and I had to write also an algorithm for the "SVD decomposition". Upon discovering the algol / fortran specifications of these :-( , heavily published in important journals, I thought it would be nice to provide the world with fast reliable implementation of these numerical methods (i.e. not simply bindings to lapack). Moreover, writing numerical things in haskell is much more pleasant than I thought at first. Here are a few random thoughts on this : - The haskell 98 norm does not require enough about IEEE-754 compliance, thus C bindings are still needed to guess for instance the machine epsilons, or manipulating ULPs. Moreover, taking advantage of hardware rounding is not easy, even if the hardware is IEEE-compliant : calling a C function from haskell screws up the speed advantages of hardware rounding, for instance. Maybe the new LLVM backend will make this possible ? - The current Array library is definitely not adapted to production code. It makes debugging tricky, requires a heavy use of Debug.Trace to actually see what happens, and does not seem as fast as one could expect. It seems that each algebra library on hackage redefines part of it, but a unified version would be nice : a discussion within the haskell community seems to be needed... - A numerical analysis library should really take advantage of the parallelism in GHC, especially with the arrival of hardware such as fermi (anyway, I do not know how much haskell is compilable to fermi code). The love for loops and side-effects among this community is hard to understand, but that's more of a cultural problem. Finally, as stated by William Stein, the creator of SAGE, of course it would take thousands of man-years to rewrite these codes in python, but if a language like haskell, and a compiler do 90% of the work, how many man-years are left ? If anyone here has got the time, the team and the will to start such a project, I'd love to contribute ! Cheers, PE

pierreetienne.meunier:
Hello Cafe,
Being a complete beginner in the field of numerical analysis, but anyway needing it to solve "real problems", I wrote a few functions recently to solve systems of polynomial equations using the "projected polyhedron" method by Maekawa and Patrikakalis. This requires solving systems of linear equations precisely, thus the simple Gauss method was not enough, and I had to write also an algorithm for the "SVD decomposition".
- The current Array library is definitely not adapted to production code. It makes debugging tricky, requires a heavy use of Debug.Trace to actually see what happens, and does not seem as fast as one could expect.
[snip]
- A numerical analysis library should really take advantage of the parallelism in GHC, especially with the arrival of hardware such as fermi (anyway, I do not know how much haskell is compilable to fermi code). The love for loops and side-effects among this community is hard to understand, but that's more of a cultural problem.
Perhaps you can look at the new array packages of the last few years: * vector An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop fusion optimization framework . http://hackage.haskell.org/package/vector * Repa High performance, regular, shape polymorphic parallel arrays. http://hackage.haskell.org/package/repa -- Don

Perhaps you can look at the new array packages of the last few years:
* vector
An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop fusion optimization framework .
http://hackage.haskell.org/package/vector
* Repa
High performance, regular, shape polymorphic parallel arrays.
Indeed... Looks cool ! I suppose I'll have to rewrite a few things. Do you know why they aren't (yet ?) integrated into the hierarchicals ?

pierreetienne.meunier:
Perhaps you can look at the new array packages of the last few years:
* vector
An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop fusion optimization framework .
http://hackage.haskell.org/package/vector
* Repa
High performance, regular, shape polymorphic parallel arrays.
Indeed... Looks cool ! I suppose I'll have to rewrite a few things. Do you know why they aren't (yet ?) integrated into the hierarchicals ?
Into the libraries distributed with the Haskell Platform, you mean? Because no one has proposed this! http://trac.haskell.org/haskell-platform/wiki/AddingPackages

I've also just noticed a lack in the vector library : multidimensional arrays seem to require indirections like in caml, whereas in C or in Data.Ix, there is a way to avoid this. This is especially important for avoiding cache misses with many dimensions, as well as for providing a clean interface. For instance if a 10x10 matrix is initialized unproperly like Data.Vector.replicate 10 $ Data.Vector.replicate 10 0 The result is a total mess. Surely, every programmer knows that a computer has got memory, and that this memory has to be allocated, but from what I understand of haskell, I would expect the interface and the RTS to do it for me. And an integer multiplication, followed by an addition, is way cheaper than accessing uncached memory. Or maybe I do not understand that pipelines, hyperthreading and all that stuff would give you the same result ? El 15/05/2010, a las 20:02, Don Stewart escribió:
pierreetienne.meunier:
Perhaps you can look at the new array packages of the last few years:
* vector
An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop fusion optimization framework .
http://hackage.haskell.org/package/vector
* Repa
High performance, regular, shape polymorphic parallel arrays.
Indeed... Looks cool ! I suppose I'll have to rewrite a few things. Do you know why they aren't (yet ?) integrated into the hierarchicals ?
Into the libraries distributed with the Haskell Platform, you mean? Because no one has proposed this!
http://trac.haskell.org/haskell-platform/wiki/AddingPackages

On 16/05/2010, at 10:17, Pierre-Etienne Meunier wrote:
I've also just noticed a lack in the vector library : multidimensional arrays seem to require indirections like in caml, whereas in C or in Data.Ix, there is a way to avoid this. This is especially important for avoiding cache misses with many dimensions, as well as for providing a clean interface. For instance if a 10x10 matrix is initialized unproperly like
Data.Vector.replicate 10 $ Data.Vector.replicate 10 0
The result is a total mess. Surely, every programmer knows that a computer has got memory, and that this memory has to be allocated, but from what I understand of haskell, I would expect the interface and the RTS to do it for me. And an integer multiplication, followed by an addition, is way cheaper than accessing uncached memory. Or maybe I do not understand that pipelines, hyperthreading and all that stuff would give you the same result ?
You are quite right that vector only supports nested arrays but not multidimensional ones. This is by design, however - the library's only goal is to provide efficient one-dimensional, Int-indexed arrays. I'm thinking about how to implement multidimensional arrays on top of vector but it's not that easy. While repa is a step in that direction, I also need to support mutable arrays and interoperability with C which complicates things immensely. That said, if all you need is a matrix it's quite easy to implement the necessary index calculations yourself. Also, since you are working with numerics I highly recommend that you use either Data.Vector.Unboxed or Data.Vector.Storable instead of Data.Vector as boxing tends to be prohibitively expensive in this domain. Roman

You are quite right that vector only supports nested arrays but not multidimensional ones. This is by design, however - the library's only goal is to provide efficient one-dimensional, Int-indexed arrays. I'm thinking about how to implement multidimensional arrays on top of vector but it's not that easy. While repa is a step in that direction, I also need to support mutable arrays and interoperability with C which complicates things immensely.
I understand. What complicates it even more (at least in what I imagine) is that C uses the same syntax for multidimensional and nested arrays, and I do not believe that for instance GHC's FFI allows for array types such as "int x[19][3]". Data.Ix allows for indexing arrays with (Int,Int), for instance, but with costly index conversions, constructing lists in the memory for instance. Also, sometimes the user of these libraries may find a better bijection between N and N^2, better in the sense that for his particular problem, a slightly more complicated arithmetic sequence of operations for computing the index would optimize cache misses better. Would it be for instance possible however to generate a default bijection using preprocessing (template haskell ?) ?
That said, if all you need is a matrix it's quite easy to implement the necessary index calculations yourself. Also, since you are working with numerics I highly recommend that you use either Data.Vector.Unboxed or Data.Vector.Storable instead of Data.Vector as boxing tends to be prohibitively expensive in this domain.
I'm actually thinking about rewriting parts of my code now ! It seems indeed that given how the algorithms are specified for instance in "numerical recipes" or the papers in its bibliography, unboxing is the only option, since there is no such thing as "boxing" in cobol and fortran ;-) I was also wondering about how to do linear algebra : an infinite number of types would be needed to express all the constraints on matrix multiplication : we need types such as "array of size m * n". Is there a way to generate these automatically with for instance template haskell (again ! But I know nothing of template haskell, neither, sorry !) Cheers, PE

Pierre-Etienne Meunier wrote:
I was also wondering about how to do linear algebra : an infinite number of types would be needed to express all the constraints on matrix multiplication : we need types such as "array of size m * n". Is there a way to generate these automatically
This is already done for you in the package hmatrix-static on Hackage. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

On 17/05/2010, at 02:52, Pierre-Etienne Meunier wrote:
You are quite right that vector only supports nested arrays but not multidimensional ones. This is by design, however - the library's only goal is to provide efficient one-dimensional, Int-indexed arrays. I'm thinking about how to implement multidimensional arrays on top of vector but it's not that easy. While repa is a step in that direction, I also need to support mutable arrays and interoperability with C which complicates things immensely.
I understand. What complicates it even more (at least in what I imagine) is that C uses the same syntax for multidimensional and nested arrays, and I do not believe that for instance GHC's FFI allows for array types such as "int x[19][3]".
Actually, it does since an argument of that type is equivalent to int *x. FWIW, I always say "nested array" when I mean that the individual subarrays can have different lengths as opposed to multidimensional ones where they are all the same. So the former are similar to int *x[].
I was also wondering about how to do linear algebra : an infinite number of types would be needed to express all the constraints on matrix multiplication : we need types such as "array of size m * n". Is there a way to generate these automatically with for instance template haskell (again ! But I know nothing of template haskell, neither, sorry !)
Encoding the bounds in the type system is possible but rather messy. In general, simply saying the array has indices of type (Int,Int) and doing dynamic bounds check when necessary seems to work best in Haskell. Roman

On May 16, 2010, at 4:51 AM, Roman Leshchinskiy wrote:
You are quite right that vector only supports nested arrays but not multidimensional ones. This is by design, however - the library's only goal is to provide efficient one-dimensional, Int-indexed arrays. I'm thinking about how to implement multidimensional arrays on top of vector but it's not that easy. While repa is a step in that direction, I also need to support mutable arrays and interoperability with C which complicates things immensely.
Indeed; I eventually decided to hand-roll my own package for dealing with N-dimensional arrays using StorableArray under the hood since I needed to be able to pass them to and from Fortran routines. It has some nice features such as arbitrary type-safe cuts and slicing (i.e., you can take index 1 of the first dimension and every 3rd index between 2 and 9 of the third dimension and it will returns a new array with a type that has one fewer dimension), folding, converting to/from lists, mutable and immutable versions, and direct pointer access --- which is just the amount of functionality I needed for my purposes. The problem (and part of the reason I haven't gotten around to uploading it to Hackage) is that the package is not geared around performing efficient computations in Haskell because I have been writing numeric routines themselves in Fortran; its only real purpose was to let me to examine the arrays in Haskell code. As an aside, while there are advantages to writing numerical analysis routines in Haskell, it might be better strategy to instead link in something like LAPACK and provide nice wrappers to it in Haskell, since this way you can harness the work of the experts who have spent a lot of time perfecting their code rather than re-inventing the wheel. One downside of this, though, is that the LAPACK routines only achieve parallelism by making extensive use of Level 3 BLAS routines whenever possible and assuming that they are heavily optimized and parallelized (which they are), so there *might* be cases were a pure Haskell implementation might be able to out-perform them by exploiting even more opportunities parallelism within the algorithm then that provided by the BLAS calls. Another downside is that using LAPACK requires the arrays to be in pinned memory --- though only for the duration of the LAPACK call. Finally, I have been experiencing problems linking Fortran and Haskell code on OSX for reasons that I don't understand, since I have no problem on my Linux machine, I have made sure that all the code is compiled for the 32-bit architecture on my OSX machine, and linking C and Fortran programs on the OSX machine does not result in any problems. Cheers, Greg

On 17/05/2010, at 05:17, Gregory Crosswhite wrote:
As an aside, while there are advantages to writing numerical analysis routines in Haskell, it might be better strategy to instead link in something like LAPACK and provide nice wrappers to it in Haskell, since this way you can harness the work of the experts who have spent a lot of time perfecting their code rather than re-inventing the wheel.
I don't see think this is an either/or question. A good array library ought to provide BLAS, Lapack, FFTW etc. bindings *and* allow writing high-performance code in pure Haskell. I haven't implemented any of these bindings for vector only because I'm still deciding what to do with multidimensional arrays. Roman

Oh, I agree that it would be really nice to have a way to write high-performance code in pure Haskell --- it would be nice if I didn't have to drop to Fortran anymore just because it makes it easier to write high-performance numeric code! My only point was that it might not be worthwhile to spend too much time writing routines in Haskell from scratch to implement things like singular value decompositions when there are routines to do this that already exists in LAPACK and are probably superior to something to anything we could roll out ourselves, though on the other hand the act of implementing an SVD algorithm might be useful for giving guidance on the kind of high-level operations that are needed in the library. Cheers, Greg On May 17, 2010, at 8:01 PM, Roman Leshchinskiy wrote:
On 17/05/2010, at 05:17, Gregory Crosswhite wrote:
As an aside, while there are advantages to writing numerical analysis routines in Haskell, it might be better strategy to instead link in something like LAPACK and provide nice wrappers to it in Haskell, since this way you can harness the work of the experts who have spent a lot of time perfecting their code rather than re-inventing the wheel.
I don't see think this is an either/or question. A good array library ought to provide BLAS, Lapack, FFTW etc. bindings *and* allow writing high-performance code in pure Haskell. I haven't implemented any of these bindings for vector only because I'm still deciding what to do with multidimensional arrays.
Roman

Pierre-Etienne Meunier
Indeed... Looks cool ! I suppose I'll have to rewrite a few things. Do you know why they aren't (yet ?) integrated into the hierarchicals?
What do you mean by this? If you're asking why they're not the default, it's because they're still relatively new (well, repa is _really_ new), are only now starting to be widely used and probably use a lot of GHC-isms. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

SAGE is the kind of thing that I dreamed to have available online a few
years ago.
To recode everithing in haskell perhaps does not worth the pain, but
perhapts it would be nice to do something similar to SAGE in an advanced
environment such is Google Wave, with all the collaborative facilities for
free.
Perhaps this would attract some developers. The utility of the whole thing
perhaps would incentivate the integration of existing haskell math software
and to translate/integrate foreign code.
Cheers
2010/5/16 Pierre-Etienne Meunier
Hello Cafe,
Being a complete beginner in the field of numerical analysis, but anyway needing it to solve "real problems", I wrote a few functions recently to solve systems of polynomial equations using the "projected polyhedron" method by Maekawa and Patrikakalis. This requires solving systems of linear equations precisely, thus the simple Gauss method was not enough, and I had to write also an algorithm for the "SVD decomposition".
Upon discovering the algol / fortran specifications of these :-( , heavily published in important journals, I thought it would be nice to provide the world with fast reliable implementation of these numerical methods (i.e. not simply bindings to lapack). Moreover, writing numerical things in haskell is much more pleasant than I thought at first. Here are a few random thoughts on this :
- The haskell 98 norm does not require enough about IEEE-754 compliance, thus C bindings are still needed to guess for instance the machine epsilons, or manipulating ULPs. Moreover, taking advantage of hardware rounding is not easy, even if the hardware is IEEE-compliant : calling a C function from haskell screws up the speed advantages of hardware rounding, for instance. Maybe the new LLVM backend will make this possible ?
- The current Array library is definitely not adapted to production code. It makes debugging tricky, requires a heavy use of Debug.Trace to actually see what happens, and does not seem as fast as one could expect. It seems that each algebra library on hackage redefines part of it, but a unified version would be nice : a discussion within the haskell community seems to be needed...
- A numerical analysis library should really take advantage of the parallelism in GHC, especially with the arrival of hardware such as fermi (anyway, I do not know how much haskell is compilable to fermi code). The love for loops and side-effects among this community is hard to understand, but that's more of a cultural problem.
Finally, as stated by William Stein, the creator of SAGE, of course it would take thousands of man-years to rewrite these codes in python, but if a language like haskell, and a compiler do 90% of the work, how many man-years are left ?
If anyone here has got the time, the team and the will to start such a project, I'd love to contribute !
Cheers, PE
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (7)
-
Alberto G. Corona
-
Don Stewart
-
Erik de Castro Lopo
-
Gregory Crosswhite
-
Ivan Lazar Miljenovic
-
Pierre-Etienne Meunier
-
Roman Leshchinskiy