Haskell, C and Matrix Multiplication

Dear Haskellers, I thought I'd take some time to share something of my weekend with you all. Not because of anything new, but because it is a feel-good example of Haskell being win. A good read for Monday morning, perhaps. What feels like decades ago, I used to work in computer graphics. Mostly my work was on software for the amateur/semi-professional film industry - developing various filters and effects for the likes of Adobe AfterEffects and so on. I have not touched computer graphics from a programming perspective for quite a long time now, and I thought it was about time that I flexed those old muscles. So, I have taken it upon myself to write an SVG renderer for Blender. To cut what is going to be a very long and humdrum story short, I needed to write some matrix arithmetic code. Having not done this in a long time, I thought I'd see if I can still actually remember what matrix multiplication is by writing a function to multiply two 4x4 matrices. I thought I'd write one in C and then one in Haskell, and see how long they took and how easy they were to write. Here is the one in C: void multiplyM4 (float m1[4][4], float m2[4][4], float m3[4][4]) { m1[0][0] = m2[0][0] * m3[0][0] + m2[0][1] * m3[1][0] + m2[0][2] * m3[2][0] + m2[0][3] * m3[3][0]; m1[0][1] = m2[0][0] * m3[0][1] + m2[0][1] * m3[1][1] + m2[0][2] * m3[2][1] + m2[0][3] * m3[3][1]; m1[0][2] = m2[0][0] * m3[0][2] + m2[0][1] * m3[1][2] + m2[0][2] * m3[2][2] + m2[0][3] * m3[3][2]; m1[0][3] = m2[0][0] * m3[0][3] + m2[0][1] * m3[1][3] + m2[0][2] * m3[2][3] + m2[0][3] * m3[3][3]; m1[1][0] = m2[1][0] * m3[0][0] + m2[1][1] * m3[1][0] + m2[1][2] * m3[2][0] + m2[1][3] * m3[3][0]; m1[1][1] = m2[1][0] * m3[0][1] + m2[1][1] * m3[1][1] + m2[1][2] * m3[2][1] + m2[1][3] * m3[3][1]; m1[1][2] = m2[1][0] * m3[0][2] + m2[1][1] * m3[1][2] + m2[1][2] * m3[2][2] + m2[1][3] * m3[3][2]; m1[1][3] = m2[1][0] * m3[0][3] + m2[1][1] * m3[1][3] + m2[1][2] * m3[2][3] + m2[1][3] * m3[3][3]; m1[2][0] = m2[2][0] * m3[0][0] + m2[2][1] * m3[1][0] + m2[2][2] * m3[2][0] + m2[2][3] * m3[3][0]; m1[2][1] = m2[2][0] * m3[0][1] + m2[2][1] * m3[1][1] + m2[2][2] * m3[2][1] + m2[2][3] * m3[3][1]; m1[2][2] = m2[2][0] * m3[0][2] + m2[2][1] * m3[1][2] + m2[2][2] * m3[2][2] + m2[2][3] * m3[3][2]; m1[2][3] = m2[2][0] * m3[0][3] + m2[2][1] * m3[1][3] + m2[2][2] * m3[2][3] + m2[2][3] * m3[3][3]; m1[3][0] = m2[3][0] * m3[0][0] + m2[3][1] * m3[1][0] + m2[3][2] * m3[2][0] + m2[3][3] * m3[3][0]; m1[3][1] = m2[3][0] * m3[0][1] + m2[3][1] * m3[1][1] + m2[3][2] * m3[2][1] + m2[3][3] * m3[3][1]; m1[3][2] = m2[3][0] * m3[0][2] + m2[3][1] * m3[1][2] + m2[3][2] * m3[2][2] + m2[3][3] * m3[3][2]; m1[3][3] = m2[3][0] * m3[0][3] + m2[3][1] * m3[1][3] + m2[3][2] * m3[2][3] + m2[3][3] * m3[3][3]; } Urgh! This took me over 20 minutes to write out, and a further 34 minutes of debugging. That's 54 minutes to write a simple function! This included 23 invocations of the compiler and 18 invocations of the test program. I was disgusted with myself! I should have automated the writing of it with Haskell! The reason for this was mostly that I kept pressing the wrong buttons when writing those many, many array indices. I became blind to the numbers in the end, and it became one of those nightmare debugging sessions where you check your printMatrix function and other stupid things. I'd like to say that I was a bit tired, but that's just an excuse. Some points about the C version: 1. It's only 21 lines of code. 2. It took me 54 minutes to write and debug. Therefore I can write eight of these in a standard working day. Hmm. 3. It's written in C, so everybody and their dog understands it completely. Honest. 4. It's written in C, so it can just be pasted into nearly any other language (because they're all just augmented C). Therefore no one needs to understand it. 5. It does not consume much space (in terms of memory). 6. It's pretty fast (being written in C): Performing 100,000,000 matrix multiples takes on average 6.117s (with gcc -O2). 7. Passing a NULL pointer anywhere will cause it to explode. 8. If the pointers are not to arrays of 16 floats, it will trash memory (being as it destructively updates one of it's arguments). 9. It will only operate on 4x4 float matrices. 10. How am I storing matrices again...? Column major? So, after drinking some coffee and having a smoke, I started the Haskell version: [foldl (+) 0 $ zipWith (*) x y | x <- m1, y <- transpose m2] Ah. So... anyway, some points about the Haskell version: 1 It's only one line of code - that's 20 less than the C version. 2. It took me just over five minutes to think about and to code. That's a saving of 49 minutes (1180% or something). It compiled straight away and worked first time. I can write too many of these in a day. 3. It's laughably clear and concise. Anyone can easily tell what I'm doing. 4. It does *not* operate in constant space. I think. I can't tell. I don't know how to tell. Maybe it does. 5. It's very fast (being written in Haskell): Performing 100,000,000 matrix multiples takes on average 1.435s (with ghc -O2). 6. Passing in empty lists will not crash it. 7. It will operate on square matrices of any size (3x3, 4x4...). 8. It will operate on a matrix of any type that is of class Num. The Haskell version seems to win on all counts. Interestingly enough (to some at least), other matrix operations also become trivial when expressed in Haskell: Determinant is a bit fiddly, but quite trivial. However it is woefully slow at O(n!). I need to make use of LU decomposition (in which the determinant is the sum of the diagonal values in U). Adjoint is even more of a fiddle, but actually a one-liner after reusing bits of the determinant function (including the determinant function). Matrix inversion is easy in one line, but dependent on the determinant and the adjoint, therefore also woefully slow. Just another example of Haskell being awesome :) Now over to you guys! What experiences have you had which make you glad you use Haskell? Anything make you smile recently? - Blake

Blake Rain
Here is the one in C:
void multiplyM4 (float m1[4][4], float m2[4][4], float m3[4][4]) { m1[0][0] = m2[0][0] * m3[0][0] + m2[0][1] * m3[1][0] + m2[0][2] * m3[2][0] + m2[0][3] * m3[3][0]; m1[0][1] = m2[0][0] * m3[0][1] + m2[0][1] * m3[1][1] + m2[0][2] * m3[2][1] + m2[0][3] * m3[3][1]; (etc) :
Some points about the C version: : 6. It's pretty fast (being written in C):
Performing 100,000,000 matrix multiples takes on average 6.117s (with gcc -O2). :
Haskell version:
[foldl (+) 0 $ zipWith (*) x y | x <- m1, y <- transpose m2] : 5. It's very fast (being written in Haskell):
Performing 100,000,000 matrix multiples takes on average 1.435s (with ghc -O2).
This is surprising: a C program doing an explicit list of arithmetic operation is beaten by a factor of 4 or so by a polymorphic, lazy, list-consuming Haskell program? Are you sure? What am I missing? But, regardless of performance, I think this is a nice exposition of some of Haskell's strengths. -k -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde wrote:
Blake Rain
writes: Here is the one in C:
void multiplyM4 (float m1[4][4], float m2[4][4], float m3[4][4]) { m1[0][0] = m2[0][0] * m3[0][0] + m2[0][1] * m3[1][0] + m2[0][2] * m3[2][0] + m2[0][3] * m3[3][0]; m1[0][1] = m2[0][0] * m3[0][1] + m2[0][1] * m3[1][1] + m2[0][2] * m3[2][1] + m2[0][3] * m3[3][1]; (etc) :
Some points about the C version: : 6. It's pretty fast (being written in C):
Performing 100,000,000 matrix multiples takes on average 6.117s (with gcc -O2). :
Haskell version:
[foldl (+) 0 $ zipWith (*) x y | x <- m1, y <- transpose m2] : 5. It's very fast (being written in Haskell):
Performing 100,000,000 matrix multiples takes on average 1.435s (with ghc -O2).
This is surprising: a C program doing an explicit list of arithmetic operation is beaten by a factor of 4 or so by a polymorphic, lazy, list-consuming Haskell program? Are you sure? What am I missing?
Constant folding? The numbers certainly depend on the kind of the test program.

Sorry, but last time I've checked, C did have loops, is that correct? And even if you don't want loops, there is a preprocessor. 17.01.2011 10:45, Blake Rain пишет:
Dear Haskellers,
I thought I'd take some time to share something of my weekend with you all. Not because of anything new, but because it is a feel-good example of Haskell being win. A good read for Monday morning, perhaps.
What feels like decades ago, I used to work in computer graphics. Mostly my work was on software for the amateur/semi-professional film industry - developing various filters and effects for the likes of Adobe AfterEffects and so on.
I have not touched computer graphics from a programming perspective for quite a long time now, and I thought it was about time that I flexed those old muscles. So, I have taken it upon myself to write an SVG renderer for Blender.
To cut what is going to be a very long and humdrum story short, I needed to write some matrix arithmetic code.
Having not done this in a long time, I thought I'd see if I can still actually remember what matrix multiplication is by writing a function to multiply two 4x4 matrices. I thought I'd write one in C and then one in Haskell, and see how long they took and how easy they were to write.
Here is the one in C:
void multiplyM4 (float m1[4][4], float m2[4][4], float m3[4][4]) { m1[0][0] = m2[0][0] * m3[0][0] + m2[0][1] * m3[1][0] + m2[0][2] * m3[2][0] + m2[0][3] * m3[3][0]; m1[0][1] = m2[0][0] * m3[0][1] + m2[0][1] * m3[1][1] + m2[0][2] * m3[2][1] + m2[0][3] * m3[3][1]; m1[0][2] = m2[0][0] * m3[0][2] + m2[0][1] * m3[1][2] + m2[0][2] * m3[2][2] + m2[0][3] * m3[3][2]; m1[0][3] = m2[0][0] * m3[0][3] + m2[0][1] * m3[1][3] + m2[0][2] * m3[2][3] + m2[0][3] * m3[3][3];
m1[1][0] = m2[1][0] * m3[0][0] + m2[1][1] * m3[1][0] + m2[1][2] * m3[2][0] + m2[1][3] * m3[3][0]; m1[1][1] = m2[1][0] * m3[0][1] + m2[1][1] * m3[1][1] + m2[1][2] * m3[2][1] + m2[1][3] * m3[3][1]; m1[1][2] = m2[1][0] * m3[0][2] + m2[1][1] * m3[1][2] + m2[1][2] * m3[2][2] + m2[1][3] * m3[3][2]; m1[1][3] = m2[1][0] * m3[0][3] + m2[1][1] * m3[1][3] + m2[1][2] * m3[2][3] + m2[1][3] * m3[3][3];
m1[2][0] = m2[2][0] * m3[0][0] + m2[2][1] * m3[1][0] + m2[2][2] * m3[2][0] + m2[2][3] * m3[3][0]; m1[2][1] = m2[2][0] * m3[0][1] + m2[2][1] * m3[1][1] + m2[2][2] * m3[2][1] + m2[2][3] * m3[3][1]; m1[2][2] = m2[2][0] * m3[0][2] + m2[2][1] * m3[1][2] + m2[2][2] * m3[2][2] + m2[2][3] * m3[3][2]; m1[2][3] = m2[2][0] * m3[0][3] + m2[2][1] * m3[1][3] + m2[2][2] * m3[2][3] + m2[2][3] * m3[3][3];
m1[3][0] = m2[3][0] * m3[0][0] + m2[3][1] * m3[1][0] + m2[3][2] * m3[2][0] + m2[3][3] * m3[3][0]; m1[3][1] = m2[3][0] * m3[0][1] + m2[3][1] * m3[1][1] + m2[3][2] * m3[2][1] + m2[3][3] * m3[3][1]; m1[3][2] = m2[3][0] * m3[0][2] + m2[3][1] * m3[1][2] + m2[3][2] * m3[2][2] + m2[3][3] * m3[3][2]; m1[3][3] = m2[3][0] * m3[0][3] + m2[3][1] * m3[1][3] + m2[3][2] * m3[2][3] + m2[3][3] * m3[3][3]; }
Urgh! This took me over 20 minutes to write out, and a further 34 minutes of debugging. That's 54 minutes to write a simple function! This included 23 invocations of the compiler and 18 invocations of the test program. I was disgusted with myself! I should have automated the writing of it with Haskell!
The reason for this was mostly that I kept pressing the wrong buttons when writing those many, many array indices. I became blind to the numbers in the end, and it became one of those nightmare debugging sessions where you check your printMatrix function and other stupid things. I'd like to say that I was a bit tired, but that's just an excuse.
Some points about the C version:
1. It's only 21 lines of code.
2. It took me 54 minutes to write and debug. Therefore I can write eight of these in a standard working day. Hmm.
3. It's written in C, so everybody and their dog understands it completely. Honest.
4. It's written in C, so it can just be pasted into nearly any other language (because they're all just augmented C). Therefore no one needs to understand it.
5. It does not consume much space (in terms of memory).
6. It's pretty fast (being written in C):
Performing 100,000,000 matrix multiples takes on average 6.117s (with gcc -O2).
7. Passing a NULL pointer anywhere will cause it to explode.
8. If the pointers are not to arrays of 16 floats, it will trash memory (being as it destructively updates one of it's arguments).
9. It will only operate on 4x4 float matrices.
10. How am I storing matrices again...? Column major?
So, after drinking some coffee and having a smoke, I started the Haskell version:
[foldl (+) 0 $ zipWith (*) x y | x<- m1, y<- transpose m2]
Ah.
So... anyway, some points about the Haskell version:
1 It's only one line of code - that's 20 less than the C version.
2. It took me just over five minutes to think about and to code. That's a saving of 49 minutes (1180% or something). It compiled straight away and worked first time. I can write too many of these in a day.
3. It's laughably clear and concise. Anyone can easily tell what I'm doing.
4. It does *not* operate in constant space. I think. I can't tell. I don't know how to tell. Maybe it does.
5. It's very fast (being written in Haskell):
Performing 100,000,000 matrix multiples takes on average 1.435s (with ghc -O2).
6. Passing in empty lists will not crash it.
7. It will operate on square matrices of any size (3x3, 4x4...).
8. It will operate on a matrix of any type that is of class Num.
The Haskell version seems to win on all counts.
Interestingly enough (to some at least), other matrix operations also become trivial when expressed in Haskell:
Determinant is a bit fiddly, but quite trivial. However it is woefully slow at O(n!). I need to make use of LU decomposition (in which the determinant is the sum of the diagonal values in U).
Adjoint is even more of a fiddle, but actually a one-liner after reusing bits of the determinant function (including the determinant function).
Matrix inversion is easy in one line, but dependent on the determinant and the adjoint, therefore also woefully slow.
Just another example of Haskell being awesome :)
Now over to you guys! What experiences have you had which make you glad you use Haskell? Anything make you smile recently?
- Blake
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Oh yeah, it does. You're right. I haven't used it in so long that I forgot about them! No need to be sorry. Well, +2 for C, eh? :) And I'd just make a mess if I used the preprocessor. On Mon, 2011-01-17 at 11:23 +0300, Miguel Mitrofanov wrote:
Sorry, but last time I've checked, C did have loops, is that correct? And even if you don't want loops, there is a preprocessor.
17.01.2011 10:45, Blake Rain пишет:
Dear Haskellers,
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Mon, 17 Jan 2011 07:45:04 +0000
Blake Rain
So, after drinking some coffee and having a smoke, I started the Haskell version:
[foldl (+) 0 $ zipWith (*) x y | x <- m1, y <- transpose m2]
I don't think this is correct; it's type is (Num a) => [[a]] -> [[a]] -> [a] rather than the expected (Num a) => [[a]] -> [[a]] -> [[a]] How about: mult m1 m2 = [[foldl (+) 0 $ zipWith (*) x y | y<-transpose m2] | x<-m1] Regarding performance: did you make sure you're forcing the evaluation of the result matrix? Regards, Pedro

On Mon, 2011-01-17 at 10:13 +0000, Pedro Vasconcelos wrote:
On Mon, 17 Jan 2011 07:45:04 +0000 Blake Rain
wrote: So, after drinking some coffee and having a smoke, I started the Haskell version:
[foldl (+) 0 $ zipWith (*) x y | x <- m1, y <- transpose m2]
I don't think this is correct; it's type is
(Num a) => [[a]] -> [[a]] -> [a]
rather than the expected
(Num a) => [[a]] -> [[a]] -> [[a]]
How about:
mult m1 m2 = [[foldl (+) 0 $ zipWith (*) x y | y<-transpose m2] | x<-m1]
So sorry, I meant: mult :: (Num a) => [[a]] -> [[a]] -> [a] mult m1 m2 = [foldl (+) 0 $ zipWith (*) x y | x <- m1, y <- transpose m2] Lack of sleep.
Regarding performance: did you make sure you're forcing the evaluation of the result matrix?
I thought I was. It's printing the results as expected. Could be I'm just imagining I am and suffering a delusion or misunderstanding.
Regards,
Pedro
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Mon, 17 Jan 2011 10:38:30 +0000
Blake Rain
So sorry, I meant:
mult :: (Num a) => [[a]] -> [[a]] -> [a] mult m1 m2 = [foldl (+) 0 $ zipWith (*) x y | x <- m1, y <- transpose m2]
Shouldn't the result of multiplying an (n,k)-matrix by an (k,m)-matrix be an (n,m)-matrix? The way you've written it the result will be a list of length n*m.
Regarding performance: did you make sure you're forcing the evaluation of the result matrix?
I thought I was. It's printing the results as expected. Could be I'm just imagining I am and suffering a delusion or misunderstanding.
Another issue to look out for is let-floating: if you perform the same matrix multiplication repeatedly, the compiler might very well lift the matrix calculation outside the loop. Pedro

Blake Rain wrote:
Determinant is a bit fiddly, but quite trivial. However it is woefully slow at O(n!). I need to make use of LU decomposition (in which the determinant is the sum of the diagonal values in U).
product of diagonal values
Adjoint is even more of a fiddle, but actually a one-liner after reusing bits of the determinant function (including the determinant function).
Isn't Adjoint just 'transpose' followed by complex conjugate?

On Mon, Jan 17, 2011 at 12:50 PM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
Blake Rain wrote:
Adjoint is even more of a fiddle, but actually a one-liner after
reusing bits of the determinant function (including the determinant function).
Isn't Adjoint just 'transpose' followed by complex conjugate?
I think he refers to the matrix of cofactors. Francesco
participants (6)
-
Blake Rain
-
Francesco Guerrieri
-
Henning Thielemann
-
Ketil Malde
-
Miguel Mitrofanov
-
Pedro Vasconcelos