
The good news is that my code compiles without error and much faster under ghc 6.8.1. The bad news is that there appear to be subtle bugs that did not occur when I compiled things under 6.6.1. One issue is that my code is somewhat complex and links into a C library as well. The new behavior is that under certain conditions a certain matrix inner product produces undefined floats, that should not be there. If the code is executed inside any function it fails but if the same code is reexecuted at the ghci prompt it works. Here is the gist of the code that I'm running main = do ... lots of computations and let clauses -- get a submatrix viewMatbotk wstart nsua su 1 suw -- get another submatrix viewMatbotk 0 nsua arrstart npaths sua -- complex non conjugated inner product (multiply the two submatrices) mulCFtF mprint If this is executed either in ghci as main or from a Dos prompt I get a matrix filled with bad values including a few that look like -1.#IND+1.87514i If I recompile everything in ghc-6.6.1 it works like charm. I make sure that I have deleted all the .o and .hi files. There is a dll that contains a C library I link to via running dlltool.exe. If I print out all the function inputs to the function viewMatbotk and then call them interactively in ghc 6.8.1 and call mulCFtF interactively it works correctly. both viewMatbotk and mulCFtF are C routines pulled in from the external library. I am at a complete loss how to debug this or how to pin down what exactly has changed between 6.6.1 and 6.8.1 that breaks this code so badly. This type of error stinks of some kind of memory issue, e.g. corrupted pointers. Any suggestions would be appreciated. Unfortunately the code base is rather involved and potentially proprietary so I can't publish all of the details. -- View this message in context: http://www.nabble.com/ghc-6.8.1-bug--tf4810375.html#a13763341 Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Hello, I have had exactly the same problem with my bindings to GSL, BLAS and LAPACK. The foreign functions (!) randomly (but very frequently) produced NaN with ghc-6.8.1 -O. As usual, I first thought that I had a subtle bug related to the foreign pointers, but after a lot of refactoring, experiments, and tracing everything, I'm reasonably sure that memory is safely used. What I have found is that the same errors can be reproduced on ghc-6.6.1 with -O -fasm. So I tried -O -fvia-C on ghc-6.8.1 (which now it is not the default) and apparently everything works well. So it seems that now the ffi requires and additional and explicit -fvia-C. In any case I don't know why -fasm produces those strange NaN in precompiled foreign functions... Alberto On Thursday 15 November 2007 09:05, SevenThunders wrote:
The good news is that my code compiles without error and much faster under ghc 6.8.1. The bad news is that there appear to be subtle bugs that did not occur when I compiled things under 6.6.1. One issue is that my code is somewhat complex and links into a C library as well.
The new behavior is that under certain conditions a certain matrix inner product produces undefined floats, that should not be there. If the code is executed inside any function it fails but if the same code is reexecuted at the ghci prompt it works. Here is the gist of the code that I'm running
main = do ... lots of computations and let clauses -- get a submatrix viewMatbotk wstart nsua su 1 suw -- get another submatrix viewMatbotk 0 nsua arrstart npaths sua -- complex non conjugated inner product (multiply the two submatrices) mulCFtF mprint
If this is executed either in ghci as main or from a Dos prompt I get a matrix filled with bad values including a few that look like -1.#IND+1.87514i
If I recompile everything in ghc-6.6.1 it works like charm. I make sure that I have deleted all the .o and .hi files. There is a dll that contains a C library I link to via running dlltool.exe. If I print out all the function inputs to the function viewMatbotk and then call them interactively in ghc 6.8.1 and call mulCFtF interactively it works correctly. both viewMatbotk and mulCFtF are C routines pulled in from the external library.
I am at a complete loss how to debug this or how to pin down what exactly has changed between 6.6.1 and 6.8.1 that breaks this code so badly. This type of error stinks of some kind of memory issue, e.g. corrupted pointers. Any suggestions would be appreciated. Unfortunately the code base is rather involved and potentially proprietary so I can't publish all of the details.

I'm also seeing unusual behavior from GSL under ghc-6.8.1. I get a singular matrix error where there was none before, but if I prefix the function's rhs with "m `seq`", where m is the matrix in question, the error goes away. I'll try removing the seq and compiling with -fvia-C tomorrow to see if I can confirm that that makes the problem go away too. Certain inputs cause it to fail repeatably, while others do not fail; I'm not seeing random behavior like Alberto is. Strange indeed. -- Joel

Ian Lynagh wrote:
Can any of you give us a testcase for this, please?
Thanks Ian
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
I started to work on this but so far it's been hard to shrink the size of the test case. It seems to need a little workout before it's starts to exhibit the behavior. Also there are always time limitations. I'll keep hacking away at it. -- View this message in context: http://www.nabble.com/ghc-6.8.1-bug--tf4810375.html#a13964202 Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Alberto Ruiz-2 wrote:
Hello,
I have had exactly the same problem with my bindings to GSL, BLAS and LAPACK. The foreign functions (!) randomly (but very frequently) produced NaN with ghc-6.8.1 -O. As usual, I first thought that I had a subtle bug related to the foreign pointers, but after a lot of refactoring, experiments, and tracing everything, I'm reasonably sure that memory is safely used. What I have found is that the same errors can be reproduced on ghc-6.6.1 with -O -fasm. So I tried -O -fvia-C on ghc-6.8.1 (which now it is not the default) and apparently everything works well. So it seems that now the ffi requires and additional and explicit -fvia-C. In any case I don't know why -fasm produces those strange NaN in precompiled foreign functions...
Alberto
Arrgh, the fix of using -fvia-C doesn't seem to be working for me. You got my hopes up for a moment :). I also am calling BLAS via C bindings. I am going to try to distill my case down to the bare minimum if possible and then provide an example. It may take a while. -- View this message in context: http://www.nabble.com/ghc-6.8.1-bug--tf4810375.html#a13779349 Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

On Thursday 15 November 2007 19:58, SevenThunders wrote:
Alberto Ruiz-2 wrote:
Hello,
I have had exactly the same problem with my bindings to GSL, BLAS and LAPACK. The foreign functions (!) randomly (but very frequently) produced NaN with ghc-6.8.1 -O. As usual, I first thought that I had a subtle bug related to the foreign pointers, but after a lot of refactoring, experiments, and tracing everything, I'm reasonably sure that memory is safely used. What I have found is that the same errors can be reproduced on ghc-6.6.1 with -O -fasm. So I tried -O -fvia-C on ghc-6.8.1 (which now it is not the default) and apparently everything works well. So it seems that now the ffi requires and additional and explicit -fvia-C. In any case I don't know why -fasm produces those strange NaN in precompiled foreign functions...
Alberto
Arrgh, the fix of using -fvia-C doesn't seem to be working for me. You got my hopes up for a moment :). I also am calling BLAS via C bindings. I am going to try to distill my case down to the bare minimum if possible and then provide an example. It may take a while.
Hmm, I' sorry... all seems to work well for me if I set -O -fvia-C for building the library and for final program compilation. But I will also try to find a minimum test case. In the meantime I have sent to Ian information to expose the problem with my library, although I know that such large amount of code will not be very helpful. Have you tested ghc-6.6.1 with -O -fasm?

Alberto Ruiz-2 wrote:
Hmm, I' sorry... all seems to work well for me if I set -O -fvia-C for building the library and for final program compilation. But I will also try to find a minimum test case. In the meantime I have sent to Ian information to expose the problem with my library, although I know that such large amount of code will not be very helpful.
Have you tested ghc-6.6.1 with -O -fasm? _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Good idea. I just tried that. However, it worked just fine even with -fasm in ghc 6.6.1. One thought that I had as well was to be sure to recompile my C code that interfaces to Haskell using the HsFFI.h header from 6.8.1 instead of 6.6.1 just in case something might have changed. Unfortunately in my case it made no difference. -- View this message in context: http://www.nabble.com/ghc-6.8.1-bug--tf4810375.html#a13781878 Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Alberto, SevenThunders, Joel, Glark. This is not good. Thank you for being so polite about it. And thanks for working on a reproducible test case -- without that we are 100% stuck. We did fix one just-possibly-related bug in 6.8 recently, which concerned the use of {-# UNPACK #-} on strict Double-sized fields in fixed, top-level data structures. I think it was only wrong on a 64-bit machine. http://www.haskell.org/pipermail/glasgow-haskell-users/2007-November/013454.... What is the word size on your machine? But that may well be a complete red herring. We'll stand by. Simon | -----Original Message----- | From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Alberto Ruiz | Sent: 15 November 2007 08:44 | To: haskell-cafe@haskell.org | Subject: Re: [Haskell-cafe] ghc 6.8.1 bug? | | Hello, | | I have had exactly the same problem with my bindings to GSL, BLAS and LAPACK. | The foreign functions (!) randomly (but very frequently) produced NaN with | ghc-6.8.1 -O. As usual, I first thought that I had a subtle bug related to | the foreign pointers, but after a lot of refactoring, experiments, and | tracing everything, I'm reasonably sure that memory is safely used. What I | have found is that the same errors can be reproduced on ghc-6.6.1 | with -O -fasm. So I tried -O -fvia-C on ghc-6.8.1 (which now it is not the | default) and apparently everything works well. So it seems that now the ffi | requires and additional and explicit -fvia-C. In any case I don't know | why -fasm produces those strange NaN in precompiled foreign functions... | | Alberto | | On Thursday 15 November 2007 09:05, SevenThunders wrote: | > The good news is that my code compiles without error and much faster under | > ghc 6.8.1. | > The bad news is that there appear to be subtle bugs that did not occur when | > I compiled things under | > 6.6.1. One issue is that my code is somewhat complex and links into a C | > library as well.

Just out of curiosity, what LAPACK and BLAS implementation is causing problems? I have no idea if there is anything related, but I have been having similar sounding problems with python when using the latest ATLAS library on 64 bit core 2 machines. I am beginning to suspect that there may be something wrong in ATLAS, but I don't have any definite evidence yet because the bug is also rather elusive here. Michael. On 16 Nov 2007, at 2:13 AM, Simon Peyton-Jones wrote:
Alberto, SevenThunders, Joel,
Glark. This is not good. Thank you for being so polite about it. And thanks for working on a reproducible test case -- without that we are 100% stuck.
We did fix one just-possibly-related bug in 6.8 recently, which concerned the use of {-# UNPACK #-} on strict Double-sized fields in fixed, top-level data structures. I think it was only wrong on a 64-bit machine. http://www.haskell.org/pipermail/glasgow-haskell-users/2007- November/013454.html What is the word size on your machine?
But that may well be a complete red herring. We'll stand by.
Simon
| -----Original Message----- | From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe- bounces@haskell.org] On Behalf Of Alberto Ruiz | Sent: 15 November 2007 08:44 | To: haskell-cafe@haskell.org | Subject: Re: [Haskell-cafe] ghc 6.8.1 bug? | | Hello, | | I have had exactly the same problem with my bindings to GSL, BLAS and LAPACK. | The foreign functions (!) randomly (but very frequently) produced NaN with | ghc-6.8.1 -O. As usual, I first thought that I had a subtle bug related to | the foreign pointers, but after a lot of refactoring, experiments, and | tracing everything, I'm reasonably sure that memory is safely used. What I | have found is that the same errors can be reproduced on ghc-6.6.1 | with -O -fasm. So I tried -O -fvia-C on ghc-6.8.1 (which now it is not the | default) and apparently everything works well. So it seems that now the ffi | requires and additional and explicit -fvia-C. In any case I don't know | why -fasm produces those strange NaN in precompiled foreign functions... | | Alberto | | On Thursday 15 November 2007 09:05, SevenThunders wrote: | > The good news is that my code compiles without error and much faster under | > ghc 6.8.1. | > The bad news is that there appear to be subtle bugs that did not occur when | > I compiled things under | > 6.6.1. One issue is that my code is somewhat complex and links into a C | > library as well. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
---------------------- Mailing address: Michael McNeil Forbes UW Dept. of Physics Box 351560 Seattle, WA, 98195-1560 For couriers: Physics/Astronomy Building, Room C121 3910 15th Ave NE Seattle, WA, 98195-1560 If you would like to visit me personally: Room B482 (Fourth floor) (206) 543-9754

Simon, I have only tested 32-bit machines, I will try to test also on 64-bit. Michael, I have also observed strange ATLAS behavior. For example, I can make atlas3-sse2 segfault on big matrices (1000x1000) in ubuntu 6.06 and 7.04, so I typically use atlas3-base. In fact, I found a similar problem: http://article.gmane.org/gmane.linux.debian.devel.bugs.general/323065 (However, atlas3-sse2 seems to work well in ubuntu 7.10...) But I have removed ATLAS and the problem persists even with the basic refblas3 and lapack available in my ubuntu 6.06: lapack3-dev 3.0.2000531a-6ubuntu2. And I can even produce errors on GSL functions. In my particular case all errors disappear if I set -O0 or -O -fvia-C. I will try to find a minimal test case exposing the problem with -fasm. Alberto On Friday 16 November 2007 11:30, Michael McNeil Forbes wrote:
Just out of curiosity, what LAPACK and BLAS implementation is causing problems? I have no idea if there is anything related, but I have been having similar sounding problems with python when using the latest ATLAS library on 64 bit core 2 machines. I am beginning to suspect that there may be something wrong in ATLAS, but I don't have any definite evidence yet because the bug is also rather elusive here.
Michael.
On 16 Nov 2007, at 2:13 AM, Simon Peyton-Jones wrote:
Alberto, SevenThunders, Joel,
Glark. This is not good. Thank you for being so polite about it. And thanks for working on a reproducible test case -- without that we are 100% stuck.
We did fix one just-possibly-related bug in 6.8 recently, which concerned the use of {-# UNPACK #-} on strict Double-sized fields in fixed, top-level data structures. I think it was only wrong on a 64-bit machine. http://www.haskell.org/pipermail/glasgow-haskell-users/2007- November/013454.html What is the word size on your machine?
But that may well be a complete red herring. We'll stand by.
Simon
| -----Original Message----- | From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-
bounces@haskell.org] On Behalf Of Alberto Ruiz
| Sent: 15 November 2007 08:44 | To: haskell-cafe@haskell.org | Subject: Re: [Haskell-cafe] ghc 6.8.1 bug? | | Hello, | | I have had exactly the same problem with my bindings to GSL, BLAS
and LAPACK.
| The foreign functions (!) randomly (but very frequently) produced
NaN with
| ghc-6.8.1 -O. As usual, I first thought that I had a subtle bug
related to
| the foreign pointers, but after a lot of refactoring,
experiments, and
| tracing everything, I'm reasonably sure that memory is safely
used. What I
| have found is that the same errors can be reproduced on ghc-6.6.1 | with -O -fasm. So I tried -O -fvia-C on ghc-6.8.1 (which now it
is not the
| default) and apparently everything works well. So it seems that
now the ffi
| requires and additional and explicit -fvia-C. In any case I don't
know
| why -fasm produces those strange NaN in precompiled foreign
functions...
| Alberto | | On Thursday 15 November 2007 09:05, SevenThunders wrote: | > The good news is that my code compiles without error and much
faster under
| > ghc 6.8.1. | > The bad news is that there appear to be subtle bugs that did
not occur when
| > I compiled things under | > 6.6.1. One issue is that my code is somewhat complex and links
into a C
| > library as well.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
---------------------- Mailing address:
Michael McNeil Forbes UW Dept. of Physics Box 351560 Seattle, WA, 98195-1560 For couriers:
Physics/Astronomy Building, Room C121 3910 15th Ave NE Seattle, WA, 98195-1560
If you would like to visit me personally: Room B482 (Fourth floor) (206) 543-9754

Simon Peyton-Jones wrote:
Alberto, SevenThunders, Joel,
Glark. This is not good. Thank you for being so polite about it. And thanks for working on a reproducible test case -- without that we are 100% stuck.
We did fix one just-possibly-related bug in 6.8 recently, which concerned the use of {-# UNPACK #-} on strict Double-sized fields in fixed, top-level data structures. I think it was only wrong on a 64-bit machine. http://www.haskell.org/pipermail/glasgow-haskell-users/2007-November/013454.... What is the word size on your machine?
But that may well be a complete red herring. We'll stand by.
Simon
Well I am running windows xp-64 and I have an athlon x2. Does a 64 bit ghc exist for windows? I just installed ghc 6.8.1 using the binary installer. I know the C code I link to is 32 bit. I actually did have some problems upgrading atlas some time ago, but that was a build failure. I think I've built atlas 3.7.11 and had trouble installing 3.7.24. I haven't bothered to upgrade to 3.8, but I suppose I should get around to doing so. Since my current version of atlas passes the tests and has been working flawlessly until the upgrade to ghc 6.8.1, I'm not inclined to suspect that right now. As for narrowing down a test case. It's still a work in progress. So far it appears that I need to do a lot of computations before I see it. Also one oddity is that in that code that I have right now I have to apply the round function in my main routine to an arbitrary double, that's not even used in the final calculation (but is printed), in order to see the spurious NaNs. I'm not sure if that means anything since the behavior is reminiscent of buggy C code. However my C code has already been through some valgrind checks and some other tests. I'm quite confident there are no memory faults there. If I save off my matrices right before doing the multiply the bug goes away as well. -- View this message in context: http://www.nabble.com/ghc-6.8.1-bug--tf4810375.html#a13794966 Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

SevenThunders wrote:
The new behavior is that under certain conditions a certain matrix inner product produces undefined floats, that should not be there.
I now have a simple example that I have posted as ticket number 1944 for ghc
6.8.1. The behavior is that if I link to an external cblas .dll file and do
a simple matrix multiply I get NaNs in the answer. However this only seems
to happen after I call the round function. The behavior does not occur for
ghc 6.6.1.
I will show the source files that cause this below.
Test2.hs:
module Main where
foreign import ccall unsafe "test2.h iprod" iprod :: IO()
main = do
let base = round 0.03
print $ "rounded base = " ++ (show base)
iprod
The c source file ctest2.c:
#include
participants (6)
-
Alberto Ruiz
-
Ian Lynagh
-
Joel Koerwer
-
Michael McNeil Forbes
-
SevenThunders
-
Simon Peyton-Jones