
Hi Simon,
My suspicion for the root cause of the problem is that Concurrent.Chan is incorrect. In the course of debugging this problem we found 2 bugs in Chan, and while I never tracked down any other bugs in Chan, I no longer trust it. By rewriting parts of the program, including avoiding Chan, the bugs disappeared.I don't think I'll be using Chan again until after someone has proven in correct.
Considering Chan is <150 lines of code and has been around for many years, that's amazing! Did you report the bugs? Is it anything to do with asynchronous exceptions?
Nothing to do with async exceptions. I found: http://hackage.haskell.org/trac/ghc/ticket/4154 http://hackage.haskell.org/trac/ghc/ticket/3527 Of course, there's also the async exceptions bug still around: http://hackage.haskell.org/trac/ghc/ticket/3160 However, even after having a program with no async exceptions (I never used them), and eliminating unGetChan and isEmpyChan, I still got bugs. I have no proof they came from the Chan module, and no minimal test case was ever able to recreate them, but the same program with my own Chan implementation worked. My Chan had different properties (it queues items randomly) and a subset of the Chan functions, so it still doesn't prove any issue with Chan - but I am now sceptical.
You should have more luck with Control.Concurrent.STM.TChan, incedentally. It's much easier to get right, and when we benchmarked it, performance was about the same (all those withMVar/modifyMVars in Chan are quite expensive), plus you get to compose it easily: reading from either of 2 TChans is trivial.
The performance of the Haskell is irrelevant - the program spends all its time invoking system calls. Looking at the implementation I am indeed much more trusting of TChan, I'll be using that in future if there is ever a need. Thanks, Neil