Hi Madan,

Yes, GHC, like Java, is an "unsafe language" currently ;-). It does not protect its abstractions against buggy programs. Namely both Haskell and Java do not protect semi-colon/(>>).

Really, we should probably have some additional monad to distinguish data-race-free (DRF) IO programs from other IO programs that may have data races.

Or we could just make sequential consistency the law of the land for IO, as you propose. One thing I'd really like to work on -- if someone were interested in collaborating -- is testing the overhead of making sequential composition the norm in this way.

I had a nice conversation with your SNAPL co-author Satish about this recently. It seems like we could survey a broad swath of Haskell code to see if fine-grained use of "writeIORef" and "readIORef" are at all common. My hunch is that IORefs used in concurrent apps (web servers, etc), all use atomicModifyIORef anyway.

Thus like you I think we could fence read/writeIORef with acceptable perf for most real apps. You can still have "reallyUnsafeReadIORef" for specific purposes like implementing concurrent data structures.

Best,

-Ryan

On Mon, Mar 14, 2016 at 10:06 AM, Simon Peyton Jones <simonpj@microsoft.com> wrote:

Maclan

I’m glad you enjoyed the awkward squad paper.

I urge you to write to the Haskell Café mailing list and/or ghc-devs. Lots of smart people there. Ryan Newton is working on this kind of stuff; I’ve cc’d him.

But my rough answer would be: IORefs are really only meant for single-threaded work. Use STM for concurrent communication.

That’s not to say that we are done! The Haskell community doesn’t have many people like you, who care about the detail of the memory model. So please do help us J. For example, perhaps we could guarantee a simple sequential memory model without much additional cost?

Simon

From: Madan Musuvathi
Sent: 11 March 2016 19:35
To: Simon Peyton Jones <simonpj@microsoft.com>
Subject: Semantics of IORefs in GHC

Dear Simon,

I really enjoyed reading your awkward squad paper. Thank you for writing such an accessible paper.

My current understanding is that the implementation of IORefs in GHC breaks the simple semantics you develop in this paper. In particular, by not inserting sufficient fences around reads and writes of IORefs, a Haskell program is exposed to the weak-memory-consistency effects of the underlying hardware and possibly the backend C compiler. As a result, the monadic bind operator no longer has the simple semantics of sequential composition. Is my understanding correct?

This is very troublesome as this weaker semantics can lead to unforeseen consequences even in pure functional parts of a program. For example, when a reference to an object is passed through an IORef to another thread, the latter thread is not guaranteed to see the updates of the first thread. So, it is quite possible for some (pure functional) code to be processing objects with broken invariants or partially-constructed objects. In the extreme, this could lead to type-unsafety unless the GHC compiler is taking careful precautions to avoid this. (Many of these problems are unlikely to show up on x86 machines, but will be common on ARM.)

I am sure the GHC community is addressing these problems one way or the other. But, my question is WHY? Why can’t GHC tighten the semantics of IORefs so that the bind operation simply means sequential composition? Given that Haskell has a clean separation between pure functional parts and “awkward” parts of the program, the overheads of these fenced IORefs should be acceptable.

My coauthors and I wrote a recent SNAPL article about this problem for other (“less-beautiful” J) imperative languages like C# and Java. I really believe we should support sequential composition in our programming languages.

madan