What does unpacking an MVar really mean?

I admit I don't know exactly how MVars are implemented, but given that they can be aliased and have indefinite extent, I would think that they look something vaguely like a cdatatype ** var, basically a pointer to an MVar (which is itself a pointer, modulo some other things such as a thread queue.) And, I would think that "unpacking" such an structure would basically be eliminating one layer of indirection, so it would then look vague like a cdatatype * var. But again, given aliasing and indefinite extent, this would seem to be a difficult thing to do. Actually this isn't too difficult if an MVar only exists in a single unpacked structure: other references to the MVar can simply be pointers into the structure. But the case where an MVar is unpacked into two different structures suggests that, at least in some cases, an "unpacked" MVar is still a cdatatype ** var; So, is my understanding more or less correct? Does anybody have a good, succinct explanation of how MVars are implemented, and how they are unpacked? One final question, assuming that unpacking an MVar really does eliminate a layer of indirection, and that other references to that MVar are simply pointers into a larger structure, what happens to that larger structure when there are no more references to it (but still some references to the MVar?) Given the complications that must arise out of a doubly "unpacked" MVar, I'm going to guess that the larger structure does get garbage collected in this case, and that the MVar becomes dislodged from this structure. Would that MVar then be placed directly inside another unpacked reference, if one is available? Best, Leon

I'm not sure I totally understand your question about 'unpacking' an MVar, but I'm going to assume you mean data structures that use the {-# UNPACK #-} pragma, like in Control.Concurrent.Future [1] and Control.Concurrent.NamedLock [2]. Here is how MVar is defined in GHC [3]: data MVar a = MVar (MVar# RealWorld a) A "MVar# s a" is an unboxed pointer to a structure understood by the GHC runtime. So yes, you can imagine a MVar as a pointer-to-pointer. The structure it points at likely has another pointer to the embedded boxed "a", so it may even be pointer-to-pointer-to-pointer. The MVar data structure exists to allow laziness; for example let x = unsafePerformIO (newMVar ()) in () is likely to not allocate an MVar#, just a thunk that would create an MVar if it was evaluated. Unboxed objects (represented by convetion with # in GHC), on the other hand, are strict, so if you have an MVar# RealWorld (), you know it points to a valid MVar#.
From [2] we have data NLItem = NLItem {-# UNPACK #-} !Int {-# UNPACK #-} !(MVar ())
All the {-# UNPACK #-} pragma does is embed the contents of a strict
single-constructor data declaration *directly* into the structure
containing it; it's like you declared NLItem as such:
data NLItem = NLItem Word# (MVar# RealWorld ())
except that if you call functions that want an Int or MVar thunk, the
compiler will automatically re-box them in a new I#/MVar constructor.
Many copies of pointers to the same MVar# may exist; they are all
'identical' MVars; equality is defined as such:
instance Eq (MVar a) where
(MVar mvar1#) == (MVar mvar2#) = sameMVar# mvar1# mvar2#
where sameMVar# is a primitive that is probably just raw pointer equality.
Because of this, boxed MVars can be garbage collected without necessarily
garbage-collecting the MVar# it holds, if a live reference to that MVar#
still exists elsewhere.
-- ryan
[1]
http://hackage.haskell.org/packages/archive/future/2.0.0/doc/html/src/Contro...
[2]
http://hackage.haskell.org/packages/archive/named-lock/0.1/doc/html/src/Cont...
[3]
http://www.haskell.org/ghc/docs/7.4.1/html/libraries/base/src/GHC-MVar.html#...
On Mon, Jul 30, 2012 at 9:25 PM, Leon Smith
I admit I don't know exactly how MVars are implemented, but given that they can be aliased and have indefinite extent, I would think that they look something vaguely like a cdatatype ** var, basically a pointer to an MVar (which is itself a pointer, modulo some other things such as a thread queue.)
And, I would think that "unpacking" such an structure would basically be eliminating one layer of indirection, so it would then look vague like a cdatatype * var. But again, given aliasing and indefinite extent, this would seem to be a difficult thing to do.
Actually this isn't too difficult if an MVar only exists in a single unpacked structure: other references to the MVar can simply be pointers into the structure. But the case where an MVar is unpacked into two different structures suggests that, at least in some cases, an "unpacked" MVar is still a cdatatype ** var;
So, is my understanding more or less correct? Does anybody have a good, succinct explanation of how MVars are implemented, and how they are unpacked?
One final question, assuming that unpacking an MVar really does eliminate a layer of indirection, and that other references to that MVar are simply pointers into a larger structure, what happens to that larger structure when there are no more references to it (but still some references to the MVar?) Given the complications that must arise out of a doubly "unpacked" MVar, I'm going to guess that the larger structure does get garbage collected in this case, and that the MVar becomes dislodged from this structure. Would that MVar then be placed directly inside another unpacked reference, if one is available?
Best, Leon
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Let me clarify a bit.
I am familiar with the source of Control.Concurrent.MVar, and I do see {-#
UNPACK #-}'ed MVars around, for example in GHC's IO manager. What I
should have asked is, what does an MVar# look like? This cannot be
inferred from Haskell source; though I suppose I could have tried to read
the Runtime source.
Now, one would hope that and (MVar# RealWorld footype) would
approximately correspond to a "footype * mvar;" variable in C. The
problem is this cannot _always_ be the case, because you can alias the
(MVar# RealWorld footype) by placing a single MVar into two unpacked
columns in two different data structures. So you would need to be able
to still sometimes represent an MVar# by a footype ** mvar at runtime,
even though one would hope that it would be represented by a "footype *
mvar" in one particular data structure.
On Tue, Jul 31, 2012 at 1:04 AM, Ryan Ingram
Because of this, boxed MVars can be garbage collected without necessarily garbage-collecting the MVar# it holds, if a live reference to that MVar# still exists elsewhere.
I was asking the dual question: if the MVar# exists in some data structure, can that data structure still be garbage collected when there is a reference to the MVar#, but not the data structure it is contained within. Best, Leon

Leon Smith wrote:
I am familiar with the source of Control.Concurrent.MVar, and I do see {-# UNPACK #-}'ed MVars around, for example in GHC's IO manager. What I should have asked is, what does an MVar# look like? This cannot be inferred from Haskell source; though I suppose I could have tried to read the Runtime source.
So let's have a brief look at the source. MVar# is an RTS specific heap object which contains three pointers: (from ghc/includes/rts/storage/Closures.h) typedef struct { StgHeader header; struct StgMVarTSOQueue_ *head; struct StgMVarTSOQueue_ *tail; StgClosure* value; } StgMVar; The 'value' pointer refers to the actual value held by the mutable variable, if any. The 'head' and 'tail' pointers are used for managing a linked list of threads blocked on the mutable variable. An MVar (if evaluated) contains just a pointer the MVar# object. To access the value of an MVar, one starts with a pointer to the MVar heap object. Then, 1. Make sure that the MVar is evaluated, using standard lazy evaluation (follow indirections, enter thunks, ...). In the best case that's a check of a tag bit in the pointer. 2. Read the pointer to the MVar# in the MVar. 3. access the 'value' field of the StgMVar record, which results in another pointer to a heap object representing the actual data held by the MVar. (In reality the code has to check whether the MVar is full or not, and block if necessary. This is quite involved; see stg_takeMVarzh in ghc/rts/PrimOps.cmm) That's two dereferences and some bookkeeping work. In loops, the compiler will often unpack the MVar, so that you can expect the first two steps to be performed just once. Unpacking an MVar into a bigger record means that the pointer to the MVar# will be stored in the record directly, rather than a pointer to an MVar object that holds a pointer to the MVar#. Note that MVar# itself cannot be unpacked -- the StgMVar record will always be a separate heap object.
I was asking the dual question: if the MVar# exists in some data structure, can that data structure still be garbage collected when there is a reference to the MVar#, but not the data structure it is contained within.
Yes, because the data structure only contains a pointer to an MVar# (StgMVar record) that will live on. Best regards, Bertram

On Tue, Jul 31, 2012 at 7:37 AM, Bertram Felgenhauer < bertram.felgenhauer@googlemail.com> wrote:
Note that MVar# itself cannot be unpacked -- the StgMVar record will always be a separate heap object.
One could imagine a couple of techniques to unpack the MVar# itself, and was curious if GHC might employ one of them. So, really, unpacking the MVar does not eliminate a layer of indirection, it just eliminates the need to check a pointer tag (and possibly execute a thunk or follow some redirects if you don't have a pointer to an MVar#). I think this is what I was ultimately after. Best, Leon
participants (3)
-
Bertram Felgenhauer
-
Leon Smith
-
Ryan Ingram