
I've tried google and google scholar, wikipedia, and planetMath. Can't find a description. Can someone point me to a freely available reference? Thanks, John Velman

A "box" is a cell representing some value in a program. It generally
contains a pointer to code (a thunk), or to a proper value. When
evaluation of that box is forced for the first time, the code
executes, and when it is done, it updates the pointer with a pointer
to the result value. There are a number of slight variations on this
scheme, but essentially the idea is that there is a layer of
indirection which keeps track of whether the value has been evaluated
yet or not. This sort of thing forms a major role in the
implementation of a lazy language like Haskell.
Ordinary Arrays in Haskell, say, Array i e for some index type i, and
element type e, are arrays of boxed values. That is, each element of
the array is a pointer to either code, or a value, and each may be
evaluated separately and lazily.
By contrast, unboxed arrays, like those of type UArray i e, are arrays
of unboxed values. The layer of indirection which normally allows for
separate lazy evaluation of the elements is missing. Consequently,
such arrays consume much less space in overhead, however, there is a
penalty to be paid in ease of use. The array's box now has code which
determines the value of every element of the array as soon as it is
forced. Thus, even a single lookup in the array will cause the entire
array to be computed. If you were going to compute it anyway, you
don't lose much. What you do lose however, is the ability to define
the array elements recursively in terms of one another.
The following definition works for boxed array types, but not unboxed ones:
a = array (0,20000) $ (0,0):(1,1):[(i, a ! (i-1) + a ! (i-2)) | i <- [2..20000]]
Further, you restrict yourself to only a small collection of possible
element types which have a compact unboxed representation. A list of
these in the case of UArray is available in the documentation for
Data.Array.Unboxed.
One paper which might be of interest is the one located at
http://citeseer.ist.psu.edu/peytonjones92implementing.html
which describes the Spineless Tagless G-machine, a low-level mechanism
used to implement Haskell code by GHC.
Hope this helps,
- Cale
On 09/12/05, John Velman
I've tried google and google scholar, wikipedia, and planetMath. Can't find a description. Can someone point me to a freely available reference?
Thanks,
John Velman
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, Dec 09, 2005 at 02:29:33PM -0500, Cale Gibbard wrote:
A "box" is a cell representing some value in a program. It generally contains a pointer to code (a thunk), or to a proper value. When evaluation of that box is forced for the first time, the code executes, and when it is done, it updates the pointer with a pointer to the result value. There are a number of slight variations on this scheme, but essentially the idea is that there is a layer of indirection which keeps track of whether the value has been evaluated yet or not. This sort of thing forms a major role in the implementation of a lazy language like Haskell.
Ordinary Arrays in Haskell, say, Array i e for some index type i, and element type e, are arrays of boxed values. That is, each element of the array is a pointer to either code, or a value, and each may be evaluated separately and lazily.
I just want to add that many strict languages also have a distinction between boxed and unboxed arrays (or offer only boxed arrays), because there are other reasons to box values besides implementing laziness, like for simplifying the memory model (which can simplify GC implementation), allowing to intermix values of different types (different kinds of polymorphism) or allow variable sized values (think strings, arrays and arbitrary size integers). Some concrete examples: Arrays in many scripting languages, like Python, are often boxed. There are some languages, like R, where there are many types of unboxed arrays, for efficiency reasons. The situation in OCaml is a bit more complicated - standard arrays are boxed in general, the tagged 31bit int hack makes them unboxed for ints, and arrays of doubles are unboxed as an exception - IIUC in some cases this requires a runtime check, for example in non-inlined functions which are polymorphic on the type of array elements. Best regards Tomasz -- I am searching for a programmer who is good at least in some of [Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland

Tomasz Zielonka: ...
there are other reasons to box values besides implementing laziness, like for simplifying the memory model (which can simplify GC implementation), allowing to intermix values of different types (different kinds of polymorphism) or allow variable sized values (think strings, arrays and arbitrary size integers).
Some concrete examples: Arrays in many scripting languages, like Python, are often boxed. There are some languages, like R, where there are many types of unboxed arrays, for efficiency reasons.
There are nice unboxed arrays in Clean as well. The gain of efficiency is enormous. == Hmmm.... I thought that arrays in Python have no reason to be boxed. Lists, yes, since they are untyped, so they are arrays of pointers, but arrays are homogeneous. I don't know why boxing may simplify the *memory model*. The most dramatic effect of implicit boxing is the possibility - available in Smalltalk - to change the *identity* of an object. You write (x become y), and all references to the object assigned to x, point now to something completely different. A terrible weapon. Jerzy Karczmarczuk

Hmmm.... I thought that arrays in Python have no reason to be boxed. Lists, yes, since they are untyped, so they are arrays of pointers, but arrays are homogeneous.
Python has a different meaning for "list" than the rest of the world. When python says "list", read "boxed array" (heterogeneous) and when it says "array" read "unboxed array" (homogenous).
I don't know why boxing may simplify the *memory model*.
If you can have unboxed types floating around then you need to ether put them on the stack or your GC needs to be cool with them... relocating GC I imagine having especial trouble with unboxed types. Having a common handle on all types would certainly simplify the GC. I don't know if this is what he meant by "memory model".
The most dramatic effect of implicit boxing is the possibility - available in Smalltalk - to change the *identity* of an object.
Wow, just when you thought you were safe from aliasing...
participants (5)
-
Cale Gibbard
-
Evan Laforge
-
jerzy.karczmarczuk@info.unicaen.fr
-
John Velman
-
Tomasz Zielonka