What is a Boxed Array?

John Velman

9 Dec 2005 9 Dec '05

6:52 p.m.

I've tried google and google scholar, wikipedia, and planetMath. Can't find a description. Can someone point me to a freely available reference? Thanks, John Velman

Show replies by date

Cale Gibbard

9 Dec 9 Dec

7:29 p.m.

A "box" is a cell representing some value in a program. It generally contains a pointer to code (a thunk), or to a proper value. When evaluation of that box is forced for the first time, the code executes, and when it is done, it updates the pointer with a pointer to the result value. There are a number of slight variations on this scheme, but essentially the idea is that there is a layer of indirection which keeps track of whether the value has been evaluated yet or not. This sort of thing forms a major role in the implementation of a lazy language like Haskell. Ordinary Arrays in Haskell, say, Array i e for some index type i, and element type e, are arrays of boxed values. That is, each element of the array is a pointer to either code, or a value, and each may be evaluated separately and lazily. By contrast, unboxed arrays, like those of type UArray i e, are arrays of unboxed values. The layer of indirection which normally allows for separate lazy evaluation of the elements is missing. Consequently, such arrays consume much less space in overhead, however, there is a penalty to be paid in ease of use. The array's box now has code which determines the value of every element of the array as soon as it is forced. Thus, even a single lookup in the array will cause the entire array to be computed. If you were going to compute it anyway, you don't lose much. What you do lose however, is the ability to define the array elements recursively in terms of one another. The following definition works for boxed array types, but not unboxed ones: a = array (0,20000) $ (0,0):(1,1):[(i, a ! (i-1) + a ! (i-2)) | i <- [2..20000]] Further, you restrict yourself to only a small collection of possible element types which have a compact unboxed representation. A list of these in the case of UArray is available in the documentation for Data.Array.Unboxed. One paper which might be of interest is the one located at http://citeseer.ist.psu.edu/peytonjones92implementing.html which describes the Spineless Tagless G-machine, a low-level mechanism used to implement Haskell code by GHC. Hope this helps, - Cale On 09/12/05, John Velman wrote:

...

I've tried google and google scholar, wikipedia, and planetMath. Can't find a description. Can someone point me to a freely available reference?

Thanks,

John Velman

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Tomasz Zielonka

10:04 p.m.

On Fri, Dec 09, 2005 at 02:29:33PM -0500, Cale Gibbard wrote:

...

A "box" is a cell representing some value in a program. It generally contains a pointer to code (a thunk), or to a proper value. When evaluation of that box is forced for the first time, the code executes, and when it is done, it updates the pointer with a pointer to the result value. There are a number of slight variations on this scheme, but essentially the idea is that there is a layer of indirection which keeps track of whether the value has been evaluated yet or not. This sort of thing forms a major role in the implementation of a lazy language like Haskell.

Ordinary Arrays in Haskell, say, Array i e for some index type i, and element type e, are arrays of boxed values. That is, each element of the array is a pointer to either code, or a value, and each may be evaluated separately and lazily.

I just want to add that many strict languages also have a distinction between boxed and unboxed arrays (or offer only boxed arrays), because there are other reasons to box values besides implementing laziness, like for simplifying the memory model (which can simplify GC implementation), allowing to intermix values of different types (different kinds of polymorphism) or allow variable sized values (think strings, arrays and arbitrary size integers). Some concrete examples: Arrays in many scripting languages, like Python, are often boxed. There are some languages, like R, where there are many types of unboxed arrays, for efficiency reasons. The situation in OCaml is a bit more complicated - standard arrays are boxed in general, the tagged 31bit int hack makes them unboxed for ints, and arrays of doubles are unboxed as an exception - IIUC in some cases this requires a runtime check, for example in non-inlined functions which are polymorphic on the type of array elements. Best regards Tomasz -- I am searching for a programmer who is good at least in some of [Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland

jerzy.karczmarczuk＠info.unicaen.fr

10:16 p.m.

Tomasz Zielonka: ...

...

there are other reasons to box values besides implementing laziness, like for simplifying the memory model (which can simplify GC implementation), allowing to intermix values of different types (different kinds of polymorphism) or allow variable sized values (think strings, arrays and arbitrary size integers).

...

Some concrete examples: Arrays in many scripting languages, like Python, are often boxed. There are some languages, like R, where there are many types of unboxed arrays, for efficiency reasons.

There are nice unboxed arrays in Clean as well. The gain of efficiency is enormous. == Hmmm.... I thought that arrays in Python have no reason to be boxed. Lists, yes, since they are untyped, so they are arrays of pointers, but arrays are homogeneous. I don't know why boxing may simplify the *memory model*. The most dramatic effect of implicit boxing is the possibility - available in Smalltalk - to change the *identity* of an object. You write (x become y), and all references to the object assigned to x, point now to something completely different. A terrible weapon. Jerzy Karczmarczuk

Evan Laforge

11:04 p.m.

...

Hmmm.... I thought that arrays in Python have no reason to be boxed. Lists, yes, since they are untyped, so they are arrays of pointers, but arrays are homogeneous.

Python has a different meaning for "list" than the rest of the world. When python says "list", read "boxed array" (heterogeneous) and when it says "array" read "unboxed array" (homogenous).

...

I don't know why boxing may simplify the *memory model*.

If you can have unboxed types floating around then you need to ether put them on the stack or your GC needs to be cool with them... relocating GC I imagine having especial trouble with unboxed types. Having a common handle on all types would certainly simplify the GC. I don't know if this is what he meant by "memory model".

...

The most dramatic effect of implicit boxing is the possibility - available in Smalltalk - to change the *identity* of an object.

Wow, just when you thought you were safe from aliasing...

John Velman

10:59 p.m.

Thanks, this is very helpful. John Velman On Fri, Dec 09, 2005 at 02:29:33PM -0500, Cale Gibbard wrote:

...

A "box" is a cell representing some value in a program. It generally ...

7158

Age (days ago)

7158

Last active (days ago)

List overview

Download

5 comments

5 participants

participants (5)

Cale Gibbard
Evan Laforge
jerzy.karczmarczuk＠info.unicaen.fr
John Velman
Tomasz Zielonka