Re: GHC and the Lazy Functional State Threads Paper

3 May 2001

      Thu, 03 May 2001 00:02:01 +0200, Thomas Pasch  pisze:
...
...
If the immutable array type used was particularly UArray, it would
be more efficient to use the corresponding STUArray instead of
...
freezeArr3:: 
  (Ix i, MArray.MArray (MArray.STUArray s) e (ST s), IArray.IArray a e) 
  => (MArray.STUArray s i e) -> ST s (a i e)
freezeArr3 = MArray.freeze
Is this the way to do it?
Well, yes, although when the immutable array happens to *not* be
UArray (but e.g. Array), it can be less efficient than with STArray,
because elements must be reboxed. And it's less general (the element
type must be "unboxable").

This is why I would use STUArray when it is known that the immutable
array is UArray and nothing else, and STArray in other cases.
...
I wonder a bit if this triggers the optimization if 'a' is of
type STUArray.
UArray. It does when it's statically determined that 'a' is of type
UArray. It doesn't when a function using freeze is so large that
it's not compiled inline and it's not specialized, so the compiler
uses a single generic version in a code compiled out of line. This
optimization is implented as rewrite rule:

freeze:: (Ix i, MArray a e m, IArray b e) => a i e -> m (b i e)
  -- Generically implemented in terms of overloaded element access
  -- and array construction.
freezeSTUArray:: Ix i => STUArray s i e -> ST s (UArray i e)
  -- Uses memcpy.
{-# RULES
"freeze/STUArray" freeze = freezeSTUArray
  -- The compiler can replace freeze with freezeSTUArray where types
  -- match (if optimization is turned on).
  #-}

There is no automatic way of choosing the "best" mutable array type
"corresponding" to the given immutable type. Formally there is no
correspondence here (except that there are some rules for freezing
and thawing for specific combinations of types).

Efficiency of bulk array operations improved since released ghc-5.00:
I eliminated many unnecessary bounds checking, at the cost of turning
IArray and MArray methods into functions (so custom instances of
these classes no longer work - they must define different methods).
...
(By the way I still have problems to see where unboxed values are
a good thing and were you should better avoid them.)
If there could be a universal answer, probably the compiler could do
this automatically all the times :-)

Operations like arithmetic on boxed values are implemented as unboxing,
performing the operation and reboxing. So unboxed values are generally
faster than unboxed values - except that when we perform no operations
at all and just copy values from one place to another (like in freezing
a mutable array), it's cheaper to pass boxed values unmodified than
to unbox or box them.

The compiler is able to optimize away boxing in many cases, e.g.
usually in passing values as arguments to strict functions and
returning results. But it is not able to automatically optimize it
when values are stored in data structures (unless the data structure
itself is optimized away! this sometimes happend to lists), and it's
not able to unbox values of unknown types (so it helps to inline or
specialize critical functions - this sometimes happens automatically).

In any case you can watch what is happening instead of guessing.
There are options which tell ghc to dump intermediate forms of the
program, in not so pretty format (because it's internal: the textual
presentation is defined only for the purpose of dumping). I learned
what ghc is doing from watching the dumps much more than from reading
its source. Some interesting dumps:
    ghc -c -no-recomp -O File.hs -ddump-stg
                                 -ddump-realC
                                 -fasm -ddump-asm
...
In addition I wonder why there are so many array types. Are both
STArray and IOArray really necessary and what's the difference 
between them?
Operations on STArray can be safely wrapped in a pure computation
using runST.

Operations on IOArray can be easily mixed with other IO operations.
Embedding them in a pure computation is still possible using
unsafePerformIO, but it's unsafe and slightly less efficient.
You can convert between ST and IO computations using stToIO and
unsafeIOtoST.
...
What about DiffArray?
It has immutable interface but computes '//' without copying the
whole array as long as it's used in a single-threaded way. OTOH it has
larger constant overheads than Array (because of more magic inside,
more indirections).
...
I'm still surprised by operations like 'unsafeFreezeArray' and
'freezeArray'. Shouldn't the garbage collector juge if there are
still other references to a distinct object?
I don't know how easy it would be, and I guess that in case the
programmer knows that unsafeFreeze can be safely used it's faster than
asking the garbage collection. Unsafe freezing is used everywhere
in implementation of immutable arrays, so it's worth eliminating
unnecessary overheads.

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTÊPCZA
QRCZAK

Re: GHC and the Lazy Functional State Threads Paper

Marcin 'Qrczak' Kowalczyk