newtypes and optimization

12 Dec 2007

      Hi,

I have a statically-sized-list library that I use for linear algebra stuff.
It's got a vector type something like this:
...
data V a b = V a b
so that a 3D vector is
...
type Vec3 a = V a (V a (V a ()))
and I have type classes for operations on these things, like so:
...
class VZipWith a b c u v w | {-lots of fundeps-} where
  vzipWith :: (a -> b -> c) -> u -> v -> w
instance VZipWith a b c (V a ()) (V b ()) (V c ()) where
  vzipWith f (V x ()) (V y ()) = V (f x y) ()
instance
  VZipWith a b c (V a u) (V b v) (V c w)
  => VZipWith a b c (V a (V a u)) (V b (V b v)) (V c (V c w))
    where
      vzipWith f (V x u) (V y v) = V (f x y) (vzipWith f u v)
so that vector addition is
...
vadd = vzipWith (+)
I put strictness annotations and INLINE pragmas all over the place, and GHC
does wonders with it. Using Storable instances something like the following,
...
instance Storable a => Storable (V a ()) where
  sizeOf _ = sizeOf (undefined::a)
  peek p = peek (castPtr p) >>= \a -> return (V a ())
  --etc
instance (Storable a, Storable v) => Storable (V a v) where
  sizeOf _ = sizeOf (undefined::a) + sizeOf (undefined::v)
  peek p =
    a <- peek (castPtr p)
    v <- peek (castPtr (p`plusPtr`sizeOf(undefined::a)))
    return (V a v)
GHC can turn a loop like this,
...
forM_ [0..n] $ \i ->
  do a <- peekElemOff aptr i
     b <- peekElemOff bptr i
     pokeElemOff cptr i (vadd a b)
into something as fast as C, using no heap. You look at the core and its
nothing but readDoubleOffAddr#, +## and the like. I went so far as to generalize
this to matrices with things like vector-matrix and matrix-matrix
multiplication, determinants and all that and, when used in loops like above,
it's consistently as fast or even faster than C.

However when I do this:
...
newtype Quaternion = Q (Vec4 Double)
Everything is ruined. Functions like peek and vadd are no longer inlined,
intermediate linked lists are created all over the place. The Quaternion
Storable instance looks like this
...
instance Storable s => Storable (Quaternion s) where
  sizeOf _ = 4*sizeOf (undefined::s)
  peek p = peek (castPtr p :: Ptr (Vec4 s)) >>= \v -> return (Q v)
with strictness annotations and INLINEs for everything. I also tried automatic
newtype deriving, with no luck. Why does a newtype defeat so much of the
optimization?

Thanks,
Scott

newtypes and optimization

Scott Dillard