New subject: Mutable arrays

7 Feb 2008

      Ok, I've done some testing on different array accessing strategies,
and at this point I believe that, as Stefan previously suggested,
using an Oleg-inspired fold for access offers the best performance.
My first non-unsafe idea was to try:
...
import Data.Array.MArray
import Data.Array.IO
import Data.Int (Int32)
import Data.List (foldl')
-- |Create an array.  Change the type signature to experiment with different array types.
makeAry :: IO (IOUArray Integer Float)
makeAry = newListArray (1,1000000) [1..]
...
getElemsLazy :: (MArray a e m, Ix i) => a i e -> m [e]
getElemsLazy myAry = getBounds myAry >>=
  mapM (readArray myAry) . range
this works on a 1 million element Int32 array (IOArray or IOUArray):
...
myElems <-getElemsLazy myAry
let myMax = foldl' max 0 myElems
But with a 10-million element array, I get a stack overflow.

I next tried the foldA functions Stefan posted, but had some problems
with them.  I think that the array elements need to be instances of
Bounded and Fractional, and I don't know of a suitable type.  Anyway,
his code is a little too compact for me to understand without further
study, so I moved on.

Next I created my own fold operators.  These are fairly directly
copied from an email Oleg posted to the haskell mailing list:
...
leftFoldA' myArray self iteratee seed = do
    elem <- readArray myArray $ fst seed
    case iteratee seed elem of
        Left seed -> return seed
        Right seed -> self iteratee seed
leftFoldA :: (MArray a e m, Ix a1) => a a1 e -> ((a1, b) -> e -> Either t (a1, b)) -> a1, b) -> m t
leftFoldA array iteratee seed =
    let g = (leftFoldA' array) g in
    g iteratee seed
leftFoldModA' myArray self iteratee seed = do
    elem <- readArray myArray $ fst seed
    result <- iteratee seed elem
    case result of
        Left seed -> return seed
        Right seed -> self iteratee seed
leftFoldMA :: (MArray a e m, Ix a1) => a a1 e -> ((a1, b) -> e -> m (Either t (a1, b))) -> a1, b) -> m t
leftFoldMA array iteratee seed =
    let g = (leftFoldModA' array) g in
    g iteratee seed
the two functions to use are leftFoldA and leftFoldMA.  The
difference, show in the type signature, is that the iteratee in
leftFoldA is pure while the iteratee in leftFoldMA operates in the
monad.  So if you're going to modify the array, for example, you need
to use the leftFoldMA version.  These folds let me define:
...
findMax myAry = do
    (lowB, highB) <- getBounds myAry
    let maxReader (ix, acc) c =
        let
            ix' = ix + 1
            acc' = max c acc
        in acc' `seq` (if ix' > highB then Left else Right) (ix', acc')
    (_, myMax) <- leftFoldA myAry maxReader (lowB, 0)
    return myMax
normalizeArray myAry = do
    myMax <- findMax myAry
    (lowB, highB) <- getBounds myAry
    let normMod = 1/myMax
    let normalizer (ix, _) c = do
        let ix' = ix + 1
        curVal <- readArray myAry ix
        writeArray myAry ix (curVal * normMod)
        return $ (if ix' > highB then Left else Right) (ix', ())
    leftFoldMA myAry normalizer (lowB, ())
-- |show all elements from lowB to highB
-- this only works with IO arrays, since the readArray and print happen in the same monad.
-- you could possibly make it work with ST arrays if you use a monad transformer, but you'd have
-- to change leftFoldMA as well.
showElems myAry lowB highB = do
    let showFunc (ix, _) c = do
        let ix' = ix + 1
        readArray myAry ix >>= print
        return $ (if ix' > highB then Left else Right) (ix', ())
    leftFoldMA myAry showFunc (lowB, ())
There are probably better ways to define the folds, but I hope these
are pretty easy to follow.  If I was doing a lot of different
operations, I'd probably define functions like
...
makeIteratee highB func = (\(ix, acc) c ->
    let
        ix' = ix + 1
        acc' = func c acc
    in acc' `seq` (if ix' > highB then Left else Right) (ix', acc'))
so that findMax becomes:
...
findMax2 myAry = do
    (lowB, highB) <- getBounds myAry
    (_, myMax) <- leftFoldA myAry (makeIteratee highB max) (lowB, 0)
    return myMax
instead of having to define a bunch of iteratees in let clauses.

Now, running the program
...
main :: IO ()
main = do
    myAry <- makeAry
    (lowB, highB) <- getBounds myAry
    normalizeArray myAry
    showElems myAry 1 10
    showElems myAry 999990 1000000
    return ()
takes about 5 seconds, with no optimizations.  It takes less than a
second with -O2.  Much better.  It does take longer with boxed arrays,
but that's to be expected.  No unsafe functions, either.  The only
sketchy bit is using seq in the iteratees (e.g. maxReader in findMax)
to force the accumulator.  I say sketchy not because it's unsafe or
unreliable, but it would be nice if it was unnecessary.

Notes: I used GHC 6.8.1 on WinXP, 3.2GHZ P4, 1.5GB RAM.  All of my
original tests were done without any optimizations, which is why I
didn't include timing information for the earlier stuff.  The
getElemsLazy function is fast, but the stack overflow is annoying.
Using the leftFold functions is both faster and avoids the space leak.
 I was also using the Data.List.Stream and Control.Monad.Stream
versions of foldl' and mapM, although I don't think those made much
difference.

Anyway, this was a good exercise for me.  I learned a lot, especially
about the power of left-fold enumerators.
Cheers,
John

Re: [Haskell-cafe] Mutable arrays

John Lato

Chaddaï Fouché

Chaddaï Fouché

tags

participants (2)