Re: [Haskell-cafe] Mutable arrays

Ok, I've done some testing on different array accessing strategies, and at this point I believe that, as Stefan previously suggested, using an Oleg-inspired fold for access offers the best performance. My first non-unsafe idea was to try:
import Data.Array.MArray import Data.Array.IO import Data.Int (Int32) import Data.List (foldl')
-- |Create an array. Change the type signature to experiment with different array types. makeAry :: IO (IOUArray Integer Float) makeAry = newListArray (1,1000000) [1..]
getElemsLazy :: (MArray a e m, Ix i) => a i e -> m [e] getElemsLazy myAry = getBounds myAry >>= mapM (readArray myAry) . range
this works on a 1 million element Int32 array (IOArray or IOUArray):
myElems <-getElemsLazy myAry let myMax = foldl' max 0 myElems
But with a 10-million element array, I get a stack overflow. I next tried the foldA functions Stefan posted, but had some problems with them. I think that the array elements need to be instances of Bounded and Fractional, and I don't know of a suitable type. Anyway, his code is a little too compact for me to understand without further study, so I moved on. Next I created my own fold operators. These are fairly directly copied from an email Oleg posted to the haskell mailing list:
leftFoldA' myArray self iteratee seed = do elem <- readArray myArray $ fst seed case iteratee seed elem of Left seed -> return seed Right seed -> self iteratee seed
leftFoldA :: (MArray a e m, Ix a1) => a a1 e -> ((a1, b) -> e -> Either t (a1, b)) -> a1, b) -> m t leftFoldA array iteratee seed = let g = (leftFoldA' array) g in g iteratee seed
leftFoldModA' myArray self iteratee seed = do elem <- readArray myArray $ fst seed result <- iteratee seed elem case result of Left seed -> return seed Right seed -> self iteratee seed
leftFoldMA :: (MArray a e m, Ix a1) => a a1 e -> ((a1, b) -> e -> m (Either t (a1, b))) -> a1, b) -> m t leftFoldMA array iteratee seed = let g = (leftFoldModA' array) g in g iteratee seed
the two functions to use are leftFoldA and leftFoldMA. The difference, show in the type signature, is that the iteratee in leftFoldA is pure while the iteratee in leftFoldMA operates in the monad. So if you're going to modify the array, for example, you need to use the leftFoldMA version. These folds let me define:
findMax myAry = do (lowB, highB) <- getBounds myAry let maxReader (ix, acc) c = let ix' = ix + 1 acc' = max c acc in acc' `seq` (if ix' > highB then Left else Right) (ix', acc') (_, myMax) <- leftFoldA myAry maxReader (lowB, 0) return myMax
normalizeArray myAry = do myMax <- findMax myAry (lowB, highB) <- getBounds myAry let normMod = 1/myMax let normalizer (ix, _) c = do let ix' = ix + 1 curVal <- readArray myAry ix writeArray myAry ix (curVal * normMod) return $ (if ix' > highB then Left else Right) (ix', ()) leftFoldMA myAry normalizer (lowB, ())
-- |show all elements from lowB to highB -- this only works with IO arrays, since the readArray and print happen in the same monad. -- you could possibly make it work with ST arrays if you use a monad transformer, but you'd have -- to change leftFoldMA as well. showElems myAry lowB highB = do let showFunc (ix, _) c = do let ix' = ix + 1 readArray myAry ix >>= print return $ (if ix' > highB then Left else Right) (ix', ()) leftFoldMA myAry showFunc (lowB, ())
There are probably better ways to define the folds, but I hope these are pretty easy to follow. If I was doing a lot of different operations, I'd probably define functions like
makeIteratee highB func = (\(ix, acc) c -> let ix' = ix + 1 acc' = func c acc in acc' `seq` (if ix' > highB then Left else Right) (ix', acc'))
so that findMax becomes:
findMax2 myAry = do (lowB, highB) <- getBounds myAry (_, myMax) <- leftFoldA myAry (makeIteratee highB max) (lowB, 0) return myMax
instead of having to define a bunch of iteratees in let clauses. Now, running the program
main :: IO () main = do myAry <- makeAry (lowB, highB) <- getBounds myAry normalizeArray myAry showElems myAry 1 10 showElems myAry 999990 1000000 return ()
takes about 5 seconds, with no optimizations. It takes less than a second with -O2. Much better. It does take longer with boxed arrays, but that's to be expected. No unsafe functions, either. The only sketchy bit is using seq in the iteratees (e.g. maxReader in findMax) to force the accumulator. I say sketchy not because it's unsafe or unreliable, but it would be nice if it was unnecessary. Notes: I used GHC 6.8.1 on WinXP, 3.2GHZ P4, 1.5GB RAM. All of my original tests were done without any optimizations, which is why I didn't include timing information for the earlier stuff. The getElemsLazy function is fast, but the stack overflow is annoying. Using the leftFold functions is both faster and avoids the space leak. I was also using the Data.List.Stream and Control.Monad.Stream versions of foldl' and mapM, although I don't think those made much difference. Anyway, this was a good exercise for me. I learned a lot, especially about the power of left-fold enumerators. Cheers, John

Après avoir un peu manipulé la solution de John pour qu'elle fasse la même chose que la mienne, je peux affirmer qu'elle est légèrement moins rapide (c'est infime et normal vu que ses leftFold passent plus d'informations), mais que les deux solutions ont au moins cet avantage d'être rapides (2s sur 10M de Double) et en mémoire parfaitement constante : ---- import Data.Array.MArray import Control.Monad import Data.Array.IO import System maxArr a = liftM (foldl1' max) (getElems a) foldlA :: (MArray a e m, Ix i) => (r -> e -> r) -> r -> a i e -> m r foldlA f a arr = getBounds arr >>= foldM (\a->a `seq` liftM $ f a) a . map (readArray arr) . range foldl1A :: (MArray a e m, Ix i) => (e -> e -> e) -> a i e -> m e foldl1A f arr = flip (foldlA f) arr =<< readArray arr . fst =<< getBounds arr foldMA :: (MArray a e m, Ix i) => (r -> e -> m r) -> r -> a i e -> m r foldMA f a arr = getBounds arr >>= foldM (\a->a `seq` (>>= f a)) a . map (readArray arr) . range modifyArray :: (MArray a e m, Ix i) => (e -> e) -> a i e -> m () modifyArray f arr = mapM_ (modifyElement f arr) . range =<< getBounds arr modifyElement :: (MArray a e m, Ix i) => (e -> e) -> a i e -> i -> m () modifyElement f arr i = writeArray arr i . f =<< readArray arr i main = do [n] <- getArgs a <- (newListArray (0, 10 ^ read n) [1..] :: IO (IOUArray Int Double)) maxE <- foldl1A max a modifyArray (* (1/maxE)) a print =<< readArray a (10 ^ read n) ---- En tout cas il serait agréable d'avoir quelques unes de ces fonctions dans les librairies standards, vu qu'elles sont généralement utiles et très efficace. Je n'utilise pas les langages fonctionnels pour avoir à écrire des boucles explicites sur mes structures de données !! ;-) (et comme on l'a vu, pour un débutant, l'écriture d'une généralisation efficace de ces boucles n'est pas si triviale qu'il y paraît). -- Jedaï

Sorry for the french, I was a little bit confused...
On 08/02/08, Chaddaï Fouché
participants (2)
-
Chaddaï Fouché
-
John Lato