Mutable arrays

Jeff φ

2 Feb 2008 2 Feb '08

10:32 a.m.

Hello, I'm trying to write code that will take a mutable 2D array and normalize it by dividing all elements by the largest element. I managed to write code to do this, but it seems overly complex. I could write something much simpler in Clean or C++. Most likely, my code is complex because I don't have any experience with mutable arrays in Haskell. I couldn't find any tutorials on the Internet. I'd be grateful for suggestions on simplifying the following code. Thanks. {-# OPTIONS_GHC -fglasgow-exts -fbreak-on-exception #-} -- normalize_ary This takes a mutable array. Determines the largest -- element in the array (max_elem) and then divides every element by -- max_elem. normalize_ary :: (Num t1, Num t, Ix t, Ix t1, MArray a e t2, Ord e, Fractional e, Enum t, Enum t1) => a (t, t1) e -> t2 () normalize_ary ary = do -- The following two commented out lines of code show my first -- attempt at determining a value for max_elem. However, this -- produces a stack overflow. -- elem_ary <- getElems ary -- let max_elem = foldl1 max elem_ary max_elem <- calc_max_2d_elem ary max_elem `seq` map_in_place_2d_arr (\x -> x / max_elem) ary map_in_place_2d_arr :: (MArray a e t, Enum t2, Enum t1, Ix t1, Ix t2) => (e -> e) -> a (t1, t2) e -> t () map_in_place_2d_arr fn arr = ret where ret = do ((i1,j1),(i2,j2)) <- getBounds arr ( mapM_ (\i -> do v <- readArray arr i writeArray arr i (fn v) ) [(i,j) | i <- [i1..i2], j <- [j1..j2]]) calc_max_2d_elem :: (Ord t, MArray a t t1, Ix t2, Ix t3, Num t3, Num t2) => a (t3, t2) t -> t1 t calc_max_2d_elem arr = do m <- readArray arr (0,0) (_,(i_max, j_max)) <- getBounds arr let calc_max_loop arr m (i,j) | j == j_max = return m | otherwise = do e <- readArray arr (i,j) let m2 = max e m m2 `seq` calc_max_loop arr m2 nxt_idx where nxt_idx | i == i_max - 1 = (0,j+1) | otherwise = (i+1,j) calc_max_loop arr m (0,0)

Attachments:

attachment.html (text/html — 9.4 KB)

Show replies by date

Luke Palmer

2 Feb 2 Feb

11:26 a.m.

I prerequest your forgiveness if I sound patronizing, I'm just writing everything that comes to mind. 2008/2/2 Jeff φ :

...

{-# OPTIONS_GHC -fglasgow-exts -fbreak-on-exception #-}

-- normalize_ary This takes a mutable array. Determines the largest -- element in the array (max_elem) and then divides every element by -- max_elem. normalize_ary :: (Num t1, Num t, Ix t, Ix t1, MArray a e t2, Ord e, Fractional e, Enum t, Enum t1) => a (t, t1) e -> t2 ()

Yagh! Look at that type signature. That looks like it came from ghci. That type should raise a few alarms, such as the Num t, Num t1. Why should the indices be numbers? That indicates that your implementation is not as general as it should be, so maybe try another method. (Really it's calc_max_2d_elem which is losing that generality). I usually write my type signatures first, and then let that guide my implementation. But you will find differing valid opinions on this list on that issue. Anyway, without further ado, into the guts we go.

...

normalize_ary ary = do -- The following two commented out lines of code show my first -- attempt at determining a value for max_elem. However, this -- produces a stack overflow.

-- elem_ary <- getElems ary -- let max_elem = foldl1 max elem_ary

Hmm, how big is the array? If it's pretty big, that's understandable. Frankly, it's because foldl sucks: I have never seen a reason to use it. You should be using the strict variant foldl' here. (I don't think there is a foldl1'). And that will get rid of your big function calc_max_2d_elem.

...

max_elem <- calc_max_2d_elem ary max_elem `seq` map_in_place_2d_arr (\x -> x / max_elem) ary

I don't think that max_elem `seq` is doing anything useful here (but I could be missing something subtle). Oh and a really low level thing which may or may not make a difference: floating point division is expensive. You'd be better off precalculating 1 / max_elem and then multiplying by that instead.

...

map_in_place_2d_arr :: (MArray a e t, Enum t2, Enum t1, Ix t1, Ix t2) => (e -> e) -> a (t1, t2) e -> t ()

Another conspicuous type signature. Enum t2, Enum t1 is the red flag here. It's because you're using [i1..i2] instead of range (i1,i2) from Data.Ix.

...

map_in_place_2d_arr fn arr = ret where ret = do ((i1,j1),(i2,j2)) <- getBounds arr ( mapM_ (\i -> do v <- readArray arr i writeArray arr i (fn v) ) [(i,j) | i <- [i1..i2], j <- [j1..j2]])

This looks pretty good modulo the [i1..i2] I mentioned above. For this kind of stuff I prefer to use forM_, as it is a more imperative-looking construct for imperative-looking code (then you can lose the parentheses around (\i -> ...))...

...

calc_max_2d_elem :: (Ord t, MArray a t t1, Ix t2, Ix t3, Num t3, Num t2) => a (t3, t2) t -> t1 t calc_max_2d_elem arr = do m <- readArray arr (0,0) (_,(i_max, j_max)) <- getBounds arr let calc_max_loop arr m (i,j) | j == j_max = return m | otherwise = do e <- readArray arr (i,j) let m2 = max e m m2 `seq` calc_max_loop arr m2 nxt_idx where nxt_idx | i == i_max - 1 = (0,j+1) | otherwise = (i+1,j) calc_max_loop arr m (0,0)

Hopefully we have done away with this thing given the foldl' thing. There are a lot of implicit assumptions hiding in this code, such as indices being zero-based integers. Writing your type signature first would have caught those assumptions, since you wouldn't have had (Num t3, Num t2) ;-). Luke

Jeff φ

5 Feb 5 Feb

4:36 p.m.

I want to say thanks to everyone who responded to my mutable array post. I'm going to work through and experiment with all the comments people posted. It might take me a while. Luke Palmer wrote:

...

Hmm, how big is the array? If it's pretty big, that's understandable. Frankly, it's because foldl sucks: I have never seen a reason to use it. You should be using the strict variant foldl' here. (I don't think there is a foldl1'). And that will get rid of your big function calc_max_2d_elem.

I should have mentioned that I'm working with a 2D array that is 1024 x 1024. Eventually, this code will have to work with arrays that are much larger. (For fun I write image processing and fractal "art" programs.) I replaced the foldl1 with foldl1'. Unfortunately, I still get a stack overflow. Chaddaï Fouché wrote:

...

Sorry but none of those propositions change the heart of the problem : the list of elements is totally produced before she can be consumed due to the strict monadic (IO or ST) nature of getElems. Thus you get an extraordinary waste of memory as well as resources...

This is interesting. I've been programming in Concurrent Clean for a while. Instead of monads, Clean supports unique types for mutable arrays and IO. In Clean, I can write code that iterates through a mutable array by converting it to a lazy list. This is convenient because I can use all the nice list processing functions that are available. Changing the subject slightly, I once wrote code in Concurrent Clean that filtered a file that was larger than the available memory on my PC. I did this by creating a function that returned the contents of the original file as a lazy list. Then, I created functions to process the list and write the processed list to a results file. The code was not imperative at all. The function that wrote the results file forced the evaluation of the lazy list. As the lazy list was consumed, the contents of the original file were read. Is this possible with Monads in Haskell? Based on your comments, I suspect that in Haskell, one would have to explicitly code a loop that reads a portion of the original file, processed it, and writes a portion of the results file, over and over. By the way, if anyone wants to see it, I can post some Clean code that demonstrates the file processing I described. Clean's syntax is very similar to Haskell's. Thanks, Jeff

Kalman Noel

5:14 p.m.

Jeff φ wrote:

...

Changing the subject slightly, I once wrote code in Concurrent Clean that filtered a file that was larger than the available memory on my PC. I did this by creating a function that returned the contents of the original file as a lazy list.

Doing this is idiomatic in Haskell, although its usage is commonly discouraged in more complex UI settings because you cannot ever close the file handle until the end of the program. The relevant functions are to be found in the Prelude (or in Data.ByteString.Lazy, for that matter). ---------------------------------------------------------------------- Get a free email account with anti spam protection. http://www.bluebottle.com/tag/2

Luke Palmer

5:17 p.m.

On Feb 5, 2008 4:36 PM, Jeff φ wrote:

...

I want to say thanks to everyone who responded to my mutable array post. I'm going to work through and experiment with all the comments people posted. It might take me a while.

Luke Palmer wrote:

...
Hmm, how big is the array? If it's pretty big, that's understandable. Frankly, it's because foldl sucks: I have never seen a reason to use it. You should be using the strict variant foldl' here. (I don't think there is a foldl1'). And that will get rid of your big function calc_max_2d_elem.

I should have mentioned that I'm working with a 2D array that is 1024 x 1024. Eventually, this code will have to work with arrays that are much larger. (For fun I write image processing and fractal "art" programs.) I replaced the foldl1 with foldl1'. Unfortunately, I still get a stack overflow.

Right, that was my mistake. The reason is right here:

...

Chaddaï Fouché wrote:

...
Sorry but none of those propositions change the heart of the problem : the list of elements is totally produced before she can be consumed due to the strict monadic (IO or ST) nature of getElems. Thus you get an extraordinary waste of memory as well as resources...

This is interesting. I've been programming in Concurrent Clean for a while. Instead of monads, Clean supports unique types for mutable arrays and IO. In Clean, I can write code that iterates through a mutable array by converting it to a lazy list. This is convenient because I can use all the nice list processing functions that are available.

Changing the subject slightly, I once wrote code in Concurrent Clean that filtered a file that was larger than the available memory on my PC. I did this by creating a function that returned the contents of the original file as a lazy list. Then, I created functions to process the list and write the processed list to a results file. The code was not imperative at all. The function that wrote the results file forced the evaluation of the lazy list. As the lazy list was consumed, the contents of the original file were read. Is this possible with Monads in Haskell?

Yes, using hGetContents, which is considered bad practice by many people here. The problem is that hGetContents breaks referential transparency, and I suspect that whatever Clean does to lazily read files also does (though I can't be sure, I haven't looked in any detail at uniqueness types). That is, the contents of the returned list depend on when you read it, which is not allowed in a referentially transparent language. The same applies to your problem. getElems cannot return a lazy list of elements*, because what if the array were changed between the point that you did the getElems and the point you required the element. So it seems that actually specifying the order of evaluation using an imperative-style loop is the only pure way to do this. * Well, it could, but it would require some cleverness like copy-on-write logic under the hood. Luke

Bulat Ziganshin

6:06 p.m.

New subject: Re[2]: Mutable arrays

Hello Jeff, Tuesday, February 5, 2008, 7:36:27 PM, you wrote:

...

Changing the subject slightly, I once wrote code in Concurrent Clean that filtered a file that was larger than the available memory on my PC. Is this possible with Monads in Haskell?

google for "simple unix tools" -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Rodrigo Queiro

2 Feb 2 Feb

12:57 p.m.

This is my attempt at some nicer code: maximum' (x:xs) = foldl' max x xs maximum' _ = undefined modifyArray :: (MArray a e m, Ix i) => (e -> e) -> a i e -> m () modifyArray fn arr = do bounds <- getBounds arr forM_ (range bounds) (modifyElement fn arr) modifyElement :: (MArray a e m, Ix i) => (e -> e) -> a i e -> i -> m () modifyElement fn arr i = do x <- readArray arr i writeArray arr i (fn x) normalizeArray :: (MArray a e m, Ix i, Fractional e, Ord e) => a i e -> m () normalizeArray arr = do arr_elems <- getElems arr let max_elem = maximum' arr_elems modifyArray (/max_elem) arr On 02/02/2008, Jeff φ wrote:

...

Hello,

I'm trying to write code that will take a mutable 2D array and normalize it by dividing all elements by the largest element.

I managed to write code to do this, but it seems overly complex. I could write something much simpler in Clean or C++. Most likely, my code is complex because I don't have any experience with mutable arrays in Haskell. I couldn't find any tutorials on the Internet. I'd be grateful for suggestions on simplifying the following code. Thanks.

{-# OPTIONS_GHC -fglasgow-exts -fbreak-on-exception #-}

-- normalize_ary This takes a mutable array. Determines the largest -- element in the array (max_elem) and then divides every element by -- max_elem. normalize_ary :: (Num t1, Num t, Ix t, Ix t1, MArray a e t2, Ord e, Fractional e, Enum t, Enum t1) => a (t, t1) e -> t2 () normalize_ary ary = do -- The following two commented out lines of code show my first -- attempt at determining a value for max_elem. However, this -- produces a stack overflow.

-- elem_ary <- getElems ary -- let max_elem = foldl1 max elem_ary

max_elem <- calc_max_2d_elem ary max_elem `seq` map_in_place_2d_arr (\x -> x / max_elem) ary

map_in_place_2d_arr :: (MArray a e t, Enum t2, Enum t1, Ix t1, Ix t2) => (e -> e) -> a (t1, t2) e -> t () map_in_place_2d_arr fn arr = ret where ret = do ((i1,j1),(i2,j2)) <- getBounds arr ( mapM_ (\i -> do v <- readArray arr i writeArray arr i (fn v) ) [(i,j) | i <- [i1..i2], j <- [j1..j2]])

calc_max_2d_elem :: (Ord t, MArray a t t1, Ix t2, Ix t3, Num t3, Num t2) => a (t3, t2) t -> t1 t calc_max_2d_elem arr = do m <- readArray arr (0,0) (_,(i_max, j_max)) <- getBounds arr let calc_max_loop arr m (i,j) | j == j_max = return m | otherwise = do e <- readArray arr (i,j) let m2 = max e m m2 `seq` calc_max_loop arr m2 nxt_idx where nxt_idx | i == i_max - 1 = (0,j+1) | otherwise = (i+1,j) calc_max_loop arr m (0,0)

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Rodrigo Queiro

1:11 p.m.

Sorry, I was lazy. New maximum': maximum' = foldl1' max On 02/02/2008, Rodrigo Queiro wrote:

...

This is my attempt at some nicer code:

maximum' (x:xs) = foldl' max x xs maximum' _ = undefined

modifyArray :: (MArray a e m, Ix i) => (e -> e) -> a i e -> m () modifyArray fn arr = do bounds <- getBounds arr forM_ (range bounds) (modifyElement fn arr)

modifyElement :: (MArray a e m, Ix i) => (e -> e) -> a i e -> i -> m () modifyElement fn arr i = do x <- readArray arr i writeArray arr i (fn x)

normalizeArray :: (MArray a e m, Ix i, Fractional e, Ord e) => a i e -> m () normalizeArray arr = do arr_elems <- getElems arr let max_elem = maximum' arr_elems modifyArray (/max_elem) arr

On 02/02/2008, Jeff φ wrote:

...
Hello,

I'm trying to write code that will take a mutable 2D array and normalize it by dividing all elements by the largest element.

I managed to write code to do this, but it seems overly complex. I could write something much simpler in Clean or C++. Most likely, my code is complex because I don't have any experience with mutable arrays in Haskell. I couldn't find any tutorials on the Internet. I'd be grateful for suggestions on simplifying the following code. Thanks.

{-# OPTIONS_GHC -fglasgow-exts -fbreak-on-exception #-}

-- normalize_ary This takes a mutable array. Determines the largest -- element in the array (max_elem) and then divides every element by -- max_elem. normalize_ary :: (Num t1, Num t, Ix t, Ix t1, MArray a e t2, Ord e, Fractional e, Enum t, Enum t1) => a (t, t1) e -> t2 () normalize_ary ary = do -- The following two commented out lines of code show my first -- attempt at determining a value for max_elem. However, this -- produces a stack overflow.

-- elem_ary <- getElems ary -- let max_elem = foldl1 max elem_ary

max_elem <- calc_max_2d_elem ary max_elem `seq` map_in_place_2d_arr (\x -> x / max_elem) ary

map_in_place_2d_arr :: (MArray a e t, Enum t2, Enum t1, Ix t1, Ix t2) => (e -> e) -> a (t1, t2) e -> t () map_in_place_2d_arr fn arr = ret where ret = do ((i1,j1),(i2,j2)) <- getBounds arr ( mapM_ (\i -> do v <- readArray arr i writeArray arr i (fn v) ) [(i,j) | i <- [i1..i2], j <- [j1..j2]])

calc_max_2d_elem :: (Ord t, MArray a t t1, Ix t2, Ix t3, Num t3, Num t2) => a (t3, t2) t -> t1 t calc_max_2d_elem arr = do m <- readArray arr (0,0) (_,(i_max, j_max)) <- getBounds arr let calc_max_loop arr m (i,j) | j == j_max = return m | otherwise = do e <- readArray arr (i,j) let m2 = max e m m2 `seq` calc_max_loop arr m2 nxt_idx where nxt_idx | i == i_max - 1 = (0,j+1) | otherwise = (i+1,j) calc_max_loop arr m (0,0)

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Chaddaï Fouché

5:11 p.m.

2008/2/2, Rodrigo Queiro :

...

Sorry, I was lazy. New maximum': maximum' = foldl1' max

Sorry but none of those propositions change the heart of the problem : the list of elements is totally produced before she can be consumed due to the strict monadic (IO or ST) nature of getElems. Thus you get an extraordinary waste of memory as well as resources... To address this I propose this function : foldl1MArray' :: (MArray a e m, Ix i) => (e -> e -> e) -> a i e -> m e foldl1MArray' f a = do (l,u) <- getBounds a firstElem <- readArray a l foldM (\a mb -> a `seq` mb >>= return . f a) firstElem (map (readArray a) (range (l,u))) With this, we can rewrite the original program using the excellent modifyArray from Rodrigo : normalizeArray :: (MArray a e m, Ix i, Fractional e, Ord e) => a i e -> m () normalizeArray arr = do max_elem <- foldl1MArray' max arr modifyArray (* (1/max_elem)) arr -- Jedaï

Jeff φ

7 Feb 7 Feb

4:34 p.m.

On Feb 2, 2008 12:11 PM, Chaddaï Fouché wrote:

...

To address this I propose this function : foldl1MArray' :: (MArray a e m, Ix i) => (e -> e -> e) -> a i e -> m e foldl1MArray' f a = do (l,u) <- getBounds a firstElem <- readArray a l foldM (\a mb -> a `seq` mb >>= return . f a) firstElem (map (readArray a) (range (l,u)))

I played with your foldl1MArray' last night. I noticed it can be reduced to . . . foldl1MArray' :: (MArray a e m, Ix i) => (e -> e -> e) -> a i e -> m e foldl1MArray' f a = do (l,u) <- getBounds a foldl1' (liftM2 f) (map (readArray a) (range (l,u))) Unfortunately, my new version consumes a lot of stack and heap space. Why is this so inefficient? Is there a simple change that will make it efficient?

Chaddaï Fouché

6:21 p.m.

2008/2/7, Jeff φ :

...

I played with your foldl1MArray' last night. I noticed it can be reduced to . . .

foldl1MArray' :: (MArray a e m, Ix i) => (e -> e -> e) -> a i e -> m e foldl1MArray' f a = do (l,u) <- getBounds a foldl1' (liftM2 f) (map (readArray a) (range (l,u)))

Unfortunately, my new version consumes a lot of stack and heap space. Why is this so inefficient? Is there a simple change that will make it efficient?

This code don't compute the results incrementally, it can't because foldl1' is not aware of the monad, it only construct a huge action in m which is then run. foldM advantage is that it can run incrementally. Which is not to say my code was the best possible (far from it), already the following would have been better :

...

foldlA f a arr = getBounds arr >>= foldM (\a->a `seq` liftM $ f a) a . map (readArray arr) . range

...

foldl1A f arr = getBounds arr >>= readArray arr . fst >>= flip (foldlA f) arr

-- Jedaï

Stefan O'Rear

2 Feb 2 Feb

5:28 p.m.

On Sat, Feb 02, 2008 at 12:57:47PM +0000, Rodrigo Queiro wrote:

...

This is my attempt at some nicer code:

maximum' (x:xs) = foldl' max x xs maximum' _ = undefined

modifyArray :: (MArray a e m, Ix i) => (e -> e) -> a i e -> m () modifyArray fn arr = do bounds <- getBounds arr forM_ (range bounds) (modifyElement fn arr)

modifyElement :: (MArray a e m, Ix i) => (e -> e) -> a i e -> i -> m () modifyElement fn arr i = do x <- readArray arr i writeArray arr i (fn x)

normalizeArray :: (MArray a e m, Ix i, Fractional e, Ord e) => a i e -> m () normalizeArray arr = do arr_elems <- getElems arr let max_elem = maximum' arr_elems modifyArray (/max_elem) arr

Note that by using getElems, you are throwing away most of the advantages of arrays, since it is strict (it has to be, since it's effectively an IO function and lazy IO is unsound wrt Haskell's normal semantics) and converts the whole thing into a list. If I just had this one bit of code to do, I'd use explicit loop: normalizeArray arr = do b <- getBounds arr ; m <- findMax b forM_ (range b) (edit m) where findMax (i:is) = findMax' is =<< readArray arr i findMax' (i:is) !v = findMax' is . max v =<< readArray arr i findMax' [] !v = return v edit mx i = writeArray arr i . (/mx) =<< readArray arr i With a little more, I'd probably set the scene with a few array-modifying combinators, inspired by Oleg's left-fold idea: -- yes, I'm passing four arguments to foldr. this is not a mistake. foldA fn ac arr = getBounds arr >>= \b -> foldr (\ i ct acc -> ct =<< fn i ac =<< readArray arr i) (\_ -> return ac) (range b) ac foldAp fn = foldA (\i a b -> return (fn a b)) maxA = foldAp max minBound mapA fn ar = foldA (\i _ v -> writeArray ar i (fn v)) () ar normalize arr = maxA arr >>= \ m -> mapA (/m) arr Stefan

Henning Thielemann

8:34 p.m.

On Sat, 2 Feb 2008, [ISO-8859-7] Jeff ö wrote:

...

Hello,

I'm trying to write code that will take a mutable 2D array and normalize it by dividing all elements by the largest element.

Are you sure you need the arrays to be mutable? Maybe it's fast enough to do the copying - it's significantly easier anyway. If all operations run over the whole array, like the normalization, then it's not much a matter speed, but only a matter of memory. That is you need the double amount of memory, because the data can be processed and copied forth and back. You can even reduce this further, if you can come up with an optimizer fusion framework. With immutable arrays you can easily implement let xm = maximum (Array.elems arr) in fmap (/xm) arr Ok, better use the strict "maximum'" proposed by others in this thread.

6355

Age (days ago)

6360

Last active (days ago)

List overview

Download

12 comments

8 participants

participants (8)

Bulat Ziganshin
Chaddaï Fouché
Henning Thielemann
Jeff φ
Kalman Noel
Luke Palmer
Rodrigo Queiro
Stefan O'Rear

Mutable arrays

Jeff φ

Luke Palmer

Jeff φ

Kalman Noel

Luke Palmer

Bulat Ziganshin

Rodrigo Queiro

Rodrigo Queiro

Chaddaï Fouché

Jeff φ

Chaddaï Fouché

Stefan O'Rear

Henning Thielemann

tags

participants (8)