Lazy lists simulated by unboxed mutable arrays

newer
I/O without monads, using an event...

Henning Thielemann

28 May 2008 28 May '08

5:51 a.m.

I wonder whether the following idea has been investigated or implemented somewhere: We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?

Show replies by date

Bulat Ziganshin

28 May 28 May

6:06 a.m.

Hello Henning, Wednesday, May 28, 2008, 9:51:28 AM, you wrote:

...

We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?

looks like lazy.bytestring generalized to any a -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Henning Thielemann

8:22 a.m.

On Wed, 28 May 2008, Bulat Ziganshin wrote:

...

Hello Henning,

Wednesday, May 28, 2008, 9:51:28 AM, you wrote:

...
We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?

looks like lazy.bytestring generalized to any a

As far as I know, ByteString.Lazy is chunky, that is laziness occurs only every 1000th byte or so. My suggestion aims at laziness at element level but still more efficiency than a list.

Bulat Ziganshin

8:31 a.m.

New subject: Re[2]: Lazy lists simulated by unboxed mutable arrays

Hello Henning, Wednesday, May 28, 2008, 12:22:54 PM, you wrote:

...

...
...
Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced.

...

As far as I know, ByteString.Lazy is chunky, that is laziness occurs only every 1000th byte or so. My suggestion aims at laziness at element level but still more efficiency than a list.

well, i don't understand difference between your idea and lazybs implementation -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Ketil Malde

8:44 a.m.

Bulat Ziganshin writes:

...

well, i don't understand difference between your idea and lazybs implementation

HT said earlier that:

...

...
This would still allow the nice tricks for recursive Fibonacci sequence definition.

Which I guess refers to something like: fibs = 1 : 1 : zipWith (+) fibs (tail fibs) I don't think you can do that with LBS, since you'd need to calculate a whole chunk at a time, and for any chunk size > 1, each chunk depends on itself. -k -- If I haven't seen further, it is by standing in the footprints of giants

Henning Thielemann

9:24 a.m.

On Wed, 28 May 2008, Ketil Malde wrote:

...

Bulat Ziganshin writes:

...
well, i don't understand difference between your idea and lazybs implementation

HT said earlier that:

...
...
This would still allow the nice tricks for recursive Fibonacci sequence definition.

Which I guess refers to something like:

fibs = 1 : 1 : zipWith (+) fibs (tail fibs)

I don't think you can do that with LBS, since you'd need to calculate a whole chunk at a time, and for any chunk size > 1, each chunk depends on itself.

Right, that's what I meant.

Henning Thielemann

30 May 30 May

6:59 p.m.

On Wed, 28 May 2008, Ketil Malde wrote:

...

Bulat Ziganshin writes:

...
well, i don't understand difference between your idea and lazybs implementation

HT said earlier that:

...
...
This would still allow the nice tricks for recursive Fibonacci sequence definition.

Which I guess refers to something like:

fibs = 1 : 1 : zipWith (+) fibs (tail fibs)

I don't think you can do that with LBS, since you'd need to calculate a whole chunk at a time, and for any chunk size > 1, each chunk depends on itself.

I have now implemented a small prototype: http://code.haskell.org/storablevector/Data/StorableVector/Cursor.hs Actually you can run the Fibonacci example but it allocates three arrays: let f2 = zipNWith 15 (+) f0 f1; f1 = cons 1 f2; f0 = cons (0::Int) f1 in f0 I'm afraid the compiler cannot optimize the IORefs to unboxed values, even in registers, because in principle they can be modified from everywhere in the program. Is there a better way than using IORefs hidden by unsafePerformIO? {-# OPTIONS_GHC -cpp -fglasgow-exts #-} {- | Simulate a list with strict elements by a more efficient array structure. -} module Data.StorableVector.Cursor where import Control.Exception (assert, ) import Control.Monad.State (StateT(StateT), runStateT, ) import Data.IORef (IORef, newIORef, readIORef, writeIORef, ) import Foreign.Storable (Storable(peekElemOff, pokeElemOff)) import Foreign.ForeignPtr (ForeignPtr, mallocForeignPtrArray, withForeignPtr, ) import Control.Monad (when) import Data.Maybe (isNothing) import System.IO.Unsafe (unsafePerformIO) import Data.StorableVector.Utility (viewListL, mapSnd, ) import Prelude hiding (length, foldr, zipWith, ) -- | Cf. StreamFusion Data.Stream data Generator a = forall s. -- Seq s => Generator { next :: {-# UNPACK #-} !(StateT s Maybe a), -- compute next value state :: {-# UNPACK #-} !(IORef (Maybe s)) -- current state } {- | This simulates a @ data StrictList a = Elem !a (StrictList a) | End @ by an array and some unsafe hacks. -} data Buffer a = Buffer { memory :: {-# UNPACK #-} !(ForeignPtr a), size :: {-# UNPACK #-} !Int, -- size of allocated memory gen :: {-# UNPACK #-} !(Generator a), cursor :: {-# UNPACK #-} !(IORef Int) } {- | Vector is a part of a buffer. -} data Vector a = Vector { buffer :: {-# UNPACK #-} !(Buffer a), start :: {-# UNPACK #-} !Int, -- invariant: start <= cursor maxLen :: {-# UNPACK #-} !Int -- invariant: start+maxLen <= size } -- * construction {-# INLINE create #-} create :: (Storable a) => Int -> Generator a -> Buffer a create l g = unsafePerformIO (createIO l g) -- | Wrapper of mallocForeignPtrArray. createIO :: (Storable a) => Int -> Generator a -> IO (Buffer a) createIO l g = do fp <- mallocForeignPtrArray l cur <- newIORef 0 return $! Buffer fp l g cur {- | @ unfoldrNTerm 20 (\n -> Just (n, succ n)) 'a' @ -} unfoldrNTerm :: (Storable b) => Int -> (a -> Maybe (b, a)) -> a -> Vector b unfoldrNTerm i f x0 = unsafePerformIO (unfoldrNTermIO i f x0) unfoldrNTermIO :: (Storable b) => Int -> (a -> Maybe (b, a)) -> a -> IO (Vector b) unfoldrNTermIO i f x0 = do ref <- newIORef (Just x0) buf <- createIO i (Generator (StateT f) ref) return (Vector buf 0 i) {-# INLINE pack #-} pack :: (Storable a) => Int -> [a] -> Vector a pack n = unfoldrNTerm n viewListL {-# INLINE cons #-} {- | This is expensive and should not be used to construct lists iteratively! -} cons :: Storable a => a -> Vector a -> Vector a cons x xs = unfoldrNTerm (succ (maxLen xs)) (\(mx0,xs0) -> fmap (mapSnd ((,) Nothing)) $ maybe (viewL xs0) (\x0 -> Just (x0, xs0)) mx0) $ (Just x, xs) {-# INLINE zipWith #-} zipWith :: (Storable a, Storable b, Storable c) => (a -> b -> c) -> Vector a -> Vector b -> Vector c zipWith f ps0 qs0 = zipNWith (min (maxLen ps0) (maxLen qs0)) f ps0 qs0 {-# INLINE zipNWith #-} zipNWith :: (Storable a, Storable b, Storable c) => Int -> (a -> b -> c) -> Vector a -> Vector b -> Vector c zipNWith n f ps0 qs0 = unfoldrNTerm n (\(ps,qs) -> do (ph,pt) <- viewL ps (qh,qt) <- viewL qs return (f ph qh, (pt,qt))) (ps0,qs0) -- * inspection -- | evaluate next value in a buffer advanceIO :: Storable a => Buffer a -> IO () advanceIO (Buffer p sz (Generator n s) cr) = do c <- readIORef cr assert (c < sz) $ do writeIORef cr (succ c) ms <- readIORef s case ms of Nothing -> return () Just s0 -> case runStateT n s0 of Nothing -> writeIORef s Nothing Just (a,s1) -> writeIORef s (Just s1) >> withForeignPtr p (\q -> pokeElemOff q c a) {-# INLINE switchL #-} switchL :: Storable a => b -> (a -> Vector a -> b) -> Vector a -> b switchL n j v = unsafePerformIO (switchLIO n j v) switchLIO :: Storable a => b -> (a -> Vector a -> b) -> Vector a -> IO b switchLIO n j v@(Vector buf st ml) = nullIO v >>= \ isNull -> if isNull then return n else do c <- readIORef (cursor buf) assert (st <= c) $ when (st == c) (advanceIO buf) x <- withForeignPtr (memory buf) (\p -> peekElemOff p st) let tl = assert (ml>0) $ Vector buf (succ st) (pred ml) return (j x tl) {-# INLINE viewL #-} viewL :: Storable a => Vector a -> Maybe (a, Vector a) viewL = switchL Nothing (curry Just) {-# INLINE foldr #-} foldr :: (Storable a) => (a -> b -> b) -> b -> Vector a -> b foldr k z = let recurse = switchL z (\h t -> k h (recurse t)) in recurse {-# INLINE unpack #-} unpack :: (Storable a) => Vector a -> [a] unpack = foldr (:) [] instance (Show a, Storable a) => Show (Vector a) where showsPrec p x = showsPrec p (unpack x) {-# INLINE null #-} null :: Vector a -> Bool null = unsafePerformIO . nullIO nullIO :: Vector a -> IO Bool nullIO (Vector (Buffer _ sz (Generator _ s) _) st _) = do b <- readIORef s return (st >= sz || isNothing b)

Ketil Malde

28 May 28 May

8:40 a.m.

Henning Thielemann writes:

...

...
...
We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. [...]

...

...
looks like lazy.bytestring generalized to any a

...

As far as I know, ByteString.Lazy is chunky, that is laziness occurs only every 1000th byte or so. My suggestion aims at laziness at element level but still more efficiency than a list.

You've lost me at least. Presumably, you don't want single-element UArrays, so you want a chunky data structure, but fine-grained laziness. As far as I can tell, this implies that the UArray is split (at the position indicated by the cursor?) into an evaluated part, and an unevaluated part. Now, if you evaluate one more element, you need to calculate its value and create a new UArray (and cursor) to replace the old one(s). Unless you do the whole thing in the ST or IO monads, I can't see how you can implement this efficiently. Where did I misunderstand you? Perhaps you can sketch some code? -k -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde

11:16 a.m.

Ketil Malde writes:

...

You've lost me at least.

...but perhaps I can find my way back on my own? Today, you can choose between Array, with lazy elements, or UArray, with strict elements. Lazy arrays have the elements defined in advance, strict ones have them calculated in advance - with the tremendous advantage of being able to eliminate the pointer for each element. Otherwise a pointer is needed to point to a potentially unevaluated thunk. Perhaps there is a middle ground here, if you add the restriction that the elements are generated in order? This way, you only need one pointer to an unevaluated thunk (which will yield all remaining elements as needed), and an unboxed array can contain the calculated values. This would be very nice for e.g. sequence alignment, where the alignment matrix is self-referencing, but the pointers represent a very real cost to an already expensive (resource-intesive) solution. -k -- If I haven't seen further, it is by standing in the footprints of giants

Henning Thielemann

6:35 p.m.

On Wed, 28 May 2008, Ketil Malde wrote:

...

Ketil Malde writes:

...
You've lost me at least.

...but perhaps I can find my way back on my own?

Today, you can choose between Array, with lazy elements, or UArray, with strict elements.

... and ByteStrings, where in principle the elements could be initialized in any order, but actually the ByteString functions prefer a left-to-right order. They are clearly intended as list replacement, so my proposed "cursor arrays" would do as well.

...

Lazy arrays have the elements defined in advance, strict ones have them calculated in advance - with the tremendous advantage of being able to eliminate the pointer for each element. Otherwise a pointer is needed to point to a potentially unevaluated thunk.

Perhaps there is a middle ground here, if you add the restriction that the elements are generated in order?

Exactly. Thus I compared my suggestion with element-strict lists.

...

This way, you only need one pointer to an unevaluated thunk (which will yield all remaining elements as needed), and an unboxed array can contain the calculated values.

That's it!

...

This would be very nice for e.g. sequence alignment, where the alignment matrix is self-referencing, but the pointers represent a very real cost to an already expensive (resource-intesive) solution.

I'm thinking about signal processing, where data is processed in time order in many cases. Thank you for clarification!

Duncan Coutts

8:50 a.m.

On Wed, 2008-05-28 at 10:22 +0200, Henning Thielemann wrote:

...

On Wed, 28 May 2008, Bulat Ziganshin wrote:

...
Hello Henning,

Wednesday, May 28, 2008, 9:51:28 AM, you wrote:

...
We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?

looks like lazy.bytestring generalized to any a

As far as I know, ByteString.Lazy is chunky, that is laziness occurs only every 1000th byte or so. My suggestion aims at laziness at element level but still more efficiency than a list.

How about a chunky lazy array? The problem with lazy bytestring is that each chunk is strict but if you made each chunk a H98 lazy array that might work nicely. Duncan

Henning Thielemann

9:30 a.m.

On Wed, 28 May 2008, Duncan Coutts wrote:

...

On Wed, 2008-05-28 at 10:22 +0200, Henning Thielemann wrote:

...
On Wed, 28 May 2008, Bulat Ziganshin wrote:

...
Hello Henning,

Wednesday, May 28, 2008, 9:51:28 AM, you wrote:

...
We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?

looks like lazy.bytestring generalized to any a

As far as I know, ByteString.Lazy is chunky, that is laziness occurs only every 1000th byte or so. My suggestion aims at laziness at element level but still more efficiency than a list.

How about a chunky lazy array? The problem with lazy bytestring is that each chunk is strict but if you made each chunk a H98 lazy array that might work nicely.

This would actually yield the intended result, but I'm afraid there are still too much pointers and allocations around. The nice thing about my suggestion is, that no allocation would be needed to construct new elements in the list, just running the function to evaluate the next elements and advance the cursor. It's a hack of course, like ByteString internally is, it's entirely for efficiency reasons. I interpret your answers as "what you suggest is not obviuous and has probably not been proposed before" so I'll take some time for a demonstration implementation.

Andrew Coppin

6:37 p.m.

Bulat Ziganshin wrote:

...

Hello Henning,

looks like lazy.bytestring generalized to any a

That sounds like a darn useful thing to have... [OTOH, currently unboxed arrays are available only for a select few types, so good luck implementing this in a clean way!]

Bulat Ziganshin

8:29 p.m.

New subject: Re[2]: Lazy lists simulated by unboxed mutable arrays

Hello Andrew, Wednesday, May 28, 2008, 10:37:47 PM, you wrote:

...

...
looks like lazy.bytestring generalized to any a That sounds like a darn useful thing to have...

well, support on only Word8 as base type isn't some fundamental limit, just creators of bytestring package was not very interested in support of other base types

...

[OTOH, currently unboxed arrays are available only for a select few types, so good luck implementing this in a clean way!]

a few years ago, storable array was slower than special primitives created for UArray implementation. now they should have equal performance, so it's possible to make UArray implementation that will work with any Storable type. Simon, can you please confirm or refute this? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Don Stewart

8:41 p.m.

bulat.ziganshin:

...

Hello Andrew,

Wednesday, May 28, 2008, 10:37:47 PM, you wrote:

...
...
looks like lazy.bytestring generalized to any a That sounds like a darn useful thing to have...

well, support on only Word8 as base type isn't some fundamental limit, just creators of bytestring package was not very interested in support of other base types

...
[OTOH, currently unboxed arrays are available only for a select few types, so good luck implementing this in a clean way!]

a few years ago, storable array was slower than special primitives created for UArray implementation. now they should have equal performance, so it's possible to make UArray implementation that will work with any Storable type. Simon, can you please confirm or refute this?

Sure. I've been working on a general vector library based on a similar type to STUArray, but pure, with a bytestring-like interface. http://code.haskell.org/~dons/uvector Generalised unlifted vectors. Performance looks rather good (typically loops unfold into register variables). This is a restricted form of the general data parallel arrays library. -- Don

Ketil Malde

9:32 p.m.

Don Stewart writes:

...

http://code.haskell.org/~dons/uvector

http://code.haskell.org/~dons/code/uvector (I presume? The other URL gives a 404) -k -- If I haven't seen further, it is by standing in the footprints of giants

Henning Thielemann

29 May 29 May

6 a.m.

New subject: Re[2]: Lazy lists simulated by unboxed mutable arrays

On Thu, 29 May 2008, Bulat Ziganshin wrote:

...

Hello Andrew,

Wednesday, May 28, 2008, 10:37:47 PM, you wrote:

...
...
looks like lazy.bytestring generalized to any a That sounds like a darn useful thing to have...

well, support on only Word8 as base type isn't some fundamental limit, just creators of bytestring package was not very interested in support of other base types

Just in order to remind you, ByteString for any Storable types does already exist: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/storablevector Current repository is here: http://code.haskell.org/storablevector/

Don Stewart

28 May 28 May

5:50 p.m.

lemming:

...

I wonder whether the following idea has been investigated or implemented somewhere: We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End

I've used the above structure itself, as a useful alternative to fully lazy lists.

...

by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?

That sounds a lot like the semi-eager structure, Lazy ByteStrings, which do cache sized chunks of evaluation before suspending. With the cursor in place, it would behave more like the buffer abstraction to lazy bytestrings in Data.Binary? -- Don

Henning Thielemann

6:50 p.m.

On Wed, 28 May 2008, Don Stewart wrote:

...

...
by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?

That sounds a lot like the semi-eager structure, Lazy ByteStrings, which do cache sized chunks of evaluation before suspending. With the cursor in place, it would behave more like the buffer abstraction to lazy bytestrings in Data.Binary?

Can you code fibs = 0 : 1 : zipWith (+) fibs (tail fibs) with hte 'buffer'? I'm afraid you cannot simultaneously read and write from 'buffer', cannot 'drop' and so on, right? What I have in mind is some combination of a 'Data.Stream' for generating the data (playing the role of the unevaluated thunk), a memory chunk for buffering calculated data and a wrapper which provides a view on a sub-array. Ideally 'fibs' would be translated to something like int *p = malloc ...; int *p0 = p; *p = 0; p++; int *p1 = p; *p = 1; p++; int *p2 = p; int i=n; while (i>0) { *p2 = *p0 + *p1; p0++; p1++; p2++; i--; } I'm not sure, whether the compiler can eliminate the last bit of laziness that would remain in a 'cursor array'.

6243

Age (days ago)

6245

Last active (days ago)

List overview

Download

18 comments

6 participants

participants (6)

Andrew Coppin
Bulat Ziganshin
Don Stewart
Duncan Coutts
Henning Thielemann
Ketil Malde