Efficient mutable arrays in STM

I have an application in mind where concurrent access to large arrays (up to millions of elements) of mostly small elements (Int or Double) is common. Typical access patterns would be chunk-wise, i.e. reads or writes from index n up to index m. Imagine stuff like images, scientific data, etc. All this suggests that Control.Concurrent.STM.TArray, in its current implementation, is not appropriate. Quoting the docs: "It is currently implemented as Array ix (TVar e), but it may be replaced by a more efficient implementation in the future (the interface will remain the same, however)." An array of TVars is certainly *much* too inefficient for what I have in mind w.r.t. both memory and cpu time. In fact I had already decided to use Data.Vector.Unboxed from the vector package. I see that Data.Vector.Unboxed.Mutable provides slice :: Unbox a => Int -> Int -> MVector s a -> MVector s a Yield a part of the mutable vector without copying it. which is almost what I need... Can I use this together with unsafeIOToSTM internally inside a library to provide shared transactional access to an IOArray? The docs warn that using unsafeIOToSTM is (obviously) "highly dangerous", but for what I have in mind the listed problems are not an issue: * running the IO code multiple times is ok * aborting is ok, too * inconsistent views are ok, too The main question is: does the STM transaction actually "see" that I changed part of the underlying array, so that the transaction gets re-tried? Or do I have to implement this manually, and if yes: how? Has anyone done something like that before? (If you tried this and found it doesn't work, please tell me, it would save me a lot of work repeating the effort). Is someone working on providing a more efficient version of TArray? Would it help if I said I'd be a happy user of a better TArray? ;-) If what I sketched above is infeasible (I would not be surprised if it was, I haven't yet spent much effort trying it out), what other options do I have? Is there an internal API for the STM stuff, i.e. a C header file or something which would make it possible to add efficient TArrays? Or should I use a high-level approach, something like a Data.Sequence.Seq of medium sized chunks (TVar (IOVector e))? Any comments are highly appreciated! Cheers Ben

As far as I lnow the function 'unsafeIOToSTM' is not transactional in nature
- IO actions will be performed immediately and are not rolled back, and are
then re-performed on retry.
On Oct 25, 2011 12:49 PM, "Ben Franksen"
I have an application in mind where concurrent access to large arrays (up to millions of elements) of mostly small elements (Int or Double) is common. Typical access patterns would be chunk-wise, i.e. reads or writes from index n up to index m. Imagine stuff like images, scientific data, etc.
All this suggests that Control.Concurrent.STM.TArray, in its current implementation, is not appropriate. Quoting the docs:
"It is currently implemented as Array ix (TVar e), but it may be replaced by a more efficient implementation in the future (the interface will remain the same, however)."
An array of TVars is certainly *much* too inefficient for what I have in mind w.r.t. both memory and cpu time. In fact I had already decided to use Data.Vector.Unboxed from the vector package.
I see that Data.Vector.Unboxed.Mutable provides
slice :: Unbox a => Int -> Int -> MVector s a -> MVector s a Yield a part of the mutable vector without copying it.
which is almost what I need... Can I use this together with unsafeIOToSTM internally inside a library to provide shared transactional access to an IOArray? The docs warn that using unsafeIOToSTM is (obviously) "highly dangerous", but for what I have in mind the listed problems are not an issue:
* running the IO code multiple times is ok * aborting is ok, too * inconsistent views are ok, too
The main question is: does the STM transaction actually "see" that I changed part of the underlying array, so that the transaction gets re-tried? Or do I have to implement this manually, and if yes: how?
Has anyone done something like that before? (If you tried this and found it doesn't work, please tell me, it would save me a lot of work repeating the effort).
Is someone working on providing a more efficient version of TArray?
Would it help if I said I'd be a happy user of a better TArray? ;-)
If what I sketched above is infeasible (I would not be surprised if it was, I haven't yet spent much effort trying it out), what other options do I have? Is there an internal API for the STM stuff, i.e. a C header file or something which would make it possible to add efficient TArrays?
Or should I use a high-level approach, something like a Data.Sequence.Seq of medium sized chunks (TVar (IOVector e))?
Any comments are highly appreciated!
Cheers Ben
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, Oct 25, 2011 at 10:47 AM, Ben Franksen
The main question is: does the STM transaction actually "see" that I changed
part of the underlying array, so that the transaction gets re-tried? Or do I
have to implement this manually, and if yes: how?
Create an extra TVar Int for every `chunk` in the array (e.g every 256 elements, tuned to your update patterns). Read-write it (increment it, be sure to force evaluation) just before every time you write an element or slice it or slice the array element. The IO mutable array is then adjusted unsafely, but there is enough transactional context to restart transactions that see an inconsistent state. You will have extra read/write and write/write conflicts relative to a pure STM solution, but only within each chunk. A cleaner alternative is to create a `chunked` TArray, i.e. that works with fixed-width immutable chunks in a spine. This should have good performance characteristics, and would be a lot safer for general purpose use. Regards, Dave

David Barbour wrote:
On Tue, Oct 25, 2011 at 10:47 AM, Ben Franksen
wrote: The main question is: does the STM transaction actually "see" that I changed
part of the underlying array, so that the transaction gets re-tried? Or do I
have to implement this manually, and if yes: how?
Create an extra TVar Int for every `chunk` in the array (e.g every 256 elements, tuned to your update patterns). Read-write it (increment it, be sure to force evaluation) just before every time you write an element or slice it or slice the array element. The IO mutable array is then adjusted unsafely, but there is enough transactional context to restart transactions that see an inconsistent state. You will have extra read/write and write/write conflicts relative to a pure STM solution, but only within each chunk.
Your idea is quite similar to what I had in mind, and it encourages me that you think it should be possible to do it like that. My idea was to use variable-size chunks, based on which slices are currently in use and how they overlap, i.e. calculate the maximal non-overlapping index intervals. Such a solution would automatically adapt to the usage pattern, but is harder to implement efficiently.
A cleaner alternative is to create a `chunked` TArray, i.e. that works with fixed-width immutable chunks in a spine. This should have good performance characteristics, and would be a lot safer for general purpose use.
This is also an interesting idea, probably much easier to implement, too. Thanks for the feedback Cheers Ben

David Barbour wrote:
Create an extra TVar Int for every `chunk` in the array (e.g every 256 elements, tuned to your update patterns). Read-write it (increment it, be sure to force evaluation) just before every time you write an element or slice it or slice the array element.
Incrementing and forcing evaluation should not be necessary, a TVar () should be enough. I would be very much surprised if the internal STM machinery compares the actual _values_ of what is inside a TVar, I guess it just notes that a read or a write happened. Anyway, this is just a guess, I wonder if these details are documented somewhere... Cheers Ben

Benjamin Franksen wrote:
David Barbour wrote:
Create an extra TVar Int for every `chunk` in the array (e.g every 256 elements, tuned to your update patterns). Read-write it (increment it, be sure to force evaluation) just before every time you write an element or slice it or slice the array element.
Incrementing and forcing evaluation should not be necessary, a TVar () should be enough.
I was wrong, though not completely.
I would be very much surprised if the internal STM machinery compares the actual _values_ of what is inside a TVar, I guess it just notes that a read or a write happened.
According to the original STM paper the implementation does an equality test, albeit only for pointer equality. So, one should make sure that whatever is written to the TVar is a new object. An incremented integer would probably be ok, (no need to evaluate it, since the closure is newly allocated, thus a new object), a little more on the safe side is a new TVar i.e. use TVar (TVar ()). Cheers Ben

On Sun, Oct 30, 2011 at 6:20 PM, Ben Franksen
According to the original STM paper the implementation does an equality test, albeit only for pointer equality.
It strikes me as bad form to depend on characteristics of `the implementation`. An incremented integer would probably be ok, (no need to evaluate it, since the closure is newly allocated, thus a new object) Evaluation would be necessary to avoid a subtle space-leak with laziness semantics. The size of the closure is potentially linear with the number of allocations. A little more on the safe side is a new TVar
That works too. Regards, Dave

Ben Franksen
An array of TVars is certainly *much* too inefficient for what I have in mind w.r.t. both memory and cpu time.
You must be a lot more confident than I if you say this without benchmarking first. :-) IME, there are (at least) two possible problems here, 1) transactions scale (quadratically, I think) with the number of TVars touched, so if any transaction touch a large part of the array, it's going to cost you, and 2) every element of your array will have a pointer to it, making GC potentially expensive. Perhaps you can get around the latter by tuning GC, e.g. +RTS -A100M might help.
Or should I use a high-level approach, something like a Data.Sequence.Seq of medium sized chunks (TVar (IOVector e))?
I'm not sure exactly what you mean here, but if you're going to touch contigous segments of the array, why not TArray (Vector e) or similar? -k -- If I haven't seen further, it is by standing in the footprints of giants

On Tue, Oct 25, 2011 at 1:24 PM, Ketil Malde
You must be a lot more confident than I if you say this without benchmarking first. :-) IME, there are (at least) two possible problems here, 1) transactions scale (quadratically, I think) with the number of TVars touched, so if any transaction touch a large part of the array, it's going to cost you, [...]
That woud remain true no matter what, but the current quadratic behaviour is I believe easily enough fixed by switching to a better data structure than a list.

Ketil Malde wrote:
Ben Franksen
writes: An array of TVars is certainly *much* too inefficient for what I have in mind w.r.t. both memory and cpu time.
You must be a lot more confident than I if you say this without benchmarking first. :-)
Ok, not science, but an informed guess based on what I read about how STM is implemented in ghc. Cache locality is one of the main reasons why unboxed arrays perform so much better in practice than boxed ones, and TVars are most certainly boxed...
IME, there are (at least) two possible problems here, 1) transactions scale (quadratically, I think) with the number of TVars touched,
Ouch! What would be the reason for that? I thought it would be linear... I mean what happens is that the transaction log gets built when the transaction runs, which should take a constant time per TVar, and then on commit we have to traverse the log, which is again linear in the number of TVars...
so if any transaction touch a large part of the array, it's going to cost you, and 2) every element of your array will have a pointer to it, making GC potentially expensive. Perhaps you can get around the latter by tuning GC, e.g. +RTS -A100M might help.
Or should I use a high-level approach, something like a Data.Sequence.Seq of medium sized chunks (TVar (IOVector e))?
I'm not sure exactly what you mean here, but if you're going to touch contigous segments of the array, why not TArray (Vector e) or similar?
Yes, that was what David suggested, too. Have to think about it. Cheers Ben

On Tue, Oct 25, 2011 at 1:46 PM, Ben Franksen
IME, there are (at least) two possible problems here, 1) transactions scale (quadratically, I think) with the number of TVars touched,
Ouch! What would be the reason for that? I thought it would be linear... I mean what happens is that the transaction log gets built when the transaction runs, which should take a constant time per TVar, and then on commit we have to traverse the log, which is again linear in the number of TVars...
Your first assumption is incorrect. Every time you access a TVar it needs to check in the log if that TVar was already accessed in this transaction, so that it can get/update the current value. Right now the log is just a list, so accessing a TVar takes O(number of TVars already accessed). Consider this code: f :: TVar Int -> STM () f v = do x <- readTVar v writeTVar v $! (x+1) g :: Int -> TVar Int -> STM () g n v = mapM_ (\_ -> f v) [1..n] In order for f to get the current value of the TVar, it has to check if the variable is in the log already; otherwise later calls to f will just see the original value in memory and not repeatedly increment it. As to your second assumption, it's been a while since I read the STM source, so I can't comment there. -- ryan

The main question is: does the STM transaction actually "see" that I changed part of the underlying array, so that the transaction gets re-tried? Or do I have to implement this manually, and if yes: how?
The transaction does not detect anything inside the unsafeIOtoSTM. But to implement this manually is simple: use "retry" whenever you need to retry the local transaction. if the affected transactions are more than the local trnasaction, create an special TVar to be inspected by all of them. for example doesTheArrayHasChanged :: TVar Bool.
participants (8)
-
Alberto G. Corona
-
Antoine Latter
-
Ben Franksen
-
Benjamin Franksen
-
Bryan O'Sullivan
-
David Barbour
-
Ketil Malde
-
Ryan Ingram