Keeping an indexed collection of values?

I've been in a situation a lot lately where I need to keep a collection of values, and keep track of them by a persistent index. IO and ST refs can do this, but for various reasons I can't/don't want to use them. My first hacked up attempt is as follows: data IndexedCollection a = IndexedCollection { nextKey :: Int, availableKeys :: [Int], items :: (IntMap Int a) } deriving (Show) emptyIndexedCollection :: IndexedCollection a emptyIndexedCollection = IndexedCollection 0 [] empty addItem :: a -> IndexedCollection a -> (Int, IndexedCollection a) addItem a (IndexedCollection nextKey' [] t) = (nextKey', IndexedCollection (nextKey' + 1) [] (insert nextKey' a t)) addItem a (IndexedCollection nextKey' (k:ks) t) = (k, IndexedCollection nextKey' ks (insert k a t)) removeItem :: Int -> IndexedCollection a -> IndexedCollection a removeItem k (IndexedCollection nextKey' ks t) = IndexedCollection nextKey' (k:ks) (delete k t) lookupItem :: Int -> IndexedCollection a -> Maybe a lookupItem k (IndexedCollection _ _ t) = lookup k t Basically: when you add an item, use the first index in availableKeys (if there is one), otherwise use "nextKey" for the index, and then increment it for the next item. When and Item is removed, add it's index to availableKeys This keeps the code for keeping track of/generating indexes hidden away from the rest of the program which is nice, and it remains fairly efficient. Items will be added and removed on very regular basis. Often enough that I'm slightly concerned about overflow of nextKey for very long run times, and hence the availableKeys list. Does anyone know of a better/already existent data structure for handling this problem? Or perhaps a better way of keeping a "key pool" than my availableKeys solution? - Job

On Aug 18, 2009, at 9:19 PM, Job Vranish wrote:
data IndexedCollection a = IndexedCollection { nextKey :: Int, availableKeys :: [Int], items :: (IntMap Int a) } deriving (Show)
emptyIndexedCollection :: IndexedCollection a emptyIndexedCollection = IndexedCollection 0 [] empty [...] Does anyone know of a better/already existent data structure for handling this problem? Or perhaps a better way of keeping a "key pool" than my availableKeys solution?
just a slight simplification: you could drop the nextKey field and initialise availableKeys with [0..] in emptyIndexCollection. The add function would consume the head of this list, the remove function add the deleted key as new head. -- Underestimating the novelty of the future is a time-honored tradition. (D.G.)

2009/8/19 Job Vranish
My first hacked up attempt is as follows:
data IndexedCollection a = IndexedCollection { nextKey :: Int, availableKeys :: [Int], items :: (IntMap Int a) } deriving (Show)
emptyIndexedCollection :: IndexedCollection a emptyIndexedCollection = IndexedCollection 0 [] empty
addItem :: a -> IndexedCollection a -> (Int, IndexedCollection a) addItem a (IndexedCollection nextKey' [] t) = (nextKey', IndexedCollection (nextKey' + 1) [] (insert nextKey' a t)) addItem a (IndexedCollection nextKey' (k:ks) t) = (k, IndexedCollection nextKey' ks (insert k a t))
removeItem :: Int -> IndexedCollection a -> IndexedCollection a removeItem k (IndexedCollection nextKey' ks t) = IndexedCollection nextKey' (k:ks) (delete k t)
lookupItem :: Int -> IndexedCollection a -> Maybe a lookupItem k (IndexedCollection _ _ t) = lookup k t
It might be the case for IntMap (I haven't checked) that it is better to do a modify operation than to do a deletion followed by an insertion on the same key. One possible improvement is to delay deletions by putting them in a pending queue. A pending deletion followed by an addItem could be coalesced into a modify operation on the key to be deleted. You could even push lookupItems through pending deletions, assuming that they aren't on the same key (if they are on the same key then the lookup would fail). One question is how big should the pending deletion queue be allowed to become? A long queue might not be a good idea. One problem with delaying deletions is that it could introduce a space leak (same as unintended lazy evaluation). Maybe a queue of max length one is sufficient? I'm not sure it is worth the trouble, but it might be fun to try. Cheers, Bernie.

Job Vranish wrote:
I've been in a situation a lot lately where I need to keep a collection of values, and keep track of them by a persistent index.
data IndexedCollection a = IndexedCollection { nextKey :: Int, availableKeys :: [Int], items :: IntMap a } deriving (Show)
emptyIndexedCollection :: IndexedCollection a emptyIndexedCollection = IndexedCollection 0 [] empty
addItem :: a -> IndexedCollection a -> (Int, IndexedCollection a) addItem a (IndexedCollection nextKey' [] t) = (nextKey', IndexedCollection (nextKey' + 1) [] (insert nextKey' a t)) addItem a (IndexedCollection nextKey' (k:ks) t) = (k, IndexedCollection nextKey' ks (insert k a t))
removeItem :: Int -> IndexedCollection a -> IndexedCollection a removeItem k (IndexedCollection nextKey' ks t) = IndexedCollection nextKey' (k:ks) (delete k t)
lookupItem :: Int -> IndexedCollection a -> Maybe a lookupItem k (IndexedCollection _ _ t) = lookup k t
[...]
Does anyone know of a better/already existent data structure for handling this problem?
Or perhaps a better way of keeping a "key pool" than my availableKeys solution?
I'd put it in a new module and use standard names, i.e. empty add -- instead of insert delete lookup Oh, and the name IndexedCollection is kinda long. ;) You may want to make the nextKey field strict, so that forcing the whole collection forces the available keys as well data IndexedCollection = IndexedCollection { nextKey :: !Int, ... Otherwise, a chain of (+ 1) may linger unintentionally. If you follow Sebastian's great suggestion, you'd need to do something like this: empty = IndexedCollection (nats 0) IntMap.empty where nats !n = n : nats (n+1) Is it important that the keys are integers? If not, then I suggest making it abstract, i.e. like this module Store (Key, Store, empty, add, delete, lookup) where import qualified Data.IntMap as Map newtype Key = Key { int :: Int } instance Show Key where show = show . int data Store a = Store [Key] (Map.IntMap a) empty :: Store a empty = Store (nats 0) Map.empty where nats !n = Key n : nats (n+1) add :: a -> Store a -> (Key, Store a) add a (Store (k:ks) m) = (k, Store ks $ Map.insert (int k) a m) delete :: Key -> Store a -> Store a delete k (Store ks m) = Store (k:ks) $ Map.delete (int k) m lookup :: Key -> Store a -> Maybe a lookup k (Store _ m) = Map.lookup (int k) m This way, the user doesn't know and care how Key is implemented. Last but not least, there is the issue that trying to use an already deleted key might yield a wrong result instead of an error. That shouldn't happen if used correctly, but might give a headache when debugging. Regards, apfelmus -- http://apfelmus.nfshost.com

Heinrich Apfelmus wrote:
Job Vranish wrote:
I've been in a situation a lot lately where I need to keep a collection of values, and keep track of them by a persistent index.
module Store (Key, Store, empty, add, delete, lookup) where
newtype Key = Key { int :: Int }
empty :: Store a add :: a -> Store a -> (Key, Store a) delete :: Key -> Store a -> Store a lookup :: Key -> Store a -> Maybe a
This way, the user doesn't know and care how Key is implemented.
Last but not least, there is the issue that trying to use an already deleted key might yield a wrong result instead of an error. That shouldn't happen if used correctly, but might give a headache when debugging.
There is even a very simple way to prevent at least some cases of misuse, when one key is accidentally used on stores of different type. A phantom parameter will do the trick: newtype Key a = Key { int :: Int } add :: a -> Store a -> (Key a , Store a) delete :: Key a -> Store a -> Store a lookup :: Key a -> Store a -> Maybe a Regards, apfelmus -- http://apfelmus.nfshost.com

Thanks for all the input! :) My current code (unfinished) is here: http://github.com/jvranish/IndexedCollection/tree/master but I think I'll shorten the names as you suggest. (and add some strictness to availableKeys) I also added an extra phantom type parameter to the collection (and key) so that I can prevent keys from being used on different collections even if they hold elements of the same type. There is still problem that trying to use a deleted key might return a bad result rather than an error. I'm not sure how to fix that one. I could keep another buffer, perhaps of the last 100 or so deleted keys, so that a key doesn't get recycled until 100 other keys have been freed. This would increase the chances of detecting this type of error. I could also possibly integrate it with Bernie's suggestion, which would probably significantly improve performance in my case. But the added complexity might not be worth it. Hmm although... I could potentially do something evil and detect the use of a deleted key via stableNames. I'd rewrap my keys on recycle so that there stablenames change. Then I can check on lookup if the key used has the same stableName as the key in the collection, if they don't match either raise an error or return Nothing. Not sure if I feel that evil though :D Thanks again for the input :) - Job On Fri, Aug 21, 2009 at 7:26 AM, Heinrich Apfelmus < apfelmus@quantentunnel.de> wrote:
Heinrich Apfelmus wrote:
Job Vranish wrote:
I've been in a situation a lot lately where I need to keep a collection of values, and keep track of them by a persistent index.
module Store (Key, Store, empty, add, delete, lookup) where
newtype Key = Key { int :: Int }
empty :: Store a add :: a -> Store a -> (Key, Store a) delete :: Key -> Store a -> Store a lookup :: Key -> Store a -> Maybe a
This way, the user doesn't know and care how Key is implemented.
Last but not least, there is the issue that trying to use an already deleted key might yield a wrong result instead of an error. That shouldn't happen if used correctly, but might give a headache when debugging.
There is even a very simple way to prevent at least some cases of misuse, when one key is accidentally used on stores of different type. A phantom parameter will do the trick:
newtype Key a = Key { int :: Int }
add :: a -> Store a -> (Key a , Store a) delete :: Key a -> Store a -> Store a lookup :: Key a -> Store a -> Maybe a
Regards, apfelmus
-- http://apfelmus.nfshost.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Aug 21, 2009, at 5:11 PM, Job Vranish wrote:
I also added an extra phantom type parameter to the collection (and key) so that I can prevent keys from being used on different collections even if they hold elements of the same type.
I have the impression that this requires explicit type annotations with your current solution which seems a bit tiresome. If not instantiated to specific different types, the additional phantom types of different collections can just be unified which does not lead to a type error. As you seem to implement a monadic interface, you might be able to steal the idea of using higher-rank polymorphism (that is used in the ST monad implementation) to ensure that the phantom types of different collections cannot be unified. But that would probably mean to implement your own monad that carries this phantom type too.. Cheers, Sebastian -- Underestimating the novelty of the future is a time-honored tradition. (D.G.)

It only requires type annotations on your uses of empty (as that is the only way to construct a collection). The phantom type sticks to everything after that. If you don't care to add a signature then things still work just fine, you just won't be prevented from using indexes from the wrong collection if the the collection type is the same. I think this is nice, because if you are working with just one collection, or collections of only different types you probably don't want to care about the phantom type. But if you do care, it adds extra protection. For example: data P1 data P2 to = runState inCol = evalState a = empty :: IndexedCollection Int P1 b = empty :: IndexedCollection Int P2 (i1, a') = add 5 `to` a (i2, b') = add 16 `to` b test = lookup i2 `inCol` a' -- type error, but type checks if no signatures on a or b - Job On Fri, Aug 21, 2009 at 12:24 PM, Sebastian Fischer < sebf@informatik.uni-kiel.de> wrote:
On Aug 21, 2009, at 5:11 PM, Job Vranish wrote:
I also added an extra phantom type parameter to the collection (and key)
so that I can prevent keys from being used on different collections even if they hold elements of the same type.
I have the impression that this requires explicit type annotations with your current solution which seems a bit tiresome. If not instantiated to specific different types, the additional phantom types of different collections can just be unified which does not lead to a type error.
As you seem to implement a monadic interface, you might be able to steal the idea of using higher-rank polymorphism (that is used in the ST monad implementation) to ensure that the phantom types of different collections cannot be unified. But that would probably mean to implement your own monad that carries this phantom type too..
Cheers, Sebastian
-- Underestimating the novelty of the future is a time-honored tradition. (D.G.)
participants (4)
-
Bernie Pope
-
Heinrich Apfelmus
-
Job Vranish
-
Sebastian Fischer