I need extendable array to store and count unique vectors. I have a file containing vectors presented as strings like:
10, 6, 80, 25, 6, 7
1, 2, 15, 17, 33, 22
21, 34, 56, 78, 91, 2
...
(BTW, what is the best library function to use to convert string of digits into a list or array?)

There are two arrays:
1) Array of unique vectors.
The first vector red from file starts this array. Next vectors are added to this array if they are sufficiently dissimilar from the already existing in the array. Similarity is computed as euclidean distance between two vectors.
2) Array of counts. Length of this array is equal to the length of array with unique vectors. Thus every vector has a corresponding count. If new vector is similar to some vector already existing in the first array then corresponding count is incremented.

For example, array of unique vectors:
[[10, 6, 80, 25, 6, 7],
[1, 2, 15, 17, 33, 22],
[21, 34, 56, 78, 91, 2]]

may have a corresponding counts array:
[5,1,3]

where:
5 tells that my file has 5 vectors similar to vector [10, 6, 80, 25, 6, 7],
Only 1 vector [1, 2, 15, 17, 33, 22],
and 3 vectors similar to [21, 34, 56, 78, 91, 2]

As long as I don't know a priory how many unique vectors my file has, as well as how many similar to these unique others exists - I need to start both arrays with size 0.
Next when I find a similar vector I need to increment count at corresponding index in 'counts' array. I will also need to access vectors at different index in unique array later.
That's why I need a collection both indexed and able to extend also.

I think that Data.Sequence will do for the task, don't you think?

On Fri, Jul 11, 2008 at 7:23 PM, Chaddaï Fouché <chaddai.fouche@gmail.com> wrote:

2008/7/11 Dmitri O.Kondratiev <dokondr@gmail.com>:

> How does Data.Sequence
> http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Sequence.html
> compares with ArrayRef for appending and accessing arrays efficiently ?

It doesn't since Data.Sequence doesn't use arrays, it uses a custom
data structure that allows to do plenty of operations with pretty good
asymptotic complexity. Be aware though that the constants aren't that
good and that depending on your usage, lists, array or another
specialized data structure could be much more efficient.

By comparisons, extending an array is very likely to be much more
expensive from time to time than adding some elements to your
Data.Sequence. On the other hand the random access would be much
faster in an array and even the amortized cost of extending the array
may not be much worse than the amortized cost on the Data.Sequence in
some cases.

What exactly are you trying to do ?

--
Jedaï

--
Dmitri O. Kondratiev
dokondr@gmail.com
http://www.geocities.com/dkondr