vector to uvector and back again

hi, i've been using the vector [1] library for implementing some signal processing algorithms, but now i'd like to use the statistics [2] package on my data, which is based on the uvector [3] library. is there a (straightforward) way of converting between vectors and uvectors, preferrably O(1)? thanks, <sk> [1] http://hackage.haskell.org/package/vector [2] http://hackage.haskell.org/package/statistics [3] http://hackage.haskell.org/package/uvector

I'm thinking of switching the statistics library over to using vector.
uvector is pretty bit-rotted in comparison to vector at this point, and it's
really seeing no development, while vector is The Shiny Future. Roman, would
you call the vector library good enough to use in production at the moment?
On Wed, Feb 10, 2010 at 9:59 AM, stefan kersten
hi,
i've been using the vector [1] library for implementing some signal processing algorithms, but now i'd like to use the statistics [2] package on my data, which is based on the uvector [3] library. is there a (straightforward) way of converting between vectors and uvectors, preferrably O(1)?
thanks, <sk>
[1] http://hackage.haskell.org/package/vector [2] http://hackage.haskell.org/package/statistics [3] http://hackage.haskell.org/package/uvector _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, Feb 10, 2010 at 10:03 AM, Bryan O'Sullivan
I'm thinking of switching the statistics library over to using vector. uvector is pretty bit-rotted in comparison to vector at this point, and it's really seeing no development, while vector is The Shiny Future. Roman, would you call the vector library good enough to use in production at the moment?
I like the vector API much better than the uvector one. * no "U" suffixes on functions (go namespaces!) and * no cryptic names (what's an UAE?). Cheers, Johan

On 11/02/2010, at 05:03, Bryan O'Sullivan wrote:
I'm thinking of switching the statistics library over to using vector. uvector is pretty bit-rotted in comparison to vector at this point, and it's really seeing no development, while vector is The Shiny Future. Roman, would you call the vector library good enough to use in production at the moment?
Yes, with the caveat that I haven't really used it in production code (I have tested and benchmarked it, though). BTW, I'll release version 0.5 as soon as get a code.haskell.org account and move the repo there. Roman
On Wed, Feb 10, 2010 at 9:59 AM, stefan kersten
wrote: hi, i've been using the vector [1] library for implementing some signal processing algorithms, but now i'd like to use the statistics [2] package on my data, which is based on the uvector [3] library. is there a (straightforward) way of converting between vectors and uvectors, preferrably O(1)?
thanks, <sk>
[1] http://hackage.haskell.org/package/vector [2] http://hackage.haskell.org/package/statistics [3] http://hackage.haskell.org/package/uvector _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

rl:
On 11/02/2010, at 05:03, Bryan O'Sullivan wrote:
I'm thinking of switching the statistics library over to using vector. uvector is pretty bit-rotted in comparison to vector at this point, and it's really seeing no development, while vector is The Shiny Future. Roman, would you call the vector library good enough to use in production at the moment?
Yes, with the caveat that I haven't really used it in production code (I have tested and benchmarked it, though). BTW, I'll release version 0.5 as soon as get a code.haskell.org account and move the repo there.
That's the main problem. I think we could move to vector as a whole, if the suite of testing/ performance/documentation stuff from uvector was ported. Maybe this is a good job for the hackathon. -- Don

On 12/02/2010, at 12:40, Don Stewart wrote:
rl:
On 11/02/2010, at 05:03, Bryan O'Sullivan wrote:
I'm thinking of switching the statistics library over to using vector. uvector is pretty bit-rotted in comparison to vector at this point, and it's really seeing no development, while vector is The Shiny Future. Roman, would you call the vector library good enough to use in production at the moment?
Yes, with the caveat that I haven't really used it in production code (I have tested and benchmarked it, though). BTW, I'll release version 0.5 as soon as get a code.haskell.org account and move the repo there.
That's the main problem. I think we could move to vector as a whole, if the suite of testing/ performance/documentation stuff from uvector was ported.
Hmm, I'm not sure what you mean here. Mostly thanks to Max Bolingbroke's efforts, vector has a fairly extensive testsuite. I benchmark it a lot (with NoSlow) and haven't found any significant performance problems in a while. As to documentation, there are comments for most of the functions :-) Roman

On 10.02.10 19:03, Bryan O'Sullivan wrote:
I'm thinking of switching the statistics library over to using vector.
that would be even better of course! an O(0) solution, at least for me ;) let me know if i can be of any help (e.g. in testing). i suppose uvector-algorithms would also need to be ported to vector, then.
uvector is pretty bit-rotted in comparison to vector at this point, and it's really seeing no development, while vector is The Shiny Future. Roman, would you call the vector library good enough to use in production at the moment?
i've been using the library for wavelet transforms, matching pursuits and the like, and while my implementations are not heavily optimized, they perform reasonably well (no benchmarking done yet, though). the key arguments for using vector instead of uvector were the cleaner interface and Data.Vector.Storable for interfacing with foreign libraries (such as fftw, through the fft package). <sk>

stefan kersten schrieb:
uvector is pretty bit-rotted in comparison to vector at this point, and it's really seeing no development, while vector is The Shiny Future. Roman, would you call the vector library good enough to use in production at the moment?
i've been using the library for wavelet transforms, matching pursuits and the like,
Nice I have also worked on this topics, even with Haskell. However, at that time I used plain lists.
and while my implementations are not heavily optimized, they perform reasonably well (no benchmarking done yet, though). the key arguments for using vector instead of uvector were the cleaner interface and Data.Vector.Storable for interfacing with foreign libraries (such as fftw, through the fft package).
Btw. Data.StorableVector can also be used for this interfacing, and I would be very interested in an interface to FFTW. Actually, I have already used FFTW on StorableVector http://code.haskell.org/~thielema/morbus-meniere/src/StorableVectorCArray.hs There is also Data.StorableVector.Lazy which is nice for processing stream data.

On 11.02.10 18:55, Henning Thielemann wrote:
i've been using the library for wavelet transforms, matching pursuits and the like,
Nice I have also worked on this topics, even with Haskell. However, at that time I used plain lists.
interesting! was performance acceptable for practical work? at the moment i'm not too concerned about performance -- the base line maybe could be to be competitive with matlab. in the long run i hope i'll be able to scale my stuff to larger amounts of data, however ...
and while my implementations are not heavily optimized, they perform reasonably well (no benchmarking done yet, though). the key arguments for using vector instead of uvector were the cleaner interface and Data.Vector.Storable for interfacing with foreign libraries (such as fftw, through the fft package).
Btw. Data.StorableVector can also be used for this interfacing, and I would be very interested in an interface to FFTW. Actually, I have already used FFTW on StorableVector
i'm simply using the fft package and adapted some of it's internals to work on Data.Vector.Storable; nothing fancy though, and only for RC and CR transforms. let me know if you're interested in the code ...
There is also Data.StorableVector.Lazy which is nice for processing stream data.
yes, i know about storablevector, but i already had some code using uvector, so in the end vector was the easier upgrade. to me the relative merits of storablevector vs. vector are still unclear; the lazy interface could be implemented on top of vector as well, i suppose? <sk>

On Thursday 11 February 2010 12:43:10 pm stefan kersten wrote:
On 10.02.10 19:03, Bryan O'Sullivan wrote:
I'm thinking of switching the statistics library over to using vector.
that would be even better of course! an O(0) solution, at least for me ;) let me know if i can be of any help (e.g. in testing). i suppose uvector-algorithms would also need to be ported to vector, then.
I could do this. I've been occupied with things other than uvector-algorithms for a while, but I've been meaning to get back into it (perhaps finally get timsort in there). How widespread is the consensus on vector over uvector? dons seems to have added to uvector as recently as mid December, so I'm not really sure how bit rotted it is. But vector seems to have a lot more going on in it, including boxed arrays, which I suppose is a gap in using uvector. I also notice that vector seems to have discarded the idea of Vec (A * B) = Vec A * Vec B with associated types. Was this determined to not be worth it? uvector- algorithms actually used the fact for a cute trick (Schwartzian transform can be done for such arrays by computing a new array containing 'f e' for each 'e' in the original array, pairing up the two arrays, and performing an algorithm that only looks at the 'f e' half, and then pulling the 'e' half out of the pair; doing it this way requires no copying of the original array). Anyhow, if vector is the clear way forward, I don't mind porting uvector- algorithms. But I don't relish maintaining two slightly different parallel branches. -- Dan

On 12/02/2010, at 12:54, Dan Doel wrote:
I also notice that vector seems to have discarded the idea of
Vec (A * B) = Vec A * Vec B
Oh no, it hasn't. In contrast to uvector/DPH, which use a custom strict tuple type for rather outdated reasons, vector uses normal tuples. For instance, Data.Vector.Unboxed.Vector (a,b,c) is internally represented as a triple of unboxed vectors of a, b and c. In general, vector supports 4 kinds of arrays at the moment: Data.Vector.Primitive wrappers around ByteArray#, can store primitive types Data.Vector.Unboxed uses type families, can store everything D.V.Primitive can plus tuples and can be extended for user-defined types Data.Vector.Storable wrappers around ForeignPtr, can store Storable things Data.Vector boxed arrays Roman

On Thursday 11 February 2010 9:57:40 pm Roman Leshchinskiy wrote:
Oh no, it hasn't. In contrast to uvector/DPH, which use a custom strict tuple type for rather outdated reasons, vector uses normal tuples. For instance, Data.Vector.Unboxed.Vector (a,b,c) is internally represented as a triple of unboxed vectors of a, b and c. In general, vector supports 4 kinds of arrays at the moment:
Ah, all right. I was looking at the (0.4.2) documentation on hackage, which doesn't mention Data.Vector.Unboxed. Never mind about that bit, then. -- Dan

On Thursday 11 February 2010 8:54:15 pm Dan Doel wrote:
On Thursday 11 February 2010 12:43:10 pm stefan kersten wrote:
On 10.02.10 19:03, Bryan O'Sullivan wrote:
I'm thinking of switching the statistics library over to using vector.
that would be even better of course! an O(0) solution, at least for me ;) let me know if i can be of any help (e.g. in testing). i suppose uvector-algorithms would also need to be ported to vector, then.
I could do this.
To this end, I've done a preliminary port of the library, such that all the modules compile. I've just used safe operations so far, so it's probably a significant decrease in performance over the 0.2 uvector-algorithms (unless perhaps you turn off the bounds checking flag), but it's a start. It can be gotten with: darcs get http://code.haskell.org/~dolio/vector-algorithms I only encountered a couple snags during the porting so far: * swap isn't exported from D.V.Generic.Mutable, so I'm using my own. * I use a copy with an offset into the from and to arrays, and with a length (this is necessary for merge sort). However, I only saw a whole array copy (and only with identical sizes) in vector (so I wrote my own again). * Some kind of thawing of immutable vectors into mutable vectors, or other way to copy the former into the latter would be useful. Right now I'm using unstream . stream, but I'm not sure that's the best way to do it. Other than that, things went pretty smoothly. I haven't ported the test suite or benchmarks yet, so I don't recommend that anyone actually uses this for anything important yet. Cheers, -- Dan

On 12/02/2010, at 23:28, Dan Doel wrote:
On Thursday 11 February 2010 8:54:15 pm Dan Doel wrote:
On Thursday 11 February 2010 12:43:10 pm stefan kersten wrote:
On 10.02.10 19:03, Bryan O'Sullivan wrote:
I'm thinking of switching the statistics library over to using vector.
that would be even better of course! an O(0) solution, at least for me ;) let me know if i can be of any help (e.g. in testing). i suppose uvector-algorithms would also need to be ported to vector, then.
I could do this.
To this end, I've done a preliminary port of the library, such that all the modules compile. I've just used safe operations so far, so it's probably a significant decrease in performance over the 0.2 uvector-algorithms (unless perhaps you turn off the bounds checking flag), but it's a start. It can be gotten with:
That's great, thanks! FWIW, vector has two kinds of bounds checks: "real" ones which catch invalid indices supplied by the user (on by default) and internal ones which catch bugs in the library (off by default since the library is, of course, bug-free ;-). I guess you'd eventually want to use the latter but not the former; that's exactly what unsafe operations provide.
I only encountered a couple snags during the porting so far:
* swap isn't exported from D.V.Generic.Mutable, so I'm using my own.
Ah, I'll export it. Also, I gladly accept patches :-)
* I use a copy with an offset into the from and to arrays, and with a length (this is necessary for merge sort). However, I only saw a whole array copy (and only with identical sizes) in vector (so I wrote my own again).
That's actually a conscious decision. Since vectors support O(1) slicing, you can simply copy a slice of the source vector into a slice of the target vector.
* Some kind of thawing of immutable vectors into mutable vectors, or other way to copy the former into the latter would be useful. Right now I'm using unstream . stream, but I'm not sure that's the best way to do it.
At the moment, it is (although it ought to be wrapped in a nicer interface). Something like memcpy doesn't work for Data.Vector.Unboxed because the ByteArrays aren't pinned. I don't really want to provide thawing until someone convinces me that it is actually useful. BTW, vector also supports array recycling so you could implement true in-place sorting for fused pipelines. Something like map (+1) . sort . update xs wouldn't allocate any temporary arrays in that case. Roman

On Friday 12 February 2010 8:12:51 am Roman Leshchinskiy wrote:
That's actually a conscious decision. Since vectors support O(1) slicing, you can simply copy a slice of the source vector into a slice of the target vector.
Ah! I hadn't thought of that. That makes sense.
At the moment, it is (although it ought to be wrapped in a nicer interface). Something like memcpy doesn't work for Data.Vector.Unboxed because the ByteArrays aren't pinned. I don't really want to provide thawing until someone convinces me that it is actually useful.
Well, my use case is (of course) that I have lots of algorithms on mutable arrays, but they work just as well on immutable arrays by creating an intermediary. So I provided a combinator 'apply' that did something like: apply algo iv = new (safeThaw iv >>= \mv -> algo mv >> return mv) In uvector, the safeThaw part was copying iv into mv with a provided function. For the port, I used unstream . stream, which works fine assuming stream produces a correct size hint, I guess. That's the extent of what I have use for at the moment, though.
BTW, vector also supports array recycling so you could implement true in-place sorting for fused pipelines. Something like
map (+1) . sort . update xs
wouldn't allocate any temporary arrays in that case.
I'll look into it. -- Dan

Dan, do you think you might be releasing your port of uvector-algorithms to vector any time soon? I've ported mwc-random to use vector, and I'd like to move statistics (which needs uvector-algorithms) and criterion (ditto) too.

On Friday 26 February 2010 12:13:56 am Bryan O'Sullivan wrote:
Dan, do you think you might be releasing your port of uvector-algorithms to vector any time soon? I've ported mwc-random to use vector, and I'd like to move statistics (which needs uvector-algorithms) and criterion (ditto) too.
I don't want to hold anything up, so I've released the port to vector. It's available on hackage as vector-algorithms 0.3: http://hackage.haskell.org/package/vector-algorithms It's mostly a straight port, so not much new to learn. - There's no ".Array" in the module names anymore. - The Schwartzian transform combinators are gone from Data.Vector.Algorithms.Combinators, because I haven't decided on the best way to handle those yet (the existing implementation won't work on all vectors in the MVector class). Hope that isn't a problem. - There's also a new module D.V.A.Search, which so far implements a couple variations on binary search. It was something I was starting before the switch to vector, so it isn't complete yet. There are some moderate performance regressions on some of the algorithms, but nothing major. Also, the optimizer in 6.12 seems to get very confused when working with IO as the PrimMonad in question, resulting in significantly worse performance. So, I'd recommend sticking with ST, or at least making sure the algorithms are called in ST, with stToIO. HEAD is better on both these fronts, so things should get better in the future. Let me know if there are any issues.* -- Dan * P.S. I just noticed I left the .cabal recommending -O2 and -fvia-c -optc-O3. That's obviously not current since the .cabal is set to compile with -Odph and with the NCG. I'll amend that in a later version. :)

Great, thanks!
On Thu, Feb 25, 2010 at 10:29 PM, Dan Doel
On Friday 26 February 2010 12:13:56 am Bryan O'Sullivan wrote:
Dan, do you think you might be releasing your port of uvector-algorithms to vector any time soon? I've ported mwc-random to use vector, and I'd like to move statistics (which needs uvector-algorithms) and criterion (ditto) too.
I don't want to hold anything up, so I've released the port to vector. It's available on hackage as vector-algorithms 0.3:
http://hackage.haskell.org/package/vector-algorithms
It's mostly a straight port, so not much new to learn.
- There's no ".Array" in the module names anymore.
- The Schwartzian transform combinators are gone from Data.Vector.Algorithms.Combinators, because I haven't decided on the best way to handle those yet (the existing implementation won't work on all vectors in the MVector class). Hope that isn't a problem.
- There's also a new module D.V.A.Search, which so far implements a couple variations on binary search. It was something I was starting before the switch to vector, so it isn't complete yet.
There are some moderate performance regressions on some of the algorithms, but nothing major. Also, the optimizer in 6.12 seems to get very confused when working with IO as the PrimMonad in question, resulting in significantly worse performance. So, I'd recommend sticking with ST, or at least making sure the algorithms are called in ST, with stToIO. HEAD is better on both these fronts, so things should get better in the future.
Let me know if there are any issues.*
-- Dan
* P.S. I just noticed I left the .cabal recommending -O2 and -fvia-c -optc-O3. That's obviously not current since the .cabal is set to compile with -Odph and with the NCG. I'll amend that in a later version. :)

dan.doel:
On Thursday 11 February 2010 8:54:15 pm Dan Doel wrote:
On Thursday 11 February 2010 12:43:10 pm stefan kersten wrote:
On 10.02.10 19:03, Bryan O'Sullivan wrote:
I'm thinking of switching the statistics library over to using vector.
that would be even better of course! an O(0) solution, at least for me ;) let me know if i can be of any help (e.g. in testing). i suppose uvector-algorithms would also need to be ported to vector, then.
I could do this.
To this end, I've done a preliminary port of the library, such that all the modules compile. I've just used safe operations so far, so it's probably a significant decrease in performance over the 0.2 uvector-algorithms (unless perhaps you turn off the bounds checking flag), but it's a start. It can be gotten with:
I've ported uvector's tests to vector, and mostly got identical code, except for: * tail * zip* * empty/null I think if you find any slowdown over uvector it has to be a bug in vector. Let Roman know. -- Don

bos:
I'm thinking of switching the statistics library over to using vector. uvector is pretty bit-rotted in comparison to vector at this point, and it's really seeing no development, while vector is The Shiny Future. Roman, would you call the vector library good enough to use in production at the moment?
uvector's not seeing much development, but at least in the last round of benchmarks it was still consistently faster -- since it's been micro-optimized. Also, we have uvector-algorithms, so you can sort etc. a uvector. I'm not sure its the long term solution, but its a simpler, faster lib at the moment, with more surrounding support, users and documentation. -- Don

On 12/02/2010, at 12:39, Don Stewart wrote:
bos:
I'm thinking of switching the statistics library over to using vector. uvector is pretty bit-rotted in comparison to vector at this point, and it's really seeing no development, while vector is The Shiny Future. Roman, would you call the vector library good enough to use in production at the moment?
uvector's not seeing much development, but at least in the last round of benchmarks it was still consistently faster -- since it's been micro-optimized.
FWIW, the development version of vector is usually faster the both uvector and dph-prim-seq, at least for the development version of NoSlow. Roman

rl:
On 12/02/2010, at 12:39, Don Stewart wrote:
bos:
I'm thinking of switching the statistics library over to using vector. uvector is pretty bit-rotted in comparison to vector at this point, and it's really seeing no development, while vector is The Shiny Future. Roman, would you call the vector library good enough to use in production at the moment?
uvector's not seeing much development, but at least in the last round of benchmarks it was still consistently faster -- since it's been micro-optimized.
FWIW, the development version of vector is usually faster the both uvector and dph-prim-seq, at least for the development version of NoSlow.
Ah ha -- that's useful. Public benchmarks soon? In time for the Zurich Hackathon?? (March 20)

On 12/02/2010, at 13:49, Don Stewart wrote:
rl:
On 12/02/2010, at 12:39, Don Stewart wrote:
bos:
I'm thinking of switching the statistics library over to using vector. uvector is pretty bit-rotted in comparison to vector at this point, and it's really seeing no development, while vector is The Shiny Future. Roman, would you call the vector library good enough to use in production at the moment?
uvector's not seeing much development, but at least in the last round of benchmarks it was still consistently faster -- since it's been micro-optimized.
FWIW, the development version of vector is usually faster the both uvector and dph-prim-seq, at least for the development version of NoSlow.
Ah ha -- that's useful. Public benchmarks soon? In time for the Zurich Hackathon?? (March 20)
I've been trying to find the time to put the benchmarks on my blog since the beginning of January but, alas, unsuccessfully so far. In any case, vector and NoSlow currently live in http://www.cse.unsw.edu.au/~rl/code/darcs/vector http://www.cse.unsw.edu.au/~rl/code/darcs/NoSlow
If Roman declares the vector to be faster -- my main concern here for flat uarrays -- and makes the repo available so we can work on it, I'd be willing to merge uvector's tests and docs and extra array operations in.
It is generally faster than dph-prim-seq. Benchmarking against uvector is a bit difficult because it's missing operations necessary for implementing most of the algorithms in NoSlow (in particular, bulk updates). For the ones that uvector supports, vector tends to be faster. BTW, this is for unsafe operations which don't use bounds checking. Bounds checking can make things a little slower but often doesn't cost anything as long as only collective operations are used. Sometimes it makes things faster which means that the simplifier still gets confused in some situations. There are also some significant differences between 6.12 and the HEAD (the HEAD is much more predictable). In general, I find it hard to believe that the performance differences I'm seeing really matter all that much in real-world programs. Roman

rl:
On 12/02/2010, at 12:39, Don Stewart wrote:
bos:
I'm thinking of switching the statistics library over to using vector. uvector is pretty bit-rotted in comparison to vector at this point, and it's really seeing no development, while vector is The Shiny Future. Roman, would you call the vector library good enough to use in production at the moment?
uvector's not seeing much development, but at least in the last round of benchmarks it was still consistently faster -- since it's been micro-optimized.
FWIW, the development version of vector is usually faster the both uvector and dph-prim-seq, at least for the development version of NoSlow.
If Roman declares the vector to be faster -- my main concern here for flat uarrays -- and makes the repo available so we can work on it, I'd be willing to merge uvector's tests and docs and extra array operations in. -- Don
participants (7)
-
Bryan O'Sullivan
-
Dan Doel
-
Don Stewart
-
Henning Thielemann
-
Johan Tibell
-
Roman Leshchinskiy
-
stefan kersten