uvector and the stream interface

currently i'm working on stuff that looks something like this: 1 read soundfile from disk in blocks of N samples (IOCArray, hsndfile package) 2 convert to CArray with unsafeFreeze (simple O(1) cast, carray package) 3 perform FFT (CArray, fftw package) 4 convert to UArr (uvector package) 5 do some stuff with vectors i'd like to minimize copying, and since the FFT returns a new array anyway, the only copying conversion is the one between CArray and UArr in step 4. the approach i've been following is defining a `stream' conversion for CArray, hoping that subsequent carray accesses will be fused with uvector operations without the need for allocating a vector in between. does that make sense? could this be a general strategy for avoiding copying at the boundary between the various array libraries? thanks, <sk>

sk:
currently i'm working on stuff that looks something like this:
1 read soundfile from disk in blocks of N samples (IOCArray, hsndfile package) 2 convert to CArray with unsafeFreeze (simple O(1) cast, carray package) 3 perform FFT (CArray, fftw package) 4 convert to UArr (uvector package) 5 do some stuff with vectors
i'd like to minimize copying, and since the FFT returns a new array anyway, the only copying conversion is the one between CArray and UArr in step 4. the approach i've been following is defining a `stream' conversion for CArray, hoping that subsequent carray accesses will be fused with uvector operations without the need for allocating a vector in between. does that make sense? could this be a general strategy for avoiding copying at the boundary between the various array libraries?
If you can write a function from CArray a -> Stream a, then yep, it'll fuse, and that's a good way to avoid copying. Another way is to dive into the internals and coerce the CArray type to a MUArr. It would be helpful to see the programs people are writing with uvector, so I can polish up the API some more :) -- Don

Don Stewart wrote:
sk:
currently i'm working on stuff that looks something like this:
1 read soundfile from disk in blocks of N samples (IOCArray, hsndfile package) 2 convert to CArray with unsafeFreeze (simple O(1) cast, carray package) 3 perform FFT (CArray, fftw package) 4 convert to UArr (uvector package) 5 do some stuff with vectors
[snip]
It would be helpful to see the programs people are writing with uvector, so I can polish up the API some more :)
It would also be helpful to have someone explain why we have: Ptr a ByteString IOUArray IOCArray Data.Storable.StorableArray UArr Of course, I know the answers to some of those questions, ByteString is obviously less polymorphic than all the others there, and Ptr a doesn't contain size information. But it seems we have a rapidly bifurcating profusion of 'typed interfaces to chunks of memory' with no obvious consistency to their naming scheme and I think it's starting to get confusing... Jules

On 14.07.2008, at 18:42, Jules Bean wrote:
It would be helpful to see the programs people are writing with uvector, so I can polish up the API some more :)
It would also be helpful to have someone explain why we have:
Ptr a ByteString IOUArray IOCArray Data.Storable.StorableArray UArr
Of course, I know the answers to some of those questions, ByteString is obviously less polymorphic than all the others there, and Ptr a doesn't contain size information. But it seems we have a rapidly bifurcating profusion of 'typed interfaces to chunks of memory' with no obvious consistency to their naming scheme and I think it's starting to get confusing...
maybe it would be useful to look at (1) what's expected in terms of the underlying array implementation and (2) in terms of array access. (1) some things that come to mind: * ghc heap or system heap (can the garbage collector move memory during calls to C?) * access to a Ptr for interfacing to external libraries (possible with UArr?) * alignment (most SIMD instruction sets require 16 byte aligned data) * mutability * strict vs (partially) lazy (2) personally i much prefer the list-like interface provided by the stream-fusion powered libraries (ndp, uvector, vector). can't the stream-fusion framework and correspondingly the vector interface be separated from the memory representation, provided a particular concrete representation comes up with a stream/unstream pair? then it would be easy to swap out the underlying representation according to the required characteristics. <sk>

sk:
On 14.07.2008, at 18:42, Jules Bean wrote:
It would be helpful to see the programs people are writing with uvector, so I can polish up the API some more :)
It would also be helpful to have someone explain why we have:
Ptr a ByteString IOUArray IOCArray Data.Storable.StorableArray UArr
Of course, I know the answers to some of those questions, ByteString is obviously less polymorphic than all the others there, and Ptr a doesn't contain size information. But it seems we have a rapidly bifurcating profusion of 'typed interfaces to chunks of memory' with no obvious consistency to their naming scheme and I think it's starting to get confusing...
maybe it would be useful to look at (1) what's expected in terms of the underlying array implementation and (2) in terms of array access.
(1) some things that come to mind:
* ghc heap or system heap (can the garbage collector move memory during calls to C?) * access to a Ptr for interfacing to external libraries (possible with UArr?) * alignment (most SIMD instruction sets require 16 byte aligned data) * mutability * strict vs (partially) lazy
(2) personally i much prefer the list-like interface provided by the stream-fusion powered libraries (ndp, uvector, vector). can't the stream-fusion framework and correspondingly the vector interface be separated from the memory representation, provided a particular concrete representation comes up with a stream/unstream pair? then it would be easy to swap out the underlying representation according to the required characteristics.
Yes, we have long been discussing a generic Stream library to which the various sequence structures can be translated to and from. Already it is useful to say, stream bytestrings into uvectors and out to lists. If people are using uvector or stream-fusion (the list version) and are interested in interop please let Roman, Duncan and I know, so we can think more about how best to make it all play together. -- Don

Don Stewart wrote:
Yes, we have long been discussing a generic Stream library to which the various sequence structures can be translated to and from. Already it is useful to say, stream bytestrings into uvectors and out to lists.
Isn't there already such a thing? http://www.haskell.org/haskellwiki/Library/Streams - Lyle

lists:
Don Stewart wrote:
Yes, we have long been discussing a generic Stream library to which the various sequence structures can be translated to and from. Already it is useful to say, stream bytestrings into uvectors and out to lists.
Isn't there already such a thing?
Not those kind of streams (Wouter!! :) These kind, http://www.cse.unsw.edu.au/~dons/papers/CLS07.html -- Don

On 14.07.2008, at 20:48, Don Stewart wrote:
Yes, we have long been discussing a generic Stream library to which the various sequence structures can be translated to and from. Already it is useful to say, stream bytestrings into uvectors and out to lists.
could the Stream interface be made public in uvector for writing custom adapters?
If people are using uvector or stream-fusion (the list version) and are interested in interop please let Roman, Duncan and I know, so we can think more about how best to make it all play together.
i'd be interested, just have to gather a little more infrastructure before i start working on real algorithms. <sk>

stefan kersten wrote:
(2) personally i much prefer the list-like interface provided by the stream-fusion powered libraries (ndp, uvector, vector). can't the stream-fusion framework and correspondingly the vector interface be separated from the memory representation, provided a particular concrete representation comes up with a stream/unstream pair? then it would be easy to swap out the underlying representation according to the required characteristics.
The vector library does this. The problem is that just stream/unstream isn't sufficient to get good performance. In fact, stream fusion alone is suboptimal for many array algorithms since it can't handle array operations which require random access. In particular, it isn't enough for DPH. We need a more powerful framework which, hopefully, will also be useful for non-DPH arrays. Roman

jules:
Don Stewart wrote:
sk:
currently i'm working on stuff that looks something like this:
1 read soundfile from disk in blocks of N samples (IOCArray, hsndfile package) 2 convert to CArray with unsafeFreeze (simple O(1) cast, carray package) 3 perform FFT (CArray, fftw package) 4 convert to UArr (uvector package) 5 do some stuff with vectors
[snip]
It would be helpful to see the programs people are writing with uvector, so I can polish up the API some more :)
It would also be helpful to have someone explain why we have:
Ptr a ByteString IOUArray IOCArray Data.Storable.StorableArray UArr
Of course, I know the answers to some of those questions, ByteString is obviously less polymorphic than all the others there, and Ptr a doesn't contain size information. But it seems we have a rapidly bifurcating profusion of 'typed interfaces to chunks of memory' with no obvious consistency to their naming scheme and I think it's starting to get confusing...
Jules
An abstraction stack: Impure Pure ------------------------------- Further from the machine: UArr hmatrix/blas/fftw IOUArr CArray ByteString StorableArray STUArr MUArr ForeginPtr Ptr Addr# Closer to machine The key now is to ensure we have enough introduction and elimination forms, and perhaps a few mergers can also take place. -- Don

An abstraction stack:
Impure Pure
How about strict vs. lazy? I ask because I assumed there were lazy variants of uvector or storablevector, using the bytestring list of chunks approach, but apparently not? Making a lazy version seems pretty easy, but rewriting all the basic functions looks not so easy. Granted, I can probably do most of what I want with just foldr... Also, what's the distinction between storablevector and uvector? Also, it's even more off subject, but haddock crashes on uvector. You can see the errors on hackage.

On Thu, 17 Jul 2008, Evan Laforge wrote:
An abstraction stack:
Impure Pure
How about strict vs. lazy? I ask because I assumed there were lazy variants of uvector or storablevector, using the bytestring list of chunks approach, but apparently not?
For storablevector there exists code, but not complete and not tested: http://code.haskell.org/storablevector/Data/StorableVector/Lazy.hs
Also, what's the distinction between storablevector and uvector?
I hoped that I answered this with my previous mail.

Evan Laforge wrote:
An abstraction stack:
Impure Pure
How about strict vs. lazy? I ask because I assumed there were lazy variants of uvector or storablevector, using the bytestring list of chunks approach, but apparently not?
wait.... "list of chunks" makes something that behaves not like a random-access array, and yet is still rather strict in the elements. How about strict vs. lazy as in UArray vs. Array: whether the elements are evaluated lazily (although UArray has the complication that the elements must also be Storable: e.g. what if you want a strict random-access array of functions -- so there is not such an easy solution for a strict but boxed array). so do we have at least three different variables here? :-) -Isaac

Jules Bean wrote:
It would also be helpful to have someone explain why we have:
Ptr a ByteString IOUArray IOCArray Data.Storable.StorableArray UArr
Of course, I know the answers to some of those questions, ByteString is obviously less polymorphic than all the others there, and Ptr a doesn't contain size information. But it seems we have a rapidly bifurcating profusion of 'typed interfaces to chunks of memory' with no obvious consistency to their naming scheme and I think it's starting to get confusing...
I think the main reason for this is the lack of a generic infrastructure for efficient arrays which means that everyone is rolling their own. My hope is that the Data Parallel Haskell project will eventually provide such an infrastructure (but I'm biased, of course). We need it anyway and we are perhaps in the best position to do it at the moment. Unfortunately, we don't have too much time to work on it so development is slow. Still, I think (obviously) that the vector library is a step in the right direction. Roman
participants (9)
-
Bulat Ziganshin
-
Don Stewart
-
Evan Laforge
-
Henning Thielemann
-
Isaac Dupree
-
Jules Bean
-
Lyle Kopnicky
-
Roman Leshchinskiy
-
stefan kersten