Re: [Haskell-cafe] Re: Stream processors

21 Oct 2004

      Peter Simons wrote:
...
Ben Rudiak-Gould writes:
...
...
...
start  :: IO ctx
    feed   :: ctx -> Buffer -> IO ()
    commit :: ctx -> IO a
...
...
'feed' cannot have this signature because it needs to
update the context.
...
Sure it can -- it's just like writeIORef :: IORef a -> a -> IO ().
I guess it's mood to argue that point. I don't want a stream
processor to have a global state, so using an internally
encapsulated IORef is not an option for me.
I am looking for an more _general_ API, not one that forces
implementation details on the stream processor. That's what
my StreamProc data type does already. :-)
I'm not arguing about generality; I simply don't understand how your
interface is supposed to be used. E.g.:

    do ctx <- start
       ctx1 <- feed ctx array1
       ctx2 <- feed ctx array2
       val1 <- commit ctx1
       val2 <- commit ctx2
       return (val1,val2)

Should this return (MD5 of array1, MD5 of array2), or
(MD5 of array1+array2, MD5 of array1+array2), or cause a runtime error?
Any of these three might be reasonable, but for your interface to be
well-defined you need to stipulate which one is correct. Once you're
decided which one is correct, there's no reason not to change the
interface so that no one can misinterpret it. My two interfaces are
only less general than yours in that they don't have multiple
interpretations -- which is a good thing.
...
...
...
...
start  :: ctx
    feed   :: ctx -> Buffer -> IO ctx
    commit :: ctx -> a
...
In this interface contexts are supposed to be immutable
Haskell values, so there's no meaning in creating new
ones or finalizing old ones.
I don't want to restrict the API to immutable contexts. A
context could be anything, _including_ an IORef or an MVar.
But the API shouldn't enforce that.
It doesn't. Even (length :: [a] -> Int) is likely to cause destructive
updating of thunks when it's called, but that's not a reason to change
the interface to [a] -> IO Int. The important thing is whether, from
the caller's perspective, the function is pure. If it's pure, it
shouldn't be in the IO monad, even if that forces some implementations
to use unsafePerformIO under the hood.

I think you're hoping to have it both ways, capturing destructive-
update semantics and value semantics in a single interface. That's not
going to work, unfortunately. You must decide whether to enforce
single-threading or not.
...
...
...
I would implement feedSTUArray and friends as wrappers
around the Ptr interface, not as primitive computations of
the stream processor.
...
I think it's impossible to do this safely, but it would be
great if I were wrong.
wrap :: (Storable a, MArray arr a IO) => Ptr a -> Int
      -> IO (arr Int a)
 wrap ptr n = peekArray n ptr >>= newListArray (0,n)
Isn't this going in the wrong direction? I think what we want is
something like

  withArrayPtr :: (MArray arr Word8 IO) =>
                     arr i Word8 -> (Ptr Word8 -> IO a) -> IO a

You're right, though, this can be written safely:

  withArrayPtr arr act = getElems arr >>= flip withArray act

It's terribly slow, though. Ideally one wants a pointer into the
original array together with a guarantee that it won't be moved by
the garbage collector during the execution of your IO action. I
think current versions of GHC will never move the array if your IO
action performs no heap allocation, but I can easily imagine that
changing in other/future implementations.

I suppose you could also have

  withArrayPtrM :: (MArray arr Word8 m, Ix i) =>
                      arr i Word8 -> (Ptr Word8 -> m b) -> m b

  withArrayPtrI :: (IArray arr a, Ix i) =>
                      arr i Word8 -> (Ptr Word8 -> IO b) -> IO b

though I'm not sure how much sense those types (or names) make.
The first one would force the use of unsafeIOToST if you wanted
to use it with ST arrays, but probably that's unavoidable.

-- Ben