Re: Potential Network SIG

26 Aug 2009

      On 25/08/2009 21:23, Johan Tibell wrote:
...
On Tue, Aug 25, 2009 at 2:03 PM, Simon Marlow  wrote:
...
On 22/08/2009 05:49, Thomas DuBuisson wrote:
...
3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency.
As in network-bytestring, any new API should be performance concious
enough to avoid String.
Idealogically speaking, this is not a choice you should make in the network
library.  The network library should deal with setting up sockets, and
delegate the actual I/O to the I/O library.
Right now, that means making Handles from Sockets (which is something the
current network library provides).  And then you use the bytestring library
to write bytestrings to the Handle.  In the future we'll have a way to write
text to a Handle too.
Now, I wouldn't be surprised if this doesn't cover all the use cases. Maybe
people want to use the low-level send/recv.  But I expect that for most
applications, going via Handle will be the right thing, and we should look
at how to accommodate the other use cases.
In my mind an improved I/O library would look something like this:
...
-- At the very bottom is a type class 'RawIO' which represents a
-- variety of stream-like types.
class RawIO a where
     readInto :: Ptr Word8 ->  Int ->  IO ()
     write :: ByteString ->  IO ()
read :: Int ->  IO ByteString
     read n = ByteString.createAndTrim n (\p ->  readInto p n)
This is quite similar to the class of the same name in the GHC I/O library:

-- | A low-level I/O provider where the data is bytes in memory.
class RawIO a where
   read                :: a -> Ptr Word8 -> Int -> IO Int
   readNonBlocking     :: a -> Ptr Word8 -> Int -> IO (Maybe Int)
   write               :: a -> Ptr Word8 -> Int -> IO ()
   writeNonBlocking    :: a -> Ptr Word8 -> Int -> IO Int

I think the Bytestring API should be a layer on top of this.
...
This definition is very minimal and most likely need to be expanded
with operations such as 'close' and perhaps also 'seek'.
close/seek etc. are methods of the IODevice class in GHC's IO library.

See http://darcs.haskell.org/packages/base/GHC/IO/Device.hs

We have implementations of these classes for file descriptors, and it is 
my intention to have other implementations too: memory-mapped files, 
Windows HANDLEs, Bytestring (for testing), and Chan Word8 (for testing 
again: you write to the Handle, and the decoded bytes come out of the Chan).

These APIs aren't currently "public", in the sense that they are 
exported by modules in the GHC.* hierarchy.  I hope they'll help as a 
concrete start to the discussion of where the I/O library should be 
going, though.
...
We can now layer buffering on top.
...
-- Buffers for reading and writing are kept in a data type 'BufferedIO'.
-- This data type need not be exposed.
data BufferedIO = forall a. RawIO a =>  BufferedIO Buffer Buffer a
instance RawIO BufferedIO where
     readInto = readFromBufferInto  -- Calls RawIO.readInto if needed
     write = writeToBuffer  -- Calls RawIO.write if needed
-- Allocates buffers and returns a BufferedIO
buffered :: RawIO a =>  a ->  a
buffered = ...
This is where things get a bit hairy.  The upper layers often want to 
know about the buffer, for instance when it needs to be flushed, or for 
performance reasons - e.g. encoding/decoding needs to have direct access 
to both buffers.  So in GHC's I/O library buffering is a new class 
BufferedIO, consumed by the higher layer, and you can make a BufferedIO 
instance trivially given a RawIO instance.  Not everything has a RawIO 
instance though: memory-mapped files just appear as buffers.

Incedentally, there have been various designs around this theme in the 
past, e.g. http://www.haskell.org/haskellwiki/Library/Streams (with 
various problems IMO, but there are some good ideas there).

Cheers,
	Simon