
On 25/08/2009 21:23, Johan Tibell wrote:
On Tue, Aug 25, 2009 at 2:03 PM, Simon Marlow
wrote: On 22/08/2009 05:49, Thomas DuBuisson wrote:
3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency. As in network-bytestring, any new API should be performance concious enough to avoid String.
Idealogically speaking, this is not a choice you should make in the network library. The network library should deal with setting up sockets, and delegate the actual I/O to the I/O library.
Right now, that means making Handles from Sockets (which is something the current network library provides). And then you use the bytestring library to write bytestrings to the Handle. In the future we'll have a way to write text to a Handle too.
Now, I wouldn't be surprised if this doesn't cover all the use cases. Maybe people want to use the low-level send/recv. But I expect that for most applications, going via Handle will be the right thing, and we should look at how to accommodate the other use cases.
In my mind an improved I/O library would look something like this:
-- At the very bottom is a type class 'RawIO' which represents a -- variety of stream-like types. class RawIO a where readInto :: Ptr Word8 -> Int -> IO () write :: ByteString -> IO ()
read :: Int -> IO ByteString read n = ByteString.createAndTrim n (\p -> readInto p n)
This is quite similar to the class of the same name in the GHC I/O library: -- | A low-level I/O provider where the data is bytes in memory. class RawIO a where read :: a -> Ptr Word8 -> Int -> IO Int readNonBlocking :: a -> Ptr Word8 -> Int -> IO (Maybe Int) write :: a -> Ptr Word8 -> Int -> IO () writeNonBlocking :: a -> Ptr Word8 -> Int -> IO Int I think the Bytestring API should be a layer on top of this.
This definition is very minimal and most likely need to be expanded with operations such as 'close' and perhaps also 'seek'.
close/seek etc. are methods of the IODevice class in GHC's IO library. See http://darcs.haskell.org/packages/base/GHC/IO/Device.hs We have implementations of these classes for file descriptors, and it is my intention to have other implementations too: memory-mapped files, Windows HANDLEs, Bytestring (for testing), and Chan Word8 (for testing again: you write to the Handle, and the decoded bytes come out of the Chan). These APIs aren't currently "public", in the sense that they are exported by modules in the GHC.* hierarchy. I hope they'll help as a concrete start to the discussion of where the I/O library should be going, though.
We can now layer buffering on top.
-- Buffers for reading and writing are kept in a data type 'BufferedIO'. -- This data type need not be exposed. data BufferedIO = forall a. RawIO a => BufferedIO Buffer Buffer a
instance RawIO BufferedIO where readInto = readFromBufferInto -- Calls RawIO.readInto if needed write = writeToBuffer -- Calls RawIO.write if needed
-- Allocates buffers and returns a BufferedIO buffered :: RawIO a => a -> a buffered = ...
This is where things get a bit hairy. The upper layers often want to know about the buffer, for instance when it needs to be flushed, or for performance reasons - e.g. encoding/decoding needs to have direct access to both buffers. So in GHC's I/O library buffering is a new class BufferedIO, consumed by the higher layer, and you can make a BufferedIO instance trivially given a RawIO instance. Not everything has a RawIO instance though: memory-mapped files just appear as buffers. Incedentally, there have been various designs around this theme in the past, e.g. http://www.haskell.org/haskellwiki/Library/Streams (with various problems IMO, but there are some good ideas there). Cheers, Simon