
Hello Simon, Wednesday, April 19, 2006, 4:45:19 PM, you wrote:
Believe me I've looked in detail at your streams library. Performance-wise it is great but the design needs to be reworked IMO.
The main problem is that it doesn't have enough type structure. There are many combinations of stream transformers that don't make sense, and should therefore be ruled out by the type system. There are operations that don't work on some streams. There should at the least be a type distinction between directly accessible memory streams, byte streams, and text streams. Additionally I would add separate classes for seekable and buffered streams. I believe these changes would improve performance by reducing the size of dictionaries.
you have written this in February, but this discussion was not finished due to my laziness. now i tried to split Stream interface to several parts. so 1) that you think - Stream should be base for all other stream classes or each Stream class should be independent? i.e. class (Stream m h) => InByteStream m h where vGetByte :: h -> m Word8 or class InByteStream m h where vGetByte :: h -> m Word8 ? 2) separation of Stream classes make some automatic definitions impossible. for example, released version contains vGetBuf implementation that is defined via vGetChar and works ok for streams that provide only vGetChar as base function. i tried to implement this via instances (TextStream => BlockStream in this case) but this immediately leads to the "incoherent instances" problem and so can't be really used. now, when i implemented (in my internal version) your suggestion about splitting TextStream, BlockStream and ByteStream classes, i just repeat such definitions across each stream that needs them. not very good, but seems that it the required sacrifice well, i can explain it better. in released version there are definitions: class Stream ... where ... vGetBuf h buf n = {- repeat vGetChar operation -} instance Stream StringBuffer where vGetChar = .... -- vGetBuf defined automatically instance Stream StringReader where vGetChar = .... -- vGetBuf defined automatically now, i should use the following: instance TextStream StringBuffer where vGetChar = .... instance BlockStream StringBuffer where vGetBuf h buf n = {- repeat vGetChar operation -} instance TextStream StringReader where vGetChar = .... instance BlockStream StringReader where vGetBuf h buf n = {- repeat vGetChar operation -} as you see, the same vGetBuf implementation are repeated for each Stream that have vGetChar as it's base operation. the following makes compiler not very happy: instance TextStream m h => BlockStream m h where vGetBuf h buf n = {- repeat vGetChar operation -} 3) the problems are substantially growed now - when i tried to separate input and output streams (the same will apply to detaching of seekable streams into the separate class). the problem is what i need either to provide 2 or 3 separate implementations for buffering of read-only, write-only and read-write streams or have some universal definition that should work even when base Stream don't provide part of operations. the last seems to be impossible - may be i don't understand enough Haskell's class system? let's see: data BufferedStream h = Buf h .... vClose (Buf h ...) = vPutBuf ... - flush buffer's contents how i can implement this if `h` may not support vPutBuf operation? especially to allow read/write streams to work??? instance InBlockStream m h => SomeClass m (BufferedStream h) where ... instance OutBlockStream m h => SomeClass m (BufferedStream h) where ... GHC will be unhappy if some `h` supports both InBlockStream and OutBlockStream interfaces (if you don't understand the problem, i will write more. i even not sure that i can formulate this problem clearly!) 4) what you mean by "There are many combinations of stream transformers that don't make sense" ? splitting Stream class to the BlockStream/TextStream/ByteStream or something else? 5) why you think we need separate class for buffered streams? the classes defines INTERFACE of stream, i.e. supported operations, not it's internal implementation
There are problems with memory management: as far as I can tell, the buffers are never freed if you just release a stream. You should be using ForeignPtrs instead of explicitly malloc'd buffers.
i can use ForeignPtr just to "hold" buffer and Ptr inside speed-critical code (and propose to do the same in your Binary module). i.e. something like this: data Buffer = Buf !ForeignPtr !FastMutPtr !FastMutPtr -- buffer, current ptr, end of buffer
Text encoding/decoding is inefficient. Not a design problem, of course, but having good text coding support is one of the main reasons for replacing the IO library.
i will write separate letter about this
I have a sketched design if you'd like to see it sometime, but I have to extract it from the partially-written code.
i can just read this code to save your time. are you mean "new i/o" library? btw, i still don't understand purpose of following code: -- ----------------------------------------------------------------------------- -- Connecting streams -- | An input stream created by 'streamOutputToInput' data StreamInputStream = forall s . OutputStream s => StreamInputStream s -- | Takes an output stream, and returns an input stream that will yield -- all the data that is written to the output stream. streamOutputToInput :: (OutputStream s) => s -> IO StreamInputStream streamOutputToInput = error "unimplemented: streamOutputToInput" -- | Takes an input stream and an output stream, and pipes all the -- data from the former into the latter. streamConnect :: (InputStream i, OutputStream o) => i -> o -> IO () streamConnect = error "unimplemented: streamInputToOutput" can you say how it should work for the library user? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com