
Hello Simon, Wednesday, February 08, 2006, 2:58:30 PM, you wrote: SM> I would prefer to see more type structure, rather than putting SM> everything in the Stream class. You have classes ByteStream, SM> BlockStream etc, but these are just renamings of the Stream class. There SM> are many compositions that are illegal, but we don't find out until SM> runtime; it would make a lot more sense to me to expose this structure SM> in the type system. i initially used normal splitted classes (vGetBuf was in BlockStream) and so on, but come accross problems with the type classes system and decided to simplify the design. now i feel himself more confident with the classes, feel that i know source of my previous problems and therefore slowly migrate back to the splitted classes design. the library as published is just on the half of this way. but i know some limitations. that is the one problem: data BinHandle = forall h . (Stream IO h) => BinH h with such definition, i cannot use for BinHandles any operations that is aside of Stream interface. BinHandles, like in the NewBinary library, can be constructed from files or memory buffers, and memory buffers should support "saveToFile" operation. this operation require the "MemoryStream" interface, what implement by memory buffers, but not by files. in the old implementation, all operations was in the Stream interface, so i can implement "saveToFile" and this operation generated run-time error when used not with memory buffers. now it's inpossible to use it, i need to add second field of type (Maybe MemoryStream) the same problem will appear for all "forall h . Stream h" datatypes - if they need some operations from additional interfaces, then additional fields should be introduced, quantified by these interfaces moreover, splitting the Streams interface will require from the library users to give more classes in defining context for their functions, like the: process :: (Stream IO h, Seekable IO h, Buffered h) => h -> IO () that is not so good, especially if adding new interfaces means slowdown of calls to this function SM> My view is that the most basic level of stream is a byte stream, SM> supporting only two operations: read an array of bytes and write an SM> array of bytes, i.e. vGetBuf/vPutBuf. This makes implementing a stream, SM> or transformer, much easier and shorter. it is exact what is implemented, except for: there are two types of low-level streams. for memory-resident streams it is inefficient to work through getbuf/putbuf operations. MemoryStream interface impelements instead vReceiveBuf operation which just returns address and size of next data block in memory. accordingly, the buffering implementation slightly changes - it is the reason why i implemented FileBuffering.hs (working through GetBuf/PutBuf) and MemoryBuffering.hs (working through ReceiveBuf/SendBuf) moreover, there are third type of streams - based on the getchar/putchar or getbyte/putbyte operations. example of former is StringBuffer, later - UArray Int Word8 (not implemented, but possible in future) SM> Also I'd like to see separate SM> input/output streams for even more type safety, and I believe SM> simplicity, it will be great! but it is very uneasy and even seems impossible: 1) this will prevent dividing streams into the MemoryStream/BlockStream/ByteStream, what i like you consider as more important. it is impossible to say what InputStream BlockStream implements only vGetBuf, while OutputStream BlockStream implements only vPutBuf operation 2) such division will require to implement 2 or 3 (+ReadWrite) times more Stream types than now. Say, instead of FD we will get InputFD and OutputFD, instead of CharEncoding transformer - two transformers and so on. most of the functionality in Input and Ouput variants will be repeated (because this functionality don't depend on input/output mode) and in addition to the current large lists of passed calls like the: vIsEOF (WithEncoding h _) = vIsEOF h vMkIOError (WithEncoding h _) = vMkIOError h vReady (WithEncoding h _) = vReady h vIsReadable (WithEncoding h _) = vIsReadable h we will get the same lists in 2 or 3 repetitions!!! 3) i don't think that we can completely throw away the r/w streams, they can be required for example for database-style access. and if we need to implement this type of streams, our win in separating implementations of Input and Output streams will become a loss moreover, difference between input and output streams are well-known and errors in this area can be easily spotted by the users. so i think that such division would be great, but it requires too much work and will essentially compilcate the library. when the Haskell class system will be essenially improved, it will have sense. differences between MemoryStream, ByteStream and so on is not so obvious (because it's specific to this library), so dividing them should help users to spot errors earlier SM> but this is less important than separating byte streams from SM> text streams. I believe certain other operations would benefit from SM> being moved into separate classes: eg. vSeek into a Seekable class, SM> vSetBuffering into a Buffered class, and so on. at this moment the following classes exists: Stream BlockStream (implements vGetBuf/vPutBuf, used for FD) MemoryStream (implements vReceiveBuf/vSendBuf, used for MMFile and MemBuf) ByteStream (implements vGetByte/vPutByte, vGetChar/vPutChar and other text i/o operations) Stream class implements all other operations (just now it implements everything, but i will move the methods). i agree about moving seek/tell operations into the separate class, but not sure about buffering - its absence don't bother anyone and using the separate class will just create problems. moreover, vSetBuffering is not available for buffered MemoryStream. i prefer to distinguish buffered streams by implementation of ByteStream interface - this interface tells that byte- and text-oriented i/o is available, irrespective of concrete implementation why ByteStream implements both the byte and text i/o? i think that in most cases people are still using latin-1 text i/o - i.e. each char is just 8 bits without any encoding. because that type of text i/o don't need any complex implememtation, each time when byte i/o is implemented, text i/o springs automatically. on the other side, utf-8 encoding is rare and therefore separate transformer is used to implement it - it transforms each char i/o call into several byte i/o calls. i definitely against implementing text i/o only through the encoding transformer because it will slowdown the i/o while in 90% cases encoding will not be used. moreover, there are a 3 Stream types, developed by John Goerzen, where Char is used as minimal unit of stream data. this implementation allows us to use these streams to carry full unicode char in each Char or use each Char as the 8-bit-only container. if you, knowing all these, still recommend to change something, i'm all ears :) SM> This will improve SM> performance too - your Stream class has dictionaries with 20+ elements. here you are king - i don't know whether it's better to have one class with 20 methods or 2 classes with 5 methods each in the function context? SM> I see that buffering works on vPutChar/vGetChar, and yet you seem to be SM> buffering bytes - which is it? Are you supposed to buffer before or SM> after doing character encoding? It seems before, because otherwise SM> buffering will strip out all but the low 8 bits of each character. SM> Using a more explicit type structure would help a lot here. buffer contains bytes, which are read/written by the getbyte/putbyte operations as well as the all text i/o. this is latin1-only solution, of course. if one need utf-8 encoding, he need to apply CharEncoding transformer SM> Incedentally, I'm suprised that you can use list-based character SM> encoding/decoding and still get good performance, I expected to need to SM> do encoding directly between buffers. i reported only the speed of the buffering transformers. this don't include speed of char encoding that should be very low at this time. i want to try monadic operations here, and expect to get reasonable performance, 20-30 mb/s: -- | Read UTF-8 encoded char using `action` to get each byte utf8Decode :: (Monad m) => m Word8 -> m Char utf8Decode action = do c <- action if c < 0x80 then return (chr c) .... -- | Write UTF-8 encoded char using `action` to put each byte utf8Encode :: (Monad m) => Char -> (Word8 -> m ()) -> m () utf8Encode c action | ord c < 0x80 = action (ord c) .... vGetChar h = utf8Decode (vGetByte h) vPutChar h c = utf8Encode c (vPutByte h) what you will say about this plan? implementing this is much easier than using iconv (although you are already implemented iconv usage), and moreover it don't need presence of iconv. moreover, these monadic converters is anyway required for the "instance Binary Char" implementation SM> Still, as I said, I think the general approach is excellent, and is SM> definitely heading in the right direction. SM> Oh, and some of the code is GPL'd, which is a problem for incorporation SM> in standard libraries. This is just something to bear in mind if the SM> aim is for this to be a candidate for a/the standard IO library. i will ask John Goerzen about this -- Best regards, Bulat mailto:bulatz@HotPOP.com