
Bulat Ziganshin
MQK> It should be possible to use iconv for recoding. Iconv works on MQK> blocks and it should not be applied to one character at a time.
recoding don't need any startup.
Calling iconv (or other similar routine) does need startup. And you really don't want to reimplement all encoders/decoders by hand in Haskell. Processing a stateful encoding needs the time to pick up the state and convert the materialized state into a form used during recoding Dispatching to the encoding function (usually not known statically) takes time. When we generically convert an encoder which fails for invalid data, to an encoder which replaces invalid data with U+FFFD or question marks, setting up exception handlers takes time. These are all little times, but they can be avoided. Converting newlines takes time, and it's very similar to character recoding. It should be done transparently; network protocols often use CR-LF newlines, and it's painful to remember to output a '\r' before every newline by hand. It should be done on top of character recoding; consider UTF-16, where newline conversion works in terms of characters rather than bytes. Some conversions can be implemented with tight loops which keep data in machine registers. The tightness matters when there are many iterations; loop startup is amortized by buffering. Buffering can provide arbitrarily far lookahead, arbitrarily long putback, and checking for end of stream while logically not moving the current position. But this works only if buffering is the last stage which changes stream contents.
MQK> Byte streams and character streams should be distinguished in types, MQK> preferably by class-constrained parametric polymorphism. In particular
so that vGetBuf, vGetChar, and getWord32 can't be used at the same stream?
You can get bytes from a given byte stream, and get bytes from a character stream put on top of that byte stream. Buf if the protocol mixes bytes with characters and is specified in terms of bytes, it's probably better to work in terms of bytes, and convert byte strings to character strings after determining where they end. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/