
Hello Strings in haskell seem to be one major source of problems. I try to outline some of the problems I have faced and possible solutions. Size Handling large amounts of text as haskell strings is currently not possible as the representation (list of chars) is very inefficient. Serialization Most of the time when serializing a string, we want to handle it as an array of Word8. With lists one has to account for cycles and infinite length which make the encoding much more verbose and slow. The problem is not that some strings may not be trivially serializable, but rather that it is hard to find the easy cases. Typeclass instances It is currently hard to define typeclass instances for strings as String ( = [Char]) overlaps with [a]. Implementations provide solutions for this, but there should not be a need for workarounds in the first place. Show/Read The current Show/Read implementation makes it impossible to use with large strings. A read implementation needs to traverse the file looking for the terminating '"' and handling escape codes. A better solution would be to have an efficient (size prefixed) representation, maybe in a separate Serializable typeclass. But this would need the ablity to derive Serializable instances for algebraic datatypes automatically to avoid lots of useless code. Possible solutions The optimal solution should be compact and fast. A list of chunks is one solution - it would make it possible to e.g. mmap files (modulo encoding) and support fast concatenation. In addition one would need to support infinite and cyclic structures. Adding an alternative which corresponds to the current string abstraction would be sufficient. type CharT = Word8 data Str = S [(Ptr CharT, Int)] | I [CharT] A major problem is breaking old code. The main problem is that many functions are defined only on lists which forces strings to be lists. Making the functions polymorphic would solve a lot of problems and not only for Strings. There should be no performance penalty (at least in theory) when the compiler knows which instance is used at compile time. - Einar Karttunen