
On Mon, Jan 22, 2007 at 05:18:19PM -0800, Alexy Khrabrov wrote:
Greetings -- I'm looking at several FP languages for data mining, and was annoyed to learn that Erlang represents each character as 8 BYTES in a string which is just a list of characters. Now I'm reading a Haskell book which states the same.
The book is lying - the size of strings is unspecified and implementation dependant. In GHC String is 12 or 20 bytes per character, depending on construction details.
Is there a more efficient Haskell string-handling method?
Yes! Data.ByteString.* implements packed strings of bytes. They are less lazy, and don't support unicode, but they are small (8 bits / character) and fast (I have 100 MBy/s disks and my ByteString-based throwaway filters are IO-bound).
Which functional language is the most suitable for text processing?
If you expected any answer other than Haskell, you asked on the wrong list. :) Stefan