
On Jan 22, 2007, at 7:18 PM, Alexy Khrabrov wrote:
Greetings -- I'm looking at several FP languages for data mining, and was annoyed to learn that Erlang represents each character as 8 BYTES in a string which is just a list of characters. Now I'm reading a Haskell book which states the same.
The standard string type in Haskell is indeed a linked list of characters, with about 12 bytes of overhead per character.
Is there a more efficient Haskell string-handling method?
Yes! There is a library called Data.ByteString [1], it is included with the latest versions of GHC and Hugs, and is also available as a standalone package. Data.ByteString represents strings as packed arrays of bytes, so the overhead is about 1 byte per character. This library exhibits fantastic performance, rivaling C's speed while maintaining the elegance of Haskell. Cheers, Spencer Janssen [1] http://www.cse.unsw.edu.au/~dons/fps.html