
Ketil Malde
Johan Tibell
writes: It's not clear to me that using UTF-16 internally does make Data.Text noticeably slower.
I haven't benchmarked it, but I'm fairly sure that, if you try to fit a 3Gbyte file (the Human genome, say¹), into a computer with 4Gbytes of RAM, UTF-16 will be slower than UTF-8. Many applications will get away with streaming over data, retaining only a small part, but some won't.
Seeing as how the genome just uses 4 base "letters", wouldn't it be better to not treat it as text but use something else? Or do you just mean storage-wise to be able to be read in a text editor, etc. as well (in case someone is trying to do their mad genetic manipulation by hand)? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com