
Excerpts from Alexander Dunlap's message of Sun Mar 08 00:23:01 -0600 2009:
For a while now, we have had Data.ByteString[.Lazy][.Char8] for our fast strings. Now we also have Data.Text, which does the same for Unicode. These seem to be the standard for dealing with lists of bytes and characters.
Now we also have the storablevector, uvector, and vector packages. These seem to be also useful for unpacked data, *including* Char and Word8 values.
What is the difference between bytestring and these new "fast array" libraries? Are the latter just generalizations of the former?
Thanks for any insight anyone can give on this.
Alex
Data.Text provides functions for unicode over bytestrings, with several encoding/decoding methods. So, I think that bytestring+text now solves the general problem with the slow String type - we get various international encodings, and fast, efficient packed strings. (It's also worth mentioning utf8-string, which gives you utf8 over bytestrings. text gives you more encodings and is probably still quite efficient, however.) But this is pretty much a separate effort to that of packages like uvector/vector etc. etc.. To clarify, uvector and vector are likely to be merged in the future I think - vector is based on the idea of 'recycling arrays' so that array operations are still very efficient, while uvector only has the tested stream fusion technique behind it. Actually, I think the inevitable plan is to merge the technology behind both vector and uvector into the Data Parallel Haskell project. Array recylcing and stream fusion goes into creating extremely efficient sequential code, while the vectorisation pass turns that into efficient multicore code at the same time. In any case, I suppose that hypothetically if someone wanted to use a package like uvector to create an efficient string type, they could, but if they want that, why not just use bytestring? It's already optimized, battle tested and in extremely wide use. I think some library proliferation is good; in this case, the libraries mentioned here are really for some different purposes, and that's great, because they all lead to some nice, fast code with low conceptual overhead when put together (hopefully...) But I'm not even going to begin examining/comparing the different array interfaces or anything, because that's been done many times here, so you best check the archives if you want the 'in-depth' on the matter. Austin