Re: [Haskell-cafe] Re: String vs ByteString

18 Aug 2010


      On Aug 17, 2010, at 11:51 PM, Ketil Malde wrote:
...
Yitzchak Gale <gale@sefer.org> writes:
...
I don't think the genome is typical text.
I think the typical *large* collection of text is text-encoded data, and
not, for lack of a better word, literature.  Genomics data is just an
example.
I have a collection of 100,000 patents I'm working with.
5.5GB of XML, most of it (US-)English text.
After stripping out the XML markup, it's 4GB of text.
It's a random sample from some 14 million patents I could
have access to, but 100,000 was more than enough.

Re: [Haskell-cafe] Re: String vs ByteString

Richard O'Keefe