[ANNOUNCE] text 0.10.0.0 - fast Unicode text handling

[Blog copy of the announcement herehttp://www.serpentine.com/blog/2010/10/22/text-0-10-0-0-is-here/ .] I just pushed it to bitbucket http://bitbucket.org/bos/text and githubhttp://github.com/bos/text, and you can install it from the text site on Hackagehttp://hackage.haskell.org/package/text in the usual way: cabal update cabal install text What's in this release? - New functions for reading integers and floating point numbershttp://hackage.haskell.org/packages/archive/text/0.10.0.0/doc/html/Data-Text..., an oft-requested feature. They're fast, too: they range from parity with their bytestring counterparts, to up to 4 times faster. You can expect to parse 3 to 4 million Int values per second out of a text file, or up 2 million Double values per second. They're also easy to use, give error messages, and come in strict and lazy variants. - UTF-8 decoding and encoding are now very fasthttp://www.serpentine.com/blog/2010/10/15/unicode-text-performance-improveme.... They're up to 9x faster than they were, and close to the performance of pure C UTF-8 decoding and encoding. - The Eq and Ord instances are also now very fast, up to 5x faster than beforehttp://www.serpentine.com/blog/2010/10/19/a-brief-tale-of-faster-equality/. They're now faster than the bytestring instances. - Several other common functions received drive-by performance improvements too. - Better protection against rare crashes on really huge volumes of data.

On 22 October 2010 16:38, Bryan O'Sullivan
[Blog copy of the announcement here.]
I just pushed it to bitbucket and github, and you can install it from the text site on Hackage in the usual way:
cabal update cabal install text
What's in this release?
New functions for reading integers and floating point numbers, an oft-requested feature. They're fast, too: they range from parity with their bytestring counterparts, to up to 4 times faster. You can expect to parse 3 to 4 million Int values per second out of a text file, or up 2 million Double values per second. They're also easy to use, give error messages, and come in strict and lazy variants.
UTF-8 decoding and encoding are now very fast. They're up to 9x faster than they were, and close to the performance of pure C UTF-8 decoding and encoding.
The Eq and Ord instances are also now very fast, up to 5x faster than before. They're now faster than the bytestring instances.
Several other common functions received drive-by performance improvements too.
Better protection against rare crashes on really huge volumes of data.
Is there a "best practices" guide on how to deal with Text values? For example, I assume that it's better to try and use Text throughout rather than continually packing String values (in my case, I'm looking at using Text for I/O in graphviz; should I then start using Text rather than String for all the parameters?). -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Fri, Oct 22, 2010 at 10:30 AM, Ivan Lazar Miljenovic
For example, I assume that it's better to try and use Text throughout rather than continually packing String values (in my case, I'm looking at using Text for I/O in graphviz; should I then start using Text rather than String for all the parameters?).
Yes. Just like with ByteString, frequent packs or unpacks kills performance. The only thing I pack is compile time constants (and, at least for ByteStrings, that's cheap). Johan
participants (3)
-
Bryan O'Sullivan
-
Ivan Lazar Miljenovic
-
Johan Tibell