
On Wednesday 01 September 2010 06:08:07, Bryan O'Sullivan wrote:
New in this release:
- Substantial performance improvements. - Bug fixes, and better quality assurance via automated QuickCheck and HPC tests and coverage. - An even nicer API than before.
Good job.
The text library provides an efficient packed, immutable Unicode text type (both strict and lazy), with a powerful loop fusion optimization framework.
The 'Text' type represents Unicode character strings, in a time and space-efficient manner. This package provides text processing capabilities that are optimized for performance critical use, both in terms of large data quantities and high speed.
Hmm, I have to throw a few grains of salt into that. File: big.txt from Peter Norvig's spelling checker (6.2MiB) 1. read the file and output it to stdout (redirected to /dev/null) ByteString.Lazy: <= 0.01s text-0.7.2.1 (Lazy): 0.68s text-0.8.0.0 (Lazy): 0.26s String: 0.35s ByteString doesn't do any de/encoding or validity checking, so it's clear that can be way faster than I/O which does. The interesting part is the comparison between text and vanilla String I/O, the difference is smaller than I expected for text-0.8.0.0. It came as a surprise that text-0.7.2.1 actually had much worse performance than vanilla Strings. Okay, now we know how long the I/O takes, let's measure some work. 2. read the file, replace some string, output to stdout (redirected to /dev/null) a) "Professor" -> "Teacher" stringsearch (Lazy): 0.02s, 2MB total text-0.7.2.1 (Lazy): 0.92s, 28MB total text-0.7.2.1 (Strict): 0.90s, 65MB total text-0.8.0.0 (Lazy): 0.44s, 12MB total text-0.8.0.0 (Strict): 0.36S, 55MB total String (naive): 0.51s, 1MB total String (KMP): 0.46s, 1MB total b) "deteriorate" -> "worsen" stringsearch (Lazy): 0.02s, 2MB total text-0.7.2.1 (Lazy): 1.01s, 39MB total text-0.7.2.1 (Strict): 0.91s, 65MB total text-0.8.0.0 (Lazy): 0.45s, 17MB total text-0.8.0.0 (Strict): 0.36s, 55MB total String (naive): 0.52s, 1MB total String (KMP): 0.50s, 1MB total c) "hen" -> "here" stringsearch (Lazy): 0.04s, 2MB total text-0.7.2.1 (Lazy): 1.10s, 2MB total text-0.7.2.1 (Strict): 0.95s, 65MB total text-0.8.0.0 (Lazy): 0.66s, 2MB total text-0.8.0.0 (Strict): 0.41s, 55MB total String (naive): 0.53s, 1MB total String (KMP): 0.51s, 1MB total d) "kre" -> "here" stringsearch (Lazy): 0.03s, 2MB total text-0.7.2.1 (Lazy): 1.23s, 39MB total text-0.7.2.1 (Strict): 0.96s, 65MB total text-0.8.0.0 (Lazy): 0.65s, 17MB total text-0.8.0.0 (Strict): 0.40s, 55MB total String (naive): 0.51s, 1MB total String (KMP): 0.47s, 1MB total The performance of text-0.8.0.0 has improved significantly over that of text-0.7.2.1 (for the tested features), the improvement of the replacing algorithm is however not as impressive as that of I/O. Replacing isn't much faster than for Strings, however, for (some?) very short patterns, even slower. That's not so good. What's *really* bad is the space behaviour. My tests suggest that the space usage (for Text.Lazy) depends on the index of the pattern to replace, the later it appears (if at all), the more memory is used by text. Apparently it doesn't deliver the beginning before a match has been found (or the whole string has been scanned). That space leak ought to be fixed. The space behaviour of the strict variant doesn't show that problem, it seems to only depend on the file, but it uses disproportionally much memory.