
On Friday 13 August 2010 17:57:36, Bryan O'Sullivan wrote:
3. Some commonly used functions, such as substring searching, are *way*faster than their ByteString counterparts.
That's an unfortunate example. Using the stringsearch package, substring searching in ByteStrings was considerably faster than in Data.Text in my tests. Replacing substrings blew Data.Text to pieces even, with a factor of 10-65 between ByteString and Text (and much smaller memory footprint). stringsearch (Data.ByteString.Lazy.Search): $ ./bmLazy +RTS -s -RTS ../../bigfile Gutenberg Hutzenzwerg > /dev/null ./bmLazy ../../bigfile Gutenberg Hutzenzwerg +RTS -s 92,045,816 bytes allocated in the heap 31,908 bytes copied during GC 103,368 bytes maximum residency (1 sample(s)) 39,992 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 158 collections, 0 parallel, 0.01s, 0.00s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.07s ( 0.17s elapsed) GC time 0.01s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.08s ( 0.17s elapsed) %GC time 10.5% (2.1% elapsed) Alloc rate 1,353,535,321 bytes per MUT second Productivity 89.5% of total user, 40.1% of total elapsed Data.Text.Lazy: $ ./textLazy +RTS -s -RTS ../../bigfile Gutenberg Hutzenzwerg > /dev/null ./textLazy ../../bigfile Gutenberg Hutzenzwerg +RTS -s 4,916,133,652 bytes allocated in the heap 6,721,496 bytes copied during GC 12,961,776 bytes maximum residency (58 sample(s)) 12,788,968 bytes maximum slop 39 MB total memory in use (1 MB lost due to fragmentation) Generation 0: 8774 collections, 0 parallel, 0.70s, 0.73s elapsed Generation 1: 58 collections, 0 parallel, 0.03s, 0.03s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 9.87s ( 10.23s elapsed) GC time 0.73s ( 0.75s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 10.60s ( 10.99s elapsed) %GC time 6.9% (6.9% elapsed) Alloc rate 497,956,181 bytes per MUT second bigfile is a ~75M file. The point of the more adequate API for text manipulation stands, of course. Cheers, Daniel