Re: [Haskell-cafe] ANNOUNCE: text 0.8.0.0, fast Unicode text support

8 Sep 2010

      On Wednesday 08 September 2010 13:46:13, Daniel Fischer wrote:
...
My timings are quite different, but that's probably because 6.12.3's
inliner doesn't give the full fusion benefit, so it'll improve
automatically with the next GHC release.
Or maybe not so much. Just built the latest source bundle from the HEAD 
branch and

6.12.3:
./nbench lazyText bigfile krkx rabi +RTS -s
   1,796,245,884 bytes allocated in the heap
       1,125,596 bytes copied during GC
     110,398,048 bytes maximum residency (8 sample(s))
      38,897,164 bytes maximum slop
             191 MB total memory in use (4 MB lost due to fragmentation)

  Generation 0:  3043 collections,     0 parallel,  3.06s,  3.17s elapsed
  Generation 1:     8 collections,     0 parallel,  0.00s,  0.01s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    6.03s  (  6.48s elapsed)
  GC    time    3.07s  (  3.18s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    9.10s  (  9.66s elapsed)

  %GC time      33.7%  (33.0% elapsed)

  Alloc rate    297,965,335 bytes per MUT second

  Productivity  66.3% of total user, 62.4% of total elapsed

6.13.20100831:
./hdbench lazyText bigfile krkx rabi +RTS -s                                            
     543,409,296 bytes allocated in the heap                                            
         699,364 bytes copied during GC                                                 
     110,956,008 bytes maximum residency (8 sample(s))                                  
      38,893,040 bytes maximum slop                                                     
             191 MB total memory in use (4 MB lost due to fragmentation)                

  Generation 0:   652 collections,     0 parallel,  0.44s,  0.43s elapsed
  Generation 1:     8 collections,     0 parallel,  0.00s,  0.01s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    5.42s  (  5.77s elapsed)
  GC    time    0.44s  (  0.44s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    5.86s  (  6.21s elapsed)

  %GC time       7.5%  (7.1% elapsed)

  Alloc rate    100,327,729 bytes per MUT second

  Productivity  92.5% of total user, 87.2% of total elapsed

Sure, that's a significant improvement, but that's mostly the GC time, with 
-A64M, 6.12.3 is much closer.

However, for ByteStrings, performance got worse:

6.12.3:
./nbench lazyBS bigfile krkx rabi +RTS -s                                             
      90,127,112 bytes allocated in the heap                                          
          31,116 bytes copied during GC                                               
         103,396 bytes maximum residency (1 sample(s))                                
          39,964 bytes maximum slop                                                   
               2 MB total memory in use (0 MB lost due to fragmentation)              

  Generation 0:   158 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.10s  (  0.20s elapsed)
  GC    time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.11s  (  0.20s elapsed)

  %GC time       3.6%  (1.8% elapsed)

  Alloc rate    834,456,211 bytes per MUT second

  Productivity  92.9% of total user, 50.9% of total elapsed

6.13.20100831:
./hdbench lazyBS bigfile krkx rabi +RTS -s                                              
     478,710,672 bytes allocated in the heap                                            
         164,904 bytes copied during GC                                                 
          86,992 bytes maximum residency (1 sample(s))                                  
          44,080 bytes maximum slop                                                     
               2 MB total memory in use (0 MB lost due to fragmentation)                

  Generation 0:   864 collections,     0 parallel,  0.00s,  0.01s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.17s  (  0.28s elapsed)
  GC    time    0.00s  (  0.01s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.18s  (  0.29s elapsed)

  %GC time       2.3%  (4.1% elapsed)

  Alloc rate    2,783,039,776 bytes per MUT second

  Productivity  95.5% of total user, 57.3% of total elapsed

Not only got it slower, it also allocates more than five times as much as 
before.
...
...
Given that the space involved is just 121KB
maximum residency while processing a 124MB file, I'm not concerned
about it.
I wouldn't either.
But it needs more space here, so I am concerned.
...
...
And the time required isn't a bad place to start from, I think.
By the way, as this implies, I can't reproduce your space behaviour at
all.
That's surprising.
Have you made sure to replace a pattern which does not occur in the
text? Can you reproduce the behaviour with a) Data.List.intersperse
instead of the lazier version now used, b) ghc-6.12.* instead of HEAD?
Anyway, I would've thought that with
split pat src
| null pat        = emptyError "split"
    | isSingleton pat = splitBy (== head pat) src
    | otherwise       = go 0 (indices pat src) src
where
    go  _ []     cs = [cs]
    go !i (x:xs) cs = let h :*: t = splitAtWord (x-i) cs
                      in  h : go (x+l) xs (dropWords l t)
    l = foldlChunks (\a (T.Text _ _ b) -> a + fromIntegral b) 0 pat
you can't start returning chunks before it's known whether the list of
indices is empty, so split would have O(index of pattern) space
behaviour.
If HEAD manages to make the chunks available before they are complete
(before it's known how long they will be), it's even awesomer than I'd
have dared to hope.
Okay, so I'll have to try HEAD.
Doesn't do much here. Still leaking, still > 20 times slower than 
ByteString, even for the now far worse ByteString times.
What's going on?