
On Wednesday 08 September 2010 13:46:13, Daniel Fischer wrote:
My timings are quite different, but that's probably because 6.12.3's inliner doesn't give the full fusion benefit, so it'll improve automatically with the next GHC release.
Or maybe not so much. Just built the latest source bundle from the HEAD branch and 6.12.3: ./nbench lazyText bigfile krkx rabi +RTS -s 1,796,245,884 bytes allocated in the heap 1,125,596 bytes copied during GC 110,398,048 bytes maximum residency (8 sample(s)) 38,897,164 bytes maximum slop 191 MB total memory in use (4 MB lost due to fragmentation) Generation 0: 3043 collections, 0 parallel, 3.06s, 3.17s elapsed Generation 1: 8 collections, 0 parallel, 0.00s, 0.01s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 6.03s ( 6.48s elapsed) GC time 3.07s ( 3.18s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 9.10s ( 9.66s elapsed) %GC time 33.7% (33.0% elapsed) Alloc rate 297,965,335 bytes per MUT second Productivity 66.3% of total user, 62.4% of total elapsed 6.13.20100831: ./hdbench lazyText bigfile krkx rabi +RTS -s 543,409,296 bytes allocated in the heap 699,364 bytes copied during GC 110,956,008 bytes maximum residency (8 sample(s)) 38,893,040 bytes maximum slop 191 MB total memory in use (4 MB lost due to fragmentation) Generation 0: 652 collections, 0 parallel, 0.44s, 0.43s elapsed Generation 1: 8 collections, 0 parallel, 0.00s, 0.01s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 5.42s ( 5.77s elapsed) GC time 0.44s ( 0.44s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.86s ( 6.21s elapsed) %GC time 7.5% (7.1% elapsed) Alloc rate 100,327,729 bytes per MUT second Productivity 92.5% of total user, 87.2% of total elapsed Sure, that's a significant improvement, but that's mostly the GC time, with -A64M, 6.12.3 is much closer. However, for ByteStrings, performance got worse: 6.12.3: ./nbench lazyBS bigfile krkx rabi +RTS -s 90,127,112 bytes allocated in the heap 31,116 bytes copied during GC 103,396 bytes maximum residency (1 sample(s)) 39,964 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 158 collections, 0 parallel, 0.00s, 0.00s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.10s ( 0.20s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.11s ( 0.20s elapsed) %GC time 3.6% (1.8% elapsed) Alloc rate 834,456,211 bytes per MUT second Productivity 92.9% of total user, 50.9% of total elapsed 6.13.20100831: ./hdbench lazyBS bigfile krkx rabi +RTS -s 478,710,672 bytes allocated in the heap 164,904 bytes copied during GC 86,992 bytes maximum residency (1 sample(s)) 44,080 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 864 collections, 0 parallel, 0.00s, 0.01s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.17s ( 0.28s elapsed) GC time 0.00s ( 0.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.18s ( 0.29s elapsed) %GC time 2.3% (4.1% elapsed) Alloc rate 2,783,039,776 bytes per MUT second Productivity 95.5% of total user, 57.3% of total elapsed Not only got it slower, it also allocates more than five times as much as before.
Given that the space involved is just 121KB maximum residency while processing a 124MB file, I'm not concerned about it.
I wouldn't either.
But it needs more space here, so I am concerned.
And the time required isn't a bad place to start from, I think.
By the way, as this implies, I can't reproduce your space behaviour at all.
That's surprising. Have you made sure to replace a pattern which does not occur in the text? Can you reproduce the behaviour with a) Data.List.intersperse instead of the lazier version now used, b) ghc-6.12.* instead of HEAD?
Anyway, I would've thought that with
split pat src
| null pat = emptyError "split" | isSingleton pat = splitBy (== head pat) src | otherwise = go 0 (indices pat src) src
where go _ [] cs = [cs] go !i (x:xs) cs = let h :*: t = splitAtWord (x-i) cs in h : go (x+l) xs (dropWords l t) l = foldlChunks (\a (T.Text _ _ b) -> a + fromIntegral b) 0 pat
you can't start returning chunks before it's known whether the list of indices is empty, so split would have O(index of pattern) space behaviour.
If HEAD manages to make the chunks available before they are complete (before it's known how long they will be), it's even awesomer than I'd have dared to hope. Okay, so I'll have to try HEAD.
Doesn't do much here. Still leaking, still > 20 times slower than ByteString, even for the now far worse ByteString times. What's going on?