
On Thu, 2010-05-06 at 15:37 -0400, Brent Yorgey wrote:
Two questions here, I think they might be related, perhaps even the same, but I am not sure, so I will ask both:
Q1: Re the function f below, I like the first implementation as it's "cleaner", but is the second implementation necessary for
On Thu, May 06, 2010 at 11:35:15AM -0700, Travis Erdman wrote: performance purposes?
-- g = some CPU intensive function
-- alternate 1 f a b = c + (g b) where c = dosomethingelse a (g b)
-- alternate 2 f a b = c + saveit where saveit = g b c = dosomethingelse a saveit
You need alternative 2. In general, GHC (and, I imagine, other Haskell compilers) do not do much common subexpression elimination, because in some cases it can be a *pessimization*. The classic example is
length [1..1000000] + length [1..1000000]
vs
let a = [1..1000000] in length a + length a
The first will execute in constant space, since each list will be lazily generated as needed by the length function and then the garbage collector will come along behind length and get rid of the nodes that have already been processed. However, in the second expression, the garbage collector cannot get rid of the nodes that are already processed by the first call to length, since the second call to length still needs the list. So the entire list [1..1000000] will end up being in memory at once.
Hmm:
cFoldr :: (a -> b -> b) -> (a -> c -> c) -> (b, c) -> [a] -> (b, c) cFoldr f g ~(b, c) [] = (b, c) cFoldr f g ~(b, c) (x:xs) = let (b', c') = cFoldr f g (b, c) xs in (f x b', g x c')
cFoldl' :: (b -> a -> b) -> (c -> a -> c) -> (b, c) -> [a] -> (b, c) cFoldl' f g bc xs0 = lgo bc xs0 where lgo (b, c) [] = (b, c) lgo (b, c) (x:xs) = let b' = f b x c' = g c x bc' = b' `seq` c' `seq` (b', c') in bc' `seq` lgo bc' xs
lengthFold :: Int -> a -> Int lengthFold !n _ = n + 1
size = 100000000
main = let a = [1..size] l = length a + length a in print $! l
200000000 8,031,190,480 bytes allocated in the heap 741,264 bytes copied during GC 1,920 bytes maximum residency (1 sample(s)) 28,216 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 15318 collections, 0 parallel, 0.14s, 0.18s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 7.15s ( 7.26s elapsed) GC time 0.14s ( 0.18s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 7.29s ( 7.44s elapsed) %GC time 1.9% (2.5% elapsed) Alloc rate 1,123,257,562 bytes per MUT second Productivity 98.1% of total user, 96.1% of total elapsed ./a.out +RTS -s 7.29s user 0.05s system 98% cpu 7.450 total
main = print $! length [1..size] + length [1..size]
200000000 16,062,318,576 bytes allocated in the heap 1,476,904 bytes copied during GC 2,000 bytes maximum residency (1 sample(s)) 27,992 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 30637 collections, 0 parallel, 0.29s, 0.37s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 15.57s ( 15.90s elapsed) GC time 0.29s ( 0.37s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 15.87s ( 16.27s elapsed) %GC time 1.8% (2.3% elapsed) Alloc rate 1,031,313,475 bytes per MUT second Productivity 98.2% of total user, 95.7% of total elapsed ./a.out +RTS -s 15.87s user 0.11s system 98% cpu 16.272 total
main = print $! uncurry (+) (cFoldl' lengthFold lengthFold (0, 0) [1..size])
200000000 13,652,979,960 bytes allocated in the heap 2,089,600 bytes copied during GC 2,056 bytes maximum residency (1 sample(s)) 27,984 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 26041 collections, 0 parallel, 0.30s, 0.39s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 21.92s ( 23.44s elapsed) GC time 0.30s ( 0.39s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 22.22s ( 23.84s elapsed) %GC time 1.3% (1.7% elapsed) Alloc rate 622,864,614 bytes per MUT second Productivity 98.7% of total user, 92.0% of total elapsed ./a.out +RTS -s 22.22s user 0.16s system 93% cpu 23.843 total All are compiled with optimizations. Regards