
On Thu, Dec 17, 2020 at 12:15:19AM -0500, Viktor Dukhovni wrote:
I don't know whether indirection via "foldr = go" with `go` recursive and `foldr` "INLINABLE" and ditto for `traverse` would create further opportunities for optimisation when the instances are in a separate module. Someone else might clear that up, or you could benchmark and see. IIRC, recursive functions don't directly inline without such tweaks, but the tweaks might not help much.
Naive experiments seem to suggest that inlining via a `go` indirection that captures `f` is somewhat helpful, but the real win is using a right fold, vs. your original flatten implementation. Here's a comparison with a somewhat branchy deep structure. Flatten via Foldable: --------------------- 2,281,024 bytes allocated in the heap 3,312 bytes copied during GC 44,408 bytes maximum residency (1 sample(s)) 25,224 bytes maximum slop 17 MiB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s Gen 1 1 colls, 0 par 0.000s 0.000s 0.0005s 0.0005s INIT time 0.000s ( 0.000s elapsed) MUT time 0.002s ( 0.002s elapsed) GC time 0.000s ( 0.000s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 0.003s ( 0.003s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 1,234,320,346 bytes per MUT second Productivity 68.0% of total user, 68.0% of total elapsed The posted flatten (1000x more allocations! 200x runtime): -------------------- 2,651,027,056 bytes allocated in the heap 11,386,968 bytes copied during GC 164,424 bytes maximum residency (2 sample(s)) 29,320 bytes maximum slop 27 MiB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 104 colls, 0 par 0.015s 0.015s 0.0001s 0.0004s Gen 1 2 colls, 0 par 0.001s 0.001s 0.0007s 0.0010s INIT time 0.001s ( 0.001s elapsed) MUT time 0.715s ( 0.715s elapsed) GC time 0.016s ( 0.016s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 0.732s ( 0.732s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 3,705,952,327 bytes per MUT second Productivity 97.7% of total user, 97.7% of total elapsed -- Viktor. {-# LANGUAGE DeriveFunctor #-} module Nest ( Nest(..) ) where data Nest a = Nil | Cons a (Nest a) | Nest (Nest a) (Nest a) deriving (Show, Functor) instance Foldable Nest where {-# INLINE foldr #-} foldr f = go where go z Nil = z go z (Cons x n) = f x (go z n) go z (Nest x y) = go (go z y) x instance Traversable Nest where {-# INLINE traverse #-} traverse f = go where go Nil = pure Nil go (Cons x n) = Cons <$> f x <*> go n go (Nest x y) = Nest <$> go x <*> go y