Hello,

I was looking at cleaning up my refactoring a core loop of template rendering to go from a loop with many parameters

loop :: RenderConfig -> BlockMap -> InputBucket m -> Builder -> [Pieces] -> ExceptT StrapError m Builder

to a looped state monad transformer

loop :: [Pieces] -> RenderT m Builder

newtype RenderT m a = RenderT

{ runRenderT :: ExceptT StrapError (StateT (RenderState m) m) a

} deriving ( Functor, Applicative, Monad, MonadIO )

data RenderState m = RenderState

{ position :: SourcePos

, renderConfig :: RenderConfig

, blocks :: BlockMap

, bucket :: InputBucket m

}

however, there is a big slow down (about 6-10x) using a StateT. I think it might have something to do with laziness but I am not exactly sure of where to begin in tracking it down. Swapping out the Lazy State to a Strict State helps a little (only a 5x slow down)

You can find some of the processing code here:

https://github.com/hansonkd/StrappedTemplates/blob/321a88168d54943fc217553c873f188797c0d4f5/src/Text/Strapped/Render.hs#L189

With my old loop commented out.

Its messy right now since I am just trying a number of different approaches. I did some more work factoring out the lifts, trying different iterations of foldlM and stuff but that didn't have that much of an effect on performance.

After profiling I see in the StateT, the report has a lot more CAFs and garbage collecting.

Here is the profiling report from my original version w/o StateT

http://lpaste.net/108995

Slow version with StateT

http://lpaste.net/108997

Here is the "makeBucket" function that is referenced (it is the same in both state and nonstate):

https://github.com/hansonkd/StrappedTemplates/blob/321a88168d54943fc217553c873f188797c0d4f5/examples/big_example.hs#L24

Looking at stacked overflow and the official docs I have gotten an idea of what is going on. The heaps generated between them tells me that a lot more memory is being allocated to lists. These heaps were generated running my render function against a template with nested loops and a list of elements.

http://imgur.com/a/2jOIf

I am hoping that maybe someone could give me a hint at what to look at next. I've played around with Strictness and refactoring loops to no avail and now am kind of stuck. Any help would be appreciated.

Kyle Hanson