Performance of StateT and best practices for debugging

Hello, I was looking at cleaning up my refactoring a core loop of template rendering to go from a loop with many parameters loop :: RenderConfig -> BlockMap -> InputBucket m -> Builder -> [Pieces] -> ExceptT StrapError m Builder to a looped state monad transformer loop :: [Pieces] -> RenderT m Builder newtype RenderT m a = RenderT { runRenderT :: ExceptT StrapError (StateT (RenderState m) m) a } deriving ( Functor, Applicative, Monad, MonadIO ) data RenderState m = RenderState { position :: SourcePos , renderConfig :: RenderConfig , blocks :: BlockMap , bucket :: InputBucket m } however, there is a big slow down (about 6-10x) using a StateT. I think it might have something to do with laziness but I am not exactly sure of where to begin in tracking it down. Swapping out the Lazy State to a Strict State helps a little (only a 5x slow down) You can find some of the processing code here: https://github.com/hansonkd/StrappedTemplates/blob/321a88168d54943fc217553c8... With my old loop commented out. Its messy right now since I am just trying a number of different approaches. I did some more work factoring out the lifts, trying different iterations of foldlM and stuff but that didn't have that much of an effect on performance. After profiling I see in the StateT, the report has a lot more CAFs and garbage collecting. Here is the profiling report from my original version w/o StateT http://lpaste.net/108995 Slow version with StateT http://lpaste.net/108997 Here is the "makeBucket" function that is referenced (it is the same in both state and nonstate): https://github.com/hansonkd/StrappedTemplates/blob/321a88168d54943fc217553c8... Looking at stacked overflow and the official docs I have gotten an idea of what is going on. The heaps generated between them tells me that a lot more memory is being allocated to lists. These heaps were generated running my render function against a template with nested loops and a list of elements. http://imgur.com/a/2jOIf I am hoping that maybe someone could give me a hint at what to look at next. I've played around with Strictness and refactoring loops to no avail and now am kind of stuck. Any help would be appreciated. -- Kyle Hanson

I haven't looked very closely, but I'm suspicious of this code from
"instance Block Piece"
ListLike l -> forM l (\obj -> ...)
>>= (return . mconcat)
The "forM" means that "l" will be traversed once and create an output list,
which will then be mconcat'd together. The list has to be created because
of the monadic structure imposed by forM, but if the result of the mconcat
isn't demanded right away it will be retained as a thunk that references
the newly-created list.
I'd suggest that you replace it with something like
ListLike l -> foldM (\(!acc) obj -> ... >>= return . mappend acc) mempty l
Here I've justed added a bang pattern to the accumulator. If whatever is
being returned has some lazy fields, you may want to change that to use
deepseq instead of a bang pattern.
Also, "foo >>= return . bar" is often regarded as a bit of a code smell, it
can be replaced with "bar <$> foo" or "bar `liftM` foo", or sometimes
something even simpler depending on circumstances (but IMHO sometimes it's
more clear to just leave it alone).
The heap profile does look like a space leak. The line
StrappedTemplates-0.1.1.0:Text.Strapped.Render.sat_sc1z
is a thunk (you can tell because it's in '<>' brackets), so whatever is
referencing that is not strict enough. Sometimes another heap profile
report, e.g. "-hc" or maybe "-hy" will give more useful information that
lets you identify what exactly "sat_sc1z" is. You could also try compiling
with -ddump-stg, which will dump the intermediate STG output which usually
shows those names. But then you'll probably also need to re-run the
profile, since the names change between compilations. Also IIRC some of
values aren't named until the cmm phase, but that's harder to map back to
Haskell so if you can identify the code from stg it's simpler.
If you haven't seen
http://blog.ezyang.com/2011/06/pinpointing-space-leaks-in-big-programs/,
I'd highly recommend it if you need to track down a space leak.
John L.
On Thu, Aug 7, 2014 at 10:57 AM, Kyle Hanson
Hello,
I was looking at cleaning up my refactoring a core loop of template rendering to go from a loop with many parameters
loop :: RenderConfig -> BlockMap -> InputBucket m -> Builder -> [Pieces] -> ExceptT StrapError m Builder
to a looped state monad transformer
loop :: [Pieces] -> RenderT m Builder
newtype RenderT m a = RenderT { runRenderT :: ExceptT StrapError (StateT (RenderState m) m) a } deriving ( Functor, Applicative, Monad, MonadIO )
data RenderState m = RenderState { position :: SourcePos , renderConfig :: RenderConfig , blocks :: BlockMap , bucket :: InputBucket m }
however, there is a big slow down (about 6-10x) using a StateT. I think it might have something to do with laziness but I am not exactly sure of where to begin in tracking it down. Swapping out the Lazy State to a Strict State helps a little (only a 5x slow down)
You can find some of the processing code here:
https://github.com/hansonkd/StrappedTemplates/blob/321a88168d54943fc217553c8...
With my old loop commented out.
Its messy right now since I am just trying a number of different approaches. I did some more work factoring out the lifts, trying different iterations of foldlM and stuff but that didn't have that much of an effect on performance.
After profiling I see in the StateT, the report has a lot more CAFs and garbage collecting.
Here is the profiling report from my original version w/o StateT http://lpaste.net/108995
Slow version with StateT http://lpaste.net/108997
Here is the "makeBucket" function that is referenced (it is the same in both state and nonstate):
https://github.com/hansonkd/StrappedTemplates/blob/321a88168d54943fc217553c8...
Looking at stacked overflow and the official docs I have gotten an idea of what is going on. The heaps generated between them tells me that a lot more memory is being allocated to lists. These heaps were generated running my render function against a template with nested loops and a list of elements.
I am hoping that maybe someone could give me a hint at what to look at next. I've played around with Strictness and refactoring loops to no avail and now am kind of stuck. Any help would be appreciated.
-- Kyle Hanson
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 2014-08-07 19:57, Kyle Hanson wrote:
Hello,
Here is the "makeBucket" function that is referenced (it is the same in both state and nonstate):
https://github.com/hansonkd/StrappedTemplates/blob/321a88168d54943fc217553c8...
Just a shot in the dark, but I notice that you're using "modify" and not "modify'" which was added in a recent version of transformers. Strict.StateT is not always "strict enough" and you may need to use modify'. At any rate, it's worth a shot, I think. Regards,

On Thu, Aug 7, 2014 at 9:31 PM, Bardur Arantsson
On 2014-08-07 19:57, Kyle Hanson wrote:
Hello,
Here is the "makeBucket" function that is referenced (it is the same in both state and nonstate):
https://github.com/hansonkd/StrappedTemplates/blob/321a88168d54943fc217553c8...
Just a shot in the dark, but I notice that you're using "modify" and not "modify'" which was added in a recent version of transformers.
Strict.StateT is not always "strict enough" and you may need to use modify'.
At any rate, it's worth a shot, I think.
Good point. I think that even modify' will not be strict enough without adding strictness to RenderState as well. John L.
participants (3)
-
Bardur Arantsson
-
John Lato
-
Kyle Hanson