
Hi I'm learning Haskell through writing a compiler. I'm seeing huge memory use in a function which converts the dataflow graph to the form required by Data.Graph. It needs to return a map from dataflow nodes to Vertexs, a map in the other direction, and the list of edges (as Vertex pairs).
total time = 18.78 secs (939 ticks @ 20 ms) total alloc = 6,616,297,296 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
j Dataflow 87.6 99.9 graphvizCompile Dataflow 6.4 0.0 f Dataflow 2.9 0.0 con2tag_DFReal# Representation 2.1 0.0
I assume the allocation is being garbage-collected pretty quickly, because a) 6,616,297,296 bytes is stupid (!) and b) Process Explorer informs me that the peak private bytes of the program is not more than a couple of MB. The offending code looks like this:
-- (next Vertex available for use, edges so far, map so far, map so far) type DependencyEdgesAcc = (Vertex, [Edge], Map.Map DFNode Vertex, Map.Map Vertex DFNode)
-- Process a list of DFNodes in the same context. dependencyEdges :: DependencyEdgesAcc -> [DFNode] -> DependencyEdgesAcc dependencyEdges acc [] = {-# SCC "a" #-} acc dependencyEdges acc@(_, _, mnv1, _) (n:ns) = -- Check whether this node has been done before. {-# SCC "j" #-} case {-# SCC "i" #-} Map.lookup n mnv1 of Just _ -> {-# SCC "b" #-} dependencyEdges acc ns Nothing -> -- Find the list of dependencies. let ndeps = {-# SCC "c" #-} nodeDependencies n in -- Process them first. let (v2, es2, mnv2, mvn2) = {-# SCC "d" #-} dependencyEdges acc ndeps in -- Now we can claim v2 as the label for this node n. -- Look up vdep for each dependency ndep and add an edge from the vdep to v2. let {(v4, es4, mnv4, mvn4) = {-# SCC "e" #-} List.foldl' ( \(v3, es3, mnv3, mvn3) ndep -> case {-# SCC "f" #-} Map.lookup ndep mnv3 of Just vdep -> (v3, (vdep, v2):es3, mnv3, mvn3) Nothing -> error $ "this node should have been added already: " ++ show ndep ) (v2, es2, mnv2, mvn2) ndeps } in assert (v2 == v4) assert (mvn2 == mvn4) assert (mnv2 == mnv4) -- Finally, add this node to the accumulator and then recurse. dependencyEdges (v2+1, es4, {-# SCC "g" #-} Map.insert n v2 mnv2, {-# SCC "h" #-} Map.insert v2 n mvn2) ns
with the following profile:
dependencyGraph Dataflow 4395 1 0.0 0.0 93.4 99.9 getRootDFNodes Dataflow 4397 5 0.0 0.0 0.0 0.0 dependencyEdges Dataflow 4396 282 0.0 0.0 93.4 99.9 a Dataflow 4407 104 0.0 0.0 0.0 0.0 j Dataflow 4399 178 87.6 99.9 93.4 99.9 b Dataflow 4417 75 0.0 0.0 0.0 0.0 g Dataflow 4405 103 0.1 0.0 0.2 0.0 con2tag_DFBool# Representation 4419 1972 0.0 0.0 0.0 0.0 con2tag_DFReal# Representation 4414 186890 0.1 0.0 0.1 0.0 con2tag_DFNode# Representation 4413 1204 0.0 0.0 0.0 0.0 h Dataflow 4404 103 0.0 0.0 0.0 0.0 e Dataflow 4403 103 0.0 0.0 4.8 0.0 f Dataflow 4412 174 2.9 0.0 4.8 0.0 con2tag_DFBool# Representation 4420 245794 0.0 0.0 0.0 0.0 con2tag_DFReal# Representation 4416 25772482 1.9 0.0 1.9 0.0 con2tag_DFNode# Representation 4415 2068 0.0 0.0 0.0 0.0 d Dataflow 4402 103 0.0 0.0 0.0 0.0 c Dataflow 4401 103 0.0 0.0 0.0 0.0 nodeDependencies Dataflow 4406 103 0.0 0.0 0.0 0.0 i Dataflow 4400 178 0.6 0.0 0.7 0.0 con2tag_DFBool# Representation 4418 48230 0.0 0.0 0.0 0.0 con2tag_DFReal# Representation 4410 5045646 0.1 0.0 0.1 0.0 con2tag_DFNode# Representation 4409 1928 0.0 0.0 0.0 0.0
Is this a space leak, or is my algorithm just silly? (It could well be the latter...) I've tried adding random bangs, to no effect. So, can anyone help me? :) Cheers Ben

Hi
I'm learning Haskell through writing a compiler. I'm seeing huge memory use in a function which converts the dataflow graph to the form required by Data.Graph. [...] I assume the allocation is being garbage-collected pretty quickly, because a) 6,616,297,296 bytes is stupid (!) and b) Process Explorer informs me that the peak private bytes of the program is not more than a couple of MB. Thats not a space leak. A space leak is when you find that all 6 GB of
Ben Challenor wrote: that allocation is being retained. You are correct that the GC is recovering this space as fast as it gets allocated. The profiler records how much memory is allocated from within various bits of your program to help you understand its performance and where optimisable hotspots are. I haven't looked at your code, but I imagine that you are traversing lazy data structures, which means that bits of the structure are consumed and deallocated as fast as they are created. The profiler also has options to help you understand what is being retained where. I'd advise against trying to make your program stricter because you might suddenly find yourself building an entire 6GB structure in memory before traversing it, which would not be a Good Thing. Paul.

On Dec 31, 2007, at 9:53 AM, Paul Johnson wrote:
I'd advise against trying to make your program stricter because you might suddenly find yourself building an entire 6GB structure in memory before traversing it, which would not be a Good Thing.
I disagree. It might be the case that the _contents_ of the data structure are lazy, in which case I would say the relevant constructor parameters should be made strict. As long as the structural parts are still lazy it should be okay. And there's no harm in experimenting. :) - Jake

Jake McArthur
I disagree. It might be the case that the _contents_ of the data structure are lazy, in which case I would say the relevant constructor parameters should be made strict. As long as the structural parts are still lazy it should be okay.
And there's no harm in experimenting. :)
Or, for that matter, guessing what -ddump-simpl looks like. Which I never, ever, do. Seriously. *lightens a cracker and runs away* -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

On Dec 31, 2007, at 11:21 AM, Achim Schneider wrote:
Jake McArthur
wrote: I disagree. It might be the case that the _contents_ of the data structure are lazy, in which case I would say the relevant constructor parameters should be made strict. As long as the structural parts are still lazy it should be okay.
And there's no harm in experimenting. :)
Or, for that matter, guessing what -ddump-simpl looks like.
Which I never, ever, do. Seriously.
Touché.
participants (4)
-
Achim Schneider
-
Ben Challenor
-
Jake McArthur
-
Paul Johnson