
Hi I'm learning Haskell through writing a compiler. I'm seeing huge memory use in a function which converts the dataflow graph to the form required by Data.Graph. It needs to return a map from dataflow nodes to Vertexs, a map in the other direction, and the list of edges (as Vertex pairs).
total time = 18.78 secs (939 ticks @ 20 ms) total alloc = 6,616,297,296 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
j Dataflow 87.6 99.9 graphvizCompile Dataflow 6.4 0.0 f Dataflow 2.9 0.0 con2tag_DFReal# Representation 2.1 0.0
I assume the allocation is being garbage-collected pretty quickly, because a) 6,616,297,296 bytes is stupid (!) and b) Process Explorer informs me that the peak private bytes of the program is not more than a couple of MB. The offending code looks like this:
-- (next Vertex available for use, edges so far, map so far, map so far) type DependencyEdgesAcc = (Vertex, [Edge], Map.Map DFNode Vertex, Map.Map Vertex DFNode)
-- Process a list of DFNodes in the same context. dependencyEdges :: DependencyEdgesAcc -> [DFNode] -> DependencyEdgesAcc dependencyEdges acc [] = {-# SCC "a" #-} acc dependencyEdges acc@(_, _, mnv1, _) (n:ns) = -- Check whether this node has been done before. {-# SCC "j" #-} case {-# SCC "i" #-} Map.lookup n mnv1 of Just _ -> {-# SCC "b" #-} dependencyEdges acc ns Nothing -> -- Find the list of dependencies. let ndeps = {-# SCC "c" #-} nodeDependencies n in -- Process them first. let (v2, es2, mnv2, mvn2) = {-# SCC "d" #-} dependencyEdges acc ndeps in -- Now we can claim v2 as the label for this node n. -- Look up vdep for each dependency ndep and add an edge from the vdep to v2. let {(v4, es4, mnv4, mvn4) = {-# SCC "e" #-} List.foldl' ( \(v3, es3, mnv3, mvn3) ndep -> case {-# SCC "f" #-} Map.lookup ndep mnv3 of Just vdep -> (v3, (vdep, v2):es3, mnv3, mvn3) Nothing -> error $ "this node should have been added already: " ++ show ndep ) (v2, es2, mnv2, mvn2) ndeps } in assert (v2 == v4) assert (mvn2 == mvn4) assert (mnv2 == mnv4) -- Finally, add this node to the accumulator and then recurse. dependencyEdges (v2+1, es4, {-# SCC "g" #-} Map.insert n v2 mnv2, {-# SCC "h" #-} Map.insert v2 n mvn2) ns
with the following profile:
dependencyGraph Dataflow 4395 1 0.0 0.0 93.4 99.9 getRootDFNodes Dataflow 4397 5 0.0 0.0 0.0 0.0 dependencyEdges Dataflow 4396 282 0.0 0.0 93.4 99.9 a Dataflow 4407 104 0.0 0.0 0.0 0.0 j Dataflow 4399 178 87.6 99.9 93.4 99.9 b Dataflow 4417 75 0.0 0.0 0.0 0.0 g Dataflow 4405 103 0.1 0.0 0.2 0.0 con2tag_DFBool# Representation 4419 1972 0.0 0.0 0.0 0.0 con2tag_DFReal# Representation 4414 186890 0.1 0.0 0.1 0.0 con2tag_DFNode# Representation 4413 1204 0.0 0.0 0.0 0.0 h Dataflow 4404 103 0.0 0.0 0.0 0.0 e Dataflow 4403 103 0.0 0.0 4.8 0.0 f Dataflow 4412 174 2.9 0.0 4.8 0.0 con2tag_DFBool# Representation 4420 245794 0.0 0.0 0.0 0.0 con2tag_DFReal# Representation 4416 25772482 1.9 0.0 1.9 0.0 con2tag_DFNode# Representation 4415 2068 0.0 0.0 0.0 0.0 d Dataflow 4402 103 0.0 0.0 0.0 0.0 c Dataflow 4401 103 0.0 0.0 0.0 0.0 nodeDependencies Dataflow 4406 103 0.0 0.0 0.0 0.0 i Dataflow 4400 178 0.6 0.0 0.7 0.0 con2tag_DFBool# Representation 4418 48230 0.0 0.0 0.0 0.0 con2tag_DFReal# Representation 4410 5045646 0.1 0.0 0.1 0.0 con2tag_DFNode# Representation 4409 1928 0.0 0.0 0.0 0.0
Is this a space leak, or is my algorithm just silly? (It could well be the latter...) I've tried adding random bangs, to no effect. So, can anyone help me? :) Cheers Ben