
deliverable:
If you just want to optimize it and not compare exactly equal idiomatic code, you should stop using functional data structures and use a structure that fits your problem (the ST monad has been designed for that in Haskell), because compilers do not detect single-threaded usage and rewrite all your code to something mutable and thereby avoid O(log n) costs.
In practice it is probably easier to write the whole thing against the parallel Boost Graph Library (a C++ library), since that library provides the abstractions that you would want. If you go this path, it will probably end up being 10-100 times faster.
Surely I can rewrite it in C++ or just C and gain speedup, but not maintainability and ease of prototyping algorithms -- the main reasons I switched to FP, with added reliability, conciseness, reasonable syntax bonuses. It turned out Clojure is quite capable of doing it well, but has Null Pointer Exceptions and "debug data shape by running" annoying to an ML programmer; trying Haskell for speedup is worth it! It proves less predictable than expected -- e.g. with OCaml if it typechecks and runs, it usually runs fast. The point of this exercise is to identify typical performance bottlenecks in the domain and see how they can be avoided. The jury's still out on this one.
Following the algorithm for performance tuning in the previous email should be enough. Have you obtained heap profiles yet? -- Don