
On Wed, Mar 9, 2011 at 11:18 AM, Simon Peyton-Jones
could you contribute a little vignette or story about using Haskell in a *parallel/concurrent* application that I could use to illustrate my talk?
The Cryptographic Protocol Shapes Analyzer (CPSA) attempts to enumerate all essentially different executions possible for a cryptographic protocol. At each step in the analysis, it infers what else must have happened based on an incomplete description of an execution. The steps form a derivation tree. If CPSA terminates on its input, each leaf of the derivation tree is a complete description of some executions, and the leaves collectively enumerate all possible executions. It is crucial for performance reasons that CPSA does not compute a tree, but instead computes a directed acrylic graph (DAG). To do so, it must identify pairs of nodes in the derivation that are isomorphic. For large problems, the isomorphism check dominates all other computation during the analysis. So how does CPSA use parallelism? CPSA expands the derivation DAG in a breadth-first order. For each node at the current level, the program performs the isomorphism check on all nodes higher up in the DAG in one thread. All threads then join, and in the main thread, the intra-level isomorphism checks are performed. It's that simple. At the code level, all that is done is replace one map call with a parallelized version of map that I saw in one of your papers: #if defined HAVE_PAR parMap :: (a -> b) -> [a] -> [b] parMap _ [] = [] parMap f (x:xs) = par y (pseq ys (y:ys)) where y = f x ys = parMap f xs Performance varies depending on the shape of the DAG. A wide bushy DAG makes better use of more processors. I have found one can often get a speedup of three on a quad processor machine. Of course performance also varies depending on the version of GHC runtime in use. A speedup of three may not sound like much, but on some problems, CPSA takes twenty-four hours to run on a single core. Parallelizing CPSA was not as simple as I made it sound. I had the restructure the sequential description of the algorithm to expose the parallelism to the runtime, and the algorithm became less obvious. Of course, this cost is well worth it. John