
Hi, lately I've been working on a parallel parser whose core is something like this: pMap parse blocks where pMap is a map-like function sparking independent computations over the elements of the "blocks" list. The actual implementation used Control.Parallel.Strategies: map parse blocks `using` parListChunk n rnf In a recent post [1], however, Edward Kmett noted that such code is cache oblivious because the "blocks" list is traversed independently by the map and by the sparks. It may therefore happen that the first spark starts to be evaluated when the map is already at the end of the list. Moreover, all the sparks are generated almost instantaneously even if only a few of them can be evaluated concurrently (I'm assuming here that the length of the list is bigger than the number of available native threads, but that seems reasonable). To solve this issue I imagined an alternative "par" primitive sparking a new computation in parallel when a native thread is free to perform its evaluation and otherwise blocking. It turns out it is possible to implement something like this in a library; have a look at the BoundedParMap module here: http://github.com/akborder/HsMakefileParser/ The code uses unsafe operations, so I'm not sure it can be completely trusted. Given however that the "boundedParMap" function combines deterministically the results of multiple applications of a pure function, it should remain referentially transparent. To measure performances I made two versions of my parser (Makefile.hs), one keeping Control.Parallel.Strategies as before and the other using BoundedParMap. I built them (ghc -O2 -threaded; ghc 6.11.20090907) and run them on a quad-core system (+RTS -N4) over a list generating 48 sparks. Running each implementation 10 times I got the following run times: Control.Parallel.Strategies: 1.328 +- 0.1 BoundedParMap: 1.095 +- 0.12 BoundedParMap seems to be faster, especially considering that the total number of sparks (48) is quite small. Do you think that the difference is actually due to better caching (both in the list traversing and because the runtime system deals with less sparks at any time) or to something else I'm missing? -- AB [1] http://www.haskell.org/pipermail/haskell-cafe/2009-September/066158.html