I actually don't know that it would be the bottleneck if I ported it to Haskell at this time, as lots of other things will change speed, and I may lose for other reasons: The memory layout is less efficient, the cache management overhead is higher, etc.
I do know that if I were to reintroduced a global lock on the BDD cache in the current implementation I have in C++ which I'm looking at porting to Haskell, it definitely would be a throughput bottleneck as that was precisely what I had to fix to make it fast in the first place. =) I originally had a single lock between me and the BDD cache and couldn't scale up nicely to multiple cores.
There, in order to get decent scaling, I had to carefully split out out my cache Herlihy-style into a bunch of separate locks I hash down to, in order to get decent concurrency and the speedup was quite dramatic.
But replicating that logic here it does a lot less good, as I'll wind up just waiting on a different lock to create the weak reference whenever I need to instantiate a new variable split in the BDD.
Now, that isn't entirely apples-to-apples. In theory if I have a high enough hit rate on the hash-cons table for my BDDs then I'm producing a lot fewer new weak references from scratch than hits against the BDD cache.
So if I get a 65% hit rate, I only get 35% of the benefit from a fully parallel-friendly weak reference creation, but even a 35% regression from the current scaling I have to the original scaling I had would be pretty bad.
-Edward