Can you isolate which of the hash table instances is slow? The basic hash table uses linear probing and may perform badly if the hash function has poor clustering behavior. Some versions of "hashable" set "hash = id" for Int, which may not help matters.
The benchmarks I have, as you suggest, say that it performs well -- but they are incomplete. I can't optimize what I don't test: if you can isolate a minimal example where it performs much worse than it should, I would appreciate that.
As far as Windows goes, I don't use it and don't have access to such a machine, I would need help to debug and fix this.
G