
On Sun, Apr 22, 2012 at 4:58 PM, Radosław Szymczyszyn
Thank you all for thoughts and suggestions. I've been tracking them as they appeared, but being busy with university assignments, couldn't try them out yet.
In the meantime, however, Karol Samborski investigated the issue further and found the cause of poor performance -- somehow using HashTable.newHint doesn't play well with HashTable.update. Simply changing newHint to hint gives me execution time of about 12s (on my rather slowish AMD Fusion 1.6GHz laptop) with a dataset of ~1.35mln words.
While it's good that you have found a bug that explain the very poor performance you got, I must reiterate Brent Yorgey's advice to use Data.HashMap from unordered-containers or some other performance oriented implementation : Data.HashTable is a legacy datastructure that was mainly introduced to have a hashtable in Haskell core libraries, it ran into some performance problem of GHC (only reduced in very recent release I believe) and was never written to be very fast in the first place... Avoiding it is generally a very good idea ! Using another structure (HashMap or some trie implementation) would probably get you much better performance without hassle. I must admit that having Data.HashTable in the base library without warning as to its unsuitability is very misleading to the beginner coming from Perl or Python. -- Jedaï