
Maybe you could also time with datasets of increasing size (up to 1M), and see if the execution time grows like O(n^2), in which case I'd say it's a hashing problem... On Fri, Apr 20, 2012 at 06:03:19PM +0200, Radosław Szymczyszyn wrote:
Thanks for your suggestions. Alas, they don't solve the problem.
As I was at work without the original data file, I repeated the test suggested by Karol Samborski with a file of 1 400 000 repetitions of "żyźniejszymi". It took about 3.5s, so I thought my problem had been solved. However, repeating it with -O2 makes a difference of ~2-3s and I don't believe my laptop I used at home is *that much slower* than my Mac at work, that running without optimization would make such a great difference.
Now, I've just rerun the test run with the original data file (still at work, so comparison with 3.5s is appropriate) at 17:26 and it's still running -- so the problem lies in the data set being hashed. I don't know why, but it seems to: - either make a difference whether one specific or many different words are hashed, - or whether it's just one slot or many of the HashTable being updated (but as I'm using newHint the space should be preallocated).
Either way I would be grateful if you Karol or somebody else could rerun the test with the original data. It's available at: http://ernie.icslab.agh.edu.pl/~lavrin/formy.utf8.gz
Thanks for your time!
Regards, Radek Szymczyszyn
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
-- Lorenzo Bolla http://lbolla.info