Re: [Haskell-beginners] Fwd: Implementing a spellchecker - problem with Data.HashTable performance

20 Apr 2012


      Maybe you could also time with datasets of increasing size (up to 1M),
and see if the execution time grows like O(n^2), in which case I'd say
it's a hashing problem...


On Fri, Apr 20, 2012 at 06:03:19PM +0200, Radosław Szymczyszyn wrote:
...
Thanks for your suggestions. Alas, they don't solve the problem.
As I was at work without the original data file, I repeated the test
suggested by Karol Samborski with a file of 1 400 000 repetitions of
"żyźniejszymi". It took about 3.5s, so I thought my problem had been
solved. However, repeating it with -O2 makes a difference of ~2-3s and
I don't believe my laptop I used at home is *that much slower* than my
Mac at work, that running without optimization would make such a great
difference.
Now, I've just rerun the test run with the original data file (still
at work, so comparison with 3.5s is appropriate) at 17:26 and it's
still running -- so the problem lies in the data set being hashed. I
don't know why, but it seems to:
- either make a difference whether one specific or many different
words are hashed,
- or whether it's just one slot or many of the HashTable being updated
(but as I'm using newHint the space should be preallocated).
Either way I would be grateful if you Karol or somebody else could
rerun the test with the original data. It's available at:
http://ernie.icslab.agh.edu.pl/~lavrin/formy.utf8.gz
Thanks for your time!
Regards,
Radek Szymczyszyn
_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners
-- 
Lorenzo Bolla
http://lbolla.info