
I seem to recall that Data.HashTable is quite slow and not recommended for use. Try using Data.HashMap.Lazy/Strict from the unordered-containers package instead. -Brent On Fri, Apr 20, 2012 at 11:33:51AM +0200, Radosław Szymczyszyn wrote:
Hello fellow Haskellers,
It's my first post to this list, so at first: Greetings to everyone!
I've wanted to pick up Haskell for some time and I've run in to a perfect (at least it seemed so) oocasion at my Natural Language Processing course. At first I tried using Python, but it turned out that some extra performance wouldn't hurt and I hoped Haskell would give me some boost without forcing me to sacrifice high-level expressiveness at the same time.
The first lesson was to use ByteStrings instead of ordinary Strings for IO. Then I needed to build a dictionary of words found in the source text. That's the problematic part - I've tried using a few structures, namely Data.Map and Data.HashTable from base package and Data.HashMap from unordered-containers. In all cases I couldn't reach acceptable performance. As of that, I'd like to ask about some advice and comments about the following examples.
First example [1] just builds a HashTable of ~1 500 000 ByteStrings. The program takes about 11 seconds to execute. Second example [2] does roughly the same, but taking the HashTable data from a file (just a dictionary of valid Polish forms). Reading data itself is not a problem, as I've measured the execution time of a program which just reads the file (and prints its last line to actually force the read, hence "handle" Haskell's laziness) and it takes no more than a second. Coupled together reading and building, however, takes unbearably long. I didn't wait till the program stopped and just terminated it, but it had been running for ~10 minutes already.
The most disappointing fact is that a roughly equivalent (and rather effortless to write in contrast to the Haskell example - I'm just learning though) Python test which just reads everything into a dict takes about 10 seconds to execute (reading and updating the dict of course). Writing the spellchecker in Python seemed pretty straightforward for me (but its certainly not the only language I know; I've had some Java (who didn't nowadays), system level C and am daily working in Erlang). I understand of course, that the most convenient tool is generally the one you know the best, but my experience writing the Haskell equivalent reached both extremes - effortless expression of the logic, but frustration because of the unexplainable (for me at least) performance behaviour.
The question is why is it so slow? (or AM I RILY DAT DUMB? ;)
The link to examples: [1] https://gist.github.com/2411699 TestHashTableArtificial [2] https://gist.github.com/2411699 TestReadIntoHashTable
I'm looking forward to somebody shedding some light on this. Thanks in advance!
Best regards, Radek Szymczyszyn
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners