
Hi folks, I'm competing in a contest at work, and we're allowed to use whatever language we want. I decided this was my chance to prove to people that Haskell was up to the challenge. Unfortunately, I ran into performance problems. Since the contest ends this Friday, I've decided to switch to C++ (gasp!). But if any of you have advice on how to speed up this code, it could help me advocate Haskell in the future. It's supposed to match movie titles from an imported database to a reference database. The version I've sent doesn't do anything very smart - it's just doing literal title matches. The first argument to the program is the filename of the table to be imported, and the second is the filename of the reference table. The first line of each table is a pipe-separated list of field names; the rest of the lines are records, each a pipe-separated list of values. The import files each have 3,000 records, and the reference table has 137,986 records. Building the hash tables out of the files is quick - it just takes a few seconds. But doing the matching of title_id in one table to title_id in the other, in a nested loop between both tables, takes way too long. It's matching two import titles (against each of the reference titles) per second. It needs to do at least 20 per second to qualify for the contest, and it's not doing anything fancy yet. I tried various "improvements" to speed it up. One was to specifically use ByteString, eliminating the AbsString class. Didn't make a difference. Another was to use arrays instead of lists to store each record, and precompute the indices of each of the fields within those records. I also iterated over a list of keys instead of the list of Maps, and only converted each record to a Map one at a time, hoping they would be disposed of sooner. Instead of speeding up the program, this slowed it down by a factor of 20! I've profiled it, and I can't make much out of that. It seemed to be spending 25% of its time doing scoring, and I though the problem must be due to laziness, but I'm not sure. So if anyone has any ideas how to speed this up by a factor of at least 10 times, it would be really appreciated! Even the Ruby solutions are doing that, which is embarrassing. Thanks, Lyle