
In my first posting, I mentioned that I was going to try to translate some of our code to Haskell and see how it worked. Well, I don't have a stunning demonstration of the power of pure functional programming, but I do have an interesting problem. I chose to port a program that we use in our build system that takes a table of geographic data and groups the rows in the table according to [REDACTED]. The description of what the existing program does takes up only a few paragraphs, but the source code is eight pages of dense C++ that has obviously been optimized up the wazoo (and beyond the point where a mortal like myself can understand what's going on). On one of our servers, it can process 200,000 rows in about three minutes. My Haskell translation is three and a half pages of gorgeous lucid almost-entirely-functional code ... that in its first draft, took about three seconds to process 2,000 rows, eight minutes to process 20,000 rows, and overflowed a 1-MB stack when processing 200,000 rows. Oops. After banging together a version that took input from stdin instead of the database (it was easier to do that then to learn enough about Cabal to get HSQL recompiled with profiling), I profiled the code and observed that the most-called-upon function in the program (200 million entries for those 20,000 rows) was structured like this: type AList = [(Key, [MetaThingies])] myFunction :: AList -> Thingie -> AList myFunction [] x = [(key x, [meta x])] myFunction ((k, v):tail) x | matchKeys k (key x) = case tryJoining v x of Nothing -> (k, v) : (myFunction tail x) Just v' -> (k', v') : tail where v' = bestKey k (key x) | otherwise = (k, v) : (myFunction tail x) I'm wondering if the slowness of the program can be attributed to that case statement in the middle--perhaps unwrapping the Maybe type returned by tryJoining is preventing the function from being properly tail-recursive, or something. (I tried making the list construction strict by replacing "(k, v) : (myFunction tail x)" et al. with "(:) (k, v) $! (myFunction tail x)", and it actually slowed the program down, so either I'm not understanding how to improve things with strictness or laziness isn't the problem here.) Any advice from the gurus?