tail-recursing through an associative list

12 Oct 2006

      In my first posting, I mentioned that I was going to try to translate
some of our code to Haskell and see how it worked.  Well, I don't have a
stunning demonstration of the power of pure functional programming, but
I do have an interesting problem.

I chose to port a program that we use in our build system that takes a
table of geographic data and groups the rows in the table according to
[REDACTED].  The description of what the existing program does takes up
only a few paragraphs, but the source code is eight pages of dense C++
that has obviously been optimized up the wazoo (and beyond the point
where a mortal like myself can understand what's going on).  On one of
our servers, it can process 200,000 rows in about three minutes.

My Haskell translation is three and a half pages of gorgeous lucid
almost-entirely-functional code ... that in its first draft, took about
three seconds to process 2,000 rows, eight minutes to process 20,000
rows, and overflowed a 1-MB stack when processing 200,000 rows.  Oops.

After banging together a version that took input from stdin instead of
the database (it was easier to do that then to learn enough about Cabal
to get HSQL recompiled with profiling), I profiled the code and observed
that the most-called-upon function in the program (200 million entries
for those 20,000 rows) was structured like this:

type AList = [(Key, [MetaThingies])]

myFunction :: AList -> Thingie -> AList
myFunction [] x = [(key x, [meta x])]
myFunction ((k, v):tail) x | matchKeys k (key x) =
                               case tryJoining v x of
                               Nothing -> (k, v) : (myFunction tail x)
                               Just v' -> (k', v') : tail
                                   where v' = bestKey k (key x)
                           | otherwise = (k, v) : (myFunction tail x)

I'm wondering if the slowness of the program can be attributed to that
case statement in the middle--perhaps unwrapping the Maybe type returned
by tryJoining is preventing the function from being properly
tail-recursive, or something.

(I tried making the list construction strict by replacing "(k, v) :
(myFunction tail x)" et al. with "(:) (k, v) $! (myFunction tail x)",
and it actually slowed the program down, so either I'm not understanding
how to improve things with strictness or laziness isn't the problem here.)

Any advice from the gurus?

Seth Gordon

Malcolm Wallace

apfelmus＠quantentunnel.de

Seth Gordon

Jason Dagit

tags

participants (4)