
Malcolm Wallace wrote:
Seth Gordon
wrote: almost-entirely-functional code ... that in its first draft, took about three seconds to process 2,000 rows, eight minutes to process 20,000 rows, and overflowed a 1-MB stack when processing 200,000 rows. Oops.
Which just goes to show that your algorithm is non-linear in the size of the input. I doubt that your posted snippet is the cause of the problem, since it is certainly linear in the AList it is given.
The linear time in the length of AList may well be the problem :) You seem to use AList as a key-value mapping (I know the word "associative" only as mathematical property, please correct me). The Key acts as key for the grouping you mentioned and the [MetaThingie] is the actual group of MetaThingie, is that so? That means that with each call to myFunction, the AList roughly grows by one element. For logarithmic access times, you should use a binary search tree like Data.Map or similar. The problem in your case could be that matchKeys is only approximate and your keys cannot be ordered in suitable fasion. Then you need a clever algorithm which somehow exploits extra structure of the keys (perhaps they are intervals, then you can use interval trees etc.). The C++ code is likely to do some magic along these lines. In such case, stunning functional power may come from Finger Trees: A Simple General-purpose Data Structure Ralf Hinze and Ross Paterson. in Journal of Functional Programming16:2 (2006), pages 197-217 http://www.soi.city.ac.uk/~ross/papers/FingerTree.pdf
I profiled the code and observed that the most-called-upon function in the program (200 million entries for those 20,000 rows)
By optimisation, you can only make this function a constant factor faster. You need to work out how to call it less often in the first place.
Almost true. I think that recursive calls are counted as proper call, so that each top level call to myFunction will result a count of calls to myFunction linear in the length of AList. Thus "in first place" alias top level is not enough.
type AList = [(Key, [MetaThingies])]
myFunction :: AList -> Thingie -> AList myFunction [] x = [(key x, [meta x])] myFunction ((k, v):tail) x | matchKeys k (key x) = case tryJoining v x of Nothing -> (k, v) : (myFunction tail x) Just v' -> (k', v') : tail where v' = bestKey k (key x) should be (?) where k' = bestKey k (key x) | otherwise = (k, v) : (myFunction tail x)
Regards, apfelmus