
One of Alan Perlis's "Epigrams in Programming" is "A language that doesn't affect the way you think about programming, is not worth knowing". I recently had an experience that demonstrated this principle. I had to write some code that took a polygon (encoded in WKT, a standard format for geographic information exchange), computed its intersection with several rectangles, and then compute the centroid of each one of these intersections. No problem, I thought. People who know more than me about this geometry stuff have written an open-source library, OGR, which has functions for parsing WKT, computing intersections, computing centroids, and computing all sorts of other things. And there are Python bindings for OGR so I don't have to wrestle with C++ to solve my problem. So I wrote a Python program that used OGR to crunch the numbers, ran it through the data set we needed to process, and...it took over sixteen hours to run. Since we want to have this program process this data set with every build of our software, adding sixteen hours to our build time was not a pleasant thought. Then I thought about some of the things I'd read about optimization of functional programming languages, and said, "Hey! The algorithm for clipping the polygon to the rectangle is producing a sequence of vertices, and the algorithm for computing the centroid is consuming a sequence of vertices. I can do some deforestation here!" It took me a week to figure out the right algorithm for combining these two procedures and write some almost-working code that implemented it. It took a co-worker of mine another few days to find the bugs that had eluded me. But after the program worked for my test cases, I ran it again on the full data set...and it took ten minutes. In other words, I sped up the code by two orders of magnitude. And I didn't even have to drop into C++ to do the number-crunching; I used OGR to parse the WKT strings and to look up the coordinates of each vertex of the polygon, but the algorithm itself was implemented in Python. w00t!

On Tue, 14 Nov 2006, Seth Gordon wrote:
It took me a week to figure out the right algorithm for combining these two procedures and write some almost-working code that implemented it. It took a co-worker of mine another few days to find the bugs that had eluded me.
Were these bugs of the kind that would have been found by a static type checker, say the one of Haskell?

Henning Thielemann wrote:
On Tue, 14 Nov 2006, Seth Gordon wrote:
It took me a week to figure out the right algorithm for combining these two procedures and write some almost-working code that implemented it. It took a co-worker of mine another few days to find the bugs that had eluded me.
Were these bugs of the kind that would have been found by a static type checker, say the one of Haskell?
The most significant bug that my co-worker caught was in the math, not the implementation--when translating the centroid formula from big-sigma notation to a recurrence, I accidentally wrote (x_i x_{i+1}) when I should have written (x_i + x_{i+1}). My stupidity surpasses the stupidity-catching power of the type checker. There were less significant bugs that probably would have been caught by static type checking, but in this case I think the unit tests caught them just as quickly. The "producer" part of the clip-and-centroid algorithm generates 0, 1, or 2 vertices for every vertex of the original polygon, which are then fed into the "consumer" part to compute the centroid. It did cross my mind as I was writing the Python code that this sort of thing could be handled much more concisely with the List monad, but doing the whole thing in Haskell would have required (a) writing Haskell glue code to either interface with the OGR library or parse WKT strings, and (b) convincing my co-workers that Haskell was *so* incredibly useful that it was worth adding another language as a build-time dependency. (I'm laying the groundwork for (b) in my Copious Free Time....)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

As Lily Tomlin would say, neVERmind. Simon P-J asked me, in email, whether the deforestation was the thing that actually made the program faster or whether it was just the thing that made me think about how to solve the problem. I realized that my fast program had *another* difference from the earlier, slower program: it was based on an algorithm that was specifically designed to clip polygons to rectangles, whereas OGR just had a function to compute the intersection between two arbitrary polygons. So I threw together a version that accumulated all the vertices of the clipped polygon in a list and then iterated through the list to compute the centroid--i.e., preserving everything from my fast program except the deforestation--and it ran at almost exactly the same speed as the deforested program. (Actually, it ran 2% faster, but that could be just random variation from things like the load on the database server.) "The tragedy of science: a beautiful theory slain by an ugly fact." "T. H. Huxley said it first."

Don't knock it! Using a functional language helped you to think about the problem in a new way, and throw together a prototype that worked in a short enough time that you actually did it. A merit of fp is, I think, that you can explore the algorithm design space much more quickly -- and algorithms are the dominant factor in the performance equation. Simon | -----Original Message----- | From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Seth | Gordon | Sent: 15 November 2006 20:49 | To: haskell-cafe@haskell.org | Subject: Re: [Haskell-cafe] the case of the 100-fold program speedup | | As Lily Tomlin would say, neVERmind. | | Simon P-J asked me, in email, whether the deforestation was the thing | that actually made the program faster or whether it was just the thing | that made me think about how to solve the problem. I realized that my | fast program had *another* difference from the earlier, slower program: | it was based on an algorithm that was specifically designed to clip | polygons to rectangles, whereas OGR just had a function to compute the | intersection between two arbitrary polygons. | | So I threw together a version that accumulated all the vertices of the | clipped polygon in a list and then iterated through the list to compute | the centroid--i.e., preserving everything from my fast program except | the deforestation--and it ran at almost exactly the same speed as the | deforested program. (Actually, it ran 2% faster, but that could be just | random variation from things like the load on the database server.) | | "The tragedy of science: a beautiful theory slain by an ugly fact." | | "T. H. Huxley said it first." | _______________________________________________ | Haskell-Cafe mailing list | Haskell-Cafe@haskell.org | http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (3)
-
Henning Thielemann
-
Seth Gordon
-
Simon Peyton-Jones