
Hello, in order to tune my brain to "functional thinking" I've decided to start writing some "real" programs. After having spent a couple of days figuring out how to translate an imperative algorithm into stateless haskell I beleive I've now managed a simple implementation of the DBSCAN clustering algorithm. http://privatepaste.com/e6bb4fb665 This code has several drawbacks (and probably it doesn't work correctly after all) but I would like to tune its performance. Obviously the code calls several functions multiply with the same arguments. Haskell, being stateless should thus always yield the same resuts, correct? In this case performing the same calculations multiply seems pointless. My question: How would you go about improving this code? In any imperative language I would simply cache distances/neighborhoods in a matrix. Obviously this is not what the way to go in haskell? Thank you very much

I'm sorry the Thread title is not very helpful... I hope this is better
Hello,
in order to tune my brain to "functional thinking" I've decided to start writing some "real" programs. After having spent a couple of days figuring out how to translate an imperative algorithm into stateless haskell I beleive I've now managed a simple implementation of the DBSCAN clustering algorithm.
http://privatepaste.com/e6bb4fb665
This code has several drawbacks (and probably it doesn't work correctly after all) but I would like to tune its performance. Obviously the code calls several functions multiply with the same arguments. Haskell, being stateless should thus always yield the same resuts, correct? In this case performing the same calculations multiply seems pointless.
My question: How would you go about improving this code? In any imperative language I would simply cache distances/neighborhoods in a matrix. Obviously this is not what the way to go in haskell?
Thank you very much
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Robert Heumüller
in order to tune my brain to "functional thinking" I've decided to start writing some "real" programs. After having spent a couple of days figuring out how to translate an imperative algorithm into stateless haskell I beleive I've now managed a simple implementation of the DBSCAN clustering algorithm.
When asking people to review your code, it would be very helpful to write type signatures for your top-level functions. This alone makes code greatly more readable and also tells a lot about the quality right away. I advice you to get used to writing type signatures for yourself as well. Greets, Ertugrul -- Not to be or to be and (not to be or to be and (not to be or to be and (not to be or to be and ... that is the list monad.

That's a great start, thanks ;)
Am Tue, 19 Jun 2012 17:01:26 +0200
schrieb Ertugrul Söylemez
Robert Heumüller
wrote: in order to tune my brain to "functional thinking" I've decided to start writing some "real" programs. After having spent a couple of days figuring out how to translate an imperative algorithm into stateless haskell I beleive I've now managed a simple implementation of the DBSCAN clustering algorithm.
When asking people to review your code, it would be very helpful to write type signatures for your top-level functions. This alone makes code greatly more readable and also tells a lot about the quality right away. I advice you to get used to writing type signatures for yourself as well.
Greets, Ertugrul

Hi again,
I've added the type signatures. What's next?
http://privatepaste.com/26073e4b0e
Thanks
Am Tue, 19 Jun 2012 17:01:26 +0200
schrieb Ertugrul Söylemez
Robert Heumüller
wrote: in order to tune my brain to "functional thinking" I've decided to start writing some "real" programs. After having spent a couple of days figuring out how to translate an imperative algorithm into stateless haskell I beleive I've now managed a simple implementation of the DBSCAN clustering algorithm.
When asking people to review your code, it would be very helpful to write type signatures for your top-level functions. This alone makes code greatly more readable and also tells a lot about the quality right away. I advice you to get used to writing type signatures for yourself as well.
Greets, Ertugrul

Robert Heumüller
I've added the type signatures. What's next?
Great, now that makes it much easier to reason about performance. A quick look at the type signatures tells me that your next step is to use the right tool for the job. In Haskell this basically means to use the right data and control structures. First of all know when not to use lists. A good intuition is to ask yourself whether the lists will actually live in memory. If no, then lists are fine. If yes (like in your case), examine your usage. If you use them like stacks (working on the head only) that's probably fine (it's not in your case). However, if you fold and unfold a lot or index individual elements you probably want a different data structure like a vector or a map. This is a somewhat involved topic and needs a lot of training, but making the right decision here will give you the main performance boost. Note: Whatever you do, stay in pure land. Always use immutable data structures. If your code is too slow, you may be using the right data structure the wrong way. Going mutable is almost always the wrong way. Again: It takes some training, especially in Haskell where (as a beginner) you may not see directly what is actually happening. Also find spots where sharing can be used. Unshared example: f x' + f x' Shared version: let x = f x' in x + x Haskell compilers don't do that automatically for you, because sharing trades memory for run-time performance, and the outcome is hard to predict for a compiler. I suggest experimenting. Greets, Ertugrul -- Not to be or to be and (not to be or to be and (not to be or to be and (not to be or to be and ... that is the list monad.

Hi again, hoping to further increase performance i've decided to calculate the distances between all points ahead of time and store the distance values in a map of maps. Because of the symmetry of the matrix this should work in O(0.5*n^2). Could you please take a look at the code and tell me if this would be the way to go? I fear that i might actually loose performance due to the lookuptime of the treemap compared to recalculating the distance between two points? Thank you very much :) import qualified Data.Map as Map data Point a = Pt a a deriving (Show, Eq, Ord) manhattan (Pt x y) (Pt x' y') = abs (x' - x) + abs (y' - y) dist' :: Num a => Point a -> [Point a] -> [(Point a, a)] dist' p [] = [] dist' p l = zip l (map (manhattan p) l) dist :: (Num a, Ord a) => Point a -> [Point a] -> Map.Map (Point a) a dist p l = Map.fromList (dist' p l) distmatrix :: (Num a, Ord a) => [Point a] -> Map.Map (Point a) (Map.Map (Point a) a) distmatrix (p:[]) = Map.empty distmatrix (p:ps) = Map.insert p (dist p ps) (distmatrix ps) getdist' :: (Num a, Ord a) => Point a -> Point a -> Map.Map (Point a) (Map.Map (Point a) a) -> a getdist' p1 p2 dm = (dm Map.! p1) Map.! p2 getdist :: (Num a, Ord a) => Point a -> Point a -> Map.Map (Point a) (Map.Map (Point a) a) -> a getdist p1 p2 dm | Map.member p1 dm = getdist p1 p2 dm | otherwise = getdist' p2 p1 dm

On Sun, Jul 01, 2012 at 12:38:14PM +0200, Robert Heumüller wrote:
Hi again,
hoping to further increase performance i've decided to calculate the distances between all points ahead of time and store the distance values in a map of maps. Because of the symmetry of the matrix this should work in O(0.5*n^2). Could you please take a look at the code and tell me if this would be the way to go? I fear that i might actually loose performance due to the lookuptime of the treemap compared to recalculating the distance between two points?
Indeed, since calculating the distance between two points is so simple (just a few arithmetic operations) I really doubt you will get much speedup by caching the distances, especially if you have a lot of points. But, as always, it's really impossible to predict for sure---you'll have to do some profiling. -Brent
participants (3)
-
Brent Yorgey
-
Ertugrul Söylemez
-
Robert Heumüller