
Dear Group, I've spend the last few days figuring out the solution to Euler Problem 201 in haskell. I first tried a relatively elegant approach based on Data.Map but the performance was horrible. I never actually arrived at the answer. I then rewrote the same algorithm using STUArrays and it was lightning. I have posted both versions of the code at: http://www.maztravel.com/haskell/euler_problem_201.html and would appreciate any insights that you master haskellers can provide on why the speed difference is so huge. Thanks in advance. Henry Laxen

nadine.and.henry@pobox.com wrote:
Dear Group,
I've spend the last few days figuring out the solution to Euler Problem 201 in haskell. I first tried a relatively elegant approach based on Data.Map but the performance was horrible. I never actually arrived at the answer. I then rewrote the same algorithm using STUArrays and it was lightning. I have posted both versions of the code at: http://www.maztravel.com/haskell/euler_problem_201.html and would appreciate any insights that you master haskellers can provide on why the speed difference is so huge. Thanks in advance. Henry Laxen
First, you may want to change the map type to type SumMap = Map (Int,Int) Int since you're working with pairs (length, sum), not lists. I mean, you're doing the same with STUArray (Int,Int) Int . Did you try to estimate the running time of both data structures? Calculating the number of big-O operations on the back of an envelope is a very good guideline. So, Data.Map.insert takes O(log (size of map)) operations and so on. A rule of thumb is that a computer can perform 10 million "operations" per second (maybe 100, that was five years ago :)). Granted, this rule works best for C programs whereas Haskell is quite sensitive to constant factors, in particular concerning memory and cache effects. So, the rule is pretty accurate for an STUArray but you may have to multiply with 10 to get the right order of magnitude for Data.Map. As you have noted, the choice of data structure (Map, STUArray, something else) is important (Map only touches existing sums, but STUArray has O(1) access and uses a tight representation in memory). But in the following, I want to discuss something what you did implicitly, namely how to *calculate* the general algorithm in a mechanical fashion. This follows the lines of Richard Bird's work, of which the book "Algebra of Programming" http://web.comlab.ox.ac.uk/oucl/research/pdt/ap/pubs.html#Bird-deMoor96:Alge... is one of the cornerstones. The systematic derivation of dynamic programming algorithms has been rediscovered in a more direct but less general fashion in http://bibiserv.techfak.uni-bielefeld.de/adp/ Euler problem 201 asks to calculate the possible sums you can form with 50 elements from the set of square numbers from 1^2 to 100^2. Hence, given a function subsets [] = [[]] subsets (x:xs) = map (x:) (subsets xs) ++ subsets xs that returns all subsets of a set, we can implement a solution as follows: squares = map (^2) [1..100] euler201 = map sum . filter ((==50) . length) . subsets $ squares While hopelessly inefficient, this solution is obviously correct! In fact, we did barely more than write down the task. Ok ok, the solution is *not correct* because map sum may generate *duplicates*. In other words, subsets generates a lot of sets that have the same sum. But that's the key point for creating a better algorithm: we could be a lot faster if merging subsets with the same sum and generating these subsets could be interleaved. To that end, we first have to move the length filter to after the summation: map sum . filter ((==50) . length) = map snd . filter ((==50) . fst) . map (length &&& sum) The function (&&&) is very useful and defined as (length &&& sum) xs = (length xs, sum xs) You can import (a generalization of) of it from Control.Arrow. In other words, our solution now reads euler201 = map snd . filter ((==50) . fst) . subsums $ squares where subsums = map (length &&& sum) . subsets and our task is to find a definition of subsums that fuses summation and subset generation. But this is a straightforward calculation! Let's assume that we have an implementation of Sets that we can use for merging duplicates. In other words, we assume operations singleton :: a -> Set a union :: Set a -> Set a -> Set a map :: (a -> b) -> Set a -> Set b so that subsets becomes subsets [] = singleton [] subsets (x:xs) = map (x:) (subsets xs) `union` subsets xs Now, let's calculate: subsums [] = { definition } map (length &&& sum) (subsets []) = { subsets } map (length &&& sum) (singleton []) = { map } singleton ((length &&& sum) []) = { length &&& sum } singleton (0,0) subsums (x:xs) = { definition } map (length &&& sum) (subsets (x:xs)) = { subsets } map (length &&& sum) (map (x:) (subsets xs) `union` subsets xs) = { map preserves unions } map (length &&& sum) (map (x:) subsets xs) `union` map (length &&& sum) (subsets xs) = { map fusion } map (length &&& sum . (x:)) (subsets xs) `union` map (length &&& sum) (subsets xs) = { move (length &&& sum) to the front, see footnote } map ((\(n,s) -> (n+1,s+x)) . (length &&& sum)) (subsets xs) `union` map (length &&& sum) (subsets xs) = { reverse map fusion } map (\(n,s) -> (n+1,s+x)) (map (length &&& sum) (subsets xs)) `union` map (length &&& sum) (subsets xs) = { reverse definition of subsums } map (\(n,s) -> (n+1,s+x)) (subsums xs) `union` subsums xs In other words, we have now calculated the more efficient program euler201 = map snd . filter ((==50) . fst) . subsums $ squares where subsums [] = singleton (0,0) subsums (x:xs) = map (\(n,s) -> (n+1,s+x)) (subsums xs) `union` subsums xs Of course, we still need an efficient implementation for sets of (length, sum) pairs. Henry has already explored the two possibilities Set (Int,Int) and STUArray (Int,Int) a bit, but there are others, like IntMap Int [Int] or sorted lists. (Strictly speaking, Henry has explored something different but similar, what is it?). Regards, apfelmus Footnote: We still have to prove the identity (length &&& sum) . (x:) = (\(n,s) -> (n+1,s+x)) . (length &&& sum) I mean, you can figure this out in your head, but a formal calculation best proceeds with the two identities length . (x:) = (1+) . length -- definition of length sum . (x:) = (x+) . sum -- definition of sum and the observation (f &&& g) . h = (f . h &&& g . h) = (hf . f &&& hg . g) -- assuming hf . f = f . h and hg . g = g . h = (hg *** hf) . (f &&& g) where (***) is yet another handy function from Control.Arrow with the definition (f *** g) (x,y) = (f x, g y)

apfelmus wrote:
In other words, we have now calculated the more efficient program
euler201 = map snd . filter ((==50) . fst) . subsums $ squares where subsums [] = singleton (0,0) subsums (x:xs) = map (\(n,s) -> (n+1,s+x)) (subsums xs) `union` subsums xs
I forgot something very important, namely that the common subexpression subsums xs has to be shared euler201 = map snd . filter ((==50) . fst) . subsums $ squares where subsums [] = singleton (0,0) subsums (x:xs) = let s = subsums xs in map (\(n,s) -> (n+1,s+x)) s `union`s Otherwise, this exercise would be pointless and the runtime exponential ... :O Regards, apfelmus
participants (2)
-
apfelmus
-
nadine.and.henry@pobox.com