
Hi, I'm going through the first chapters of the Real World Haskell book, so I'm still a complete newbie, but today I was hoping I could solve the following function in Haskell, for large numbers (n > 108) f(n) = max(n,f(n/2)+f(n/3)+f(n/4)) I've seen examples of memoization in Haskell to solve fibonacci numbers, which involved computing (lazily) all the fibonacci numbers up to the required n. But in this case, for a given n, we only need to compute very few intermediate results. How could one go about solving this efficiently with Haskell? Thanks in advance, Ángel de Vicente -- http://www.iac.es/galeria/angelv/ High Performance Computing Support PostDoc Instituto de Astrofísica de Canarias --------------------------------------------------------------------------------------------- ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

On Thursday 08 July 2010 23:30:05, Angel de Vicente wrote:
Hi,
I'm going through the first chapters of the Real World Haskell book, so I'm still a complete newbie, but today I was hoping I could solve the following function in Haskell, for large numbers (n > 108)
f(n) = max(n,f(n/2)+f(n/3)+f(n/4))
You need some base case or you'll have infinite recursion.
I've seen examples of memoization in Haskell to solve fibonacci numbers, which involved computing (lazily) all the fibonacci numbers up to the required n. But in this case, for a given n, we only need to compute very few intermediate results.
How could one go about solving this efficiently with Haskell?
If f has the appropriate type and the base case is f 0 = 0, module Memo where import Data.Array f :: (Integral a, Ord a, Ix a) => a -> a f n = memo ! n where memo = array (0,n) $ (0,0) : [(i, max i (memo!(i `quot` 2) + memo!(i `quot` 3) + memo!(i `quot` 4))) | i <- [1 .. n]] is wasteful regarding space, but it calculates only the needed values and very simple. (to verify: module Memo where import Data.Array import Debug.Trace f :: (Integral a, Ord a, Ix a) => a -> a f n = memo ! n where memo = array (0,n) $ (0,0) : [(i, max (trace ("calc " ++ show i) i) (memo!(i `quot` 2) + memo!(i `quot` 3) + memo!(i `quot` 4))) | i <- [1 .. n]] ) You can also use a library (e.g. http://hackage.haskell.org/package/data- memocombinators) to do the memoisation for you. Another fairly simple method to memoise is using a Map and State, import qualified Data.Map as Map import Control.Monad.State f :: (Integral a) => a -> a f n = evalState (memof n) (Map.singleton 0 0) where memof k = do mb <- gets (Map.lookup k) case mb of Just r -> return r Nothing -> do vls <- mapM memof [k `quot` 2, k `quot` 3, k `quot` 4] let vl = max k (sum vls) modify (Map.insert k vl) return vl
Thanks in advance, Ángel de Vicente

On Friday 09 July 2010 00:10:24, Daniel Fischer wrote:
You can also use a library (e.g. http://hackage.haskell.org/package/data- memocombinators) to do the memoisation for you.
Well, actualy, I think http://hackage.haskell.org/package/MemoTrie would be the better choice for the moment, data-memocombinators doesn't seem to offer the functionality we need out of the box.

On Thu, Jul 8, 2010 at 4:23 PM, Daniel Fischer
On Friday 09 July 2010 00:10:24, Daniel Fischer wrote:
You can also use a library (e.g. http://hackage.haskell.org/package/data- memocombinators) to do the memoisation for you.
Well, actualy, I think http://hackage.haskell.org/package/MemoTrie would be the better choice for the moment, data-memocombinators doesn't seem to offer the functionality we need out of the box.
I'm interested to hear what functionality MemoTrie has that data-memocombinators does not. I wrote the latter in hopes that it would be strictly more powerful*. Luke * Actually MemoTrie wasn't around when I wrote that, but I meant the combinatory technique should be strictly more powerful than a typeclass technique. And data-memocombinators has many primitives, so I'm still curious.

On Friday 09 July 2010 01:03:48, Luke Palmer wrote:
On Thu, Jul 8, 2010 at 4:23 PM, Daniel Fischer
wrote: On Friday 09 July 2010 00:10:24, Daniel Fischer wrote:
You can also use a library (e.g. http://hackage.haskell.org/package/data- memocombinators) to do the memoisation for you.
Well, actualy, I think http://hackage.haskell.org/package/MemoTrie would be the better choice for the moment, data-memocombinators doesn't seem to offer the functionality we need out of the box.
I'm interested to hear what functionality MemoTrie has that data-memocombinators does not. I wrote the latter in hopes that it would be strictly more powerful*.
It's probably my night-blindness, but I didn't see an immediate way to memoise a simple function on a short look at the docs, like memo :: (ConstraintOn a) => (a -> b) -> a -> b , which Data.MemoTrie provides (together with memo2 and memo3, which data- memocombinators provide too). Taking a closer look at the docs in daylight, I see data-mc provides that out of the box too, the stuff is just differently named (bool, char, integral, ...) - which I didn't expect. So you could take it as an indication that I'm visually impaired, or as an indication that the docs aren't as obvious as they could be. Cheers, Daniel
Luke
* Actually MemoTrie wasn't around when I wrote that, but I meant the combinatory technique should be strictly more powerful than a typeclass technique. And data-memocombinators has many primitives, so I'm still curious.

Daniel Fischer wrote:
If f has the appropriate type and the base case is f 0 = 0,
module Memo where
import Data.Array
f :: (Integral a, Ord a, Ix a) => a -> a f n = memo ! n where memo = array (0,n) $ (0,0) : [(i, max i (memo!(i `quot` 2) + memo!(i `quot` 3) + memo!(i `quot` 4))) | i <- [1 .. n]]
is wasteful regarding space, but it calculates only the needed values and very simple.
Can someone explain to a beginner like me why this calculates only the needed values? The list comprehension draws from 1..n so I don't understand why all those values wouldn't be computed. Thanks Mike

On 7/8/10 9:17 PM, Michael Mossey wrote:
Daniel Fischer wrote:
If f has the appropriate type and the base case is f 0 = 0,
module Memo where
import Data.Array
f :: (Integral a, Ord a, Ix a) => a -> a f n = memo ! n where memo = array (0,n) $ (0,0) : [(i, max i (memo!(i `quot` 2) + memo!(i `quot` 3) + memo!(i `quot` 4))) | i <- [1 .. n]]
is wasteful regarding space, but it calculates only the needed values and very simple.
Can someone explain to a beginner like me why this calculates only the needed values? The list comprehension draws from 1..n so I don't understand why all those values wouldn't be computed.
The second pair of each element of the list will remain unevaluated until demanded --- it's the beauty of being a lazy language. :-) Put another way, although it might look like the list contains values (and technically it does due to referential transparency), at a lower level what it actually contains are pairs such that the second element is represented not by number but rather by a function that can be called to obtain its value. Cheers, Greg

Thanks, okay the next question is: how does the memoization work? Each call to memo seems to construct a new array, if the same f(n) is encountered several times in the recursive branching, it would be computed several times. Am I wrong? Thanks, Mike Gregory Crosswhite wrote:
On 7/8/10 9:17 PM, Michael Mossey wrote:
Daniel Fischer wrote:
If f has the appropriate type and the base case is f 0 = 0,
module Memo where
import Data.Array
f :: (Integral a, Ord a, Ix a) => a -> a f n = memo ! n where memo = array (0,n) $ (0,0) : [(i, max i (memo!(i `quot` 2) + memo!(i `quot` 3) + memo!(i `quot` 4))) | i <- [1 .. n]]
is wasteful regarding space, but it calculates only the needed values and very simple.
Can someone explain to a beginner like me why this calculates only the needed values? The list comprehension draws from 1..n so I don't understand why all those values wouldn't be computed.
The second pair of each element of the list will remain unevaluated until demanded --- it's the beauty of being a lazy language. :-) Put another way, although it might look like the list contains values (and technically it does due to referential transparency), at a lower level what it actually contains are pairs such that the second element is represented not by number but rather by a function that can be called to obtain its value.
Cheers, Greg
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

You're correct in pointing out that f uses memoization inside of itself to cache the intermediate values that it commutes, but those values don't get shared between invocations of f; thus, if you call f with the same value of n several times then the memo table might get reconstructed redundantly. (However, there are other strategies for memoization that are persistent across calls.) Cheers, Greg On 7/8/10 9:59 PM, Michael Mossey wrote:
Thanks, okay the next question is: how does the memoization work? Each call to memo seems to construct a new array, if the same f(n) is encountered several times in the recursive branching, it would be computed several times. Am I wrong? Thanks, Mike
Gregory Crosswhite wrote:
On 7/8/10 9:17 PM, Michael Mossey wrote:
Daniel Fischer wrote:
If f has the appropriate type and the base case is f 0 = 0,
module Memo where
import Data.Array
f :: (Integral a, Ord a, Ix a) => a -> a f n = memo ! n where memo = array (0,n) $ (0,0) : [(i, max i (memo!(i `quot` 2) + memo!(i `quot` 3) + memo!(i `quot` 4))) | i <- [1 .. n]]
is wasteful regarding space, but it calculates only the needed values and very simple.
Can someone explain to a beginner like me why this calculates only the needed values? The list comprehension draws from 1..n so I don't understand why all those values wouldn't be computed.
The second pair of each element of the list will remain unevaluated until demanded --- it's the beauty of being a lazy language. :-) Put another way, although it might look like the list contains values (and technically it does due to referential transparency), at a lower level what it actually contains are pairs such that the second element is represented not by number but rather by a function that can be called to obtain its value.
Cheers, Greg
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Gregory Crosswhite wrote:
You're correct in pointing out that f uses memoization inside of itself to cache the intermediate values that it commutes, but those values don't get shared between invocations of f; thus, if you call f with the same value of n several times then the memo table might get reconstructed redundantly. (However, there are other strategies for memoization that are persistent across calls.)
It should be f = \n -> memo ! n where memo = .. so that memo is shared across multiple calls like f 1 , f 2 etc. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

That actually doesn't work as long as memo is an array, since then it has fixed size; you have to also make memo an infinitely large data (but lazy) structure so that it can hold results for arbitrary n. One option for doing this of course is to make memo be an infinite list, but a more space and time efficient option is to use a trie like in MemoTrie. Cheers, Greg On 7/9/10 12:50 AM, Heinrich Apfelmus wrote:
Gregory Crosswhite wrote:
You're correct in pointing out that f uses memoization inside of itself to cache the intermediate values that it commutes, but those values don't get shared between invocations of f; thus, if you call f with the same value of n several times then the memo table might get reconstructed redundantly. (However, there are other strategies for memoization that are persistent across calls.)
It should be
f = \n -> memo ! n where memo = ..
so that memo is shared across multiple calls like f 1 , f 2 etc.
Regards, Heinrich Apfelmus
-- http://apfelmus.nfshost.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Gregory Crosswhite wrote:
Heinrich Apfelmus wrote:
Gregory Crosswhite wrote:
You're correct in pointing out that f uses memoization inside of itself to cache the intermediate values that it commutes, but those values don't get shared between invocations of f; thus, if you call f with the same value of n several times then the memo table might get reconstructed redundantly. (However, there are other strategies for memoization that are persistent across calls.)
It should be
f = \n -> memo ! n where memo = ..
so that memo is shared across multiple calls like f 1 , f 2 etc.
That actually doesn't work as long as memo is an array, since then it has fixed size; you have to also make memo an infinitely large data (but lazy) structure so that it can hold results for arbitrary n. One option for doing this of course is to make memo be an infinite list, but a more space and time efficient option is to use a trie like in MemoTrie.
Oops, silly me! I erroneously thought that the code was using f instead of (memo !) in the definition of the array, like this f :: (Integral a, Ord a, Ix a) => a -> a f n = memo ! n where memo = array (0,n) $ (0,0) : [(i, max i (f (i `quot` 2) + f (i `quot` 3) + f (i `quot` 4))) | i <- [1 .. n]] But since memo depends on n , it cannot be lifted outside the lambda abstraction. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Thu, Jul 8, 2010 at 5:30 PM, Angel de Vicente
Hi,
I'm going through the first chapters of the Real World Haskell book, so I'm still a complete newbie, but today I was hoping I could solve the following function in Haskell, for large numbers (n > 108)
f(n) = max(n,f(n/2)+f(n/3)+f(n/4))
I've seen examples of memoization in Haskell to solve fibonacci numbers, which involved computing (lazily) all the fibonacci numbers up to the required n. But in this case, for a given n, we only need to compute very few intermediate results.
How could one go about solving this efficiently with Haskell?
We can do this very efficiently by making a structure that we can index in sub-linear time.
But first,
{-# LANGUAGE BangPatterns #-}
import Data.Function (fix)
Lets define f, but make it use 'open recursion' rather than call itself directly.
f :: (Int -> Int) -> Int -> Int f mf 0 = 0 f mf n = max n $ mf (div n 2) + mf (div n 3) + mf (div n 4)
You can get an unmemoized f by using `fix f` This will let you test that f does what you mean for small values of f by calling, for example: `fix f 123` = 144 We could memoize this by defining:
f_list :: [Int] f_list = map (f faster_f) [0..]
faster_f :: Int -> Int faster_f n = f_list !! n
That performs passably well, and replaces what was going to take O(n^3) time with something that memoizes the intermediate results. But it still takes linear time just to index to find the memoized answer for `mf`. This means that results like: *Main Data.List> faster_f 123801 248604 are tolerable, but the result doesn't scale much better than that. We can do better! First lets define an infinite tree:
data Tree a = Tree (Tree a) a (Tree a) instance Functor Tree where fmap f (Tree l m r) = Tree (fmap f l) (f m) (fmap f r)
And then we'll define a way to index into it, so we can find a node with index n in O(log n) time instead:
index :: Tree a -> Int -> a index (Tree _ m _) 0 = m index (Tree l _ r) n = case (n - 1) `divMod` 2 of (q,0) -> index l q (q,1) -> index r q
... and we may find a tree full of natural numbers to be convenient so we don't have to fiddle around with those indices:
nats :: Tree Int nats = go 0 1 where go !n !s = Tree (go l s') n (go r s') where l = n + s r = l + s s' = s * 2
Since we can index, you can just convert a tree into a list:
toList :: Tree a -> [a] toList as = map (index as) [0..]
You can check the work so far by verifying that `toList nats` gives you [0..] Now,
f_tree :: Tree Int f_tree = fmap (f fastest_f) nats
fastest_f :: Int -> Int fastest_f = index f_tree
works just like with list above, but instead of taking linear time to find each node, can chase it down in logarithmic time. The result is considerably faster: *Main> fastest_f 12380192300 67652175206 *Main> fastest_f 12793129379123 120695231674999 In fact it is so much faster that you can go through and replace Int with Integer above and get ridiculously large answers almost instantaneously *Main> fastest_f' 1230891823091823018203123 93721573993600178112200489 *Main> fastest_f' 12308918230918230182031231231293810923 11097012733777002208302545289166620866358 -Edward Kmett

begin Edward Kmett quotation:
The result is considerably faster:
*Main> fastest_f 12380192300 67652175206
*Main> fastest_f 12793129379123 120695231674999
I just thought I'd point out that running with these particular values on a machine with a 32 bit Int will cause your machine to go deep into swap... Anything constant greater that maxBound is being wrapped back to the negative side, causing havoc to ensue. I changed the open version of "f" to look like this to exclude negative values: f :: (Int -> Int) -> Int -> Int f mf 0 = 0 f mf n | n < 0 = error $ "Invalid n value: " ++ show n f mf n | otherwise = max n $ mf (div n 2) + mf (div n 3) + mf (div n 4) -md

Very true. I was executing the large Int-based examples on a 64 bit machine.
You can of course flip over to Integer on either 32 or 64 bit machines and
alleviate the problem with undetected overflow. Of course that doesn't help
with negative initial inputs
;)
I do agree It is still probably a good idea to either filter the negative
case like you do here, or, since it is well defined, extend the scope of the
memo table to the full Int range by explicitly memoizing negative vales as
well.
-Edward Kmett
On Fri, Jul 9, 2010 at 11:51 AM, Mike Dillon
begin Edward Kmett quotation:
The result is considerably faster:
*Main> fastest_f 12380192300 67652175206
*Main> fastest_f 12793129379123 120695231674999
I just thought I'd point out that running with these particular values on a machine with a 32 bit Int will cause your machine to go deep into swap... Anything constant greater that maxBound is being wrapped back to the negative side, causing havoc to ensue. I changed the open version of "f" to look like this to exclude negative values:
f :: (Int -> Int) -> Int -> Int f mf 0 = 0 f mf n | n < 0 = error $ "Invalid n value: " ++ show n f mf n | otherwise = max n $ mf (div n 2) + mf (div n 3) + mf (div n 4)
-md

Hi, thanks for all the replies. I'm off now to try all the suggestions... Cheers, Ángel de Vicente -- http://www.iac.es/galeria/angelv/ High Performance Computing Support PostDoc Instituto de Astrofísica de Canarias --------------------------------------------------------------------------------------------- ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
participants (8)
-
Angel de Vicente
-
Daniel Fischer
-
Edward Kmett
-
Gregory Crosswhite
-
Heinrich Apfelmus
-
Luke Palmer
-
Michael Mossey
-
Mike Dillon