Re: [Haskell-cafe] [newbie question] Memoization automatic in Haskell?

On Jan 12, 2008 10:54 PM, Henning Thielemann
On Sat, 12 Jan 2008, Hugh Perkins wrote:
I guess that Haskell's referential transparence means the answers to the isPerfectSquare will be cached, ie automatically memoized? (not sure if is correct term?)
Interesting... but I dont understand... I thought that referential transparence meant that once the answer to a function has been calculated once, it will always be the same, and that the interpreter can, and will, cache this answer? So, if I call f( 20 ) once, for some, arbitrary, f, it will have to go away and calculate f(20), but if I call it multiple times, it will just return the value it already calculated?

On Jan 12, 2008, at 18:16 , Hugh Perkins wrote:
On Jan 12, 2008 10:54 PM, Henning Thielemann
wrote: On Sat, 12 Jan 2008, Hugh Perkins wrote:
I guess that Haskell's referential transparence means the answers to the isPerfectSquare will be cached, ie automatically memoized? (not sure if is correct term?)
Interesting... but I dont understand... I thought that referential transparence meant that once the answer to a function has been calculated once, it will always be the same, and that the interpreter can, and will, cache this answer?
It *can* cache the answer, if it so chooses... but that often turns out to be a pessimization, as it caches values that are only used once or twice. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On 12 Jan 2008, at 3:16 PM, Hugh Perkins wrote:
On Jan 12, 2008 10:54 PM, Henning Thielemann
wrote: On Sat, 12 Jan 2008, Hugh Perkins wrote:
I guess that Haskell's referential transparence means the answers to the isPerfectSquare will be cached, ie automatically memoized? (not sure if is correct term?)
Interesting... but I dont understand... I thought that referential transparence meant that once the answer to a function has been calculated once, it will always be the same,
Right.
and that the interpreter can,
Right.
and will,
Caching is not infrequently a terrible implementation strategy, for space reasons (and sometimes for time reasons as well). Deciding whether to use it is a delicate engineering tradeoff, and, while the compiler will do its best, the starting point is decided by the programmer. Lists get cached by default; functions over integers do not. The compiler may change that decision, but it will make sure to nail down in triplicate that doing so is an improvement for every program subject to the optimization first, which means it probably won't.
cache this answer?
jcc

On 12 Jan 2008, at 23:16, Hugh Perkins wrote:
On Jan 12, 2008 10:54 PM, Henning Thielemann
wrote: On Sat, 12 Jan 2008, Hugh Perkins wrote:
I guess that Haskell's referential transparence means the answers to the isPerfectSquare will be cached, ie automatically memoized? (not sure if is correct term?)
Interesting... but I dont understand... I thought that referential transparence meant that once the answer to a function has been calculated once, it will always be the same, and that the interpreter can, and will, cache this answer?
So, if I call f( 20 ) once, for some, arbitrary, f, it will have to go away and calculate f(20), but if I call it multiple times, it will just return the value it already calculated?
No, Memorisation has it's costs too... Suppose you wanted to computer map f [1..100000000000000]? Each time f was called, your program would look up a table of on average 50000000000000 results for f. That doesn't sound very efficient if f is a simple function. Now suppose you're running a program for several hours -- imagine how large your table would become, and how slow your lookup would be. What you can do however, is introduced constants. Constants are evaluated once and only once, so using them, you can tell the compiler exactly what should be memorized. Bob

(While writing this message GMail told me I was too late to answer the
question. Oh well, as I already typed it, let's send =)
On Jan 12, 2008 9:16 PM, Hugh Perkins
Interesting... but I dont understand... I thought that referential transparence meant that once the answer to a function has been calculated once, it will always be the same, and that the interpreter can, and will, cache this answer?
It *can*, but all Haskell implementations I know do not. The reason is very simple, it would need a veery large amount of memory, and sometimes searching to see if the answer was already calculated could be worse than recalculating it (think of (+1) or perhaps (null)). Polimorphic functions would also complicate the matter, as multiple different caches would be needed.
So, if I call f( 20 ) once, for some, arbitrary, f, it will have to go away and calculate f(20), but if I call it multiple times, it will just return the value it already calculated?
If you do something like let x = f 20 in x + x it *probably* will be calculated only once (although it could be calculated twice). But in the (bad) implementaion of fibonacci below, fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2) when calculating (fib 40), (fib 38) will be calculated twice, (fib 37) will be calculated thrice, etc. -- Felipe.

On Sun, 13 Jan 2008, Hugh Perkins wrote:
On Jan 12, 2008 10:54 PM, Henning Thielemann
wrote: On Sat, 12 Jan 2008, Hugh Perkins wrote:
I guess that Haskell's referential transparence means the answers to the isPerfectSquare will be cached, ie automatically memoized? (not sure if is correct term?)
Interesting... but I dont understand... I thought that referential transparence meant that once the answer to a function has been calculated once, it will always be the same, and that the interpreter can, and will, cache this answer?
Caching is not the default, but you can easily code this by yourself: Define an array and initialize it with all function values. Because of lazy evaluation the function values are computed only when they are requested and then they persist in the array. One should add this most simple case to the Wiki.

On 1/12/08, Henning Thielemann
Caching is not the default, but you can easily code this by yourself: Define an array and initialize it with all function values. Because of lazy evaluation the function values are computed only when they are requested and then they persist in the array.
But how can I implement memoization for a more complicated function? For example, perhaps I want to memoize f :: String -> Int -> Double -> String -> Bool In Python, it's pretty easy to memoize this. How can I do it in Haskell? I suspect the only way would involve changing the function signature to use IO or ST. It would be nice if I could just tell the compiler "I command you to memoize this function", and have it produce the required code automatically.

On 12 Jan 2008, at 3:30 PM, David Benbennick wrote:
On 1/12/08, Henning Thielemann
wrote: Caching is not the default, but you can easily code this by yourself: Define an array and initialize it with all function values. Because of lazy evaluation the function values are computed only when they are requested and then they persist in the array.
But how can I implement memoization for a more complicated function? For example, perhaps I want to memoize
f :: String -> Int -> Double -> String -> Bool
In Python, it's pretty easy to memoize this. How can I do it in Haskell? I suspect the only way would involve changing the function signature to use IO or ST.
It would be nice if I could just tell the compiler "I command you to memoize this function", and have it produce the required code automatically.
You can cache anything using mutable hash tables, as you know, and googling will find you `function's in Haskell that do this for you. jcc

On Sat, 12 Jan 2008, David Benbennick wrote:
On 1/12/08, Henning Thielemann
wrote: Caching is not the default, but you can easily code this by yourself: Define an array and initialize it with all function values. Because of lazy evaluation the function values are computed only when they are requested and then they persist in the array.
But how can I implement memoization for a more complicated function? For example, perhaps I want to memoize
f :: String -> Int -> Double -> String -> Bool
There was a long thread about a sophisticated technique called "blue prints", which allows you to use binary search trees as memoizing structure. http://www.haskell.org/pipermail/haskell-cafe/2006-September/018204.html

Henning Thielemann wrote:
David Benbennick wrote:
But how can I implement memoization for a more complicated function? For example, perhaps I want to memoize
f :: String -> Int -> Double -> String -> Bool
There was a long thread about a sophisticated technique called "blue prints", which allows you to use binary search trees as memoizing structure. http://www.haskell.org/pipermail/haskell-cafe/2006-September/018204.html
That's not what blueprints were for. You want generalized tries here Ralf Hinze. Generalizing generalized tries. http://www.informatik.uni-bonn.de/~ralf/publications/GGTries.ps.gz as Luke pointed out. Regards, apfelmus

On Jan 12, 2008 11:30 PM, David Benbennick
On 1/12/08, Henning Thielemann
wrote: Caching is not the default, but you can easily code this by yourself: Define an array and initialize it with all function values. Because of lazy evaluation the function values are computed only when they are requested and then they persist in the array.
But how can I implement memoization for a more complicated function? For example, perhaps I want to memoize
f :: String -> Int -> Double -> String -> Bool
In Python, it's pretty easy to memoize this. How can I do it in Haskell? I suspect the only way would involve changing the function signature to use IO or ST.
No, that is one way to do it, and probably the easiest to think about. Because its conceptually pure, I wouldn't be opposed to wrapping it in unsafePerformIO (but that can be, well, unsafe if you do it wrong :-) But there is a way to do it if you demand to be a purist, but only if you can code a data structure representing all values of a type. Doing this for a particular type is one of my favorite ways to spend a half hour when I'm bored :-) For an obvious case, but to illustrate the point, I'll do Bool. data BoolCache a = BC a a bools :: BoolCache Bool bools = BC True False lookupBool :: BoolCache a -> Bool -> a lookupBool (BC t f) True = t lookupBool (BC t f) False = f memoBool :: (Bool -> a) -> (Bool -> a) memoBool f = lookupBool (fmap f bools) The pattern is the same for any type. You can do it for types with infinitely many members, like Integer, but it's trickier (but it's the same pattern, just a trickier data structure). The Integer case is scattered here and there online. I haven't found any other cases online, but I've implemented a few.
It would be nice if I could just tell the compiler "I command you to memoize this function", and have it produce the required code automatically.
Tru dat! But it's not clear what the best way for the compiler writer to do that is. For example, if I know the access patterns of the function, I can design the aforementioned data structure to favor those. Also, not every type admits memoization, for example functions. But I can certainly envisage a library providing: class Memo a where memo :: (a -> b) -> (a -> b) For a bunch of different types. Hmm, one probably already exists, actually... Luke

Luke Palmer wrote:
David Benbennick wrote:
It would be nice if I could just tell the compiler "I command you to memoize this function", and have it produce the required code automatically.
Tru dat!
But it's not clear what the best way for the compiler writer to do that is. For example, if I know the access patterns of the function, I can design the aforementioned data structure to favor those. Also, not every type admits memoization, for example functions.
Indeed. There are plenty of choices of data structures for memo "tables" and hash tables are not the best of them. Such choices are better left to the programmer. Regards, apfelmus

Henning Thielemann writes:
Caching is not the default, but you can easily code this by yourself: Define an array and initialize it with all function values. Because of lazy evaluation the function values are computed only when they are requested and then they persist in the array. One should add this most simple case to the Wiki.
A posteriori thought, when I reread that... This is a =part= of the story. Whatever you do with the initial calls, if there is no automatic memoization, further calls will be executed normally. The user has to replace his/her calls with the elements of the memo-array. Suppose (sorry for the awful Fibonacci again...) we define fibs = [fib n | n<-[0 ..]] fib 0 = 0 fib 1 = 1 fib n = fib(n-1) + fib(n-2) If you try to get fibs!!1000 you will die before anyway. The solution is obviously to replace the recursive definition of fib by fib n = fibs!!(n-1) + fibs!!(n-2) This works well. I had a similar problem in physics, perturbation theory offers often some quite intricate, knotty recurrencies, and the memoization offers a solution for the worse than exponential complexity. But I had to use trees, 2_dim lists of lists, etc. in order to sort the order of the creation of the results appropriately. So, I agree wholeheartly with the statement that memoization is not a blind automaton, it should be used consciously, and adapted to concrete needs. Jerzy Karczmarczuk

On Sun, Jan 13, 2008 at 12:25:53AM +0100,
Henning Thielemann
Caching is not the default, but you can easily code this by yourself: Define an array and initialize it with all function values. Because of lazy evaluation the function values are computed only when they are requested and then they persist in the array.
It works only if the argument of the function is an integer or an enumerated type. If the argument is a string and you do not know the string values in advance, you cannot use the array.
participants (11)
-
apfelmus
-
Brandon S. Allbery KF8NH
-
David Benbennick
-
Felipe Lessa
-
Henning Thielemann
-
Hugh Perkins
-
jerzy.karczmarczuk@info.unicaen.fr
-
Jonathan Cast
-
Luke Palmer
-
Stephane Bortzmeyer
-
Thomas Davie