memoization

staafmeister

5 Sep 2009 5 Sep '09

9:52 a.m.

Hi, I participating in de google code jam this year and I want to try to use haskell. The following simple http://code.google.com/codejam/contest/dashboard?c=90101#s=p2 problem would have the beautiful haskell solution. import Data.MemoTrie import Data.Char import Data.Word import Text.Printf newtype ModP = ModP Integer deriving Eq p=10000 instance Show ModP where show (ModP x) = printf "%04d" x instance Num ModP where ModP x + ModP y = ModP ((x + y) `mod` p) fromInteger x = ModP (x `mod` p) ModP x * ModP y = ModP ((x * y) `mod` p) abs = undefined signum = undefined solve _ [] = 1::ModP solve [] _ = 0::ModP solve (hs:ts) t@(ht:tt) | hs==ht = solve ts tt + solve ts t | otherwise = solve ts t go (run, line) = "Case #"++show run++": "++show (solve line "welcome to code jam") main = interact $ unlines . map go . zip [1..] . tail . lines Which is unfortunately exponential. Now in earlier thread I argued for a compiler directive in the lines of {-# Memoize function -#}, but this is not possible (it seems to be trivial to implement though). Now I used memotrie which runs hopelessly out of memory. I looked at some other haskell solutions, which were all ugly and more clumsy compared to simple and concise C code. So it seems to me that haskell is very nice and beautiful until your are solving real algorithmic problems when you want to go back to some imperative language. How would experienced haskellers solve this problem? Thanks -- View this message in context: http://www.nabble.com/memoization-tp25306687p25306687.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Show replies by date

Daniel Fischer

5 Sep 5 Sep

1:51 p.m.

Am Samstag 05 September 2009 11:52:50 schrieb staafmeister:

...

Hi,

I participating in de google code jam this year and I want to try to use haskell. The following simple http://code.google.com/codejam/contest/dashboard?c=90101#s=p2 problem would have the beautiful haskell solution.

import Data.MemoTrie import Data.Char import Data.Word import Text.Printf

newtype ModP = ModP Integer deriving Eq

p=10000

instance Show ModP where show (ModP x) = printf "%04d" x

instance Num ModP where ModP x + ModP y = ModP ((x + y) `mod` p) fromInteger x = ModP (x `mod` p) ModP x * ModP y = ModP ((x * y) `mod` p) abs = undefined signum = undefined

solve _ [] = 1::ModP solve [] _ = 0::ModP solve (hs:ts) t@(ht:tt) | hs==ht = solve ts tt + solve ts t

| otherwise = solve ts t

go (run, line) = "Case #"++show run++": "++show (solve line "welcome to code jam")

main = interact $ unlines . map go . zip [1..] . tail . lines

Which is unfortunately exponential.

Now in earlier thread I argued for a compiler directive in the lines of {-# Memoize function -#}, but this is not possible (it seems to be trivial to implement though).

Not really. Though a heck of a lot easier than automatic memoisation.

...

Now I used memotrie which runs hopelessly out of memory. I looked at some other haskell solutions, which were all ugly and more clumsy compared to simple and concise C code. So it seems to me that haskell is very nice and beautiful until your are solving real algorithmic problems when you want to go back to some imperative language.

How would experienced haskellers solve this problem?

Thanks

completely unoptimised: ---------------------------------------------------------------------- module Main (main) where import Text.Printf import Data.List out :: Integer -> String out n = printf "%04d" (n `mod` 10000) update :: [(String,Integer)] -> Char -> [(String,Integer)] update ((p@((h:_),n)):tl) c = case update tl c of ((x,m):more) | c == h -> p:(x,m+n):more other -> p:other update xs _ = xs solve pattern = snd . last . foldl' update (zip (tails pattern) (1:repeat 0)) solveLine :: String -> (Integer,String) -> String solveLine pattern (i,str) = "Case# " ++ show i ++ ": " ++ out (solve pattern str) main :: IO () main = interact $ unlines . map (solveLine "welcome to code jam") . zip [1 .. ] . tail . lines ---------------------------------------------------------------------- ./codeJam +RTS -sstderr -RTS < C-large-practice.in <snip> Case# 98: 4048 Case# 99: 8125 Case# 100: 0807 15,022,840 bytes allocated in the heap 789,028 bytes copied during GC 130,212 bytes maximum residency (1 sample(s)) 31,972 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 28 collections, 0 parallel, 0.00s, 0.00s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.04s ( 0.03s elapsed) GC time 0.00s ( 0.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.04s ( 0.04s elapsed) %GC time 0.0% (13.8% elapsed) Alloc rate 417,277,929 bytes per MUT second Productivity 100.0% of total user, 98.6% of total elapsed

Reid Barton

5:16 p.m.

On Sat, Sep 05, 2009 at 02:52:50AM -0700, staafmeister wrote:

...

How would experienced haskellers solve this problem?

You could just memoize using an array, like in C. import Data.Array occurrences :: Num a => String -> String -> a occurrences key buf = grid ! (0, 0) -- grid ! (i, j) = occurrences (drop i key) (drop j buf) where grid = listArray ((0, 0), (nk, nb)) [ if i == nk then 1 else if j == nb then 0 else (if key !! i == buf !! j then grid ! (i+1, j+1) else 0) + grid ! (i, j+1) | i <- [0..nk], j <- [0..nb] ] nk = length key nb = length buf Regards, Reid

staafmeister

10 Sep 10 Sep

11:54 a.m.

Thanks to reactions! What do you think about such a function? This function is still a bit dangerous (I think). I don't know how to make sure the compiler does not lift cache to something global. But on the other hand this use of unsafePerformIO is legit because it doesn't alter the referential transparency of the function. The same as in DiffArray. Greetings Gerben memo f = let cache = unsafePerformIO $ newIORef M.empty cachedFunc x = unsafePerformIO (do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res) in cachedFunc memo2 f = curry $ memo $ uncurry f -- View this message in context: http://www.nabble.com/memoization-tp25306687p25381881.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Bulat Ziganshin

12:07 p.m.

New subject: Re[2]: memoization

Hello staafmeister, Thursday, September 10, 2009, 3:54:34 PM, you wrote:

...

What do you think about such a function? This function is

a bit of refactoring -- "global variable" in haskell way cache = unsafePerformIO $ newIORef M.empty memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res memo2 = curry . memo . uncurry -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

staafmeister

12:23 p.m.

New subject: Re[2]: memoization

Hi Bulat, Bulat Ziganshin-2 wrote:

...

Hello staafmeister,

Thursday, September 10, 2009, 3:54:34 PM, you wrote:

...
What do you think about such a function? This function is

a bit of refactoring

-- "global variable" in haskell way cache = unsafePerformIO $ newIORef M.empty

memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res

memo2 = curry . memo . uncurry

This doesn't work and is exactly what I'm afraid the compiler is going to do. Cache needs to be associated with the function f. Otherwise one would get conflicts Greetings -- View this message in context: http://www.nabble.com/memoization-tp25306687p25382341.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Bulat Ziganshin

12:50 p.m.

New subject: Re[2]: Re[2]: memoization

Hello staafmeister, Thursday, September 10, 2009, 4:23:26 PM, you wrote:

...

This doesn't work and is exactly what I'm afraid the compiler is going to do. Cache needs to be associated with the function f.

...

Otherwise one would get conflicts

well, technique i used is well known, we would have something like C global variable. initiating it inside function is a technique i never seen, i *expect* that it would be the same since syntax scoping doesn't change semantics, but it would be better to ask people that know haskell better if you want to disable sharing of cache you need to make function (or some string representing it) an explicit parameter. i see that you try to do it via declaring f at the outer function level and x in the inner function, but this shouldn't work. the following: outer f = inner where inner x = f x*f x and outer f x = f x*f x are exactly the same. in general, consider Haskell as pure math notation with all its features -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

mf-hcafe-15c311f0c＠etc-network.de

3:42 p.m.

New subject: Re[2]: memoization

On Thu, Sep 10, 2009 at 05:23:26AM -0700, staafmeister wrote:

...

To: haskell-cafe@haskell.org From: staafmeister Date: Thu, 10 Sep 2009 05:23:26 -0700 (PDT) Subject: Re: Re[Haskell-cafe] [2]: memoization

Hi Bulat,

Bulat Ziganshin-2 wrote:

...
Hello staafmeister,

Thursday, September 10, 2009, 3:54:34 PM, you wrote:

...
What do you think about such a function? This function is

a bit of refactoring

-- "global variable" in haskell way cache = unsafePerformIO $ newIORef M.empty

memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res

memo2 = curry . memo . uncurry

This doesn't work and is exactly what I'm afraid the compiler is going to do. Cache needs to be associated with the function f.

Otherwise one would get conflicts

then make the cache object store functions together with values. cache = unsafePerformIO $ newIORef M.empty memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup (mkKey f, x) m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert (mkKey f, x) res m return res memo2 = curry . memo . uncurry This leaves mkKey. Since functions are neither Ord nor Show, you'd have to hack something together yourself. Perhaps an explicit argument to memo? memo :: (Ord a) => String -> (a -> b) -> a -> IO b memo fname f x = unsafePerformIO$ do m <- readIORef cache case M.lookup (fname, x) m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert (fname, x) res m return res there is probably a better and more elegant solution, but this should at least work. right? matthias

Peter Verswyvelen

12:34 p.m.

New subject: Re[2]: memoization

You might want to watch out for multithreading issues, although in this case, I don't think it will cause sever problems, besides a couple of redundant cache updates. On Thu, Sep 10, 2009 at 2:07 PM, Bulat Ziganshin wrote:

...

Hello staafmeister,

Thursday, September 10, 2009, 3:54:34 PM, you wrote:

...
What do you think about such a function? This function is

a bit of refactoring

-- "global variable" in haskell way cache = unsafePerformIO $ newIORef M.empty

memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res

memo2 = curry . memo . uncurry

-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Luke Palmer

11 Sep 11 Sep

12:46 a.m.

New subject: Re[2]: memoization

On Thu, Sep 10, 2009 at 6:34 AM, Peter Verswyvelen wrote:

...

You might want to watch out for multithreading issues, although in this case, I don't think it will cause sever problems, besides a couple of redundant cache updates.

On Thu, Sep 10, 2009 at 2:07 PM, Bulat Ziganshin wrote:

...
Hello staafmeister,

Thursday, September 10, 2009, 3:54:34 PM, you wrote:

...
What do you think about such a function? This function is

a bit of refactoring

-- "global variable" in haskell way cache = unsafePerformIO $ newIORef M.empty

Watch out! This is not necessarily the same. The cache in the original message was one per function, not one globally, because the let occurred inside a lambda binding. However, because the body of cache didn't depend on f, we can use lambda calculus rules to lift the let outside the lambda. So your transformation is completely valid... And yet, the original code works, and Bulat's equivalent code does not (in fact you can make a segfault using it). I wouldn't dare say the original code is "correct" though, since a valid transformation can break it. Compilers do valid transformations. O unsafePerformIO, how I love thee... Luke

...

...
memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res

memo2 = curry . memo . uncurry

-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ryan Ingram

1:33 a.m.

New subject: Re[2]: memoization

On Thu, Sep 10, 2009 at 5:46 PM, Luke Palmer wrote:

...

However, because the body of cache didn't depend on f, we can use lambda calculus rules to lift the let outside the lambda. So your transformation is completely valid... And yet, the original code works, and Bulat's equivalent code does not (in fact you can make a segfault using it).

I wouldn't dare say the original code is "correct" though, since a valid transformation can break it. Compilers do valid transformations.

O unsafePerformIO, how I love thee...

Right, which is why you should write it like this: memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo) I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken? -- ryan

Rodney Price

16 Sep 16 Sep

8:48 p.m.

New subject: weak pointers and memoization (was Re: memoization)

How does garbage collection work in an example like the one below? You memoize a function with some sort of lookup table, which stores function arguments as keys and function results as values. As long as the function remains in scope, the keys in the lookup table remain in memory, which means that the keys themselves always remain reachable and they cannot be garbage collected. Right? So what do you do in the case where you know that, after some period of time, some entries in the lookup table will never be accessed? That is, there are no references to the keys for some entries remaining, except for the references in the lookup table itself. You'd like to allow the memory occupied by the keys to be garbage collected. Otherwise, if the function stays around for a long time, the size of the lookup table always grows. How do you avoid the space leak? I notice that there is a function in Data.IORef, mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a)) which looks promising. In the code below, however, there's only one IORef, so either the entire table gets garbage collected or none of it does. I've been reading the paper "Stretching the storage manager: weak pointers and stable names in Haskell," which seems to answer my question. When I attempt to run the memoization code in the paper on the simple fib example, I find that -- apparently due to lazy evaluation -- no new entries are entered into the lookup table, and therefore no lookups are ever successful! So apparently there is some interaction between lazy evaluation and garbage collection that I don't understand. My head hurts. Is it necessary to make the table lookup operation strict? Or is it something entirely different that I am missing? -Rod On Thu, 10 Sep 2009 18:33:47 -0700 Ryan Ingram wrote:

...

memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res

memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)

I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?

-- ryan

Job Vranish

17 Sep 17 Sep

5:32 p.m.

New subject: weak pointers and memoization (was Re: memoization)

What are you trying to use this for? It seems to me that for memo tables you almost never have references to they keys outside the lookup table since the keys are usually computed right at the last minute, and then discarded (otherwise it might be easier to just cache stuff outside the function). For example with a naive fibs, the values you are passing in are computed, and probably don't exist before you do the recursive call, and then are discarded shortly afterward. It seems like putting a cap on the cache size, and then just overwriting old entries would be better. Am I missing something? - Job On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price wrote:

...

How does garbage collection work in an example like the one below? You memoize a function with some sort of lookup table, which stores function arguments as keys and function results as values. As long as the function remains in scope, the keys in the lookup table remain in memory, which means that the keys themselves always remain reachable and they cannot be garbage collected. Right?

So what do you do in the case where you know that, after some period of time, some entries in the lookup table will never be accessed? That is, there are no references to the keys for some entries remaining, except for the references in the lookup table itself. You'd like to allow the memory occupied by the keys to be garbage collected. Otherwise, if the function stays around for a long time, the size of the lookup table always grows. How do you avoid the space leak?

I notice that there is a function in Data.IORef,

mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))

which looks promising. In the code below, however, there's only one IORef, so either the entire table gets garbage collected or none of it does.

I've been reading the paper "Stretching the storage manager: weak pointers and stable names in Haskell," which seems to answer my question. When I attempt to run the memoization code in the paper on the simple fib example, I find that -- apparently due to lazy evaluation -- no new entries are entered into the lookup table, and therefore no lookups are ever successful!

So apparently there is some interaction between lazy evaluation and garbage collection that I don't understand. My head hurts. Is it necessary to make the table lookup operation strict? Or is it something entirely different that I am missing?

-Rod

On Thu, 10 Sep 2009 18:33:47 -0700 Ryan Ingram wrote:

...
memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res

memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)

I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?

-- ryan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Rodney Price

11:39 p.m.

New subject: weak pointers and memoization (was Re: memoization)

In my case, the results of each computation are used to generate a node in a graph structure (dag). The key, oddly, is a hash of a two-tuple that gets stored in the data structure after the computation of the node finishes. If I don't memoize the function to build a node, the cost of generating the tree is exponential; if I do, it's somewhere between linear and quadratic. Another process prunes parts of this graph structure as time goes on. The entire data structure is intended to be persistent, lasting for days at a time in a server-like application. If the parts pruned aren't garbage collected, the space leak will eventually be catastrophic. Either the memo table or the graph structure itself will outgrow available memory. -Rod On Thu, 17 Sep 2009 13:32:13 -0400 Job Vranish wrote:

...

What are you trying to use this for? It seems to me that for memo tables you almost never have references to they keys outside the lookup table since the keys are usually computed right at the last minute, and then discarded (otherwise it might be easier to just cache stuff outside the function).

For example with a naive fibs, the values you are passing in are computed, and probably don't exist before you do the recursive call, and then are discarded shortly afterward.

It seems like putting a cap on the cache size, and then just overwriting old entries would be better. Am I missing something?

- Job

On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price wrote:

...
How does garbage collection work in an example like the one below? You memoize a function with some sort of lookup table, which stores function arguments as keys and function results as values. As long as the function remains in scope, the keys in the lookup table remain in memory, which means that the keys themselves always remain reachable and they cannot be garbage collected. Right?

So what do you do in the case where you know that, after some period of time, some entries in the lookup table will never be accessed? That is, there are no references to the keys for some entries remaining, except for the references in the lookup table itself. You'd like to allow the memory occupied by the keys to be garbage collected. Otherwise, if the function stays around for a long time, the size of the lookup table always grows. How do you avoid the space leak?

I notice that there is a function in Data.IORef,

mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))

which looks promising. In the code below, however, there's only one IORef, so either the entire table gets garbage collected or none of it does.

I've been reading the paper "Stretching the storage manager: weak pointers and stable names in Haskell," which seems to answer my question. When I attempt to run the memoization code in the paper on the simple fib example, I find that -- apparently due to lazy evaluation -- no new entries are entered into the lookup table, and therefore no lookups are ever successful!

So apparently there is some interaction between lazy evaluation and garbage collection that I don't understand. My head hurts. Is it necessary to make the table lookup operation strict? Or is it something entirely different that I am missing?

-Rod

On Thu, 10 Sep 2009 18:33:47 -0700 Ryan Ingram wrote:

...
memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res

memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)

I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?

-- ryan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Peter Verswyvelen

18 Sep 18 Sep

11:56 a.m.

New subject: weak pointers and memoization (was Re: memoization)

I would also like to see a solution for problems like these. Haskell provides a lot of nice memoizing / caching data structures - like a trie - but the ones I know indeed keep growing, so no garbage collection takes place? It would be nice to have a data structure that performs caching but does not grow unlimited. I had a similar problem with stable names; it is not possible to check if a stable name is still "alive". On Fri, Sep 18, 2009 at 1:39 AM, Rodney Price wrote:

...

In my case, the results of each computation are used to generate a node in a graph structure (dag). The key, oddly, is a hash of a two-tuple that gets stored in the data structure after the computation of the node finishes. If I don't memoize the function to build a node, the cost of generating the tree is exponential; if I do, it's somewhere between linear and quadratic.

Another process prunes parts of this graph structure as time goes on. The entire data structure is intended to be persistent, lasting for days at a time in a server-like application. If the parts pruned aren't garbage collected, the space leak will eventually be catastrophic. Either the memo table or the graph structure itself will outgrow available memory.

-Rod

On Thu, 17 Sep 2009 13:32:13 -0400 Job Vranish wrote:

...
What are you trying to use this for? It seems to me that for memo tables you almost never have references to they keys outside the lookup table since the keys are usually computed right at the last minute, and then discarded (otherwise it might be easier to just cache stuff outside the function).

For example with a naive fibs, the values you are passing in are computed, and probably don't exist before you do the recursive call, and then are discarded shortly afterward.

It seems like putting a cap on the cache size, and then just overwriting old entries would be better. Am I missing something?

- Job

On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price wrote:

...
How does garbage collection work in an example like the one below? You memoize a function with some sort of lookup table, which stores function arguments as keys and function results as values. As long as the function remains in scope, the keys in the lookup table remain in memory, which means that the keys themselves always remain reachable and they cannot be garbage collected. Right?

So what do you do in the case where you know that, after some period of time, some entries in the lookup table will never be accessed? That is, there are no references to the keys for some entries remaining, except for the references in the lookup table itself. You'd like to allow the memory occupied by the keys to be garbage collected. Otherwise, if the function stays around for a long time, the size of the lookup table always grows. How do you avoid the space leak?

I notice that there is a function in Data.IORef,

mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))

which looks promising. In the code below, however, there's only one IORef, so either the entire table gets garbage collected or none of it does.

I've been reading the paper "Stretching the storage manager: weak pointers and stable names in Haskell," which seems to answer my question. When I attempt to run the memoization code in the paper on the simple fib example, I find that -- apparently due to lazy evaluation -- no new entries are entered into the lookup table, and therefore no lookups are ever successful!

So apparently there is some interaction between lazy evaluation and garbage collection that I don't understand. My head hurts. Is it necessary to make the table lookup operation strict? Or is it something entirely different that I am missing?

-Rod

On Thu, 10 Sep 2009 18:33:47 -0700 Ryan Ingram wrote:

...
memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res

memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)

I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?

-- ryan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Job Vranish

2:19 p.m.

New subject: weak pointers and memoization (was Re: memoization)

Yeah it seems like the general solution to the problem would be some sort of map-like datastructure that you add items via a key/value pair, and if the key gets GC'd, that entry gets removed from the structure. I've been wanting something like this as well, but didn't know about weak references so I didn't know if it was possible, but I think I could make something like this now. I'll give it a shot and let you guys know how it goes. Rodney could you post your memo code that uses the weak references? - Job On Fri, Sep 18, 2009 at 7:56 AM, Peter Verswyvelen wrote:

...

I would also like to see a solution for problems like these.

Haskell provides a lot of nice memoizing / caching data structures - like a trie - but the ones I know indeed keep growing, so no garbage collection takes place?

It would be nice to have a data structure that performs caching but does not grow unlimited.

I had a similar problem with stable names; it is not possible to check if a stable name is still "alive".

On Fri, Sep 18, 2009 at 1:39 AM, Rodney Price wrote:

...
In my case, the results of each computation are used to generate a node in a graph structure (dag). The key, oddly, is a hash of a two-tuple that gets stored in the data structure after the computation of the node finishes. If I don't memoize the function to build a node, the cost of generating the tree is exponential; if I do, it's somewhere between linear and quadratic.

Another process prunes parts of this graph structure as time goes on. The entire data structure is intended to be persistent, lasting for days at a time in a server-like application. If the parts pruned aren't garbage collected, the space leak will eventually be catastrophic. Either the memo table or the graph structure itself will outgrow available memory.

-Rod

On Thu, 17 Sep 2009 13:32:13 -0400 Job Vranish wrote:

...
What are you trying to use this for? It seems to me that for memo tables you almost never have references to they keys outside the lookup table since the keys are usually computed right at the last minute, and then discarded (otherwise it might be easier to just cache stuff outside the function).

For example with a naive fibs, the values you are passing in are computed, and probably don't exist before you do the recursive call, and then are discarded shortly afterward.

It seems like putting a cap on the cache size, and then just overwriting old entries would be better. Am I missing something?

- Job

On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price wrote:

...
How does garbage collection work in an example like the one below? You memoize a function with some sort of lookup table, which stores function arguments as keys and function results as values. As long as the function remains in scope, the keys in the lookup table remain in memory, which means that the keys themselves always remain reachable and they cannot be garbage collected. Right?

So what do you do in the case where you know that, after some period of time, some entries in the lookup table will never be accessed? That is, there are no references to the keys for some entries remaining, except for the references in the lookup table itself. You'd like to allow the memory occupied by the keys to be garbage collected. Otherwise, if the function stays around for a long time, the size of the lookup table always grows. How do you avoid the space leak?

I notice that there is a function in Data.IORef,

mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))

which looks promising. In the code below, however, there's only one IORef, so either the entire table gets garbage collected or none of it does.

I've been reading the paper "Stretching the storage manager: weak pointers and stable names in Haskell," which seems to answer my question. When I attempt to run the memoization code in the paper on the simple fib example, I find that -- apparently due to lazy evaluation -- no new entries are entered into the lookup table, and therefore no lookups are ever successful!

So apparently there is some interaction between lazy evaluation and garbage collection that I don't understand. My head hurts. Is it necessary to make the table lookup operation strict? Or is it something entirely different that I am missing?

-Rod

On Thu, 10 Sep 2009 18:33:47 -0700 Ryan Ingram wrote:

...
memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res

memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)

I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?

-- ryan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Job Vranish

3:24 p.m.

New subject: weak pointers and memoization (was Re: memoization)

Hey it works :D Here is a proof of concept: http://gist.github.com/189104 Maybe later today I'll try to make a version that can be safely used outside IO. - Job On Fri, Sep 18, 2009 at 10:19 AM, Job Vranish wrote:

...

Yeah it seems like the general solution to the problem would be some sort of map-like datastructure that you add items via a key/value pair, and if the key gets GC'd, that entry gets removed from the structure.

I've been wanting something like this as well, but didn't know about weak references so I didn't know if it was possible, but I think I could make something like this now. I'll give it a shot and let you guys know how it goes.

Rodney could you post your memo code that uses the weak references?

- Job

On Fri, Sep 18, 2009 at 7:56 AM, Peter Verswyvelen wrote:

...
I would also like to see a solution for problems like these.

Haskell provides a lot of nice memoizing / caching data structures - like a trie - but the ones I know indeed keep growing, so no garbage collection takes place?

It would be nice to have a data structure that performs caching but does not grow unlimited.

I had a similar problem with stable names; it is not possible to check if a stable name is still "alive".

On Fri, Sep 18, 2009 at 1:39 AM, Rodney Price wrote:

...
In my case, the results of each computation are used to generate a node in a graph structure (dag). The key, oddly, is a hash of a two-tuple that gets stored in the data structure after the computation of the node finishes. If I don't memoize the function to build a node, the cost of generating the tree is exponential; if I do, it's somewhere between linear and quadratic.

Another process prunes parts of this graph structure as time goes on. The entire data structure is intended to be persistent, lasting for days at a time in a server-like application. If the parts pruned aren't garbage collected, the space leak will eventually be catastrophic. Either the memo table or the graph structure itself will outgrow available memory.

-Rod

On Thu, 17 Sep 2009 13:32:13 -0400 Job Vranish wrote:

...
What are you trying to use this for? It seems to me that for memo tables you almost never have references to they keys outside the lookup table since the keys are usually computed right at the last minute, and then discarded (otherwise it might be easier to just cache stuff outside the function).

For example with a naive fibs, the values you are passing in are computed, and probably don't exist before you do the recursive call, and then are discarded shortly afterward.

It seems like putting a cap on the cache size, and then just overwriting old entries would be better. Am I missing something?

- Job

On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price wrote:

...
How does garbage collection work in an example like the one below? You memoize a function with some sort of lookup table, which stores function arguments as keys and function results as values. As long as the function remains in scope, the keys in the lookup table remain in memory, which means that the keys themselves always remain reachable and they cannot be garbage collected. Right?

So what do you do in the case where you know that, after some period of time, some entries in the lookup table will never be accessed? That is, there are no references to the keys for some entries remaining, except for the references in the lookup table itself. You'd like to allow the memory occupied by the keys to be garbage collected. Otherwise, if the function stays around for a long time, the size of the lookup table always grows. How do you avoid the space leak?

I notice that there is a function in Data.IORef,

mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))

which looks promising. In the code below, however, there's only one IORef, so either the entire table gets garbage collected or none of it does.

I've been reading the paper "Stretching the storage manager: weak pointers and stable names in Haskell," which seems to answer my question. When I attempt to run the memoization code in the paper on the simple fib example, I find that -- apparently due to lazy evaluation -- no new entries are entered into the lookup table, and therefore no lookups are ever successful!

So apparently there is some interaction between lazy evaluation and garbage collection that I don't understand. My head hurts. Is it necessary to make the table lookup operation strict? Or is it something entirely different that I am missing?

-Rod

On Thu, 10 Sep 2009 18:33:47 -0700 Ryan Ingram wrote:

...
memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res

memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)

I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?

-- ryan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

staafmeister

11 Sep 11 Sep

12:57 p.m.

Hi, Investigating memoization inspired by replies from this thread. I encountered something strange in the behavior of ghci. Small chance it's a bug, it probably is a feature, but I certainly don't understand it :) The interpreter session went as follows GHCi, version 6.10.4: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer ... linking ... done. Loading package base ... linking ... done. Prelude> :load test_bug.hs [1 of 1] Compiling Main ( test_bug.hs, interpreted ) Ok, modules loaded: Main. *Main> let s1 = memo2 solve2 Loading package syb ... linking ... done. Loading package array-0.2.0.0 ... linking ... done. Loading package containers-0.2.0.1 ... linking ... done. Loading package filepath-1.1.0.2 ... linking ... done. Loading package old-locale-1.0.0.1 ... linking ... done. Loading package old-time-1.0.0.2 ... linking ... done. Loading package unix-2.3.2.0 ... linking ... done. Loading package directory-1.0.0.3 ... linking ... done. Loading package process-1.0.1.1 ... linking ... done. Loading package random-1.0.0.1 ... linking ... done. Loading package haskell98 ... linking ... done. *Main> :type s1 s1 :: [()] -> [()] -> ModP *Main> let s2 a b = memo2 solve2 a b *Main> :type s2 s2 :: (Eq t) => [t] -> [t] -> ModP Here memo2 is a function that works like a combinator to obtain a memoized recursive function. However the type of the function depends on how I define it. In point-free style it gets the wrong type, however if I define (s2) with explicit arguments the type is correct? Do you know what happens here? I would expect the types to be the same. Another question is: I use now makeStableName for equality but using this function memoization does not work and it still takes a long (exponential?) time to go through the codejam testcases. The memoization using data.map works flawless. Greetings, Gerben ps. The content of test_bug.hs is import Data.IORef import System.IO.Unsafe import Control.Exception import qualified Data.Map as M import Text.Printf import qualified Data.HashTable as H import System.Mem.StableName import Data.Ratio import Array memo f = unsafePerformIO $ do cache <- H.new (==) (H.hashInt . hashStableName) let cacheFunc = \x -> unsafePerformIO $ do stable <- makeStableName x lup <- H.lookup cache stable case lup of Just y -> return y Nothing -> do let res = f cacheFunc x H.insert cache stable res return res return cacheFunc memo2 f = curry $ memo (\g (x,y) -> f (curry g) x y) newtype ModP = ModP Integer deriving Eq p=10007 instance Show ModP where show (ModP x) = printf "%d" x instance Num ModP where ModP x + ModP y = ModP ((x + y) `mod` p) fromInteger x = ModP (x `mod` p) ModP x * ModP y = ModP ((x * y) `mod` p) abs = undefined signum = undefined solve2 f _ [] = 1::ModP solve2 f [] _ = 0::ModP solve2 f (hs:ts) t@(ht:tt) | hs==ht = f ts tt + f ts t | otherwise = f ts t go (run, line) = "Case #"++show run++": "++show ((memo2 solve2) line "welcome to code jam") main = interact $ unlines . map go . zip [1..] . tail . lines -- View this message in context: http://www.nabble.com/memoization-tp25306687p25400506.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Bulat Ziganshin

1:06 p.m.

New subject: Re[2]: memoization

Hello staafmeister, Friday, September 11, 2009, 4:57:01 PM, you wrote:

...

Here memo2 is a function that works like a combinator to obtain a memoized recursive function. However the type of the function depends on how I define it. In point-free style it gets the wrong type, however if I define (s2) with explicit arguments the type is correct? Do you know what happens here? I would expect the types to be the same.

looks like you need to pass -fno-monomorphism-restriction to ghci. it's a "bug" of haskell definition, expected to be removed in the next language version -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

5772

Age (days ago)

5785

Last active (days ago)

List overview

Download

18 comments

10 participants

participants (10)

Bulat Ziganshin
Daniel Fischer
Job Vranish
Luke Palmer
mf-hcafe-15c311f0c＠etc-network.de
Peter Verswyvelen
Reid Barton
Rodney Price
Ryan Ingram
staafmeister

memoization

staafmeister

Daniel Fischer

Reid Barton

staafmeister

Bulat Ziganshin

staafmeister

Bulat Ziganshin

mf-hcafe-15c311f0c＠etc-network.de

Peter Verswyvelen

Luke Palmer

Ryan Ingram

Rodney Price

Job Vranish

Rodney Price

Peter Verswyvelen

Job Vranish

Job Vranish

staafmeister

Bulat Ziganshin

tags

participants (10)