
Hi, I participating in de google code jam this year and I want to try to use haskell. The following simple http://code.google.com/codejam/contest/dashboard?c=90101#s=p2 problem would have the beautiful haskell solution. import Data.MemoTrie import Data.Char import Data.Word import Text.Printf newtype ModP = ModP Integer deriving Eq p=10000 instance Show ModP where show (ModP x) = printf "%04d" x instance Num ModP where ModP x + ModP y = ModP ((x + y) `mod` p) fromInteger x = ModP (x `mod` p) ModP x * ModP y = ModP ((x * y) `mod` p) abs = undefined signum = undefined solve _ [] = 1::ModP solve [] _ = 0::ModP solve (hs:ts) t@(ht:tt) | hs==ht = solve ts tt + solve ts t | otherwise = solve ts t go (run, line) = "Case #"++show run++": "++show (solve line "welcome to code jam") main = interact $ unlines . map go . zip [1..] . tail . lines Which is unfortunately exponential. Now in earlier thread I argued for a compiler directive in the lines of {-# Memoize function -#}, but this is not possible (it seems to be trivial to implement though). Now I used memotrie which runs hopelessly out of memory. I looked at some other haskell solutions, which were all ugly and more clumsy compared to simple and concise C code. So it seems to me that haskell is very nice and beautiful until your are solving real algorithmic problems when you want to go back to some imperative language. How would experienced haskellers solve this problem? Thanks -- View this message in context: http://www.nabble.com/memoization-tp25306687p25306687.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Am Samstag 05 September 2009 11:52:50 schrieb staafmeister:
Hi,
I participating in de google code jam this year and I want to try to use haskell. The following simple http://code.google.com/codejam/contest/dashboard?c=90101#s=p2 problem would have the beautiful haskell solution.
import Data.MemoTrie import Data.Char import Data.Word import Text.Printf
newtype ModP = ModP Integer deriving Eq
p=10000
instance Show ModP where show (ModP x) = printf "%04d" x
instance Num ModP where ModP x + ModP y = ModP ((x + y) `mod` p) fromInteger x = ModP (x `mod` p) ModP x * ModP y = ModP ((x * y) `mod` p) abs = undefined signum = undefined
solve _ [] = 1::ModP solve [] _ = 0::ModP solve (hs:ts) t@(ht:tt) | hs==ht = solve ts tt + solve ts t
| otherwise = solve ts t
go (run, line) = "Case #"++show run++": "++show (solve line "welcome to code jam")
main = interact $ unlines . map go . zip [1..] . tail . lines
Which is unfortunately exponential.
Now in earlier thread I argued for a compiler directive in the lines of {-# Memoize function -#}, but this is not possible (it seems to be trivial to implement though).
Not really. Though a heck of a lot easier than automatic memoisation.
Now I used memotrie which runs hopelessly out of memory. I looked at some other haskell solutions, which were all ugly and more clumsy compared to simple and concise C code. So it seems to me that haskell is very nice and beautiful until your are solving real algorithmic problems when you want to go back to some imperative language.
How would experienced haskellers solve this problem?
Thanks
completely unoptimised: ---------------------------------------------------------------------- module Main (main) where import Text.Printf import Data.List out :: Integer -> String out n = printf "%04d" (n `mod` 10000) update :: [(String,Integer)] -> Char -> [(String,Integer)] update ((p@((h:_),n)):tl) c = case update tl c of ((x,m):more) | c == h -> p:(x,m+n):more other -> p:other update xs _ = xs solve pattern = snd . last . foldl' update (zip (tails pattern) (1:repeat 0)) solveLine :: String -> (Integer,String) -> String solveLine pattern (i,str) = "Case# " ++ show i ++ ": " ++ out (solve pattern str) main :: IO () main = interact $ unlines . map (solveLine "welcome to code jam") . zip [1 .. ] . tail . lines ---------------------------------------------------------------------- ./codeJam +RTS -sstderr -RTS < C-large-practice.in <snip> Case# 98: 4048 Case# 99: 8125 Case# 100: 0807 15,022,840 bytes allocated in the heap 789,028 bytes copied during GC 130,212 bytes maximum residency (1 sample(s)) 31,972 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 28 collections, 0 parallel, 0.00s, 0.00s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.04s ( 0.03s elapsed) GC time 0.00s ( 0.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.04s ( 0.04s elapsed) %GC time 0.0% (13.8% elapsed) Alloc rate 417,277,929 bytes per MUT second Productivity 100.0% of total user, 98.6% of total elapsed

On Sat, Sep 05, 2009 at 02:52:50AM -0700, staafmeister wrote:
How would experienced haskellers solve this problem?
You could just memoize using an array, like in C. import Data.Array occurrences :: Num a => String -> String -> a occurrences key buf = grid ! (0, 0) -- grid ! (i, j) = occurrences (drop i key) (drop j buf) where grid = listArray ((0, 0), (nk, nb)) [ if i == nk then 1 else if j == nb then 0 else (if key !! i == buf !! j then grid ! (i+1, j+1) else 0) + grid ! (i, j+1) | i <- [0..nk], j <- [0..nb] ] nk = length key nb = length buf Regards, Reid

Thanks to reactions! What do you think about such a function? This function is still a bit dangerous (I think). I don't know how to make sure the compiler does not lift cache to something global. But on the other hand this use of unsafePerformIO is legit because it doesn't alter the referential transparency of the function. The same as in DiffArray. Greetings Gerben memo f = let cache = unsafePerformIO $ newIORef M.empty cachedFunc x = unsafePerformIO (do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res) in cachedFunc memo2 f = curry $ memo $ uncurry f -- View this message in context: http://www.nabble.com/memoization-tp25306687p25381881.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Hello staafmeister, Thursday, September 10, 2009, 3:54:34 PM, you wrote:
What do you think about such a function? This function is
a bit of refactoring -- "global variable" in haskell way cache = unsafePerformIO $ newIORef M.empty memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res memo2 = curry . memo . uncurry -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hi Bulat, Bulat Ziganshin-2 wrote:
Hello staafmeister,
Thursday, September 10, 2009, 3:54:34 PM, you wrote:
What do you think about such a function? This function is
a bit of refactoring
-- "global variable" in haskell way cache = unsafePerformIO $ newIORef M.empty
memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res
memo2 = curry . memo . uncurry
This doesn't work and is exactly what I'm afraid the compiler is going to do. Cache needs to be associated with the function f. Otherwise one would get conflicts Greetings -- View this message in context: http://www.nabble.com/memoization-tp25306687p25382341.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Hello staafmeister, Thursday, September 10, 2009, 4:23:26 PM, you wrote:
This doesn't work and is exactly what I'm afraid the compiler is going to do. Cache needs to be associated with the function f.
Otherwise one would get conflicts
well, technique i used is well known, we would have something like C global variable. initiating it inside function is a technique i never seen, i *expect* that it would be the same since syntax scoping doesn't change semantics, but it would be better to ask people that know haskell better if you want to disable sharing of cache you need to make function (or some string representing it) an explicit parameter. i see that you try to do it via declaring f at the outer function level and x in the inner function, but this shouldn't work. the following: outer f = inner where inner x = f x*f x and outer f x = f x*f x are exactly the same. in general, consider Haskell as pure math notation with all its features -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Thu, Sep 10, 2009 at 05:23:26AM -0700, staafmeister wrote:
To: haskell-cafe@haskell.org From: staafmeister
Date: Thu, 10 Sep 2009 05:23:26 -0700 (PDT) Subject: Re: Re[Haskell-cafe] [2]: memoization Hi Bulat,
Bulat Ziganshin-2 wrote:
Hello staafmeister,
Thursday, September 10, 2009, 3:54:34 PM, you wrote:
What do you think about such a function? This function is
a bit of refactoring
-- "global variable" in haskell way cache = unsafePerformIO $ newIORef M.empty
memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res
memo2 = curry . memo . uncurry
This doesn't work and is exactly what I'm afraid the compiler is going to do. Cache needs to be associated with the function f.
Otherwise one would get conflicts
then make the cache object store functions together with values. cache = unsafePerformIO $ newIORef M.empty memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup (mkKey f, x) m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert (mkKey f, x) res m return res memo2 = curry . memo . uncurry This leaves mkKey. Since functions are neither Ord nor Show, you'd have to hack something together yourself. Perhaps an explicit argument to memo? memo :: (Ord a) => String -> (a -> b) -> a -> IO b memo fname f x = unsafePerformIO$ do m <- readIORef cache case M.lookup (fname, x) m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert (fname, x) res m return res there is probably a better and more elegant solution, but this should at least work. right? matthias

You might want to watch out for multithreading issues, although in
this case, I don't think it will cause sever problems, besides a
couple of redundant cache updates.
On Thu, Sep 10, 2009 at 2:07 PM, Bulat Ziganshin
Hello staafmeister,
Thursday, September 10, 2009, 3:54:34 PM, you wrote:
What do you think about such a function? This function is
a bit of refactoring
-- "global variable" in haskell way cache = unsafePerformIO $ newIORef M.empty
memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res
memo2 = curry . memo . uncurry
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Thu, Sep 10, 2009 at 6:34 AM, Peter Verswyvelen
You might want to watch out for multithreading issues, although in this case, I don't think it will cause sever problems, besides a couple of redundant cache updates.
On Thu, Sep 10, 2009 at 2:07 PM, Bulat Ziganshin
wrote: Hello staafmeister,
Thursday, September 10, 2009, 3:54:34 PM, you wrote:
What do you think about such a function? This function is
a bit of refactoring
-- "global variable" in haskell way cache = unsafePerformIO $ newIORef M.empty
Watch out! This is not necessarily the same. The cache in the original message was one per function, not one globally, because the let occurred inside a lambda binding. However, because the body of cache didn't depend on f, we can use lambda calculus rules to lift the let outside the lambda. So your transformation is completely valid... And yet, the original code works, and Bulat's equivalent code does not (in fact you can make a segfault using it). I wouldn't dare say the original code is "correct" though, since a valid transformation can break it. Compilers do valid transformations. O unsafePerformIO, how I love thee... Luke
memo f x = unsafePerformIO$ do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res
memo2 = curry . memo . uncurry
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Thu, Sep 10, 2009 at 5:46 PM, Luke Palmer
However, because the body of cache didn't depend on f, we can use lambda calculus rules to lift the let outside the lambda. So your transformation is completely valid... And yet, the original code works, and Bulat's equivalent code does not (in fact you can make a segfault using it).
I wouldn't dare say the original code is "correct" though, since a valid transformation can break it. Compilers do valid transformations.
O unsafePerformIO, how I love thee...
Right, which is why you should write it like this: memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo) I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken? -- ryan

How does garbage collection work in an example like the one below? You
memoize a function with some sort of lookup table, which stores function
arguments as keys and function results as values. As long as the
function remains in scope, the keys in the lookup table remain in
memory, which means that the keys themselves always remain reachable
and they cannot be garbage collected. Right?
So what do you do in the case where you know that, after some period of
time, some entries in the lookup table will never be accessed? That is,
there are no references to the keys for some entries remaining, except
for the references in the lookup table itself. You'd like to allow the
memory occupied by the keys to be garbage collected. Otherwise, if the
function stays around for a long time, the size of the lookup table
always grows. How do you avoid the space leak?
I notice that there is a function in Data.IORef,
mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
which looks promising. In the code below, however, there's only one
IORef, so either the entire table gets garbage collected or none of it
does.
I've been reading the paper "Stretching the storage manager: weak
pointers and stable names in Haskell," which seems to answer my
question. When I attempt to run the memoization code in the paper on
the simple fib example, I find that -- apparently due to lazy
evaluation -- no new entries are entered into the lookup table, and
therefore no lookups are ever successful!
So apparently there is some interaction between lazy evaluation and
garbage collection that I don't understand. My head hurts. Is it
necessary to make the table lookup operation strict? Or is it
something entirely different that I am missing?
-Rod
On Thu, 10 Sep 2009 18:33:47 -0700
Ryan Ingram
memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res
memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)
I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?
-- ryan

What are you trying to use this for? It seems to me that for memo tables you
almost never have references to they keys outside the lookup table since the
keys are usually computed right at the last minute, and then discarded
(otherwise it might be easier to just cache stuff outside the function).
For example with a naive fibs, the values you are passing in are computed,
and probably don't exist before you do the recursive call, and then are
discarded shortly afterward.
It seems like putting a cap on the cache size, and then just overwriting old
entries would be better.
Am I missing something?
- Job
On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price
How does garbage collection work in an example like the one below? You memoize a function with some sort of lookup table, which stores function arguments as keys and function results as values. As long as the function remains in scope, the keys in the lookup table remain in memory, which means that the keys themselves always remain reachable and they cannot be garbage collected. Right?
So what do you do in the case where you know that, after some period of time, some entries in the lookup table will never be accessed? That is, there are no references to the keys for some entries remaining, except for the references in the lookup table itself. You'd like to allow the memory occupied by the keys to be garbage collected. Otherwise, if the function stays around for a long time, the size of the lookup table always grows. How do you avoid the space leak?
I notice that there is a function in Data.IORef,
mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
which looks promising. In the code below, however, there's only one IORef, so either the entire table gets garbage collected or none of it does.
I've been reading the paper "Stretching the storage manager: weak pointers and stable names in Haskell," which seems to answer my question. When I attempt to run the memoization code in the paper on the simple fib example, I find that -- apparently due to lazy evaluation -- no new entries are entered into the lookup table, and therefore no lookups are ever successful!
So apparently there is some interaction between lazy evaluation and garbage collection that I don't understand. My head hurts. Is it necessary to make the table lookup operation strict? Or is it something entirely different that I am missing?
-Rod
On Thu, 10 Sep 2009 18:33:47 -0700 Ryan Ingram
wrote: memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res
memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)
I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?
-- ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

In my case, the results of each computation are used to generate a node
in a graph structure (dag). The key, oddly, is a hash of a two-tuple
that gets stored in the data structure after the computation of the
node finishes. If I don't memoize the function to build a node, the
cost of generating the tree is exponential; if I do, it's somewhere
between linear and quadratic.
Another process prunes parts of this graph structure as time goes on.
The entire data structure is intended to be persistent, lasting for
days at a time in a server-like application. If the parts pruned
aren't garbage collected, the space leak will eventually be
catastrophic. Either the memo table or the graph structure itself will
outgrow available memory.
-Rod
On Thu, 17 Sep 2009 13:32:13 -0400
Job Vranish
What are you trying to use this for? It seems to me that for memo tables you almost never have references to they keys outside the lookup table since the keys are usually computed right at the last minute, and then discarded (otherwise it might be easier to just cache stuff outside the function).
For example with a naive fibs, the values you are passing in are computed, and probably don't exist before you do the recursive call, and then are discarded shortly afterward.
It seems like putting a cap on the cache size, and then just overwriting old entries would be better. Am I missing something?
- Job
On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price
wrote: How does garbage collection work in an example like the one below? You memoize a function with some sort of lookup table, which stores function arguments as keys and function results as values. As long as the function remains in scope, the keys in the lookup table remain in memory, which means that the keys themselves always remain reachable and they cannot be garbage collected. Right?
So what do you do in the case where you know that, after some period of time, some entries in the lookup table will never be accessed? That is, there are no references to the keys for some entries remaining, except for the references in the lookup table itself. You'd like to allow the memory occupied by the keys to be garbage collected. Otherwise, if the function stays around for a long time, the size of the lookup table always grows. How do you avoid the space leak?
I notice that there is a function in Data.IORef,
mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
which looks promising. In the code below, however, there's only one IORef, so either the entire table gets garbage collected or none of it does.
I've been reading the paper "Stretching the storage manager: weak pointers and stable names in Haskell," which seems to answer my question. When I attempt to run the memoization code in the paper on the simple fib example, I find that -- apparently due to lazy evaluation -- no new entries are entered into the lookup table, and therefore no lookups are ever successful!
So apparently there is some interaction between lazy evaluation and garbage collection that I don't understand. My head hurts. Is it necessary to make the table lookup operation strict? Or is it something entirely different that I am missing?
-Rod
On Thu, 10 Sep 2009 18:33:47 -0700 Ryan Ingram
wrote: memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res
memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)
I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?
-- ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I would also like to see a solution for problems like these.
Haskell provides a lot of nice memoizing / caching data structures -
like a trie - but the ones I know indeed keep growing, so no garbage
collection takes place?
It would be nice to have a data structure that performs caching but
does not grow unlimited.
I had a similar problem with stable names; it is not possible to check
if a stable name is still "alive".
On Fri, Sep 18, 2009 at 1:39 AM, Rodney Price
In my case, the results of each computation are used to generate a node in a graph structure (dag). The key, oddly, is a hash of a two-tuple that gets stored in the data structure after the computation of the node finishes. If I don't memoize the function to build a node, the cost of generating the tree is exponential; if I do, it's somewhere between linear and quadratic.
Another process prunes parts of this graph structure as time goes on. The entire data structure is intended to be persistent, lasting for days at a time in a server-like application. If the parts pruned aren't garbage collected, the space leak will eventually be catastrophic. Either the memo table or the graph structure itself will outgrow available memory.
-Rod
On Thu, 17 Sep 2009 13:32:13 -0400 Job Vranish
wrote: What are you trying to use this for? It seems to me that for memo tables you almost never have references to they keys outside the lookup table since the keys are usually computed right at the last minute, and then discarded (otherwise it might be easier to just cache stuff outside the function).
For example with a naive fibs, the values you are passing in are computed, and probably don't exist before you do the recursive call, and then are discarded shortly afterward.
It seems like putting a cap on the cache size, and then just overwriting old entries would be better. Am I missing something?
- Job
On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price
wrote: How does garbage collection work in an example like the one below? You memoize a function with some sort of lookup table, which stores function arguments as keys and function results as values. As long as the function remains in scope, the keys in the lookup table remain in memory, which means that the keys themselves always remain reachable and they cannot be garbage collected. Right?
So what do you do in the case where you know that, after some period of time, some entries in the lookup table will never be accessed? That is, there are no references to the keys for some entries remaining, except for the references in the lookup table itself. You'd like to allow the memory occupied by the keys to be garbage collected. Otherwise, if the function stays around for a long time, the size of the lookup table always grows. How do you avoid the space leak?
I notice that there is a function in Data.IORef,
mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
which looks promising. In the code below, however, there's only one IORef, so either the entire table gets garbage collected or none of it does.
I've been reading the paper "Stretching the storage manager: weak pointers and stable names in Haskell," which seems to answer my question. When I attempt to run the memoization code in the paper on the simple fib example, I find that -- apparently due to lazy evaluation -- no new entries are entered into the lookup table, and therefore no lookups are ever successful!
So apparently there is some interaction between lazy evaluation and garbage collection that I don't understand. My head hurts. Is it necessary to make the table lookup operation strict? Or is it something entirely different that I am missing?
-Rod
On Thu, 10 Sep 2009 18:33:47 -0700 Ryan Ingram
wrote: memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res
memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)
I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?
-- ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Yeah it seems like the general solution to the problem would be some sort of
map-like datastructure that you add items via a key/value pair, and if the
key gets GC'd, that entry gets removed from the structure.
I've been wanting something like this as well, but didn't know about weak
references so I didn't know if it was possible, but I think I could make
something like this now. I'll give it a shot and let you guys know how it
goes.
Rodney could you post your memo code that uses the weak references?
- Job
On Fri, Sep 18, 2009 at 7:56 AM, Peter Verswyvelen
I would also like to see a solution for problems like these.
Haskell provides a lot of nice memoizing / caching data structures - like a trie - but the ones I know indeed keep growing, so no garbage collection takes place?
It would be nice to have a data structure that performs caching but does not grow unlimited.
I had a similar problem with stable names; it is not possible to check if a stable name is still "alive".
On Fri, Sep 18, 2009 at 1:39 AM, Rodney Price
wrote: In my case, the results of each computation are used to generate a node in a graph structure (dag). The key, oddly, is a hash of a two-tuple that gets stored in the data structure after the computation of the node finishes. If I don't memoize the function to build a node, the cost of generating the tree is exponential; if I do, it's somewhere between linear and quadratic.
Another process prunes parts of this graph structure as time goes on. The entire data structure is intended to be persistent, lasting for days at a time in a server-like application. If the parts pruned aren't garbage collected, the space leak will eventually be catastrophic. Either the memo table or the graph structure itself will outgrow available memory.
-Rod
On Thu, 17 Sep 2009 13:32:13 -0400 Job Vranish
wrote: What are you trying to use this for? It seems to me that for memo tables you almost never have references to they keys outside the lookup table since the keys are usually computed right at the last minute, and then discarded (otherwise it might be easier to just cache stuff outside the function).
For example with a naive fibs, the values you are passing in are computed, and probably don't exist before you do the recursive call, and then are discarded shortly afterward.
It seems like putting a cap on the cache size, and then just overwriting old entries would be better. Am I missing something?
- Job
On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price
wrote: How does garbage collection work in an example like the one below? You memoize a function with some sort of lookup table, which stores function arguments as keys and function results as values. As long as the function remains in scope, the keys in the lookup table remain in memory, which means that the keys themselves always remain reachable and they cannot be garbage collected. Right?
So what do you do in the case where you know that, after some period of time, some entries in the lookup table will never be accessed? That is, there are no references to the keys for some entries remaining, except for the references in the lookup table itself. You'd like to allow the memory occupied by the keys to be garbage collected. Otherwise, if the function stays around for a long time, the size of the lookup table always grows. How do you avoid the space leak?
I notice that there is a function in Data.IORef,
mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
which looks promising. In the code below, however, there's only one IORef, so either the entire table gets garbage collected or none of it does.
I've been reading the paper "Stretching the storage manager: weak pointers and stable names in Haskell," which seems to answer my question. When I attempt to run the memoization code in the paper on the simple fib example, I find that -- apparently due to lazy evaluation -- no new entries are entered into the lookup table, and therefore no lookups are ever successful!
So apparently there is some interaction between lazy evaluation and garbage collection that I don't understand. My head hurts. Is it necessary to make the table lookup operation strict? Or is it something entirely different that I am missing?
-Rod
On Thu, 10 Sep 2009 18:33:47 -0700 Ryan Ingram
wrote: memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res
memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)
I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?
-- ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hey it works :D
Here is a proof of concept:
http://gist.github.com/189104
Maybe later today I'll try to make a version that can be safely used outside
IO.
- Job
On Fri, Sep 18, 2009 at 10:19 AM, Job Vranish
Yeah it seems like the general solution to the problem would be some sort of map-like datastructure that you add items via a key/value pair, and if the key gets GC'd, that entry gets removed from the structure.
I've been wanting something like this as well, but didn't know about weak references so I didn't know if it was possible, but I think I could make something like this now. I'll give it a shot and let you guys know how it goes.
Rodney could you post your memo code that uses the weak references?
- Job
On Fri, Sep 18, 2009 at 7:56 AM, Peter Verswyvelen
wrote: I would also like to see a solution for problems like these.
Haskell provides a lot of nice memoizing / caching data structures - like a trie - but the ones I know indeed keep growing, so no garbage collection takes place?
It would be nice to have a data structure that performs caching but does not grow unlimited.
I had a similar problem with stable names; it is not possible to check if a stable name is still "alive".
On Fri, Sep 18, 2009 at 1:39 AM, Rodney Price
wrote: In my case, the results of each computation are used to generate a node in a graph structure (dag). The key, oddly, is a hash of a two-tuple that gets stored in the data structure after the computation of the node finishes. If I don't memoize the function to build a node, the cost of generating the tree is exponential; if I do, it's somewhere between linear and quadratic.
Another process prunes parts of this graph structure as time goes on. The entire data structure is intended to be persistent, lasting for days at a time in a server-like application. If the parts pruned aren't garbage collected, the space leak will eventually be catastrophic. Either the memo table or the graph structure itself will outgrow available memory.
-Rod
On Thu, 17 Sep 2009 13:32:13 -0400 Job Vranish
wrote: What are you trying to use this for? It seems to me that for memo tables you almost never have references to they keys outside the lookup table since the keys are usually computed right at the last minute, and then discarded (otherwise it might be easier to just cache stuff outside the function).
For example with a naive fibs, the values you are passing in are computed, and probably don't exist before you do the recursive call, and then are discarded shortly afterward.
It seems like putting a cap on the cache size, and then just overwriting old entries would be better. Am I missing something?
- Job
On Wed, Sep 16, 2009 at 4:48 PM, Rodney Price
wrote: How does garbage collection work in an example like the one below? You memoize a function with some sort of lookup table, which stores function arguments as keys and function results as values. As long as the function remains in scope, the keys in the lookup table remain in memory, which means that the keys themselves always remain reachable and they cannot be garbage collected. Right?
So what do you do in the case where you know that, after some period of time, some entries in the lookup table will never be accessed? That is, there are no references to the keys for some entries remaining, except for the references in the lookup table itself. You'd like to allow the memory occupied by the keys to be garbage collected. Otherwise, if the function stays around for a long time, the size of the lookup table always grows. How do you avoid the space leak?
I notice that there is a function in Data.IORef,
mkWeakIORef :: IORef a -> IO () -> IO (Weak (IORef a))
which looks promising. In the code below, however, there's only one IORef, so either the entire table gets garbage collected or none of it does.
I've been reading the paper "Stretching the storage manager: weak pointers and stable names in Haskell," which seems to answer my question. When I attempt to run the memoization code in the paper on the simple fib example, I find that -- apparently due to lazy evaluation -- no new entries are entered into the lookup table, and therefore no lookups are ever successful!
So apparently there is some interaction between lazy evaluation and garbage collection that I don't understand. My head hurts. Is it necessary to make the table lookup operation strict? Or is it something entirely different that I am missing?
-Rod
On Thu, 10 Sep 2009 18:33:47 -0700 Ryan Ingram
wrote: memoIO :: Ord a => (a -> b) -> IO (a -> IO b) memoIO f = do cache <- newIORef M.empty return $ \x -> do m <- readIORef cache case M.lookup x m of Just y -> return y Nothing -> do let res = f x writeIORef cache $ M.insert x res m return res
memo :: Ord a => (a -> b) -> (a -> b) memo f = unsafePerformIO $ do fmemo <- memoIO f return (unsafePerformIO . fmemo)
I don't think there is any valid transformation that breaks this, since the compiler can't lift anything through unsafePerformIO. Am I mistaken?
-- ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi, Investigating memoization inspired by replies from this thread. I encountered something strange in the behavior of ghci. Small chance it's a bug, it probably is a feature, but I certainly don't understand it :) The interpreter session went as follows GHCi, version 6.10.4: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer ... linking ... done. Loading package base ... linking ... done. Prelude> :load test_bug.hs [1 of 1] Compiling Main ( test_bug.hs, interpreted ) Ok, modules loaded: Main. *Main> let s1 = memo2 solve2 Loading package syb ... linking ... done. Loading package array-0.2.0.0 ... linking ... done. Loading package containers-0.2.0.1 ... linking ... done. Loading package filepath-1.1.0.2 ... linking ... done. Loading package old-locale-1.0.0.1 ... linking ... done. Loading package old-time-1.0.0.2 ... linking ... done. Loading package unix-2.3.2.0 ... linking ... done. Loading package directory-1.0.0.3 ... linking ... done. Loading package process-1.0.1.1 ... linking ... done. Loading package random-1.0.0.1 ... linking ... done. Loading package haskell98 ... linking ... done. *Main> :type s1 s1 :: [()] -> [()] -> ModP *Main> let s2 a b = memo2 solve2 a b *Main> :type s2 s2 :: (Eq t) => [t] -> [t] -> ModP Here memo2 is a function that works like a combinator to obtain a memoized recursive function. However the type of the function depends on how I define it. In point-free style it gets the wrong type, however if I define (s2) with explicit arguments the type is correct? Do you know what happens here? I would expect the types to be the same. Another question is: I use now makeStableName for equality but using this function memoization does not work and it still takes a long (exponential?) time to go through the codejam testcases. The memoization using data.map works flawless. Greetings, Gerben ps. The content of test_bug.hs is import Data.IORef import System.IO.Unsafe import Control.Exception import qualified Data.Map as M import Text.Printf import qualified Data.HashTable as H import System.Mem.StableName import Data.Ratio import Array memo f = unsafePerformIO $ do cache <- H.new (==) (H.hashInt . hashStableName) let cacheFunc = \x -> unsafePerformIO $ do stable <- makeStableName x lup <- H.lookup cache stable case lup of Just y -> return y Nothing -> do let res = f cacheFunc x H.insert cache stable res return res return cacheFunc memo2 f = curry $ memo (\g (x,y) -> f (curry g) x y) newtype ModP = ModP Integer deriving Eq p=10007 instance Show ModP where show (ModP x) = printf "%d" x instance Num ModP where ModP x + ModP y = ModP ((x + y) `mod` p) fromInteger x = ModP (x `mod` p) ModP x * ModP y = ModP ((x * y) `mod` p) abs = undefined signum = undefined solve2 f _ [] = 1::ModP solve2 f [] _ = 0::ModP solve2 f (hs:ts) t@(ht:tt) | hs==ht = f ts tt + f ts t | otherwise = f ts t go (run, line) = "Case #"++show run++": "++show ((memo2 solve2) line "welcome to code jam") main = interact $ unlines . map go . zip [1..] . tail . lines -- View this message in context: http://www.nabble.com/memoization-tp25306687p25400506.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Hello staafmeister, Friday, September 11, 2009, 4:57:01 PM, you wrote:
Here memo2 is a function that works like a combinator to obtain a memoized recursive function. However the type of the function depends on how I define it. In point-free style it gets the wrong type, however if I define (s2) with explicit arguments the type is correct? Do you know what happens here? I would expect the types to be the same.
looks like you need to pass -fno-monomorphism-restriction to ghci. it's a "bug" of haskell definition, expected to be removed in the next language version -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (10)
-
Bulat Ziganshin
-
Daniel Fischer
-
Job Vranish
-
Luke Palmer
-
mf-hcafe-15c311f0c@etc-network.de
-
Peter Verswyvelen
-
Reid Barton
-
Rodney Price
-
Ryan Ingram
-
staafmeister