Am I misunderstanding something about `StableName` and `hashStableName`?

Hi, I've been playing around with `System.Mem.StableName` and `hashStableName` and stumbled across a behavior that seems to be rather weird. First of all the documentation of `hashStableName` says the following: The Int returned is not necessarily unique; several StableNames may map to the same Int (in practice however, the chances of this are small, so the result of hashStableName makes a good hash key). OK, sounds fine, let's put this to the test. So I wrote a little program. module Main where import Control.Monad import System.Mem.StableName import System.Environment main :: IO () main = do args <- getArgs res <- forM [0..10000] $ \i -> do x <- makeStableName i let h' = hashStableName x unless (null args) $ print h' return h' putStrLn "---------------------------" print $ minimum res print $ maximum res Running this program without an argument only prints aus the minimum and maximum hash values that where produced. So surprises here: $ ./Test --------------------------- 1 10000 Here is how things get a bit weird when I make the progam print out the hash values while they are being produced I get a completely different result. Not only that but I get a whole lot of hash collisions, since the hashes produced only seem to be within the range of 1 and 260. $ ./Test print [...] 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 --------------------------- 1 260 Am I missing something fundamental here, because this behavior seems to be rather confusing. Kind regards, raichoo

raichoo via Haskell-Cafe wrote:
First of all the documentation of `hashStableName` says the following:
The Int returned is not necessarily unique; several StableNames may map to the same Int (in practice however, the chances of this are small, so the result of hashStableName makes a good hash key).
OK, sounds fine, let's put this to the test. So I wrote a little program.
module Main where
import Control.Monad
import System.Mem.StableName import System.Environment
main :: IO () main = do args <- getArgs res <- forM [0..10000] $ \i -> do x <- makeStableName i let h' = hashStableName x unless (null args) $ print h' return h'
putStrLn "---------------------------" print $ minimum res print $ maximum res
There is nothing in this program that keeps the stable names alive. It appears, from your experiments, that once a stable name is garbage collected, its ID, which also serves as its hash value, may be reused for another stable pointer. Consider this variant of your main function: main = do res <- forM [0..10000] $ fmap hashStableName . makeStableName performGC res <- forM [0..10000] $ fmap hashStableName . makeStableName performGC res <- forM [0..10000] $ fmap hashStableName . makeStableName print (minimum res, maximum res) This produces `(1,10001)` as output in my tests. I'm not sure how exactly `print` affects garbage collections. Overall, I believe the documentation of `hashStableName` is mostly correct, but it would make sense to stress that the statement is only valid for stable names that are currently alive at a particular point in time, not globally for the whole run of a program. Cheers, Bertram

On November 21, 2016 12:32:03 AM GMT+11:00, Bertram Felgenhauer via Haskell-Cafe
raichoo via Haskell-Cafe wrote:
First of all the documentation of `hashStableName` says the following:
The Int returned is not necessarily unique; several StableNames may map to the same Int (in practice however, the chances of this are small, so the result of hashStableName makes a good hash key).
OK, sounds fine, let's put this to the test. So I wrote a little program.
module Main where
import Control.Monad
import System.Mem.StableName import System.Environment
main :: IO () main = do args <- getArgs res <- forM [0..10000] $ \i -> do x <- makeStableName i let h' = hashStableName x unless (null args) $ print h' return h'
putStrLn "---------------------------" print $ minimum res print $ maximum res
There is nothing in this program that keeps the stable names alive. It appears, from your experiments, that once a stable name is garbage collected, its ID, which also serves as its hash value, may be reused for another stable pointer. Consider this variant of your main function:
main = do res <- forM [0..10000] $ fmap hashStableName . makeStableName performGC res <- forM [0..10000] $ fmap hashStableName . makeStableName performGC res <- forM [0..10000] $ fmap hashStableName . makeStableName print (minimum res, maximum res)
This produces `(1,10001)` as output in my tests.
I'm not sure how exactly `print` affects garbage collections.
Overall, I believe the documentation of `hashStableName` is mostly correct, but it would make sense to stress that the statement is only valid for stable names that are currently alive at a particular point in time, not globally for the whole run of a program.
Cheers,
Bertram
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
I suspect that when print h' isn't called, then `let h' = hashStableName x` stays as a thunk in the list, until you later compute the minimum and maximum. The thunk refers to the stable name, and keeps it from being GCed. So they *are* all live at the same time. When you print h' in the loop body, that forces the hash calculation and discharges the thunk. So the stable name becomes garbage immediately, and once collected the internal id is available for reuse. Perhaps the 1-260 range observed tells you roughly how many iterations of that loop you can get through in the smallest GC generation, on your system.
participants (3)
-
Ben Mellor
-
Bertram Felgenhauer
-
raichoo