
Hello! A short question, given: data Seed = ... data Value = ... someGenerator :: Seed -> [Value] createTwo :: Seed -> ([Value], [Value]) createTwo s = (as, bs) where as = someGenerator s bs = drop 1000000000 (someGenerator s) Is it guaranteed that 'someGenerator s' is created twice and not shared between 'as' and 'bs'? Is this by language design? Are there any GHC options that change the behaviour? Thank you, Michael

It's not guaranteed. Unfortunately there aren't really good ways to avoid sharing; the general advice is to convert values into functions, and apply them at the use site where sharing is OK. Unrelatedly, in your sample code, dropping 1000000000 entries is not a good way to build a splittable RNG. Check out http://publications.lib.chalmers.se/records/fulltext/183348/local_183348.pdf and also its related work for some bettera pproaches. Edward Excerpts from Michael Roth's message of 2016-11-07 20:56:54 +0100:
Hello! A short question, given:
data Seed = ... data Value = ...
someGenerator :: Seed -> [Value]
createTwo :: Seed -> ([Value], [Value]) createTwo s = (as, bs) where as = someGenerator s bs = drop 1000000000 (someGenerator s)
Is it guaranteed that 'someGenerator s' is created twice and not shared between 'as' and 'bs'? Is this by language design? Are there any GHC options that change the behaviour?
Thank you,
Michael

I'm not going to contradict Edward's answer in practice, because he knows far more about the internals of GHC than I do. However, I will contradict it in theory. There is a straightforward operational semantics for Haskell programs as compiled by GHC via Core to STG (without any optimisations applied). I described this semantics in the following talk at Haskell eXchange 2016: https://skillsmatter.com/skillscasts/8726-haskell-programs-how-do-they-run#v... and I wrote it up as an article here: http://h2.jaguarpaw.co.uk/posts/haskell-programs-how-do-they-run/ Under that semantics, yes, 'someGenerator s' is bound twice and thus is not shared. Now, GHC may try to apply an "optimisation" and (accidentally?) introduce sharing. That would be a compiler bug, in my opinion. In fact, it would be a terrible blow to Haskell's future as a practical language, because we really need fine-grained control over sharing. So I really hope that Edward is wrong about this. Tom On Mon, Nov 07, 2016 at 12:56:02PM -0800, Edward Z. Yang wrote:
It's not guaranteed. Unfortunately there aren't really good ways to avoid sharing; the general advice is to convert values into functions, and apply them at the use site where sharing is OK.
Unrelatedly, in your sample code, dropping 1000000000 entries is not a good way to build a splittable RNG. Check out http://publications.lib.chalmers.se/records/fulltext/183348/local_183348.pdf and also its related work for some bettera pproaches.
Edward
Excerpts from Michael Roth's message of 2016-11-07 20:56:54 +0100:
Hello! A short question, given:
data Seed = ... data Value = ...
someGenerator :: Seed -> [Value]
createTwo :: Seed -> ([Value], [Value]) createTwo s = (as, bs) where as = someGenerator s bs = drop 1000000000 (someGenerator s)
Is it guaranteed that 'someGenerator s' is created twice and not shared between 'as' and 'bs'? Is this by language design? Are there any GHC options that change the behaviour?

GHC is allowed to do common sub-expression elimination which would cause sharing. For example: module Opt where {-# NOINLINE bar #-} bar :: Int -> Int bar x = x + 3 foo :: Int -> Int foo s = bar s * bar s Here's the core for 'foo': foo = \ (s_ayA :: Int) -> case bar s_ayA of _ [Occ=Dead] { GHC.Types.I# x_aJa -> GHC.Types.I# (GHC.Prim.*# x_aJa x_aJa) } bar is invoked only once. In Michael's original example, CSE doesn't fire, probably because the tuple constructor is lazy. I am not entirely certain what the interaction here is. Edward Excerpts from Tom Ellis's message of 2016-11-07 21:10:41 +0000:
I'm not going to contradict Edward's answer in practice, because he knows far more about the internals of GHC than I do. However, I will contradict it in theory.
There is a straightforward operational semantics for Haskell programs as compiled by GHC via Core to STG (without any optimisations applied). I described this semantics in the following talk at Haskell eXchange 2016:
https://skillsmatter.com/skillscasts/8726-haskell-programs-how-do-they-run#v...
and I wrote it up as an article here:
http://h2.jaguarpaw.co.uk/posts/haskell-programs-how-do-they-run/
Under that semantics, yes, 'someGenerator s' is bound twice and thus is not shared.
Now, GHC may try to apply an "optimisation" and (accidentally?) introduce sharing. That would be a compiler bug, in my opinion. In fact, it would be a terrible blow to Haskell's future as a practical language, because we really need fine-grained control over sharing. So I really hope that Edward is wrong about this.
Tom
On Mon, Nov 07, 2016 at 12:56:02PM -0800, Edward Z. Yang wrote:
It's not guaranteed. Unfortunately there aren't really good ways to avoid sharing; the general advice is to convert values into functions, and apply them at the use site where sharing is OK.
Unrelatedly, in your sample code, dropping 1000000000 entries is not a good way to build a splittable RNG. Check out http://publications.lib.chalmers.se/records/fulltext/183348/local_183348.pdf and also its related work for some bettera pproaches.
Edward
Excerpts from Michael Roth's message of 2016-11-07 20:56:54 +0100:
Hello! A short question, given:
data Seed = ... data Value = ...
someGenerator :: Seed -> [Value]
createTwo :: Seed -> ([Value], [Value]) createTwo s = (as, bs) where as = someGenerator s bs = drop 1000000000 (someGenerator s)
Is it guaranteed that 'someGenerator s' is created twice and not shared between 'as' and 'bs'? Is this by language design? Are there any GHC options that change the behaviour?

I think we should forbid it then! Or at least only allow it when what is shared is known to be small. On Mon, Nov 07, 2016 at 01:32:50PM -0800, Edward Z. Yang wrote:
GHC is allowed to do common sub-expression elimination which would cause sharing. For example:
module Opt where
{-# NOINLINE bar #-} bar :: Int -> Int bar x = x + 3
foo :: Int -> Int foo s = bar s * bar s
Here's the core for 'foo':
foo = \ (s_ayA :: Int) -> case bar s_ayA of _ [Occ=Dead] { GHC.Types.I# x_aJa -> GHC.Types.I# (GHC.Prim.*# x_aJa x_aJa) }
bar is invoked only once.
In Michael's original example, CSE doesn't fire, probably because the tuple constructor is lazy. I am not entirely certain what the interaction here is.
Edward
Excerpts from Tom Ellis's message of 2016-11-07 21:10:41 +0000:
I'm not going to contradict Edward's answer in practice, because he knows far more about the internals of GHC than I do. However, I will contradict it in theory.
There is a straightforward operational semantics for Haskell programs as compiled by GHC via Core to STG (without any optimisations applied). I described this semantics in the following talk at Haskell eXchange 2016:
https://skillsmatter.com/skillscasts/8726-haskell-programs-how-do-they-run#v...
and I wrote it up as an article here:
http://h2.jaguarpaw.co.uk/posts/haskell-programs-how-do-they-run/
Under that semantics, yes, 'someGenerator s' is bound twice and thus is not shared.
Now, GHC may try to apply an "optimisation" and (accidentally?) introduce sharing. That would be a compiler bug, in my opinion. In fact, it would be a terrible blow to Haskell's future as a practical language, because we really need fine-grained control over sharing. So I really hope that Edward is wrong about this.
Tom
On Mon, Nov 07, 2016 at 12:56:02PM -0800, Edward Z. Yang wrote:
It's not guaranteed. Unfortunately there aren't really good ways to avoid sharing; the general advice is to convert values into functions, and apply them at the use site where sharing is OK.
Unrelatedly, in your sample code, dropping 1000000000 entries is not a good way to build a splittable RNG. Check out http://publications.lib.chalmers.se/records/fulltext/183348/local_183348.pdf and also its related work for some bettera pproaches.
Edward
Excerpts from Michael Roth's message of 2016-11-07 20:56:54 +0100:
Hello! A short question, given:
data Seed = ... data Value = ...
someGenerator :: Seed -> [Value]
createTwo :: Seed -> ([Value], [Value]) createTwo s = (as, bs) where as = someGenerator s bs = drop 1000000000 (someGenerator s)
Is it guaranteed that 'someGenerator s' is created twice and not shared between 'as' and 'bs'? Is this by language design? Are there any GHC options that change the behaviour?
Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Yes, dropping 1000000000 entries is bad, 'someGenerator' was thought as an deputy only. In reality I have some sort of generator which generates a large system state varying over time. It was my fault I didn't state this clearly. Thank you for the interesting reading about RNGs anyway. Michael Am 07.11.2016 um 21:56 schrieb Edward Z. Yang:
It's not guaranteed. Unfortunately there aren't really good ways to avoid sharing; the general advice is to convert values into functions, and apply them at the use site where sharing is OK.
Unrelatedly, in your sample code, dropping 1000000000 entries is not a good way to build a splittable RNG. Check out http://publications.lib.chalmers.se/records/fulltext/183348/local_183348.pdf and also its related work for some bettera pproaches.
Edward
Excerpts from Michael Roth's message of 2016-11-07 20:56:54 +0100:
Hello! A short question, given:
data Seed = ... data Value = ...
someGenerator :: Seed -> [Value]
createTwo :: Seed -> ([Value], [Value]) createTwo s = (as, bs) where as = someGenerator s bs = drop 1000000000 (someGenerator s)
Is it guaranteed that 'someGenerator s' is created twice and not shared between 'as' and 'bs'? Is this by language design? Are there any GHC options that change the behaviour?
Thank you,
Michael

On Mon, Nov 7, 2016 at 2:56 PM, Michael Roth wrote:
as = someGenerator s bs = drop 1000000000 (someGenerator s)
Is it guaranteed that 'someGenerator s' is created twice and not shared between 'as' and 'bs'? Is this by language design? Are there any GHC options that change the behaviour?
It's not specified in any standard, but ghc will only share something that is directly bound. There is no sensible way to determine sharing automatically that will do what everyone expects, so ghc doesn't try: the programmer is expected to use bindings to specify sharing. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Hi, Am Montag, den 07.11.2016, 16:00 -0500 schrieb Brandon Allbery:
It's not specified in any standard, but ghc will only share something that is directly bound. There is no sensible way to determine sharing automatically that will do what everyone expects, so ghc doesn't try: the programmer is expected to use bindings to specify sharing.
that is not true. GHC applies „common subexpression elimination“, and this optimization pass will likely sum up the two occurrences of "someGenerator s". In places online it says that only code like
let x = e in .... let y = e in ....
would be CSEd. But reading the code (CSE.hs) I conclude that also
let x = e in .... foo e ....
would get CSEd, which includes the code posted by Michael Roth. Furthermore, the FloatOut pass may turn
… foo e … into let x = e in … foo x … under certain circumstance, so even a limited form of CSE can apply here.
In the end, only looking at the Core of a concrete program can help to answer these questions. Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • https://www.joachim-breitner.de/ XMPP: nomeata@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org
participants (5)
-
Brandon Allbery
-
Edward Z. Yang
-
Joachim Breitner
-
Michael Roth
-
Tom Ellis