How unique is Unique

newer
State Machine and the Abstractions

older
Re: [Haskell-cafe] Haskell School...

Emil Axelsson

27 May 2011 27 May '11

7:35 a.m.

Hello! Lacking a proper blog, I've written some notes about Data.Unique here: http://community.haskell.org/~emax/darcs/MoreUnique/ This describes a real problem that makes Data.Unique unsuitable for implementing observable sharing. The document also proposes a solution that uses time stamps to generate symbols that are "more unique". Does anyone have any comments on the proposed solution? Are there any alternatives available? Thanks! / Emil

Show replies by date

Andrew Butterfield

27 May 27 May

8:29 a.m.

On 27 May 2011, at 08:35, Emil Axelsson wrote:

...

Hello!

Lacking a proper blog, I've written some notes about Data.Unique here:

http://community.haskell.org/~emax/darcs/MoreUnique/

This describes a real problem that makes Data.Unique unsuitable for implementing observable sharing.

Looking at the source for Data.Unique shows that its implemented as an Integer, starting at 0, when the IO monad is started - so the values are only unique within a single I/O run, and are definitely not comparable across runs...

...

The document also proposes a solution that uses time stamps to generate symbols that are "more unique".

Does anyone have any comments on the proposed solution? Are there any alternatives available?

I haven't had time to look at that part yet ....

...

Thanks!

/ Emil

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-------------------------------------------------------------------- Andrew Butterfield Tel: +353-1-896-2517 Fax: +353-1-677-2204 Foundations and Methods Research Group Director. School of Computer Science and Statistics, Room F.13, O'Reilly Institute, Trinity College, University of Dublin http://www.cs.tcd.ie/Andrew.Butterfield/ --------------------------------------------------------------------

David Virebayre

8:44 a.m.

2011/5/27 Emil Axelsson :

...

Does anyone have any comments on the proposed solution? Are there any alternatives available?

It might be unsuitable where an administrator can change the system's time while the program is running. David.

Emil Axelsson

12:17 p.m.

2011-05-27 10:44, David Virebayre skrev:

...

2011/5/27 Emil Axelsson:

...
Does anyone have any comments on the proposed solution? Are there any alternatives available?

It might be unsuitable where an administrator can change the system's time while the program is running.

Agreed! However, it should be extremely hard to cause a clash in this way. So unless there are even safer solutions, I think this is a risk I'm willing to take. / Emil

Gilberto Garcia

12:28 p.m.

There is also the UUID that guarantees uniqueness in a environment with more than one machine. http://en.wikipedia.org/wiki/Universally_unique_identifier Cheers

Simon Marlow

11:12 a.m.

On 27/05/2011 08:35, Emil Axelsson wrote:

...

Hello!

Lacking a proper blog, I've written some notes about Data.Unique here:

http://community.haskell.org/~emax/darcs/MoreUnique/

This describes a real problem that makes Data.Unique unsuitable for implementing observable sharing.

The document also proposes a solution that uses time stamps to generate symbols that are "more unique".

Does anyone have any comments on the proposed solution? Are there any alternatives available?

This has nothing to do with Unique, you are simply using unsafePerformIO in an unsafe way. It is called unsafePerformIO for a reason :-) Cheers, Simon

Emil Axelsson

12:40 p.m.

2011-05-27 13:12, Simon Marlow skrev:

...

On 27/05/2011 08:35, Emil Axelsson wrote:

...
Hello!

Lacking a proper blog, I've written some notes about Data.Unique here:

http://community.haskell.org/~emax/darcs/MoreUnique/

This describes a real problem that makes Data.Unique unsuitable for implementing observable sharing.

The document also proposes a solution that uses time stamps to generate symbols that are "more unique".

Does anyone have any comments on the proposed solution? Are there any alternatives available?

This has nothing to do with Unique, you are simply using unsafePerformIO in an unsafe way. It is called unsafePerformIO for a reason :-)

Right, I wasn't suggesting this is a bug in Data.Unique. It's just that I need something different. I've added a note about this on the post. Note that I am planning to wrap all this in a safe interface. So using unsafePerformIO should be fine in this case, as long as Unique symbols survive GHCi reloads. / Emil

Simon Marlow

1:44 p.m.

On 27/05/2011 13:40, Emil Axelsson wrote:

...

2011-05-27 13:12, Simon Marlow skrev:

...
On 27/05/2011 08:35, Emil Axelsson wrote:

...
Hello!

Lacking a proper blog, I've written some notes about Data.Unique here:

http://community.haskell.org/~emax/darcs/MoreUnique/

This describes a real problem that makes Data.Unique unsuitable for implementing observable sharing.

The document also proposes a solution that uses time stamps to generate symbols that are "more unique".

Does anyone have any comments on the proposed solution? Are there any alternatives available?

This has nothing to do with Unique, you are simply using unsafePerformIO in an unsafe way. It is called unsafePerformIO for a reason :-)

Right, I wasn't suggesting this is a bug in Data.Unique. It's just that I need something different. I've added a note about this on the post.

Note that I am planning to wrap all this in a safe interface. So using unsafePerformIO should be fine in this case, as long as Unique symbols survive GHCi reloads.

Oh, I missed the bit about observable sharing, sorry. Anyway, I think the explanation for what you're seeing is that :r resets the Unique counter, because the counter is a CAF in Data.Unique and :r resets all CAFs. This wouldn't happen in a compiled program, or even using runhaskell, it's only :r that triggers resetting of CAFs. Still, to use a phrase coined by Lennart Augustsson, "the ice is thin here". Cheers, Simon

Heinrich Apfelmus

28 May 28 May

9:35 a.m.

Emil Axelsson wrote:

...

Hello!

Lacking a proper blog, I've written some notes about Data.Unique here:

http://community.haskell.org/~emax/darcs/MoreUnique/

This describes a real problem that makes Data.Unique unsuitable for implementing observable sharing.

The document also proposes a solution that uses time stamps to generate symbols that are "more unique".

Does anyone have any comments on the proposed solution? Are there any alternatives available?

I don't know how Data.Unique is implemented. For observable sharing, I usually implement my own variant of Data.Unique with a global variable/counter. Since every module of my DSL depends on the same global variable, only two things should happen: * Reloading a module does not reload the global counter. Everything is fine. * Reloading a module does reload the global counter. This forces *all* modules to be reloaded, which removes all expressions that have used the old variable from scope. However, note that the problem is not so much with Data.Unique; the real problem is that you can't say "I want this to be unique", you have to say "I want this to be unique within this or that *context*". Imagine that I give you two expressions with some Uniques inside, some of them equal, some of them not. How do you know that two equal Uniques denote the same subexpressions? For that, you have to know that I created them in the same context, in the same *environment*. Using Data.Unique assumes that the environment is one program run. It is clear that reloading modules in GHCi leads to different environments. You can make the environment explicit as a type parameter and use the usual rank-2 polymorphism trick: interpret :: (forall env. Expr env a) -> a The additional type parameter will quickly become unwieldy, though. Concerning observable sharing, I very much like the approach from Andy Gill. Type safe observable sharing. http://www.ittc.ku.edu/csdl/fpg/sites/default/files/Gill-09-TypeSafeReificat... which uses StablePointers . Unfortunately, it forces Typeable contraints on polymorphic combinators. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Emil Axelsson

30 May 30 May

8:02 a.m.

2011-05-28 11:35, Heinrich Apfelmus skrev:

...

Emil Axelsson wrote:

...
Hello!

Lacking a proper blog, I've written some notes about Data.Unique here:

http://community.haskell.org/~emax/darcs/MoreUnique/

This describes a real problem that makes Data.Unique unsuitable for implementing observable sharing.

The document also proposes a solution that uses time stamps to generate symbols that are "more unique".

Does anyone have any comments on the proposed solution? Are there any alternatives available?

I don't know how Data.Unique is implemented. For observable sharing, I usually implement my own variant of Data.Unique with a global variable/counter.

This is how Data.Unique is implemented too. Would your implementation pass the test I posted at the link above?

...

Since every module of my DSL depends on the same global variable, only two things should happen:

* Reloading a module does not reload the global counter. Everything is fine. * Reloading a module does reload the global counter. This forces *all* modules to be reloaded, which removes all expressions that have used the old variable from scope.

But Simon Marlow just said that CAFs are reset upon reload. So the first situation should never occur(?). And as my example shows, resetting the counter does not force all dependent modules to get reloaded.

...

However, note that the problem is not so much with Data.Unique; the real problem is that you can't say "I want this to be unique", you have to say "I want this to be unique within this or that *context*". Imagine that I give you two expressions with some Uniques inside, some of them equal, some of them not. How do you know that two equal Uniques denote the same subexpressions? For that, you have to know that I created them in the same context, in the same *environment*.

Using Data.Unique assumes that the environment is one program run. It is clear that reloading modules in GHCi leads to different environments.

OK, so I guess you could say that my solution makes the context explicit by associating it with a time stamp. Of course, this is just an approximation, because it's not impossible for two different contexts to get the same time stamp, especially in a parallel setting. However, it seems that this somewhat ugly trick might actually be what makes this approach to observable sharing work in practice.

...

Concerning observable sharing, I very much like the approach from

Andy Gill. Type safe observable sharing. http://www.ittc.ku.edu/csdl/fpg/sites/default/files/Gill-09-TypeSafeReificat...

which uses StablePointers . Unfortunately, it forces Typeable contraints on polymorphic combinators.

As far as I've heard, the semantics of stable pointers is not always well-defined either (but I could be wrong). But I should look into whether this technique could be used with the syntactic library. It would be nice to be able to gather several techniques under a common interface. / Emil

Heinrich Apfelmus

1:29 p.m.

Emil Axelsson wrote:

...

Heinrich Apfelmus wrote:

...
Since every module of my DSL depends on the same global variable, only two things should happen:

* Reloading a module does not reload the global counter. Everything is fine. * Reloading a module does reload the global counter. This forces *all* modules to be reloaded, which removes all expressions that have used the old variable from scope.

But Simon Marlow just said that CAFs are reset upon reload. So the first situation should never occur(?). And as my example shows, resetting the counter does not force all dependent modules to get reloaded.

Something is really weird here. In your example, reloading M2 should not reset the counter, but it does. However, it's not true that all CAFs are reset. After all, this would mean that a would have been reset as well! (Of course, if the two requirements above were met, then it would work.)

...

...
However, note that the problem is not so much with Data.Unique; the real problem is that you can't say "I want this to be unique", you have to say "I want this to be unique within this or that *context*". Imagine that I give you two expressions with some Uniques inside, some of them equal, some of them not. How do you know that two equal Uniques denote the same subexpressions? For that, you have to know that I created them in the same context, in the same *environment*.

Using Data.Unique assumes that the environment is one program run. It is clear that reloading modules in GHCi leads to different environments.

OK, so I guess you could say that my solution makes the context explicit by associating it with a time stamp. Of course, this is just an approximation, because it's not impossible for two different contexts to get the same time stamp, especially in a parallel setting. However, it seems that this somewhat ugly trick might actually be what makes this approach to observable sharing work in practice.

Well, what I actually wanted to say is that your values are "free-floating", they contain a lot of "dangling pointers". There is a fundamental problem here which I think you're trying to paste over with judicious use of side effects. Observable sharing is just a funny way of augmenting your DSL with let bindings, the funny thing being that you can use the host-language let instead of an separate Let constructor. I recommend to think it terms of an explicit Let constructor, this will clear up a lot of confusion. For instance, your example can be written as a = Let (1 := "a") `In` Var 1 b = Let (2 := "b") `In` Var 2 (Now, the purpose of Data.Unique is just to supply fresh variable names.) What happens if b is defined as b = Let (1 := "b") `In` Var 1 ? The expressions a and b are both Var 1 , but the variables refer to different environments (1 := "a") and (1 := "b"). If you want to make a large expression from a and b , you have to unify the two environments. For instance, you can use scoping rules (= associated each variable with the height in the expression tree) or you can rename variables. Or you can use unsafePerformIO , which provides one single global environment, eliminating the need to unify environments in the first place. Unfortunately, global environments are unsafe.

...

...
Concerning observable sharing, I very much like the approach from

Andy Gill. Type safe observable sharing. http://www.ittc.ku.edu/csdl/fpg/sites/default/files/Gill-09-TypeSafeReificat...

which uses StablePointers . Unfortunately, it forces Typeable contraints on polymorphic combinators.

As far as I've heard, the semantics of stable pointers is not always well-defined either (but I could be wrong). But I should look into whether this technique could be used with the syntactic library. It would be nice to be able to gather several techniques under a common interface.

Err, I meant Stable *Names*, not Stable Pointers, sorry. The worst thing that can happen is that you lose some sharing, but you can never accidentally introduce sharing. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

5160

Age (days ago)

5163

Last active (days ago)

List overview

Download

10 comments

6 participants

participants (6)

Andrew Butterfield
David Virebayre
Emil Axelsson
Gilberto Garcia
Heinrich Apfelmus
Simon Marlow