Lambda and closures in PHP -- could someone please comment?

Hi, The PHP community is discussing the adding of closures and lambdas to the language, see the proposal at http://wiki.php.net/rfc/closures If someone with knowledge of both languages could take a quick look it would be great. Thanks a lot Karoly Negyesi Ps. I am not a member of the PHP internals team, I am just a PHP developer but I am very very interested in getting these in my favourite language.

On Tue, 2008-06-17 at 18:45 +0200, Karoly Negyesi wrote:
Hi,
The PHP community is discussing the adding of closures and lambdas to the language, see the proposal at http://wiki.php.net/rfc/closures
If someone with knowledge of both languages could take a quick look it would be great.
I program in Perl for a living, so hopefully you'll understand when I say (a) I would *never* want to use an implementation of closures like that. (b) Closures as proposed are *far* better than not having closures. jcc

On Wed, 2008-06-18 at 06:36 +0200, Karoly Negyesi wrote:
(a) I would *never* want to use an implementation of closures like that. (b) Closures as proposed are *far* better than not having closures.
Could you elaborate on a) ?
I dislike the habit of implicit declaration --- strongly --- and the consequent need for the lexical keyword (although at this point PHP's stuck with it). I can see myself forgetting to use lexical far more often than accidentally leaving off a `my' in Perl I should have used (I hardly ever shadow variable names anyway, so if I forget `my' is usually a use strict 'vars' error). I dislike curly braces. Syntax that extends as far to the right as possible tends to end up with fewer delimiters and a cleaner appearance. It's basically a way to replace a bunch of closing braces with a single ) or (in Haskell, implicit) ; lintPat p0 $ \ ty0 -> lintPat p1 $ \ ty1 -> lint e vs lintPat($p0, sub { my ($ty0) = @_; lintPat($p1, sub { my ($ty1) = @_; lint($e) })}) Four closing delimiters seems excessive. Nit: `function' is verbose. ML uses fun or fn (I forget which and am too lazy to google). Perl's regular keyword is sub, so they use that. There are worse fates than duck typing (C++ comes to mind :), and given a language with neither lexical closures/anonymous functions nor HM typing, I'd complain about the lack of lambdas first. But, still, no HM means no type classes. That ultimately becomes limiting. (I still haven't seen a decent implementation of monads in a dynamically typed language). But PHP is probably pretty much stuck with it. Not to criticize, mind you --- the proposal looks excellent for what it does. But I like what Haskell does worlds better. jcc

Not to criticize, mind you --- the proposal looks excellent for what it does. But I like what Haskell does worlds better.
Obviously you like Haskell better given this mailing list :) I am not here to compare PHP and Haskell, I was just asking advice from people who know closures and lambdas very well and it seems that I got it in the words of "the proposal looks excellent for what it does". Thanks a lot. Karoly Negyesi

On 18 Jun 2008, at 4:36 pm, Karoly Negyesi wrote:
(a) I would *never* want to use an implementation of closures like that. Could you elaborate on a) ?
It wasn't me who wrote it, but consider - non-local variables are *not* captured unless you explicitly hoist them into the lambda expression using the 'lexical' keyword. - references to non-local variables that are not so hoisted are not syntax errors, they just quietly do something else. - ordinary functions do not act as if defined by lambda expressions; 'lexical' is required in lambdas and forbidden in functions. - ordinary functions do not act as if defined by lambda expressions; the latter can outlive their lexical scope, the former can't. - what you get is a reference to a variable (as you do in Scheme) but loop variables really are variables, not names for values, so lambdas created in different iterations of the same loop point so the same loop variable, and do not remember the value it had when they were created. The proposal explains how to work around this. All of this boils down to something in which you *can* with care do the things you expect to do with closures, but the language gently leads you to the edge of the Pit and the compiler just smiles quietly as you fall over.

Richard A. O'Keefe wrote:
- what you get is a reference to a variable (as you do in Scheme) but loop variables really are variables, not names for values, so lambdas created in different iterations of the same loop point so the same loop variable, and do not remember the value it had when they were created. The proposal explains how to work around this.
This one trips everyone up in Javascript. I think I'm of the opinion that variable capture from lambda formation should always be by value. However you can certainly make an argument that that's inconsistent in a language which generally has mutation/reference semantics. Whichever choice you make, though, document it loudly I predict it will be a source of confusion. Jules

On 19 Jun 2008, at 5:53 pm, Jules Bean wrote:
Richard A. O'Keefe wrote:
- what you get is a reference to a variable (as you do in Scheme) but loop variables really are variables, not names for values, so lambdas created in different iterations of the same loop point so the same loop variable, and do not remember the value it had when they were created. The proposal explains how to work around this.
This one trips everyone up in Javascript.
What's going on here is a nasty interaction with the semantics of loops. In Smalltalk and Scheme (to name two languages with closures and mutable variables), each iteration of a loop in principle creates a new variable binding. Scheme example: (do ((i 0 (+ i 1)) (l '() (cons (lambda (x) (* x i)) l))) ((>= i 10) l)) is equivalent to let f i l = if i >= 10 then l else f (i + 1) ((\x -> x * i) : l) in f 0 [] except for i and l being potentially mutable in Scheme but not Haskell. The Smalltalk equivalent would be (0 to: 9) collect: [:i | [:x | x*i]] in which (a) each iteration creates a *new* i, and (b) method and block parameters are *not* mutable, because they never are in Smalltalk. Importing only values into closures would not work for Smalltalk. Consider the usual implementation of Smalltalk's equivalent of 'fold: Collection>> inject: initial into: function |r| r := initial. self do: [:each | r := function value: r value: each]. ^r The mutablity of r here really isn't a problem. Nor is the mutability of variables _as such_ really the problem in the PHP proposal. The problem is that it's the *same* variable every time. If PHP loops introduced new bindings on every iteration, this particular problem would not exist.

Richard A. O'Keefe wrote:
The mutablity of r here really isn't a problem. Nor is the mutability of variables _as such_ really the problem in the PHP proposal. The problem is that it's the *same* variable every time. If PHP loops introduced new bindings on every iteration, this particular problem would not exist.
Well, arguably it's not only the loop variable that can be susceptible to this problem. There could be other variables in the loop body which change each time through (e.g. while loops). Consider this pseudo-code (sorry, my PHP is a bit rusty, this syntax is C really) char c; while (!eof(fp)) { c = getChar(fp); bind_event(... some lambda expression referencing c ...); } It's pretty surprising to the programmer if all that family of lambda expressions reference the *variable* c (and hence, in practice, its final value) rather than the *value* c. Well, maybe that doesn't surprise everyone. It surprised me the first time I used closures in Javascript and judging by a few google searches I wasn't alone in that. Jules

On Thu, 2008-06-19 at 07:25 +0100, Jules Bean wrote:
Richard A. O'Keefe wrote:
The mutablity of r here really isn't a problem. Nor is the mutability of variables _as such_ really the problem in the PHP proposal. The problem is that it's the *same* variable every time. If PHP loops introduced new bindings on every iteration, this particular problem would not exist.
Well, arguably it's not only the loop variable that can be susceptible to this problem. There could be other variables in the loop body which change each time through (e.g. while loops). Consider this pseudo-code (sorry, my PHP is a bit rusty, this syntax is C really)
char c;
while (!eof(fp)) { c = getChar(fp); bind_event(... some lambda expression referencing c ...); }
It's pretty surprising to the programmer if all that family of lambda expressions reference the *variable* c (and hence, in practice, its final value) rather than the *value* c.
Well, maybe that doesn't surprise everyone. It surprised me the first time I used closures in Javascript and judging by a few google searches I wasn't alone in that.
Lambda abstractions should close over bindings. Full stop. The first "surprising" behaviour is the correct one. The latter would be broken. In my opinion, the reason this behaviour is "surprising" isn't mutability, but -implicit- mutability. Let's make bindings immutable, but add ML-style references to your example. char ref c = ref(undefined); while(!eof(fp)) { c := getChar(fp); bind_event( ... print !c; ... ); } compare this to while(!eof(fp)) { char c = getChar(fp); bind_event( ... print c; ...); } or while(!eof(fp)) { char ref c = ref(getChar(fp)); bind_event( ... print !c; ...); } Each of these examples makes it clearer what is going on. Admittedly, if we write a 'foreachChar' HOF, the difference between the first implementation and the last will not be apparent from the type. That's just the nature of the beast; there is simply more than one implementation. At any rate, as Richard O'Keefe stated, it's not the lambda's behaviour that needs to be documented, it's the loop's (or HOFs in general), for the iteration variables. The ones you introduce are your own concern; say what you mean.

Derek Elkins wrote:
Lambda abstractions should close over bindings. Full stop.
Interesting. I agree with your analysis. I don't think I agree with your conclusion.
The first "surprising" behaviour is the correct one. The latter would be broken.
In my opinion, the reason this behaviour is "surprising" isn't mutability, but -implicit- mutability. Let's make bindings immutable, but add ML-style references to your example.
char ref c = ref(undefined); while(!eof(fp)) { c := getChar(fp); bind_event( ... print !c; ... ); }
compare this to
while(!eof(fp)) { char c = getChar(fp); bind_event( ... print c; ...); }
or
while(!eof(fp)) { char ref c = ref(getChar(fp)); bind_event( ... print !c; ...); }
Each of these examples makes it clearer what is going on.
Agreed. I think where I differ on you is how to map the semantics of a C-like language to explicit references. I would argue that the glyph "c" in a C-like language denotes the value of C, not the reference to it. C-like languages have, for the most part, value semantics, and call-by-value. The exception of course is what C-like languages called "lvalues", but lvalues are only really on the left of the = sign and a few other special positions. I think that's the exception and not the rule. I think the rule is that "c" denotes the value of c, and that's why I expect a closure to capture the value, not the reference. In C, of course, if you want to capture the reference you do it explicitly with "&c". Jules

This is increasingly less relevant to Haskell, except of course to demonstrate what a nice language Haskell is. On 20 Jun 2008, at 11:34 pm, Jules Bean wrote:
I think where I differ on you is how to map the semantics of a C- like language to explicit references.
I would argue that the glyph "c" in a C-like language denotes the value of C, not the reference to it. C-like languages have, for the most part, value semantics, and call-by-value.
The exception of course is what C-like languages called "lvalues", but lvalues are only really on the left of the = sign and a few other special positions. I think that's the exception and not the rule.
No, this is back to front. C basically follows the Algol 68 idea that the lvalue is the normative thing, and that there is an IMPLICIT COERCION from a variable to its value in certain contexts. C is full of implicit coercions: perhaps the most famous is the one that says that in almost all contexts an array is quietly coerced to a pointer to its first element. The key observation is that an implicit coercion from a variable to its contents is possible, whereas an implicit coercion from a value to "the" variable that holds it is not. Only the "a variable really stands for its address" view is coherent.
In C, of course, if you want to capture the reference you do it explicitly with "&c".
If we can use evidence from a relative to probe such questions, the fact that you *don't* need an explicit "&" in C++ (when you find a 'reference' to a variable, you use "c", not "&c") strongly suggests that the "variable stands for location" view is the more useful one. Thankfully, Haskell saves us these perplexities. (And replaces them with other perplexities...)

On Tue, Jun 17, 2008 at 4:45 PM, Karoly Negyesi
Hi,
The PHP community is discussing the adding of closures and lambdas to the language, see the proposal at http://wiki.php.net/rfc/closures
If someone with knowledge of both languages could take a quick look it would be great.
Thanks a lot
Karoly Negyesi
Ps. I am not a member of the PHP internals team, I am just a PHP developer but I am very very interested in getting these in my favourite language.
Whew. Well I suspect you weren't expecting that kind of reaction. Or maybe you were... I used to be a Perl developer, and it didn't take long before I got a level 12 resistence to flame... Anyway, the proposal looks mostly okay. I don't know that much PHP, but I find the "lexical" keyword to be a nuisance. What are the semantics if the lexical keyword is omitted? (i.e. does the variable become function-local, global, what?) If it is consistent with the rest of the language, it'll do. There is a much more important point with closures: their implementation cannot be half-assed! I'm not claiming that the patch is--I have not reviewed it--but there is nothing worse than coming up with a design that relies on a language feature you only later find out has been nerfed in some way. Story of my life in C#. And nerfed closures are especially bad, because it's so hard to predict the code path. What I mean by this is the following must all be supported: * A closure must only keep alive the varables it references, not the whole pad on which they are allocated (Python messed up here) * A closure must be able to call itself recursively (via a higher-order function typically) (Squeak messed up here IIRC) * Multiple references to the same body of code with different bindings must be able to exist at the same time (duh, that's kinda what makes it a closure) * Closures must be nestable. Looking over the "Zend internal perspective" section, it looks like that implementation will mostly work. There are a couple of red flags, though: * I would recommend only saving $this in the op_array structure if the closure actually references $this -- if that is possible to deduce at the time. Otherwise you might run into unexpected poor memory performances in certain cases. (This kind of thing can make an *asymptotic* difference in memory performance; i.e. bringing the memory usage of an algorithm from O(1) to O(n), for example) * I'm worried that nested closures do not work properly with this implementation sketch. Here's a test case: $f = function ($y) { return function ($z) { return $y + $z; } }; $f(1)(2) # should give 3 And congratulations, PHP, for adopting a most essential and powerful feature! Luke

* A closure must only keep alive the varables it references, not the whole pad on which they are allocated (Python messed up here)
Getting off subject, but I didn't know this about python. I'm not
saying you're incorrect, but my experimentation shows:
% cat t.py
class A(object):
def __init__(self, name): self.name = name
def __del__(self): print self.name, 'gone'
def f():
x = A('x')
y = A('y')
def g():
print x.name, 'alive'
return g
% python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import t
g = t.f()
y gone
g()
x alive
g.func_closure
#-> (

CC to Christian Seiler http://www.haskell.org/pipermail/haskell-cafe/2008-June/044379.html http://www.haskell.org/pipermail/haskell-cafe/2008-June/thread.html On Wednesday 18 June 2008, Luke Palmer wrote:
* A closure must be able to call itself recursively (via a higher-order function typically)
I see two ways a closure might get a hold of itself, in order it can call itself: $f = function () { lexical $whatever; lexical $f; //get yourself by lexical scope return $f(); }; $f(); $g = function ($g) { //get yourself by parameter lexical $whatever; return $g($g); }; $g($g); Getting the first version to work is somewhat tricky in a non-lazy language, but it would be nice to have. The second should definately work. I guess I'll download the patch and try.
* I would recommend only saving $this in the op_array structure if the closure actually references $this -- if that is possible to deduce at the time. Otherwise you might run into unexpected poor memory performances in certain cases.
Agreed. A closure created inside an object should be able to outlive the object by not holding a reference to it. Since many PHP programmers put pretty much all of their functions into classes for style reasons, which would mean most closures are created in the context of an object, implicitly referencing $this might prevent a lot of objects from being garbage-collected. Also, doesn't that turn all lambdas defined inside an object into closures, which are heavier? Between always referencing $this in a lamda and requiring "lexical $this", the latter seems like the smaller evil to me. Gesundheit Wag
participants (8)
-
Alexander Wagner
-
Derek Elkins
-
Evan Laforge
-
Jonathan Cast
-
Jules Bean
-
Karoly Negyesi
-
Luke Palmer
-
Richard A. O'Keefe