
On 07/11/2010 17:47, Jan Christiansen wrote:
On 02.11.2010, at 10:20, Simon Marlow wrote:
It's not really a question of priority, rather that we don't know of a good way to fix it!
I would not have guessed that there exists a Haskell related problem that cannot immediately be fixed by the ghc headquarters ; )
If I understand correctly, the problem is to keep the selectors recognizable while still performing optimizations that might "destroy" the selector structure. In Bertram's example the resulting expression looked as follows.
l : case q of (_, ys) -> case ys of [] -> [] _:zs -> splitBy' p zs
Is it correct that the selector is not recognizable in this case because the right hand side fo the outermost case expression is not a simple variable but a case expression? It is probably quite naive to assume that the problem can be solved by looking for a structure like
case q of (_,ys) -> e
where ys is a free variable in e. There are probably cases where further optimizations prevent this. Or do I miss the real problem in identifying selectors?
The problem is that a "selector" has to be an expression of the form case x of C y1 .. yn -> yi for some i. The RTS contains pre-compiled selectors for values of i up to 16. The garbage collector recognises selectors where x is already evaluated, and replaces them with the value. So you can't do this for an arbitrary expression without generalising the mechanism. Perhaps you could annotate an arbitrary thunk to say something like I select field N from free variable x and then the GC could null out all fields except for N, as long as x was not required elsewhere. But this has other problems - apart from being a lot more complicated in terms of what information we attach to thunks and what the GC has to do, it doesn't immediately eliminate the constructor, only the unreferenced fields, so you could still get certain kinds of leak. There's another approach in Jan Sparud's paper here: http://portal.acm.org/citation.cfm?id=165196 although it's not clear that this interacts very well with inlining either, and it has a suspicious-looking side-effecting operation. It also looks like it creates a circular reference between the thunk and the selectors, which might hinder optimisations, and would probably also make things slower (by adding extra free variables to the thunk). I haven't given it a great deal of thought though, maybe it could be made to work in GHC. Cheers, Simon