
Hi, There was a 'I want to write a compiler' thread recently. I wonder if there is such kind of pointers for compilation of strict functional languages (perhaps with lazy evaluation annotations) . Although GRIN is designed for lazy languages, would it be ok to use it for a strict one (and leverage its thunk representation for the lazy part of the language)? The 'whole program' optimization aspect is a plus. On a related note, I have another question. Say we have some data structure, for instance a list, some functions on this data structure (probably defined with some well known functions, such as map or fold), and a program using them. Is there any research trying to rewrite the program, and the data structure, to optimize them ? A contrived example is the length of a list : instead of traversing a list to know its length, the list can have an additional field which is incremented at each cons. Thanks, Thu

minh thu
On a related note, I have another question. Say we have some data structure, for instance a list, some functions on this data structure (probably defined with some well known functions, such as map or fold), and a program using them. Is there any research trying to rewrite the program, and the data structure, to optimize them ?
Yes. The most advanced approach that I know of is Dons' stream-fusion[1]. I guess the technique of transforming a program so that Y-combinators are at their outermost possible position (and fused, in the process) could be generalised.
A contrived example is the length of a list : instead of traversing a list to know its length, the list can have an additional field which is incremented at each cons.
Well, that's not a list anymore, at least not with some additional ingenuity to deal with infinite ones. Statically-lengthed lists can be done with some type trickery, see e.g. [2], if that helps. [1] http://www.cse.unsw.edu.au/~dons/papers/CLS07.html [2] http://article.gmane.org/gmane.comp.lang.haskell.general/13561 -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited.

2009/4/10 Achim Schneider
minh thu
wrote: On a related note, I have another question. Say we have some data structure, for instance a list, some functions on this data structure (probably defined with some well known functions, such as map or fold), and a program using them. Is there any research trying to rewrite the program, and the data structure, to optimize them ?
Yes. The most advanced approach that I know of is Dons' stream-fusion[1]. I guess the technique of transforming a program so that Y-combinators are at their outermost possible position (and fused, in the process) could be generalised.
A contrived example is the length of a list : instead of traversing a list to know its length, the list can have an additional field which is incremented at each cons.
Well, that's not a list anymore, at least not with some additional ingenuity to deal with infinite ones. Statically-lengthed lists can be done with some type trickery, see e.g. [2], if that helps.
Oh, that's not what I meant by my question. My question is more about the fact that the data structure can be augmented to include some information which is computed while constructing the structure, not afterward (to save an additional traversal). Say you have an AST, with just the information after a parsing pass. Then you compute 'derived' values at each node (maybe, I'm not sure if it would be practical, type, strictness, ...). To do that, you can have a function that, applied to any node, give the desired additional information. But for some functions, it can be seen clearly that such information could have been constructed at the same time that the data structure. So it is related to fusion techniques, but with the additional possibility of adding fields to the original data structure. Hope this makes sense, Thu

minh thu
But for some functions, it can be seen clearly that such information could have been constructed at the same time that the data structure.
So it is related to fusion techniques, but with the additional possibility of adding fields to the original data structure.
Well, the point fusion is about is _not_ to construct lists. consider naiveDropEnd xs = take (length xs - 1) xs , which, due to length being a catamorphism, traverses xs twice. It can be, in fact, reduced to the more sensible dropEnd (x:[]) = [] dropEnd (x:xs) = x:dropEnd xs , but that requires getting rid of the fold by replacing Integers with Peanos: length' = map (\_ -> ()) pred = tail succ = (():) zarroo = [] Now we can write notSoNaiveDropEnd xs = take' (pred $ length' xs) xs , which can be fused into a single y-combinator. Morale of the story: Folds are the bane of laziness. Having some magic in place than can choose a lazily constructed (and fused) Peano instance for Num in the right places would be awesomely cool. -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited.

minh thu wrote:
Say you have an AST, with just the information after a parsing pass. Then you compute 'derived' values at each node (maybe, I'm not sure if it would be practical, type, strictness, ...). To do that, you can have a function that, applied to any node, give the desired additional information.
But for some functions, it can be seen clearly that such information could have been constructed at the same time that the data structure.
So it is related to fusion techniques, but with the additional possibility of adding fields to the original data structure.
The problem with doing this automatically is that you don't get that data for free. Storing the length of finite lists as you construct them means that getting the length is constant time, but it introduces a space overhead of one Integer per (:). If you call length often enough it may be worth it, but generally the book-keeping will be too expensive. We could use modified strategies where we have two different (:)s, one that stores the length and one that doesn't. Then we could apply some heuristic like exponential backoff to minimize the space overheads subject to some logarithmic bound on the asymptotic cost of calculating the length of the list until the first label. Or we could apply dynamic memoization techniques and switch a non-length-storing (:) into a length-storing (:) whenever we evaluate a call to length, so it'll be quicker next time. Or,... There're a lot of strategies here, but I doubt it's feasible to automatically choose a good strategy for a given program (since "good" may depend on the vagaries of runtime data). Automatically doing the transformations and suggesting them to the user is doable though. You may be interested in looking into attribute grammars[1] and finger trees[2] for more ideas along this route. [1] http://www.haskell.org/haskellwiki/The_Monad.Reader/Issue4/Why_Attribute_Gra... [2] http://apfelmus.nfshost.com/monoid-fingertree.html -- Live well, ~wren

On a related note, I have another question. Say we have some data structure, for instance a list, some functions on this data structure (probably defined with some well known functions, such as map or fold), and a program using them. Is there any research trying to rewrite the program, and the data structure, to optimize them ?
Do you mean "computer science"?-) Much of this field could be rephrased as the search for better representations, to enable better implementations of algorithms (for some measure of "better"): do we represent programs as data structures to be interpreted, or as code to be executed? do we use sets, lists, arrays, ..? how do we represent sets/lists/arrays/..? do we recompute properties of data, or do we store them (and is one better always, or sometimes, and when? or do we need to optimize for the average or the most common case? how do we define "better", anyway?)? which are the options, for a particular application domain, or a general class of data structure/algorithm, and how do they compare? and so on.. But if you limit your question sufficiently, you might have some luck looking for papers that reference any of these: Automatic data structure selection: an example and overview, James R. Low, Communications of the ACM, 1978 http://portal.acm.org/citation.cfm?id=359488.359498 Techniques for the automatic selection of data structures Low, Rovner, 3rd POPL, 1976 http://portal.acm.org/citation.cfm?id=811540 Rovner, P. Automatic representation selection for associative data structures. Ph.D. Th., Harvard U., Cambridge, Mass; Tech. Rep. TR10, U. of Rochester, Rochester, N.Y., Sept. 1976. Claus
participants (4)
-
Achim Schneider
-
Claus Reinke
-
minh thu
-
wren ng thornton