The Worker/Wrapper Transformation

...is a paper about automatic specialisation of functions by unboxing arguments, one could say. I'm only on page 6, but already survived the first formalisms, which is bound to mean that the rest of the paper is likewise accessible, as hinted on at ltu. http://www.cs.nott.ac.uk/~gmh/wrapper.pdf The transformation itself is mindbogglingly easy, which makes this a good start: You only have to understand the formalisms, not so much what the formalisms are representing. To quote spj: It usually turns out to be more interesting and challenging than it seemed at first. I'm tempted to write that this is a paper for everyone trying to figure out what the heck Jonathan is talking about. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Achim Schneider
[...]
I'm trying to grok that [] = id ++ = . in the context of Hughes lists. I guess it would stop to slip away if I knew what : corresponds to. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

On Jan 3, 2008 6:08 AM, Achim Schneider
Achim Schneider
wrote: [...]
I'm trying to grok that
[] = id ++ = .
in the context of Hughes lists.
I guess it would stop to slip away if I knew what : corresponds to.
Well, (:) has type a -> [a] -> [a], so a function corresponding to (:) for Hughes lists should have type foo :: a -> H a -> H a that is, foo :: a -> ([a] -> [a]) -> [a] -> [a] so it can be written foo x h = (x:) . h which lambdabot informs me can also be written as (.) . (:). But in the end I'm not sure how helpful that is for understanding Hughes lists! I think the key sentence from the paper is this: "by representing a list xs as the function (xs ++) that appends this list to another list that has still to be supplied." If you understand that sentence, then you can understand why [] is id and (++) is (.). -Brent

"Brent Yorgey"
Well, (:) has type a -> [a] -> [a], so a function corresponding to (:) for Hughes lists should have type
foo :: a -> H a -> H a
[...] I think the key sentence from the paper is this: "by representing a list xs as the function (xs ++) that appends this list to another list that has still to be supplied." If you understand that sentence, then you can understand why [] is id and (++) is (.).
Yes, I did. They key was not thinking that : has type (:) :: a -> a -> [a] , or, put differently, beat the lisp out of me, thanks. The problem is merely that Haskell and lisp are too similar in a much too different way. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Achim Schneider
"Brent Yorgey"
wrote: Well, (:) has type a -> [a] -> [a], so a function corresponding to (:) for Hughes lists should have type
foo :: a -> H a -> H a
[...] I think the key sentence from the paper is this: "by representing a list xs as the function (xs ++) that appends this list to another list that has still to be supplied." If you understand that sentence, then you can understand why [] is id and (++) is (.).
Yes, I did.
They key was not thinking that : has type
(:) :: a -> a -> [a]
, or, put differently, beat the lisp out of me, thanks.
What the hell am I talking about? (define (cons x y) (lambda (m) (m x y))) (define (car z) (z (lambda (p q) p))) (define (cdr z) (z (lambda (p q) q))) : is, in a sense, \. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Achim Schneider
(define (cons x y) (lambda (m) (m x y)))
(define (car z) (z (lambda (p q) p)))
(define (cdr z) (z (lambda (p q) q)))
, which, just for completeness, can be of course also be done in Haskell: cons :: a -> b -> (a -> b -> c) -> c cons x y m = m x y car :: ((a -> b -> a) -> c) -> c car z = z $ \p q -> p cdr :: ((a -> b -> b) -> c) -> c cdr z = z $ \p q -> q Prelude> car $ cdr $ cdr $ cons 1 $ cons 2 $ cons 3 () 3 -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Achim Schneider wrote:
Achim Schneider
wrote: [...]
I'm trying to grok that
[] = id ++ = .
in the context of Hughes lists.
they are also known as "difference lists", and also used at type String in the Prelude as "ShowS", to help avoid quadratic behavior when making complicated Strings. the [a]->[a] is not an ordinary function -- it's expected not to examine its argument, just to use it exactly once (is there a formal way to say that?) ~Isaac

On Thu, 3 Jan 2008, Isaac Dupree wrote:
Achim Schneider wrote:
Achim Schneider
wrote: [...]
I'm trying to grok that
[] = id ++ = .
in the context of Hughes lists.
they are also known as "difference lists", and also used at type String in the Prelude as "ShowS", to help avoid quadratic behavior when making complicated Strings.
Sometimes I believed that I understand this reason, but then again I do not understand. I see that left-associative (++) like in ((a0 ++ a1) ++ a2) ++ a3 would cause quadratic time. But (++) is right-associative and 'concat' is 'foldr'. They should not scan the leading lists more than once. Also http://en.wikipedia.org/wiki/Difference_list doesn't answer this question. Where exactly is the problem?

Henning Thielemann
Sometimes I believed that I understand this reason, but then again I do not understand. I see that left-associative (++) like in ((a0 ++ a1) ++ a2) ++ a3 would cause quadratic time. But (++) is right-associative and 'concat' is 'foldr'. They should not scan the leading lists more than once. Also http://en.wikipedia.org/wiki/Difference_list doesn't answer this question. Where exactly is the problem?
| The shows functions return a function that prepends the output String | to an existing String. This allows constant-time concatenation of | results using function composition. I figure it's (constant vs. linear) vs. (linear vs. quadratic), for more involved examples. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

On Thu, 3 Jan 2008, Achim Schneider wrote:
Henning Thielemann
wrote: Sometimes I believed that I understand this reason, but then again I do not understand. I see that left-associative (++) like in ((a0 ++ a1) ++ a2) ++ a3 would cause quadratic time. But (++) is right-associative and 'concat' is 'foldr'. They should not scan the leading lists more than once. Also http://en.wikipedia.org/wiki/Difference_list doesn't answer this question. Where exactly is the problem?
| The shows functions return a function that prepends the output String | to an existing String. This allows constant-time concatenation of | results using function composition.
How is "constant-time concatenation" meant? If I process all list elements, it will need linear time. If I do not touch any element, I will need no time due to lazy evaluation. As far as I know, lazy evaluation is implemented by returning a union of a routine generating the actual value and the value, if it was already computed. Thus, calling (++) returns a function internally.
I figure it's (constant vs. linear) vs. (linear vs. quadratic), for more involved examples.
I can't see it. If I consider (x++y) but I do not evaluate any element of (x++y) or only the first element, then this will need constant time. If I evaluate the first n elements I need n computation time units. How is (.) on difference lists faster than (++) here?

Henning Thielemann
I figure it's (constant vs. linear) vs. (linear vs. quadratic), for more involved examples.
I can't see it. If I consider (x++y) but I do not evaluate any element of (x++y) or only the first element, then this will need constant time. If I evaluate the first n elements I need n computation time units. How is (.) on difference lists faster than (++) here?
It's in multiple calls to length if you do ((x++y)++z), the first run over x can be avoided. It basically gets rewritten to (x++y++z) by another level of abstraction. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Henning Thielemann wrote:
I can't see it. If I consider (x++y) but I do not evaluate any element of (x++y) or only the first element, then this will need constant time. If I evaluate the first n elements I need n computation time units. How is (.) on difference lists faster than (++) here?
That's a very good question. Basically, the problem is: how to specify the time complexity of an operation under lazy evaluation? You argue that (++) is constant time in the sense that evaluating (x ++ y) to WHNF is O(1) when x and y are already in WHNF. Same for (.). This is indeed correct but apparently fails to explain why (.) is any better than (++). Help! Of course, this very paradox shows that just looking at WHNF is not enough. The next best description is to pretend that our language is strict and to consider full normal form x in NF, y in NF --> (x++y) evaluates to NF in O(length x) time Even when x and y are not in normal form, we know that evaluating the expression (x ++ y) takes O(x ++ y) ~ O(length x) + O(x) + O(y) time to evaluate to NF. Here, O(e) is the time needed to bring the expression e into NF first. This approach now explains that (++) takes quadratic time when used left-associatively O((x ++ y) ++ z) ~ O(length x + length y) + O(length x) + O(x) + O(y) + O(z) instead of the expected O((x ++ y) ++ z) ~ O(x) + O(y) + O(z) or something (only up to constant factors and stuff, but you get the idea). Note that considering NFs is still only an approximation since O(head (qsort xs)) ~ O(n) + O(xs) where n = length xs instead of the expected O(head (qsort xs)) ~ O(qsort xs) ~ O(n log n) + O(xs) where n = length xs thanks to lazy evaluation. Also note that despite considering full normal forms, we can express some laziness with this by giving timings for an expression in different contexts like O(take n ys) O(head ys) instead of only O(ys). Same for parameters with something like O(const x) ~ O(1) instead of the anticipated O(const x) ~ O(x). (For lazy data structures, there are better ways to take laziness into account.)
With difference lists I write
shows L . (shows T . shows R) (shows LL . (showsLT . shows LR)) . (shows T . shows R) ((shows LLL . (shows LLT . shows LLR)) . (showsLT . shows LR)) . (shows T . shows R)
I still need to resolve three (.) until I get to the first character of the result string, but for the subsequent characters I do not need to resolve those dots. In the end, resolution of all (.) may need some time but then concatenation is performed entirely right-associative. Seems to be that this is the trick ...
So far so good, but the problem now is that analyzing (.) with full normal forms is doomed since this would mean to evaluate things under the lambda which may take less time than doing call-by-need reductions. Still, we somehow have O(x . y) ~ O(x) + O(y) which is better than O(x ++ y) but I'm not quite sure how to make this exact yet. In the end, the above O(e)s are just random doodles conjured out of the hat, I don't know a formalism for easy reasoning about time in a lazy language. Anyone any pointers? Note that the problem is already present for difference lists in strict languages. Regards, apfelmus

apfelmus
O((x ++ y) ++ z) ~ O(length x + length y) + O(length x) + O(x) + O(y) + O(z)
I would say that it's ~ O(length x) + O(length $ x ++ y) + O(2 * list mangling) -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

apfelmus wrote:
I don't know a formalism for easy reasoning about time in a lazy language. Anyone any pointers? Note that the problem is already present for difference lists in strict languages.
http://homepages.inf.ed.ac.uk/wadler/topics/strictness-analysis.html especially "strictness analysis aids time analysis". Much CPO math is involved, but I view it as: a function gives you output; if you know how much of the output you use, you can deduce how much work the function goes through.

Albert Y. C. Lai wrote:
apfelmus wrote:
I don't know a formalism for easy reasoning about time in a lazy language. Anyone any pointers? Note that the problem is already present for difference lists in strict languages.
http://homepages.inf.ed.ac.uk/wadler/topics/strictness-analysis.html
especially "strictness analysis aids time analysis".
Ah, of course, thanks. Together with D. Sands. Complexity Analysis for a Lazy Higher-Order Language. http://citeseer.ist.psu.edu/291589.html for the higher-order case, a satisfactory analysis can be put together. The formalism is basically as follows: for a function f, let f^T denote the time needed to execute it to weak normal form given that it's arguments are already in weak normal form. Weak normal form = full normal form for algebraic values and lambda-abstraction for functions = what you'd expect in a strict language. Plain values = nullary functions. For instance (++)^T [] ys = 1 + ys^T = 1 (++)^T (x:xs) ys = 1 + (x:(xs ++ ys))^T = 1 + (++)^T xs ys + xs^T + ys^T -- (:) is free = 1 + (++)^T xs ys ==> (++)^T xs ys = O(length xs) Substituting a function application by the function body is counted as 1 time step, that's where the 1 + comes from. For difference lists, we have (.)^T f g = O(1) since it immediately returns the lambda-abstraction \x -> f(g x) . Now, we missed something important about difference lists namely the function toList f = f [] that turns a difference list into an ordinary list and this function is O(n). In contrast, The "pendant" for ordinary lists, i.e. the identity function, is only O(1). Why is it O(n)? Well, (.) itself may be O(1) but it constructs a function that needs lots of time to run. In particular (f . g)^T [] = ((\x->f (g x))[])^T = 1 + (f (g []))^T = 1 + f^T (g []) + (g [])^T = 1 + f^T (g []) + g^T [] So, to analyze higher-order functions, we simply have to keep track of the "size" of the returned functions (more precisely, Sands uses "cost-closures"). The above reduces to (f . g)^T [] = 1 + f^T [] + g^T [] Since our difference lists don't care of what they are prepended to f^T xs = f^T [] Cheating a bit with the notation, we can write toList^T (f . g) = 1 + toList^T f + toList^T g This means that a difference list build out of n elements by m applications of (.) will take O(n + m) time. This is the same as O(m) because m >= n , our lists are concatenations of singletons. That's not O(n) as anticipated, but it's alright: a concatenation of m empty lists is empty but clearly takes O(m) time, so the number of concatenations matters. Since difference lists offer such a good concatenation, why not replace ordinary lists entirely? Well, the problem is that we have another function that should run fast, namely head . For ordinary lists, head^T xs = O(1) but for difference lists, we have (head . toList)^T f = O(m) which is >= O(n) in the worst case, lazy evaluation notwithstanding. How to analyze lazy evaluation? Wadler's approach is to add an extra argument to every expression which says how much of the expression is to be evaluated. This extra information can be encoded via projections. But I think it's sufficient here to let (head expr)^T symbolize the time to reduce expr to weak head normal form. For example, (head . toList)^T (f . g) = 1 + (head . toList)^T f assuming that f is nonempty. But due to the 1 + , any left-nested composition like head . toList $ (((a . b) . c) . d) . e still needs O(m) time. So, difference lists are no "eierlegende wollmilchsau" either. Regards, apfelmus

Henning Thielemann
On Wed, 9 Jan 2008, apfelmus wrote:
So, difference lists are no "eierlegende wollmilchsau" either.
LEO's forum suggests 'swiss army knife' as translation. :-)
But you really need one with 5 differently-sized blades plus three spezialized carving blades, an USB stick, microscope, 13 kinds of torx, imbus etc drivers each, a tv set (analogue/digital) with unfoldable touchscreen, at least 3-band GSM and WiFi connectivity, hydraulic car jack and chain saw to award it with the term egg-laying woolmilkpig. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Achim Schneider wrote:
Henning Thielemann wrote:
apfelmus wrote:
So, difference lists are no "eierlegende wollmilchsau" either.
LEO's forum suggests 'swiss army knife' as translation. :-)
But you really need one with 5 differently-sized blades plus three spezialized carving blades, an USB stick, microscope, 13 kinds of torx, imbus etc drivers each, a tv set (analogue/digital) with unfoldable touchscreen, at least 3-band GSM and WiFi connectivity, hydraulic car jack and chain saw to award it with the term egg-laying woolmilkpig.
But even such knives still can't lay eggs :( Regards, apfelmus

G'day all.
Quoting Achim Schneider
But you really need one with 5 differently-sized blades plus three spezialized carving blades, an USB stick, microscope, 13 kinds of torx, imbus etc drivers each, a tv set (analogue/digital) with unfoldable touchscreen, at least 3-band GSM and WiFi connectivity, hydraulic car jack and chain saw to award it with the term egg-laying woolmilkpig.
So a better translation into British engineering language might be "Heath Robinson"? Cheers, Andrew Bromage

ajb@spamcop.net wrote:
G'day all.
Quoting Achim Schneider
: But you really need one with 5 differently-sized blades plus three spezialized carving blades, an USB stick, microscope, 13 kinds of torx, imbus etc drivers each, a tv set (analogue/digital) with unfoldable touchscreen, at least 3-band GSM and WiFi connectivity, hydraulic car jack and chain saw to award it with the term egg-laying woolmilkpig.
So a better translation into British engineering language might be "Heath Robinson"?
Not really. To give you the perfect cs example, take a look at emacs: eight megabytes and continuous swapping, a whole OS with any app and library you could ever dream of, but not one decent editor. See, on the one hand that beast is every farmer's dream, but then you can't butcher the pig 'cos you want to have its milk, wool and eggs.. http://catb.org/jargon/html/C/creeping-featuritis.html is the associated plague. Outlook reminds me of it, too: I spend half a paid hour configuring it, changing everything, from default message format to quote behaviour and similar. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Henning Thielemann wrote:
Sometimes I believed that I understand this reason, but then again I do not understand. I see that left-associative (++) like in ((a0 ++ a1) ++ a2) ++ a3 would cause quadratic time. But (++) is right-associative and 'concat' is 'foldr'. They should not scan the leading lists more than once
the point is that the the right-associativity of (++) doesn't prevent terms of the form ((a0 ++ a1) ++ a2) ++ a3 to arise due to explicit parentheses or (more important) recursive processing of data structures. consider showing a tree-like structure of the form ((a0 `Cons` a1) `Cons` a2) `Cons` a3 A naive Show instance will basically replace every Cons by (++) and produce a string concatenation of the offending left-associative form: ((show a0 ++ show a1) ++ show a2) ++ show a3 Tillmann

Am Donnerstag, 3. Januar 2008 14:48 schrieb Henning Thielemann:
On Thu, 3 Jan 2008, Isaac Dupree wrote:
Achim Schneider wrote:
Achim Schneider
wrote: [...]
I'm trying to grok that
[] = id ++ = .
in the context of Hughes lists.
they are also known as "difference lists", and also used at type String in the Prelude as "ShowS", to help avoid quadratic behavior when making complicated Strings.
Sometimes I believed that I understand this reason, but then again I do not understand. I see that left-associative (++) like in ((a0 ++ a1) ++ a2) ++ a3 would cause quadratic time. But (++) is right-associative and 'concat' is 'foldr'. They should not scan the leading lists more than once. Also http://en.wikipedia.org/wiki/Difference_list doesn't answer this question. Where exactly is the problem?
Show a binary tree inorder? L-T-R gives (show L) ++ (show T ++ (show R)) gives ((show LL) ++ (showLT ++ (show LR))) ++ (show T ++ (show R)) gives (((show LLL) ++ (show LLT ++ (show LLR))) ++ (show LT ++ (show LR))) ++ (show T ++ (show R)) etc. If the tree is large, you end up with a pretty large left association for the left subtree. True, there's lot of right association, too, but bad enough, I think. Cheers, Daniel

On Thu, 3 Jan 2008, Daniel Fischer wrote:
Am Donnerstag, 3. Januar 2008 14:48 schrieb Henning Thielemann:
Sometimes I believed that I understand this reason, but then again I do not understand. I see that left-associative (++) like in ((a0 ++ a1) ++ a2) ++ a3 would cause quadratic time. But (++) is right-associative and 'concat' is 'foldr'. They should not scan the leading lists more than once. Also http://en.wikipedia.org/wiki/Difference_list doesn't answer this question. Where exactly is the problem?
Show a binary tree inorder? L-T-R gives (show L) ++ (show T ++ (show R)) gives ((show LL) ++ (showLT ++ (show LR))) ++ (show T ++ (show R)) gives (((show LLL) ++ (show LLT ++ (show LLR))) ++ (show LT ++ (show LR))) ++ (show T ++ (show R))
etc. If the tree is large, you end up with a pretty large left association for the left subtree. True, there's lot of right association, too, but bad enough, I think.
With difference lists I write shows L . (shows T . shows R) (shows LL . (showsLT . shows LR)) . (shows T . shows R) ((shows LLL . (shows LLT . shows LLR)) . (showsLT . shows LR)) . (shows T . shows R) I still need to resolve three (.) until I get to the first character of the result string, but for the subsequent characters I do not need to resolve those dots. In the end, resolution of all (.) may need some time but then concatenation is performed entirely right-associative. Seems to be that this is the trick ...

On 3 Jan 2008, at 4:49 AM, Isaac Dupree wrote:
Achim Schneider wrote:
Achim Schneider
wrote: [...] I'm trying to grok that [] = id ++ = . in the context of Hughes lists.
they are also known as "difference lists", and also used at type String in the Prelude as "ShowS", to help avoid quadratic behavior when making complicated Strings. the [a]->[a] is not an ordinary function -- it's expected not to examine its argument, just to use it exactly once (is there a formal way to say that?)
f xn = f [] ++ xn is the first thing off the top of my head. OTOH, examining your argument (while, strictly speaking unsafe) is pretty darn cool: f [] = "foo" f (c:s) | isAlphaNum c = "foo "++c:s | otherwise = "foo"++c:s Token prepend. jcc

Achim Schneider wrote:
...is a paper about automatic specialisation of functions by unboxing arguments, one could say. I'm only on page 6, but already survived the first formalisms, which is bound to mean that the rest of the paper is likewise accessible, as hinted on at ltu.
http://www.cs.nott.ac.uk/~gmh/wrapper.pdf
The transformation itself is mindbogglingly easy, which makes this a good start: You only have to understand the formalisms, not so much what the formalisms are representing. To quote spj: It usually turns out to be more interesting and challenging than it seemed at first.
I'm tempted to write that this is a paper for everyone trying to figure out what the heck Jonathan is talking about.
I like it! Of course the technique itself doesn't provide guidance on what type you want to transform a function to. on page 6, stronger vs weaker seemed backwards to me... isn't (wrap ◦ unwrap = idA) a stronger condition than (wrap ◦ unwrap ◦ body = body), because it tells you more, and is true in fewer cases? (which is also why you want to assume (wrap ◦ unwrap = idA) when you can, because it's the most useful one for subsequent program transformation) and then the inevitable minor copy-editing :-) p. 22. intentional properties --> intensional (right?) p. 27. typo 'unwarp' for 'unwrap' BTW. GHC currently does allow e.g. (I# (1# +# 2#)), not just (case (1# +# 2#) of n# -> I# n#) -- the strictness implications seem pretty straightforwards (it's trivial to transform away). p. 29. "in both these system" -> systems ~Isaac

Isaac Dupree wrote:
Achim Schneider wrote:
on page 6, stronger vs weaker seemed backwards to me... isn't (wrap ◦ unwrap = idA) a stronger condition than (wrap ◦ unwrap ◦ body = body), because it tells you more, and is true in fewer cases? (which is also why you want to assume (wrap ◦ unwrap = idA) when you can, because it's the most useful one for subsequent program transformation)
Ah, thanks you say that, I have been asking myself if that was just me, having things backward. In my book, stronger => weaker, too. Cheers Ben

Achim Schneider wrote:
...is a paper about automatic specialisation of functions by unboxing arguments, one could say. I'm only on page 6, but already survived the first formalisms, which is bound to mean that the rest of the paper is likewise accessible, as hinted on at ltu.
http://www.cs.nott.ac.uk/~gmh/wrapper.pdf
The transformation itself is mindbogglingly easy, which makes this a good start: You only have to understand the formalisms, not so much what the formalisms are representing. To quote spj: It usually turns out to be more interesting and challenging than it seemed at first.
I'm tempted to write that this is a paper for everyone trying to figure out what the heck Jonathan is talking about.
Hey Achim many thanks for bringing this wonderful paper to my attention. Dijkstra's soul breethes out of every equation... Cheers Ben
participants (11)
-
Achim Schneider
-
ajb@spamcop.net
-
Albert Y. C. Lai
-
apfelmus
-
Ben Franksen
-
Brent Yorgey
-
Daniel Fischer
-
Henning Thielemann
-
Isaac Dupree
-
Jonathan Cast
-
Tillmann Rendel