
Hia folks, More performance bugs or misunderstandings. Yes, binary serialisation again. Consider this example from an instance for the Binary serialisation class. We get lots and lots of code that looks like this: data Foo a = Foo Int | Bar ... instance Binary Foo where put (Foo a) = putTag 0 >> put a put (Bar ...) = putTag 1 ... Let's expand this one step so we can see what we want to happen: put (Foo a) = word8 (fromIntegral 0 :: Word8) >> word64le a so we want to combine the bounds checks in word8 and word64le so that we do just one check here rather than two. Lets expand it another step: put (Foo a) = write 1 (pokeWord8 (fromIntegral 0 :: Word8)) `thenPut` write 8 (pokeWord64le (fromIntegral a :: Word64)) Now we have a rule: "write/write" forall n m a b. write n a `thenPut` write m b = write (n+m) (\p -> a p >> b (p `plusPtr` n)) so obviously we'd like to apply this rule here to common-up the bounds check that write n performs. Normally this rule works fine, but in this specific example it does not. Can you spot why? Here's a simple case that works: foo :: Word8 -> Word8 -> Put () foo a b = word8 a >> word8 b and the one that doesn't bar :: Word8 -> Put () bar a = word8 0 >> put a What's the difference? Why should one work and the other not? Of course I gave it away in the email subject: let floating. In the second example, the 'bar' function, word8 0 does not depend on the arguments of the function so it gets floated out: lvl_s198 = write 1 (pokeWord8 0) bar n = thenPut lvl_s198 (write 8 (pokeWord64le n)) and now the rule does not match this because 'write' isn't there directly in the expression. This is a bit disappointing of course, so how do we fix it. There are two possibilities as far as I can see. Either don't let float it, or have the rule matcher look through the indirection. The rule matcher is already looks through lets, but that's a slightly different issue. That's for a situation like: bar n = thenPut (let lvl_s198 = pokeWord8 0 in write 1 lvl_s198) (write 8 (pokeWord64le n)) where we've got an actual let expression whose body would match the rule's pattern. It would not always be a good idea to "look up" let floated vars when doing rule matching since it could lead to duplication. Though in this example the let-floated expression is used only once, so that is not a problem. So one possibility might be to look up variables for the purpose of rule matching if they are only used once. Another possibility might be to not let-float it in the first place. Usually let-floating things like this is a good idea, so how can we spot that we might want to keep it in applicative form? The fact that it's mentioned in the pattern of a rule is the strongest hint. As I understand it, being mentioned in the head of a rule is already taken into account when doing let-floating. However in this case the head of the let-floated expression is not the head of the rule, but a sub expression: write n a `thenPut` write m b = write (n+m) (\p -> a p >> b (p `plusPtr` n)) 'thenPut' is the head of this rule's pattern. But we'd rather not float 'write' out either. So perhaps a similar heuristic should be applied to all the free variables mentioned in the pattern. Another approach might be for the library author to declare that some function is cheap, or is otherwise important to not float out, but should be kept in applicative form to aid rule matching. We have similar problems with list comprehensions and enumFromTo: [ ... | i <- [0..n], j <- [0..m] ] because in the second generator [0..m] does not depend on the first generator, it gets floated out and shared. Of course with fusion we can make the [0..m] extremely cheap, but not if it has been floated out. If it gets floated out and shared in memory between each iteration of the first generator, then we really do have to traverse the list data structure each iteration, rather than just incrementing an unboxed number. So this is another example where we'd like to say that a function, enumFromTo in this case, is extremely cheap and should not be floated out, even if it looses sharing. Actually, my example is simpler because it does not involve any loss of sharing. Duncan

| This is a bit disappointing of course, so how do we fix it. There are | two possibilities as far as I can see. Either don't let float it, or | have the rule matcher look through the indirection. This is a tricky one. One possibility would be to postpone full laziness until later in the optimisation pipeline. But then some useful sharing isn't as easily accessible. For example, if you look in DynFlags, where the main pass structure is defined, you'll see: -- Don't inline anything till full laziness has bitten -- In particular, inlining wrappers inhibits floating -- e.g. ...(case f x of ...)... -- ==> ...(case (case x of I# x# -> fw x#) of ...)... -- ==> ...(case x of I# x# -> case fw x# of ...)... -- and now the redex (f x) isn't floatable any more -- Similarly, don't apply any rules until after full -- laziness. Notably, list fusion can prevent floating. Making the rule matcher look through lets could be possible. Maybe even on a rule-by-rule basis. But consider f x = let xs = map expensive [1..x] in (sum xs, prod xs) If you are prepared to duplicate xs, you can fuse with both 'sum' and 'product'. But duplicating 'xs' duplicates an arbitrarily large computation. Are you sure you want that? The beauty of the "only match when the subexpression is literally there" idea is that you *know* it's the unique occurrence. Matching on a 'let' that's outside a lambda is potentially even more extreme; the example above was only 2-way sharing. I think you might be on solider ground if you declare that some functions are "cheap enough to duplicate"; in this case your 'write' function. But even then you need to take care: let x = cheap_fun a b in \y. ....(foo x).... Now a rule (foo (cheap_fun p q)) might perhaps match. But what if 'a' or 'b' was expensive?! Then we'd prefer that the defn of x looked like let a = <a-code>; b = <b-code>; x = cheap_fun a b in \y. ... But now it's less easy to match rules like (foo (cheap_fun (g x) (h y))) Nevertheless, my nose tells me that promising that a function is cheap may be the most robust way forward. Simon an/listinfo/glasgow-haskell-users

On Mon, 2007-06-04 at 12:31 +0100, Simon Peyton-Jones wrote:
| This is a bit disappointing of course, so how do we fix it. There are | two possibilities as far as I can see. Either don't let float it, or | have the rule matcher look through the indirection.
This is a tricky one. One possibility would be to postpone full laziness until later in the optimisation pipeline. But then some useful sharing isn't as easily accessible. For example, if you look in DynFlags, where the main pass structure is defined, you'll see:
-- Don't inline anything till full laziness has bitten -- In particular, inlining wrappers inhibits floating -- e.g. ...(case f x of ...)... -- ==> ...(case (case x of I# x# -> fw x#) of ...)... -- ==> ...(case x of I# x# -> case fw x# of ...)... -- and now the redex (f x) isn't floatable any more -- Similarly, don't apply any rules until after full -- laziness. Notably, list fusion can prevent floating.
Making the rule matcher look through lets could be possible. Maybe even on a rule-by-rule basis. But consider f x = let xs = map expensive [1..x] in (sum xs, prod xs) If you are prepared to duplicate xs, you can fuse with both 'sum' and 'product'. But duplicating 'xs' duplicates an arbitrarily large computation. Are you sure you want that? The beauty of the "only match when the subexpression is literally there" idea is that you *know* it's the unique occurrence.
No, I don't want to duplicate. But in my example the let var was only used once, so there was no sharing problem. So in this
Matching on a 'let' that's outside a lambda is potentially even more extreme; the example above was only 2-way sharing.
So instead of having the rule matcher do some kind of on the fly let inlining, it should not have been floated out in the first place. So if the let was indeed originally outside the lambda then inlining it adds more allocation, but if we choose not to let-float it then we're not making the program worse, we're just missing an opportunity to reduce allocation. But if we can identify a good reason not to let float it (like it appearing in a rule) then we can avoid thinking about duplicating let bound things.
I think you might be on solider ground if you declare that some functions are "cheap enough to duplicate"; in this case your 'write' function. But even then you need to take care: let x = cheap_fun a b in \y. ....(foo x).... Now a rule (foo (cheap_fun p q)) might perhaps match. But what if 'a' or 'b' was expensive?! Then we'd prefer that the defn of x looked like let a = <a-code>; b = <b-code>; x = cheap_fun a b in \y. ... But now it's less easy to match rules like (foo (cheap_fun (g x) (h y)))
Nevertheless, my nose tells me that promising that a function is cheap may be the most robust way forward.
Yes, but I'm not sure this case is really one where we're dealing with duplicating a cheap thing. We're just not let-floating something that could be let-floated. In my original example, once we get to a later phase of compilation, the let bound thing does get inlined again. Once the bind and write functions get inlined, lots of case expressions appear and then it becomes obvious that it'd be beneficial to inline, and ghc does so. And as I say, it was only used in one place. So the frustrating thing is that it is let-floated *only* for the one phase where I need it not to be floated so I can rule match on it :-). So how about the idea of taking the rule pattern into account when deciding to let-float? In this case it should be clear that the benefits of let floating are pretty marginal, so I'd guess it just needs a little extra nudge to decide that this case isn't beneficial to float out. That is, my guess is that we'd not loose many beneficial let-floatings by taking the presence of a rule into account. But that's only a guess. Duncan

| No, I don't want to duplicate. But in my example the let var was only | used once, so there was no sharing problem. Not so in general -- floating outside a lambda that is called many times can dramatically increase sharing. You're right that all you want is to *forgo* an optimisation; but I want to avoid people complaining about lost optimisations! | In my original example, once we get to a later phase of compilation, the | let bound thing does get inlined again. Once the bind and write | functions get inlined, lots of case expressions appear and then it | becomes obvious that it'd be beneficial to inline, and ghc does so. And Hmm. Indeed, looking at it: lvl_s198 = write 1 (pokeWord8 0) I believe that there are no redexes there, correct? So we are gaining no sharing of work. GHC is simply avoiding an allocation, by floating to the top level, rather than actually avoiding work. Is that always so in the cases you are bothered about? That is, the annoying floating is saving allocation but not work? And GHC can see that? If so, perhaps we can make the first run of FloatOut not do allocation-saving. There's a second run later I think. You could have a go at this if you like. Look at SetLevels.lhs line 400. I think you might try something like replacing 'True' with: floatConsts env || not (exprIsCheap expr) The 'floatConsts' flag is False for the first run of FloatOut but True for the second. | So how about the idea of taking the rule pattern into account when | deciding to let-float That could be the next thing to try. I used to look at LHSs etc, but decided that it was not robust enough. Simon
participants (2)
-
Duncan Coutts
-
Simon Peyton-Jones