Unexpected lack of optimisation

Hi Using GHC 6.9.20071226: The following code: ------------------------------------------------------- test s | begin2 'n' 'a' s = "test" | begin2 'n' 'b' s = "test2" begin2 :: Char -> Char -> String -> Bool begin2 x1 x2 (y:ys) | x1 == y = begin1 x2 ys begin2 _ _ _ = False begin1 :: Char -> String -> Bool begin1 x1 (y:ys) | x1 == y = True ------------------------------------------------------- You might expect the head of the list s to be tested for equality with 'n' only once. Something like: test s = case s of s1:ss -> case s1 of 'n' -> .... choose 'a' or 'b' .... _ -> fail Unfortunately, GHC can't common up these two tests. It inserts a State# RealWorld in the middle, giving a result of: test s = case s of s1:ss -> case s1 of 'n' -> case ss of s2:ss -> case s2 of 'a' -> .... _ -> retry state _ -> retry state retry dummy = case s of s1:ss -> case s1 of 'n' -> .... If GHC was to inline the "retry" (which is a local let-bound lambda) it should have no problem merging these two cases. I'm not entirely sure why the State# gets inserted, but was wondering if it is necessary? The complete -ddump-simpl is at the end of this message. Thanks Neil -------------------------------------------------------------- Text.HTML.TagSoup.Development.Sample.test :: GHC.Base.String -> [GHC.Base.Char] [GlobalId] [Arity 1] Text.HTML.TagSoup.Development.Sample.test = \ (s_a6g :: GHC.Base.String) -> let { $j_s7l :: GHC.Prim.State# GHC.Prim.RealWorld -> [GHC.Base.Char] [Arity 1] $j_s7l = \ (w_s7m :: GHC.Prim.State# GHC.Prim.RealWorld) -> let { $j1_s7d :: GHC.Prim.State# GHC.Prim.RealWorld -> [GHC.Base.Char] [Arity 1] $j1_s7d = \ (w1_s7e :: GHC.Prim.State# GHC.Prim.RealWorld) -> GHC.Err.patError @ [GHC.Base.Char] "Text/HTML/TagSoup/Development/Sample.hs:(48,0)-(49,34)|function test" } in case s_a6g of wild_Xl { [] -> $j1_s7d GHC.Prim.realWorld#; : y_a6o ys_a6q -> case GHC.Base.$f4 of tpl_Xr { GHC.Base.:DEq tpl1_B2 tpl2_B3 -> case tpl1_B2 (GHC.Base.C# 'n') y_a6o of wild1_Xp { GHC.Base.False -> $j1_s7d GHC.Prim.realWorld#; GHC.Base.True -> let { fail_d6S :: GHC.Base.Bool [] fail_d6S = GHC.Err.patError @ GHC.Base.Bool "Text/HTML/TagSoup/Development/Sample.hs:57:0-32|function begin1" } in case ys_a6q of wild2_XB { [] -> case fail_d6S of wild3_Xj { GHC.Base.False -> $j1_s7d GHC.Prim.realWorld#; GHC.Base.True -> GHC.Base.unpackCString# "test2" }; : y1_a6y ys1_a6A -> case GHC.Base.$f4 of tpl3_XH { GHC.Base.:DEq tpl4_XL tpl5_XN -> case tpl4_XL (GHC.Base.C# 'b') y1_a6y of wild3_Xo { GHC.Base.False -> case fail_d6S of wild4_Xj { GHC.Base.False -> $j1_s7d GHC.Prim.realWorld#; GHC.Base.True -> GHC.Base.unpackCString# "test2" }; GHC.Base.True -> GHC.Base.unpackCString# "test2" } } } } } } } in case s_a6g of wild_B1 { [] -> $j_s7l GHC.Prim.realWorld#; : y_a6o ys_a6q -> case GHC.Base.$f4 of tpl_Xp { GHC.Base.:DEq tpl1_B2 tpl2_B3 -> case tpl1_B2 (GHC.Base.C# 'n') y_a6o of wild1_XT { GHC.Base.False -> $j_s7l GHC.Prim.realWorld#; GHC.Base.True -> let { fail_d6S :: GHC.Base.Bool [] fail_d6S = GHC.Err.patError @ GHC.Base.Bool "Text/HTML/TagSoup/Development/Sample.hs:57:0-32|function begin1" } in case ys_a6q of wild2_Xz { [] -> case fail_d6S of wild3_XD { GHC.Base.False -> $j_s7l GHC.Prim.realWorld#; GHC.Base.True -> GHC.Base.unpackCString# "test" }; : y1_a6y ys1_a6A -> case GHC.Base.$f4 of tpl3_XF { GHC.Base.:DEq tpl4_XJ tpl5_XL -> case tpl4_XJ (GHC.Base.C# 'a') y1_a6y of wild3_Xo { GHC.Base.False -> case fail_d6S of wild4_XN { GHC.Base.False -> $j_s7l GHC.Prim.realWorld#; GHC.Base.True -> GHC.Base.unpackCString# "test" }; GHC.Base.True -> GHC.Base.unpackCString# "test" } } } } } }

Did you try comparing the results if you pass the -fno-state-hack flag? Cheers, Tim -- Tim Chevalier * http://cs.pdx.edu/~tjc * Often in error, never in doubt "I am never gratuitously rude. My rudeness is carefully calibrated to the stupidity and obtuseness of the people I am dealing with." -- Adam Carr

Neil A nice example, but I think it's difficult to give systematic solution. * The 'retry' function is a "join point", where two different conditional branches join up. * As you say, if 'retry' was inlined, all would be fine. But what if 'retry' was big? Then we'd get lots of code duplication, in exchange for fewer tests. * Presumably it's not inlined because it's over the inline size threshold. (You did use -O?) * The 'state' argument is just there to make sure that 'retry' is a *function* not a *thunk*, to avoid the overheads of unnecessary thunk update. So it's not obvious to me how to improve this example, at least in general. But I could easily be missing something. Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On | Behalf Of Neil Mitchell | Sent: 28 April 2008 22:13 | To: GHC Users Mailing List | Subject: Unexpected lack of optimisation | | Hi | | Using GHC 6.9.20071226: | | The following code: | | ------------------------------------------------------- | | test s | begin2 'n' 'a' s = "test" | | begin2 'n' 'b' s = "test2" | | | begin2 :: Char -> Char -> String -> Bool | begin2 x1 x2 (y:ys) | x1 == y = begin1 x2 ys | begin2 _ _ _ = False | | begin1 :: Char -> String -> Bool | begin1 x1 (y:ys) | x1 == y = True | | ------------------------------------------------------- | | You might expect the head of the list s to be tested for equality with | 'n' only once. Something like: | | test s = case s of | s1:ss -> case s1 of | 'n' -> .... choose 'a' or 'b' .... | _ -> fail | | Unfortunately, GHC can't common up these two tests. It inserts a | State# RealWorld in the middle, giving a result of: | | test s = case s of | s1:ss -> case s1 of | 'n' -> case ss of | s2:ss -> case s2 of | 'a' -> .... | _ -> | retry state | _ -> retry state | | retry dummy = case s of | s1:ss -> case s1 of | 'n' -> .... | | If GHC was to inline the "retry" (which is a local let-bound lambda) | it should have no problem merging these two cases. I'm not entirely | sure why the State# gets inserted, but was wondering if it is | necessary? | | The complete -ddump-simpl is at the end of this message. | | Thanks | | Neil | | -------------------------------------------------------------- | | Text.HTML.TagSoup.Development.Sample.test :: GHC.Base.String -> [GHC.Base.Char] | [GlobalId] | [Arity 1] | Text.HTML.TagSoup.Development.Sample.test = | \ (s_a6g :: GHC.Base.String) -> | let { | $j_s7l :: GHC.Prim.State# GHC.Prim.RealWorld -> [GHC.Base.Char] | [Arity 1] | $j_s7l = | \ (w_s7m :: GHC.Prim.State# GHC.Prim.RealWorld) -> | let { | $j1_s7d :: GHC.Prim.State# GHC.Prim.RealWorld -> [GHC.Base.Char] | [Arity 1] | $j1_s7d = | \ (w1_s7e :: GHC.Prim.State# GHC.Prim.RealWorld) -> | GHC.Err.patError | @ [GHC.Base.Char] | | "Text/HTML/TagSoup/Development/Sample.hs:(48,0)-(49,34)|function test" | } in | case s_a6g of wild_Xl { | [] -> $j1_s7d GHC.Prim.realWorld#; | : y_a6o ys_a6q -> | case GHC.Base.$f4 of tpl_Xr { GHC.Base.:DEq tpl1_B2 tpl2_B3 -> | case tpl1_B2 (GHC.Base.C# 'n') y_a6o of wild1_Xp { | GHC.Base.False -> $j1_s7d GHC.Prim.realWorld#; | GHC.Base.True -> | let { | fail_d6S :: GHC.Base.Bool | [] | fail_d6S = | GHC.Err.patError | @ GHC.Base.Bool | | "Text/HTML/TagSoup/Development/Sample.hs:57:0-32|function begin1" } in | case ys_a6q of wild2_XB { | [] -> | case fail_d6S of wild3_Xj { | GHC.Base.False -> $j1_s7d GHC.Prim.realWorld#; | GHC.Base.True -> GHC.Base.unpackCString# "test2" | }; | : y1_a6y ys1_a6A -> | case GHC.Base.$f4 of tpl3_XH { GHC.Base.:DEq | tpl4_XL tpl5_XN -> | case tpl4_XL (GHC.Base.C# 'b') y1_a6y of wild3_Xo { | GHC.Base.False -> | case fail_d6S of wild4_Xj { | GHC.Base.False -> $j1_s7d GHC.Prim.realWorld#; | GHC.Base.True -> GHC.Base.unpackCString# "test2" | }; | GHC.Base.True -> GHC.Base.unpackCString# "test2" | } | } | } | } | } | } } in | case s_a6g of wild_B1 { | [] -> $j_s7l GHC.Prim.realWorld#; | : y_a6o ys_a6q -> | case GHC.Base.$f4 of tpl_Xp { GHC.Base.:DEq tpl1_B2 tpl2_B3 -> | case tpl1_B2 (GHC.Base.C# 'n') y_a6o of wild1_XT { | GHC.Base.False -> $j_s7l GHC.Prim.realWorld#; | GHC.Base.True -> | let { | fail_d6S :: GHC.Base.Bool | [] | fail_d6S = | GHC.Err.patError | @ GHC.Base.Bool | | "Text/HTML/TagSoup/Development/Sample.hs:57:0-32|function begin1" } in | case ys_a6q of wild2_Xz { | [] -> | case fail_d6S of wild3_XD { | GHC.Base.False -> $j_s7l GHC.Prim.realWorld#; | GHC.Base.True -> GHC.Base.unpackCString# "test" | }; | : y1_a6y ys1_a6A -> | case GHC.Base.$f4 of tpl3_XF { GHC.Base.:DEq tpl4_XJ tpl5_XL -> | case tpl4_XJ (GHC.Base.C# 'a') y1_a6y of wild3_Xo { | GHC.Base.False -> | case fail_d6S of wild4_XN { | GHC.Base.False -> $j_s7l GHC.Prim.realWorld#; | GHC.Base.True -> GHC.Base.unpackCString# "test" | }; | GHC.Base.True -> GHC.Base.unpackCString# "test" | } | } | } | } | } | } | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hi
* As you say, if 'retry' was inlined, all would be fine. But what if 'retry' was big? Then we'd get lots of code duplication, in exchange for fewer tests.
* Presumably it's not inlined because it's over the inline size threshold. (You did use -O?)
I used -O2. I also added {-# INLINE #-} pragmas to both begin1 and begin2. In this case, I know that after inlining the let expression things will merge and become smaller afterwards (although can't see any reasonable way the compiler could know this). I realise that there is a limit on the size of inlining, and I can reduce the perceived cost of inlining begin1/begin2 with pragmas. However, once they have been inlined into a local let expression, it doesn't seem like there is much I can do to persuade the compiler to inline the let binding. I'm thinking of a situation something like: {-# INLINE foo #-} foo = large bar x = if x then res else res where res = foo By putting an INLINE on foo I am able to persuade it to be inlined into the binding of bar, but I can't then persuade it to be inlined at the let expression. (I tried to come up with a small example representing this situation, but the original begin1/begin2 example is as short as I can get)
* The 'state' argument is just there to make sure that 'retry' is a *function* not a *thunk*, to avoid the overheads of unnecessary thunk update.
OK. That seems perfectly reasonable. Does the introduction of the state argument either remove the inline annotation, or substantially increase the cost of inlining the let binding? Otherwise, I don't think its relevant to my particular case. Thanks Neil

| {-# INLINE foo #-} | foo = large | | bar x = if x then res else res | where res = foo | | By putting an INLINE on foo I am able to persuade it to be inlined | into the binding of bar, but I can't then persuade it to be inlined at | the let expression. I'm not certain what you mean here. I think you mean that in the above code you end up with bar x = let res = large in if x then res else res whereas what you wanted was bar x = if x then large else large That is indeed tricky in general, as I'm sure you can see: let x = <large> in let y = e2[x,x] in let z = e3[y] in ... Is it better to inline x into e2, or y into e3, or z into e4? Hard to tell! In your example, you want 'res' to inline "first". You can get that by explicit control: {-# NOINLINE [0] foo #-} That says "don't inline foo before phase 0", which in turn gives time for 'res' to get inlined first. I'm not certain whether that'll help in your actual example. Simon

Hi
| {-# INLINE foo #-} | foo = large | | bar x = if x then res else res | where res = foo | | By putting an INLINE on foo I am able to persuade it to be inlined | into the binding of bar, but I can't then persuade it to be inlined at | the let expression.
I'm not certain what you mean here. I think you mean that in the above code you end up with bar x = let res = large in if x then res else res whereas what you wanted was bar x = if x then large else large
Yes, that is exactly it.
In your example, you want 'res' to inline "first". You can get that by explicit control:
{-# NOINLINE [0] foo #-}
That says "don't inline foo before phase 0", which in turn gives time for 'res' to get inlined first. I'm not certain whether that'll help in your actual example.
It worked, with: {-# INLINE [1] begin1 #-} {-# INLINE begin2 #-} I don't think this approach will compose particularly well, and in the real case I was trying (not this reduced example) I don't think it will work because there is some recursion and RULES involved. I'll have another go with the full example in future. I did however notice an issue while doing this work - in GHC 6.8.2 and HEAD from Christmas. {-# INLINE begin2 #-} Temp.hs:7:1: lexical error at character '}' In particular if you use the RULES example from the manual, then it doesn't work. Thanks Neil

| It worked, with: | | {-# INLINE [1] begin1 #-} | {-# INLINE begin2 #-} | | I don't think this approach will compose particularly well, and in the | real case I was trying (not this reduced example) I don't think it | will work because there is some recursion and RULES involved. I'll | have another go with the full example in future. I agree it's not great. But I just don't have a better idea at the moment. In the general case I showed, it seems hard to know what to do. Perhaps there are better rules for special cases. If you have good ideas, I'm all ears! | I did however notice an issue while doing this work - in GHC 6.8.2 and | HEAD from Christmas. | | {-# INLINE begin2 | #-} | Temp.hs:7:1: lexical error at character '}' Layout applies inside pragmas, so you may need a space before the #-}. The error message is unhelpful. I'll fix the manual S

| It worked, with: | | {-# INLINE [1] begin1 #-} | {-# INLINE begin2 #-} | | I don't think this approach will compose particularly well, and in
Simon Peyton Jones wrote: the
| real case I was trying (not this reduced example) I don't think it | will work because there is some recursion and RULES involved. I'll | have another go with the full example in future.
I agree it's not great. But I just don't have a better idea at the moment. In the general case I showed, it seems hard to know what to do. Perhaps there are better rules for special cases. If you have good ideas, I'm all ears!
I've thought for a while that the phase system is going to cause trouble when people start trying to build self-optimising libraries on top of other self-optimising libraries like ByteString. Instead of specifying absolute phase numbers, could a partial ordering be specified somehow, and have GHC pick the actual phases? One way of doing this would be replace [1] with [Symbol] where Symbol would be some normal Haskell entity (e.g. an empty datatype) and could be exported, and add a pragma to specify orderings. I can see two problems with this: - The ordering would be (deliberately) underspecified, so optimisations might become unstable if it changed suddenly. - GHC might not have enough phases to actually assign all symbols to phases. I don't know if it would be easy to make the number of phases dynamic. Cheers, Ganesh ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ==============================================================================

G'day all. This is just a suggestion, but perhaps the problem isn't that retry isn't inlined, it's that it isn't specialised. IIRC, GHC does now specialise functions under what I assume would be these conditions: - There is a top-level case expression (possibly under some lets) which tests a function argument. - That function argument has a known constructor at a call site. The situation in Neil's code is almost identical, except that the top-level case expression is on a value passed by environment, not by function argument. Cheers, Andrew Bromage
participants (5)
-
ajb@spamcop.net
-
Neil Mitchell
-
Simon Peyton-Jones
-
Sittampalam, Ganesh
-
Tim Chevalier