[GHC] #11284: Lambda-lifting fails in simple Text example

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Consider the example, {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * failure: None/Unknown => Runtime performance bug Old description:
Consider the example, {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T
longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t
fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 }}}
New description: Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 }}} `longestWord` here produces the simplified Core`, {{{#!hs Rec { Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } end Rec } longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}} Notice `$wloop_length_s4GI`: It should be a nice tight loop traversing `dt_a4jP :: ByteArray#` until it finds whitespace. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by bgamari: Old description:
Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T
longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t
fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 }}}
`longestWord` here produces the simplified Core`,
{{{#!hs Rec { Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } end Rec }
longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}}
Notice `$wloop_length_s4GI`: It should be a nice tight loop traversing `dt_a4jP :: ByteArray#` until it finds whitespace. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no!
New description: Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 }}} `longestWord` here produces the simplified Core`, {{{#!hs Rec { Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } end Rec } longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}} Notice `$wloop_length_s4GI`: It should be a nice tight loop counting UTF-8 characters in `dt_a4jP :: ByteArray#` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by bgamari: Old description:
Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T
longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t
fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 }}}
`longestWord` here produces the simplified Core`,
{{{#!hs Rec { Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } end Rec }
longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}}
Notice `$wloop_length_s4GI`: It should be a nice tight loop counting UTF-8 characters in `dt_a4jP :: ByteArray#` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no!
New description: Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 }}} `longestWord` here produces the simplified Core`, {{{#!hs Rec { Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } end Rec } longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}} Notice `$wloop_length_s4GI`: It should be a nice tight loop counting UTF-8 characters in `dt_a4jP :: ByteArray#` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by bgamari: Old description:
Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T
longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t
fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 }}}
`longestWord` here produces the simplified Core`,
{{{#!hs Rec { Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } end Rec }
longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}}
Notice `$wloop_length_s4GI`: It should be a nice tight loop counting UTF-8 characters in `dt_a4jP :: ByteArray#` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no!
New description: Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 -- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}} `longestWord` here produces the simplified Core`, {{{#!hs Rec { Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } end Rec } longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}} Notice `$wloop_length_s4GI`: It should be a nice tight loop counting UTF-8 characters in `dt_a4jP :: ByteArray#` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by bgamari: Old description:
Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T
longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t
fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0
-- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}}
`longestWord` here produces the simplified Core`,
{{{#!hs Rec { Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } end Rec }
longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}}
Notice `$wloop_length_s4GI`: It should be a nice tight loop counting UTF-8 characters in `dt_a4jP :: ByteArray#` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no!
New description: Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 -- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}} `longestWord` here produces the simplified Core`, {{{#!hs Rec { Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } end Rec } longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}} Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): For the full Core see https://gist.github.com/bgamari/3187fe64531b19fcf37f. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by bgamari: Old description:
Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T
longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t
fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0
-- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}}
`longestWord` here produces the simplified Core`,
{{{#!hs Rec { Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } end Rec }
longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}}
Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no!
New description: Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 -- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}} `longestWord` here produces the simplified Core`, {{{#!hs Ticket.$wgo :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> [T.Text] Ticket.$wgo = ... -- > $wgo1 xs n = foldl' max n xs Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}} Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by bgamari: Old description:
Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T
longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t
fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0
-- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}}
`longestWord` here produces the simplified Core`,
{{{#!hs Ticket.$wgo :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> [T.Text] Ticket.$wgo = ...
-- > $wgo1 xs n = foldl' max n xs Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } }
longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}}
Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no!
New description: Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 -- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}} `longestWord` here produces the simplified Core`, {{{#!hs Ticket.$wgo :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> [T.Text] Ticket.$wgo = ... -- > $wgo1 xs n = foldl' max n xs Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}} Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by bgamari: Old description:
Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T
longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t
fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0
-- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}}
`longestWord` here produces the simplified Core`,
{{{#!hs Ticket.$wgo :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> [T.Text] Ticket.$wgo = ...
-- > $wgo1 xs n = foldl' max n xs Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } }
longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}}
Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no!
New description: Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 -- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}} `longestWord` here produces the simplified Core`, {{{#!hs Ticket.$wgo :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> [T.Text] Ticket.$wgo = ... -- > $wgo1 xs n = foldl' (\a b -> max a $ T.length b) n xs Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}} Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by bgamari: Old description:
Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T
longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t
fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0
-- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}}
`longestWord` here produces the simplified Core`,
{{{#!hs Ticket.$wgo :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> [T.Text] Ticket.$wgo = ...
-- > $wgo1 xs n = foldl' (\a b -> max a $ T.length b) n xs Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } }
longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}}
Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it finds the end of the `Text`. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no!
New description: Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 -- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}} `longestWord` here produces the simplified Core`, {{{#!hs Ticket.$wgo :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> [T.Text] Ticket.$wgo = ... -- > $wgo1 xs n = foldl' (\a b -> max a $ T.length b) n xs Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}} Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it arrives at its end. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): For what it's worth, this binding is caught by the `AnnRec` pattern of `SetLevels.lvlBinds` by the `not (profitableFloat ...)` guard. `dest_lvl = <1,4>` and `bind_lvl = <1,5>`. `-ffloat-all-lams` and `-ffloat-lam- args=10` here doesn't seem to make any difference on the produced Core; while the binding gets floated out as one would expect, later simplifier phases later push it back in. Even after float-out, however, we still have an inner binding, {{{#!hs poly_$wloop_length_s7fV :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# poly_$wloop_length_s7fV = \ (dt_a6OV :: GHC.Prim.ByteArray#) (a_a6OU :: GHC.Prim.Int#) (ww_s7bR :: GHC.Prim.Int#) (ww_s7bV :: GHC.Prim.Int#) -> (letrec { $wloop_length_s7fT :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s7fT = \ (ww_X7c8 :: GHC.Prim.Int#) (ww_X7cd :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww_X7cd a_a6OU) of wild1_a6P4 { False -> ... True -> ww_X7c8 }; } in $wloop_length_s7fT) ww_s7bR ww_s7bV }}} Which later gets inlined back into `$wgo`. Looking at this Core, I suppose it's plausible that there really are too many free variables in this function to beneficially lambda-lift. It would be nice to have a second opinion here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by bgamari: Old description:
Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T
longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t
fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0
-- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}}
`longestWord` here produces the simplified Core`,
{{{#!hs Ticket.$wgo :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> [T.Text] Ticket.$wgo = ...
-- > $wgo1 xs n = foldl' (\a b -> max a $ T.length b) n xs Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- Why must you allocate? For the love of all that is good, why? -- -- This loop is just `T.length`, the first argument being the -- length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) -- bounds check of _ { False -> { ... -- in this body there are few cases analyses with -- recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } }
longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}}
Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it arrives at its end. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no!
New description: Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 -- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}} `longestWord` here produces the simplified Core, {{{#!hs Ticket.$wgo :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> [T.Text] Ticket.$wgo = ... -- > $wgo1 xs n = foldl' (\a b -> max a $ T.length b) n xs Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- For the love of all that is good, why must you allocate? -- -- This loop is essentially `T.length`, the first argument being -- the length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> -- Have we reached the end of the Text? case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) of _ { False -> { ... -- in this body there are few cases analyses which -- classify the code-points we encounter. The branches -- are recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}} Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it arrives at its end. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): This looks very similar to #5945. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5945, #11318 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * related: => #5945, #11318 Comment: It turns out that even just `Data.Text.length` along allocates. See #11318. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5945, #11318 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): See #9476 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: duplicate | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5945, #11318 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => closed * resolution: => duplicate Comment: Closing as a duplicate given that this is a consequence of #9476. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: duplicate | Keywords: LateLamLift Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5945, #11318 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => LateLamLift -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: duplicate | Keywords: LateLamLift Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5945, #11318 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by sgraf: Old description:
Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T
longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t
fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0
-- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}}
`longestWord` here produces the simplified Core,
{{{#!hs Ticket.$wgo :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> [T.Text] Ticket.$wgo = ...
-- > $wgo1 xs n = foldl' (\a b -> max a $ T.length b) n xs Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- For the love of all that is good, why must you allocate? -- -- This loop is essentially `T.length`, the first argument being -- the length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> -- Have we reached the end of the Text? case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) of _ { False -> { ... -- in this body there are few cases analyses which -- classify the code-points we encounter. The branches -- are recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } }
longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}}
Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it arrives at its end. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no!
New description: Consider the example (which uses `Text`; I'm working on finding a more minimal example), {{{#!hs {-# LANGUAGE BangPatterns #-} module T11284 where import Data.Char (isSpace) import Data.List (foldl') import GHC.Exts (build) import qualified Data.Text as T import qualified Data.Text.Array as A longestWord :: T.Text -> Int longestWord t = foldl' max 0 $ map T.length $ fusedWords t fusedWords :: T.Text -> [T.Text] fusedWords t0 = build $ \cons nil -> let go !t | T.null t = nil | otherwise = let (w, rest) = T.span (not . isSpace) t in cons w (go $ T.dropWhile isSpace rest) in go t0 -- For reference data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) }}} `longestWord` here produces the simplified Core, {{{#!hs Ticket.$wgo :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> [T.Text] Ticket.$wgo = ... -- > $wgo1 xs n = foldl' (\a b -> max a $ T.length b) n xs Ticket.$wgo1 :: [T.Text] -> GHC.Prim.Int# -> GHC.Prim.Int# Ticket.$wgo1 = \ (w_s4GJ :: [T.Text]) (ww_s4GN :: GHC.Prim.Int#) -> case w_s4GJ of _ { [] -> ww_s4GN; : y_a4vC ys_a4vD -> case y_a4vC of _ { Data.Text.Internal.Text dt_a4jP dt1_a4jQ dt2_a4jR -> let { a_a4jO :: GHC.Prim.Int# a_a4jO = GHC.Prim.+# dt1_a4jQ dt2_a4jR } in letrec { -- For the love of all that is good, why must you allocate? -- -- This loop is essentially `T.length`, the first argument being -- the length accumulator and the second being an index into the -- ByteArray# $wloop_length_s4GI :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# $wloop_length_s4GI = \ (ww1_s4Gz :: GHC.Prim.Int#) (ww2_s4GD :: GHC.Prim.Int#) -> -- Have we reached the end of the Text? case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.>=# ww2_s4GD a_a4jO) of _ { False -> { ... -- in this body there are few cases analyses which -- classify the code-points we encounter. The branches -- are recursive calls of the form $wloop_length_s4GI (GHC.Prim.+# ww1_s4Gz 1) (GHC.Prim.+# ww2_s4GD 1) ... True -> ww1_s4Gz }; } in case $wloop_length_s4GI 0 dt1_a4jQ of ww1_s4GH { __DEFAULT -> case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# ww_s4GN ww1_s4GH) of _ { False -> Ticket.$wgo1 ys_a4vD ww_s4GN; True -> Ticket.$wgo1 ys_a4vD ww1_s4GH } } } } longestWord :: T.Text -> Int longestWord = \ (w_s4GT :: T.Text) -> case w_s4GT of _ { Data.Text.Internal.Text ww1_s4GW ww2_s4GX ww3_s4GY -> case Ticket.$wgo1 (Ticket.$wgo ww1_s4GW ww2_s4GX ww3_s4GY) 0 of ww4_s4H2 { __DEFAULT -> GHC.Types.I# ww4_s4H2 } } }}} Notice `$wloop_length_s4GI`: It should be a nice tight loop counting Unicode characters in the array `dt_a4jP` until it arrives at its end. However, GHC fails to lambda-lift this closure, thereby turning it into an allocating operation! Oh no! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: duplicate | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5945, #11318 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * keywords: LateLamLift => Comment: There's nothing to lift here anymore, all occuring functions turned into join points and the only actual let bindings in STG output are thunks, which can't be lifted. Here's an example run on `/usr/share/dict/words` (with an appropriate `main`): {{{ $ ./Main +RTS -s < /usr/share/dict/words 23 33,832,464 bytes allocated in the heap 24,088 bytes copied during GC 2,011,952 bytes maximum residency (2 sample(s)) 163,024 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 28 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s Gen 1 2 colls, 0 par 0.000s 0.000s 0.0001s 0.0002s INIT time 0.000s ( 0.000s elapsed) MUT time 0.012s ( 0.012s elapsed) GC time 0.001s ( 0.000s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 0.013s ( 0.013s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 2,822,899,212 bytes per MUT second Productivity 95.0% of total user, 95.5% of total elapsed }}} Strange enough, the `Data.Text.Lazy` variant of this is much slower and allocates much more, when I actually thought it would cope better with `getContents`. Whatever, this doesn't seem relevant to LateLamLift anymore. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: duplicate | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5945, #11318 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I suppose we didn't have join points 3 yrs ago. Suppose `$wloop` wasn't a join point; would LLF lift it? I expect so, and I suppose we could find out by switching off join points. (We don't have a way to do that right now, alas.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11284: Lambda-lifting fails in simple Text example -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: duplicate | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5945, #11318 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * cc: sgraf (added) Comment: Sorry, I didn't pick your reply up in the ticket noise. I ''think'' we would pick it up. This is the STG binding for `$wloop_length`: https://gist.github.com/sgraf812/fcfda9e55004d19881314e31fdea4423 As this turned into a join point anyway, no closure mentions `$wloop`, so there will be no positive closure growth (e.g. penalties in allocations). `$wloop` also has only 3 free variables, which adds with the 2 arguments to a fortunate 5, so there would be enough hardware registers for the lift. I don't see any reason we wouldn't have lifted it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11284#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC