
Hmm. With HEAD, and without profiling, the program allocates the same 104M, both with and without the INLINE. The same deforestation happens in both cases.
It's quite possible that profiling interferes with deforestation.
Simon
c:/code/HEAD/inplace/bin/ghc-stage1 Michael.hs -O -o Michael-no-inline
./Michael-no-inline.exe +RTS -s
999999
104,045,052 bytes allocated in the heap
171,752 bytes copied during GC
41,756 bytes maximum residency (2 sample(s))
36,800 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
| -----Original Message-----
| From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Michael
| Walker
| Sent: 22 July 2015 14:46
| To: ghc-devs@haskell.org
| Subject: Question about inliner behaviour with a small function
|
| Hello,
|
| I managed to shrink a bizarre memory issue down to this (probably
| minimal)
| example:
|
| module Main where
|
| f :: [Int] -> Int
| f xs = length is where
| is = [ i | (_, i) <- pairs ys ] :: [(Int,Int)]
|
| ys = zip [0..] xs :: [(Int, Int)]
|
| {-# INLINE pairs #-}
| pairs xs = zip xs $ tail xs
|
| main :: IO ()
| main = print $ f xs where
| xs = replicate 1000000 0
|
| With the INLINE pragma, this allocates 264,049,584 bytes (compiled with
| `ghc -O2
| -prof -fprof-auto inline.hs), without the pragma it allocates 336,049,512
| bytes.
|
| Dropping the `main` definition (and renaming the module to "Foo") and
| examining
| the core, the key difference seems to be in how the list comprehension is
| compiled.
|
| With the INLINE pragma:
|
| Rec {
| Foo.f_go [Occ=LoopBreaker]
| :: [(Int, Int)] -> [(Int, Int)] -> [(Int, Int)]
| [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType ]
| Foo.f_go =
| \ (ds_a13o :: [((Int, Int), (Int, Int))]) ->
| case ds_a13o of _ [Occ=Dead] {
| [] -> GHC.Types.[] @ (Int, Int);
| : y_a13t ys_a13u ->
| case y_a13t of _ [Occ=Dead] { (ds1_d12V, i_an2) ->
| GHC.Types.: @ (Int, Int) i_an2 (Foo.f_go ys_a13u)
| }
| }
| end Rec }
|
| Full core is available at
| https://gist.github.com/barrucadu/a59df62cd16074559e35
|
| In the no-INLINE case, the list comprehension is compiled much like I
| would
| expect, it's walking down a zipped list of pairs and producing the
| result. The
| INLINE case is rather different, it looks like the `zip` has been inlined
| and
| deforestation has happened.
|
| This explains the difference in memory usage, a whole intermediary list
| has been
| skipped!
|
| I assume that GHC's analysis is determining `pairs` is too expensive to
| inline
| early enough to allow the further optimisation without the pragma, but
| the
| pragma forces it to happen earlier by marking it as really cheap. But
| `pairs` is
| a really small definition, syntactically. Why does the analysis consider
| it
| expensive? It is simply because it uses its argument multiple times in
| its body?
|
| And if it's not some sort of cost analysis, what's really going on?
|
| Thank for your time.
|
| --
| Michael Walker (http://www.barrucadu.co.uk)
| _______________________________________________
| ghc-devs mailing list
| ghc-devs@haskell.org
| http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs