Inline makes program slow?

Hi cafe, Inline sometimes can cause problems, at least in following case: import qualified Data.Vector as V import Data.List f ∷ String → (String → Int → Char) → [Int] → String f str g idx = map (g str) idx h ∷ String → Int → Char *{-# INLINE h #-}* h s i = (V.fromList $ sort s) V.! i slow ∷ String → [Int] → String slow str = f str h fast ∷ String → [Int] → String fast str = map ((V.fromList $ sort str) V.!) main = do let testString = replicate 100000 'a' iterations = replicate 1000 100 putStrLn $ fast testString iterations putStrLn $ slow testString iterations Without inline (remove the inline pragma), "slow" would be much faster. I suspect this is because ghc can build a "persistent structure" for the partial applied function. After inline, each call of "g" will try to build a new vector. How can I tell ghc not to inline some specific functions? Or are there other ways to resolve this issue?

On 29 March 2014 12:43, Kai Zhang
Hi cafe,
Inline sometimes can cause problems, at least in following case:
import qualified Data.Vector as V import Data.List
f ∷ String → (String → Int → Char) → [Int] → String f str g idx = map (g str) idx
h ∷ String → Int → Char {-# INLINE h #-} h s i = (V.fromList $ sort s) V.! i
slow ∷ String → [Int] → String slow str = f str h
fast ∷ String → [Int] → String fast str = map ((V.fromList $ sort str) V.!)
main = do let testString = replicate 100000 'a' iterations = replicate 1000 100 putStrLn $ fast testString iterations putStrLn $ slow testString iterations
Without inline (remove the inline pragma), "slow" would be much faster. I suspect this is because ghc can build a "persistent structure" for the partial applied function. After inline, each call of "g" will try to build a new vector. How can I tell ghc not to inline some specific functions? Or are there other ways to resolve this issue?
There's the NOINLINE and INLINEABLE pragmas. Though I'm going through some of my own code where I waved the INLINE-hammer around rather heavily only to find that whilst it doesn't make much of a difference on my main x86_64 machine, on the x86 laptop I tested it on it made the code much slower. So I'm also interested in finding better ways of determining where and when INLINE is helpful (rather than blindly removing some INLINEs and re-profiling to see what difference it makes, as in many cases it *does* require the INLINE for performance).
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

On Fri, Mar 28, 2014 at 9:43 PM, Kai Zhang
Without inline (remove the inline pragma), "slow" would be much faster. I suspect this is because ghc can build a "persistent structure" for the partial applied function. After inline, each call of "g" will try to build a new vector. How can I tell ghc not to inline some specific functions? Or are there other ways to resolve this issue?
For what it's worth, I don't think this is an inlining issue, per se; rather, it's an issue with the fact that eta-conversion does not preserve performance characteristics. That is, when we inline h and perform as much beta-reduction as we can, we're left with the lambda expression: \i -> (V.fromList $ sort str) V.! i Which is not necessarily the same thing, performance-wise, as: ((V.fromList $ sort str) V.!) The problem is that, implicitly, the whole body of the lambda abstraction (might) depend on the value of i and therefore cannot be performed until we know what i is. If we wanted to make it explicit that sorting the string is independent of the value of i, we could write: let s = V.fromList $ sort str in \i -> s V.! i By using let-binding to lift most of the computation out of the body of the lambda abstraction, we ensure that the sorting will only be done once, rather than (possibly) being done every time this function is called. The reason I say "might" and "possibly" is because, in theory, the compiler could choose to perform this transformation for you. And sometimes it does (as apparently it does in your fast code). The problem is that, in practice, performing this transformation everywhere can slow things down horribly by taking too much memory because you're trying to hold onto too many things. Thus, the compiler must rely on heuristics to decide when it should float computations out from under lambdas and when it shouldn't. -- Live well, ~wren

Compiler optimization levels are also important. The attached program compiles
and runs ok using:
ghc -O fibmustopt.hs
./fibmustopt
But if the '-O' option is omitted all of the available memory is used
and it fails.
On Sun, Mar 30, 2014 at 2:56 AM, wren romano
On Fri, Mar 28, 2014 at 9:43 PM, Kai Zhang
wrote: Without inline (remove the inline pragma), "slow" would be much faster. I suspect this is because ghc can build a "persistent structure" for the partial applied function. After inline, each call of "g" will try to build a new vector. How can I tell ghc not to inline some specific functions? Or are there other ways to resolve this issue?
For what it's worth, I don't think this is an inlining issue, per se; rather, it's an issue with the fact that eta-conversion does not preserve performance characteristics. That is, when we inline h and perform as much beta-reduction as we can, we're left with the lambda expression:
\i -> (V.fromList $ sort str) V.! i
Which is not necessarily the same thing, performance-wise, as:
((V.fromList $ sort str) V.!)
The problem is that, implicitly, the whole body of the lambda abstraction (might) depend on the value of i and therefore cannot be performed until we know what i is. If we wanted to make it explicit that sorting the string is independent of the value of i, we could write:
let s = V.fromList $ sort str in \i -> s V.! i
By using let-binding to lift most of the computation out of the body of the lambda abstraction, we ensure that the sorting will only be done once, rather than (possibly) being done every time this function is called.
The reason I say "might" and "possibly" is because, in theory, the compiler could choose to perform this transformation for you. And sometimes it does (as apparently it does in your fast code). The problem is that, in practice, performing this transformation everywhere can slow things down horribly by taking too much memory because you're trying to hold onto too many things. Thus, the compiler must rely on heuristics to decide when it should float computations out from under lambdas and when it shouldn't.
-- Live well, ~wren _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Dominick's program is very delicate:
fibonacci :: (Integral a) => [a]
fibonacci = 0 : 1 : zipWith (+) fibonacci (tail fibonacci)
Remember, the "(Integral a) =>" thing is very like "Integral a ->"; it's an extra value argument. Would you expect this program to be fast?
foo :: Int -> [Int]
foo n = 0 : 1 : zipWith (+) (foo n) (tail (foo n))
Perhaps, but it depends on common sub-expression analysis which is on with -O.
Anyway it is not related to INLINE or eta-expansion/contraction.
Simon
| -----Original Message-----
| From: Haskell-Cafe [mailto:haskell-cafe-bounces@haskell.org] On Behalf
| Of Dominick Samperi
| Sent: 30 March 2014 17:58
| To: wren romano
| Cc: haskell
| Subject: Re: [Haskell-cafe] Inline makes program slow?
|
| Compiler optimization levels are also important. The attached program
| compiles and runs ok using:
|
| ghc -O fibmustopt.hs
| ./fibmustopt
|
| But if the '-O' option is omitted all of the available memory is used
| and it fails.
|
|
| On Sun, Mar 30, 2014 at 2:56 AM, wren romano

On Thu, Apr 03, 2014 at 04:21:38PM +0000, Simon Peyton Jones wrote:
Dominick's program is very delicate:
fibonacci :: (Integral a) => [a] fibonacci = 0 : 1 : zipWith (+) fibonacci (tail fibonacci)
Remember, the "(Integral a) =>" thing is very like "Integral a ->"; it's an extra value argument. Would you expect this program to be fast?
foo :: Int -> [Int] foo n = 0 : 1 : zipWith (+) (foo n) (tail (foo n))
Indeed. The curious may be interested to know you can fix this by hand by sharing a monomorphic bind fibonacci = let f = fibonacci in 0 : 1 : zipWith (+) f (tail f) unless the dreaded monomorphism restriction is off, in which case you have to find some other way of making 'f' monomorphic, for example with ScopedTypeVariables: fibonacci = let f = fibonacci :: [a] in 0 : 1 : zipWith (+) f (tail f) (I guess examples like this are exactly the reason the monomorphism restriction exists). Tom
participants (6)
-
Dominick Samperi
-
Ivan Lazar Miljenovic
-
Kai Zhang
-
Simon Peyton Jones
-
Tom Ellis
-
wren romano