I'm not sure which exact optimizations are responsible, but based on --ddump-simple,

* "inside" is not allocating any lists at all. It's just a couple loops over unboxed ints
* "outside" is actually allocating a (single) list data structure and has an inner loop and an outer loop, both of which traverse the list

GHC seems to be too aggressive about sharing "range" in "outside". Adding a unit argument to "range" makes both functions go fast.

On Mon, Aug 19, 2019 at 8:14 PM Emilio Francesquini <francesquini@gmail.com> wrote:
Hello Cafe,

While investigating a performance problem I stumbled upon what I eventually reduced to the example below:

module Main where

import Data.Time.Clock

outside :: Int -> Int
outside n =
  sum [i + j | i <- range, j <- range]
  where
    range = [0..n-1]

inside :: Int -> Int
inside n =
  sum [i + j | i <- [0..n-1], j <- [0..n-1]]

main :: IO ()
main = do
  t0 <- getCurrentTime
  print $ inside 10000
  t1 <- getCurrentTime
  print $ outside 10000
  t2 <-getCurrentTime

  print (diffUTCTime t1 t0)
  print (diffUTCTime t2 t1)


Compiling with -O2, up to GHC 8.2.2, both `inside` and `outside` functions would take the same amount of time to execute. Somewhere between GHC 8.2.2 and 8.6.4 something changed (possibly some new optimization) making `inside` run ~4x faster on my machine. With LLVM the difference is even bigger.

It is not that `outside` got slower, but that `inside` got much faster. I'm curious to what optimizations might be happening to the `inside` function that would not fire on the outside function.

Any hints?

Best regards,

Emilio




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.