I think what is happening is that

 

 

In this case GHC’s paranoia is not justified.  It’d be better to duplicate the production of [0..n-1] so that it can fuse with its consumers.

 

One way to address this problem might be “cheapBuild”.   There’s a ticket about this: https://gitlab.haskell.org/ghc/ghc/issues/7206.   I see that a year ago the TL;DR was “Conclusion: let's do it. Ensuring that a Note gives a clear explanation, and points to this ticket.”   But no one yet has.

 

Maybe someone might see if the cheapBuild idea really does solve this particular case.

 

Simon

 

 

From: Haskell-Cafe <haskell-cafe-bounces@haskell.org> On Behalf Of William Yager
Sent: 19 August 2019 15:35
To: Emilio Francesquini <francesquini@gmail.com>
Cc: Haskell Cafe <haskell-cafe@haskell.org>
Subject: Re: [Haskell-cafe] Optimizations for list comprehension

 

I'm not sure which exact optimizations are responsible, but based on --ddump-simple,

 

* "inside" is not allocating any lists at all. It's just a couple loops over unboxed ints

* "outside" is actually allocating a (single) list data structure and has an inner loop and an outer loop, both of which traverse the list

 

GHC seems to be too aggressive about sharing "range" in "outside". Adding a unit argument to "range" makes both functions go fast.

 

On Mon, Aug 19, 2019 at 8:14 PM Emilio Francesquini <francesquini@gmail.com> wrote:

Hello Cafe,

 

While investigating a performance problem I stumbled upon what I eventually reduced to the example below:

 

module Main where

import Data.Time.Clock

outside :: Int -> Int
outside n =
  sum [i + j | i <- range, j <- range]
  where
    range = [0..n-1]

inside :: Int -> Int
inside n =
  sum [i + j | i <- [0..n-1], j <- [0..n-1]]

main :: IO ()

main = do
  t0 <- getCurrentTime
  print $ inside 10000
  t1 <- getCurrentTime
  print $ outside 10000
  t2 <-getCurrentTime

  print (diffUTCTime t1 t0)
  print (diffUTCTime t2 t1)

 

Compiling with -O2, up to GHC 8.2.2, both `inside` and `outside` functions would take the same amount of time to execute. Somewhere between GHC 8.2.2 and 8.6.4 something changed (possibly some new optimization) making `inside` run ~4x faster on my machine. With LLVM the difference is even bigger.

 

It is not that `outside` got slower, but that `inside` got much faster. I'm curious to what optimizations might be happening to the `inside` function that would not fire on the outside function.

 

Any hints?

 

Best regards,

 

Emilio

 

 

 

 

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.