
joshua:
Hello, I'm quite new to Haskell, but experienced in other languages (C, Python, Ruby, SQL, etc). I am interested in Haskell because I've heard that the language is capable of lots of optimizations based on laziness, and I want to learn more about that.
I dug in with Project Euler problem #1, and wrote:
main = print (show (sum [x | x <- [3..999], x `mod` 3 == 0 || x `mod` 5 == 0]))
So far so good, but I want to have some way of observing what optimizations GHC has performed. Most notably, in this example I want to know if the list was ever actually constructed in memory. The "sum" function only needs the elements one at a time, in order, so if Haskell and GHC are everything I've heard about them, I would fully expect the list construction to be optimized out. :)
Unfortunately I was not able to see any way of examining ghc's output to determine whether it had performed this optimization. The C it produced with '-fvia-C -C' was totally unreadable -- it looked like something from the IOCCC. :( I couldn't find any way to match up any of the code it had generated with code I had written. My attempts to objdump the binaries was similarly unproductive.
Is there any kind of intermediate form that a person can examine to see how their code is being optimized? Anything like EXPLAIN PLAN in SQL? It would make it much easier to understand the kinds of optimizations Haskell can perform.
I'm not looking so much for profiling -- obvious this program is trivial and takes no time. I just want to better understand what kind of optimizations are possible given Haskell's language model.
So the optimization you're looking for here is fusion of some kind. GHC ships with build/foldr fusion, and there are libraries for an alternative system, stream fusion. GHC uses an intermediate representation called 'Core', which is a mini-Haskell, essentially, that is optimized repeatedly via type-preserving transformations. You can inspect this with a number of tools, including "ghc-core" (available on Hackage). Now, you're example uses a list comprehension (which is translated into an enumFromTo call, and a call to filter. It also uses a call to sum, which is a non-fusing left-fold under build/foldr fusion, but fuses under stream fusion. I'll desugar your code explicitly, and translate the calls from build/foldr to stream-fusible functions: $ cabal install uvector import Data.Array.Vector main = print . sumU . filterU (\x -> x `mod` 3 == 0 || x `mod` 5 == 0) $ enumFromToU 3 (999 :: Int) Running through ghc-core we see: 146 PreInlineUnconditionally 320 PostInlineUnconditionally 84 UnfoldingDone 18 RuleFired 7 +# 1 -# 1 <=# 2 ==#->case 1 ># 3 SC:$wfold0 1 fromIntegral/Int->Int 2 streamU/unstreamU 12 LetFloatFromLet 1 EtaReduction 210 BetaReduction 8 CaseOfCase 94 KnownBranch 4 CaseMerge 5 CaseElim 6 CaseIdentity 1 FillInCaseDefault 18 SimplifierDone Showing what transformations happened. Notably, 2 occurences of the "streamU/unstreamU" transformation, to remove intermediate structures. The final code looks like: $s$wfold :: Int# -> Int# $s$wfold = \ (sc_s19l :: Int#) -> case modInt# (-9223372036854775807) 3 of wild21_a14L { __DEFAULT -> case modInt# (-9223372036854775807) 5 of wild211_X159 { __DEFAULT -> $wfold sc_s19l (-9223372036854775806); 0 -> $wfold (+# sc_s19l (-9223372036854775807)) (-9223372036854775806) }; 0 -> $wfold (+# sc_s19l (-9223372036854775807)) (-9223372036854775806) } $wfold :: Int# -> Int# -> Int# $wfold = \ (ww_s189 :: Int#) (ww1_s18d :: Int#) -> case ># ww1_s18d 999 of wild_a15N { False -> case ww1_s18d of wild2_a14K { __DEFAULT -> case modInt# wild2_a14K 3 of wild21_a14L { __DEFAULT -> case modInt# wild2_a14K 5 of wild211_X159 { __DEFAULT -> $wfold ww_s189 (+# wild2_a14K 1); 0 -> $wfold (+# ww_s189 wild2_a14K) (+# wild2_a14K 1) }; 0 -> $wfold (+# ww_s189 wild2_a14K) (+# wild2_a14K 1) }; (-9223372036854775808) -> case modInt# (-9223372036854775808) 3 of wild21_a14N { __DEFAULT -> case lvl_r19G of wild211_a14x { False -> $s$wfold ww_s189; True -> $s$wfold (+# ww_s189 (-9223372036854775808)) }; 0 -> $s$wfold (+# ww_s189 (-9223372036854775808)) } }; True -> ww_s189 } Which might take a while to understand, but the key thing is the types at the top level are Int# -- they don't allocate [Int], but instead use unboxed, machine-level int values. Optimization succesful. There are papers and manuals describing "GHC Core" - the intermediate form - on haskell.org. For fusion, google for "deforestation" or "stream fusion". -- Don