
Adrian Hey wrote:
Using explicit seqs rather than strict data types is actually faster, for reasons that are a bit of a mystery to me. I'm not sure what cost Bertram is talking about, but AFAIK ghc uses the same info pointer mechanism for all heap records, including unevaluated thunks (although the info pointers will point to different things of course). But the cost of pattern matching on *evaluated* AVL nodes should be independent of strictness annotation AFAICS.
I must admit that for the time being the cost is of a theoretical nature. But let me explain the idea. Consider this code:
module Nat (isOne) where
data Nat = Succ !Nat | Zero
isOne :: Nat -> Bool isOne n = case n of Zero -> False Succ n' -> case n' of Zero -> True Succ _ -> False
The code of isOne starts by forcing n (looking at n's tag and entering the closure if it's unevaluated in ghc's case) and then a pattern match (looking at the the tag again). Now for the second pattern match, we can skip the first step, because we know that n' is fully evaluated, thanks to the strictness annotation in the Succ constructor. However, ghc currently does generate the same (cmm) code for isOne regardless of the strictness annotation, so performance wise you only get to pay the price of the annotation (I expect that some thunks are unnecessarily reevaluated when the constructor is used) without this benefit. Did I miss any reason why this idea can't work? I really expected ghc to do that optimisation - obviously that was wishful thinking on my part. Bertram