On 2017-12-07 12:49 AM, Jonas Scholl wrote:

Looking at the produced core of both versions reveals that in the profiled build a closure of type Regex is floated to top level. The non-profiled build doesn’t do this, thus it recompiles the regex for every iteration. This is most likely the source of the slowdown of the non-profiled build.

Thanks, Jonas. This does indeed seem to be the problem. I changed the code to use a compiled regex (with makeRegex and match instead of =~) but in the non-profiling case the run-time doesn’t improve unless I float the compiled regex myself:

parseFilename :: String -> Either String (String, String)
parseFilename fn = case (pattern `match` fn :: [[String]]) of
    [[_, full, _, time]] -> Right $ (full, time)
    _ -> Left fn

pattern :: Regex
pattern = makeRegex
    "^\\./duplicity-(full|inc|new)(-signatures)?\\.\
    \([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]T[0-9][0-9][0-9][0-9][0-9][0-9]Z)\\."

Then it runs 2-3x faster than the profiled code.

The question remains, however: why doesn’t the ghc optimizer spot this fairly obvious loop-invariant in the non-profiled build when it does manage to spot it in the profiled one? In other words, when I make pattern a local definition of parseFilename, why isn’t it treated as a CAF that’s evaluated only once (‘floated to the top level’)? Enabling profiling shouldn’t change the meaning of a program.

I remember back in the day having to be careful with regexes in Python to make sure they were always precompiled outside of loops and functions, but one of the nice things about Haskell is that one can usually let the compiler take care of this. (Nowadays Python gets around this by caching compiled regexes, but I prefer Haskell’s statically-optimized approach.)