Code runs 7x FASTER with profiling

I was writing a simple utility and I decided to use regexps to parse
filenames. (I know, now I have two problems :-) )
I was surprised at how slow it ran, so I did a profiling build. The
profiled code runs reasonably quickly, and is 7x faster, which makes it
a bit hard to figure out where the slowdown is happening in the
non-profiled code. I’m wondering if I’m doing something wrong, or if
there’s a bug in |regex-tdfa| or in ghc.
I’ve pared my code down to just the following:
|import Text.Regex.TDFA ((=~)) main :: IO () main = do entries <- map
parseFilename . lines <$> getContents let check (Right (_, t)) = last t
== 'Z' check _ = False print $ all check entries parseFilename :: String
-> Either String (String, String) parseFilename fn = case (fn =~ pattern
:: [[String]]) of [[_, full, _, time]] -> Right $ (full, time) _ -> Left
fn where pattern = "^\\./duplicity-(full|inc|new)(-signatures)?\\.\
\([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]T[0-9][0-9][0-9][0-9][0-9][0-9]Z)\\."
|
The relevant part of my |.cabal| file looks like this:
|executable DuplicityAnalyzer main-is: DuplicityAnalyzer.hs
build-depends: base >=4.6 && <4.11, regex-tdfa >= 1.0 && <1.3
default-language: Haskell2010 ghc-options: -Wall -rtsopts |
To run the profiling, I do:
|cabal clean cabal configure --enable-profiling cabal build
dist/build/DuplicityAnalyzer/DuplicityAnalyzer

Looking at the produced core of both versions reveals that in the profiled build a closure of type Regex is floated to top level. The non-profiled build doesn't do this, thus it recompiles the regex for every iteration. This is most likely the source of the slowdown of the non-profiled build. On 12/07/2017 07:09 AM, Neil Mayhew wrote:
I was writing a simple utility and I decided to use regexps to parse filenames. (I know, now I have two problems :-) )
I was surprised at how slow it ran, so I did a profiling build. The profiled code runs reasonably quickly, and is 7x faster, which makes it a bit hard to figure out where the slowdown is happening in the non-profiled code. I’m wondering if I’m doing something wrong, or if there’s a bug in |regex-tdfa| or in ghc.
I’ve pared my code down to just the following:
|import Text.Regex.TDFA ((=~)) main :: IO () main = do entries <- map parseFilename . lines <$> getContents let check (Right (_, t)) = last t == 'Z' check _ = False print $ all check entries parseFilename :: String -> Either String (String, String) parseFilename fn = case (fn =~ pattern :: [[String]]) of [[_, full, _, time]] -> Right $ (full, time) _ -> Left fn where pattern = "^\\./duplicity-(full|inc|new)(-signatures)?\\.\ \([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]T[0-9][0-9][0-9][0-9][0-9][0-9]Z)\\." |
The relevant part of my |.cabal| file looks like this:
|executable DuplicityAnalyzer main-is: DuplicityAnalyzer.hs build-depends: base >=4.6 && <4.11, regex-tdfa >= 1.0 && <1.3 default-language: Haskell2010 ghc-options: -Wall -rtsopts |
To run the profiling, I do:
|cabal clean cabal configure --enable-profiling cabal build dist/build/DuplicityAnalyzer/DuplicityAnalyzer
The |MUT| time in the non-profiling build is 7x bigger, and the |%GC| time goes from 8% to 21%. I’ve put the actual output in a gist https://gist.github.com/neilmayhew/247a30738c0e294902e7c2830ca2c6f5. I’ve also put my test input file there, in case anyone wants to try this themselves.
I’ve done my testing with NixOS (ghc 8.0.2) and Debian with the Haskell Platform (ghc 8.2.1) and the results are basically the same. I even tried using Docker containers with Debian Jessie and Debian Stretch, just to eliminate any OS influence, and the results are still the same. I’ve tried it on an i5-2500K, i5-3317U and Xeon E5-1620.
I also wrote a dummy implementation of |=~| that ignores the regex pattern and does a hard-coded manual parse, and that produces times just slightly better than the profiled ones. So I don’t think there’s a problem in my outer code that uses |=~|.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 2017-12-07 12:49 AM, Jonas Scholl wrote: Looking at the produced core of both versions reveals that in the profiled build a closure of type Regex is floated to top level. The non-profiled build doesn’t do this, thus it recompiles the regex for every iteration. This is most likely the source of the slowdown of the non-profiled build. Thanks, Jonas. This does indeed seem to be the problem. I changed the code to use a compiled regex (with |makeRegex| and |match| instead of |=~|) but in the non-profiling case the run-time doesn’t improve unless I float the compiled regex myself: |parseFilename :: String -> Either String (String, String) parseFilename fn = case (pattern `match` fn :: [[String]]) of [[_, full, _, time]] -> Right $ (full, time) _ -> Left fn pattern :: Regex pattern = makeRegex "^\\./duplicity-(full|inc|new)(-signatures)?\\.\ \([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]T[0-9][0-9][0-9][0-9][0-9][0-9]Z)\\." | Then it runs 2-3x faster than the profiled code. The question remains, however: why doesn’t the ghc optimizer spot this fairly obvious loop-invariant in the non-profiled build when it does manage to spot it in the profiled one? In other words, when I make |pattern| a local definition of |parseFilename|, why isn’t it treated as a CAF that’s evaluated only once (‘floated to the top level’)? Enabling profiling shouldn’t change the meaning of a program. I remember back in the day having to be careful with regexes in Python to make sure they were always precompiled outside of loops and functions, but one of the nice things about Haskell is that one can usually let the compiler take care of this. (Nowadays Python gets around this by caching compiled regexes, but I prefer Haskell’s statically-optimized approach.)

Bugs, usually. I think there was one such in 8.2.1 and if so it may not have been completely fixed (trying to read around the edges of some dev discussion, since I'm kinda far from expert when it comes to ghc internals). On Thu, Dec 7, 2017 at 4:25 PM, Neil Mayhew < neil_mayhew@users.sourceforge.net> wrote:
On 2017-12-07 12:49 AM, Jonas Scholl wrote:
Looking at the produced core of both versions reveals that in the profiled build a closure of type Regex is floated to top level. The non-profiled build doesn’t do this, thus it recompiles the regex for every iteration. This is most likely the source of the slowdown of the non-profiled build.
Thanks, Jonas. This does indeed seem to be the problem. I changed the code to use a compiled regex (with makeRegex and match instead of =~) but in the non-profiling case the run-time doesn’t improve unless I float the compiled regex myself:
parseFilename :: String -> Either String (String, String)parseFilename fn = case (pattern `match` fn :: [[String]]) of [[_, full, _, time]] -> Right $ (full, time) _ -> Left fn pattern :: Regexpattern = makeRegex "^\\./duplicity-(full|inc|new)(-signatures)?\\.\ \([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]T[0-9][0-9][0-9][0-9][0-9][0-9]Z)\\."
Then it runs 2-3x faster than the profiled code.
The question remains, however: why doesn’t the ghc optimizer spot this fairly obvious loop-invariant in the non-profiled build when it does manage to spot it in the profiled one? In other words, when I make pattern a local definition of parseFilename, why isn’t it treated as a CAF that’s evaluated only once (‘floated to the top level’)? Enabling profiling shouldn’t change the meaning of a program.
I remember back in the day having to be careful with regexes in Python to make sure they were always precompiled outside of loops and functions, but one of the nice things about Haskell is that one can usually let the compiler take care of this. (Nowadays Python gets around this by caching compiled regexes, but I prefer Haskell’s statically-optimized approach.)
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On 2017-12-07 03:05 PM, Brandon Allbery wrote:
Bugs, usually. I think there was one such in 8.2.1 and if so it may not have been completely fixed
So if I was being a good citizen I'd try this out with ghc master built from source and report a bug if it's not fixed? Or do you thin it would be sufficient to use a binary installer of ghc 8.2.2?

I'd do both, both to get information on what it looks like in the latest release and to verify something hasn't changed in 8.3/future 8.4 (which is moving forward). Certainly the behavior here sounds wrong and should be reported. On Thu, Dec 7, 2017 at 5:12 PM, Neil Mayhew < neil_mayhew@users.sourceforge.net> wrote:
On 2017-12-07 03:05 PM, Brandon Allbery wrote:
Bugs, usually. I think there was one such in 8.2.1 and if so it may not have been completely fixed
So if I was being a good citizen I'd try this out with ghc master built from source and report a bug if it's not fixed? Or do you thin it would be sufficient to use a binary installer of ghc 8.2.2?
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On 2017-12-07 03:33 PM, Brandon Allbery wrote: Certainly the behavior here sounds wrong and should be reported. https://ghc.haskell.org/trac/ghc/ticket/14564 - there’s already been a response (from Simon P-J) and it is being treated as a bug. I get the slowdown with 8.0.2, 8.2.1 and 8.2.2 but 7.10.3 is OK. Since 8.2.2 is very recent (20 Nov) I didn’t go to the trouble of building ghc from source.

2017-12-07 22:25 GMT+01:00 Neil Mayhew
[...] The question remains, however: why doesn’t the ghc optimizer spot this fairly obvious loop-invariant in the non-profiled build when it does manage to spot it in the profiled one? In other words, when I make pattern a local definition of parseFilename, why isn’t it treated as a CAF that’s evaluated only once (‘floated to the top level’)?
Because making something a CAF is not always an optimization: If your evaluated CAF uses e.g. hundreds of MB it might be preferable to *not* have it as a CAF, but to to recompute it instead. If I haven't missed something recently, CAFs can't be garbage collected, but I would be happy to be corrected here. :-)
Enabling profiling shouldn’t change the meaning of a program.
It doesn't change the meaning, that would really be a desaster, it just changes the performance characteristics. This is still not nice, but not much different from using different levels of optimization: Doing or not doing e.g. strictness analysis might change the space complexity etc.
I remember back in the day having to be careful with regexes in Python to make sure they were always precompiled outside of loops and functions, but one of the nice things about Haskell is that one can usually let the compiler take care of this. (Nowadays Python gets around this by caching compiled regexes, but I prefer Haskell’s statically-optimized approach.)
I think totally relying on the compiler for performance is a common misconception, because things are often more tricky than initially thought. In our example: Time vs. space tradeoff, and there is no universally "right" solution. Cheers, S.

On 2017-12-08 01:52 AM, Sven Panne wrote: … making something a CAF is not always an optimization: If your evaluated CAF uses e.g. hundreds of MB it might be preferable to /not/ have it as a CAF, but to to recompute it instead. Good point. However, I think it would be very hard for the compiler to detect which CAFs use a lot of memory and aren’t referred to very often. If I haven’t missed something recently, CAFs can’t be garbage collected, but I would be happy to be corrected here. :-) There’s a helpful discussion on the Haskell Wiki here https://wiki.haskell.org/Constant_applicative_form. In particular, it says: A CAF … may only be accessible from within the code of one or more functions. In order for the garbage collector to be able to reclaim such structures, we associate with each function a list of the CAFs to which it refers. When garbage collecting a reference to the function we collect the CAFs on its list. In my case, since the function itself is top-level, I assume the CAF can’t be garbage-collected. Enabling profiling shouldn’t change the meaning of a program. It doesn’t change the meaning, that would really be a desaster, it just changes the performance characteristics. This is still not nice, but not much different from using different levels of optimization: Doing or not doing e.g. strictness analysis might change the space complexity etc. Yes, I see you’re right. However, I guess I was using ‘meaning’ loosely to mean ‘is a CAF’ and I assumed a CAF would always be floated. The above-mentioned wiki page says, “A CAF can always be lifted to the top level of the program.” However, I realize “can” is not the same as “should”. I remember back in the day having to be careful with regexes in Python to make sure they were always precompiled outside of loops and functions, but one of the nice things about Haskell is that one can usually let the compiler take care of this. (Nowadays Python gets around this by caching compiled regexes, but I prefer Haskell’s statically-optimized approach.) I think totally relying on the compiler for performance is a common misconception, because things are often more tricky than initially thought. In our example: Time vs. space tradeoff, and there is no universally “right” solution. Good point. Having great optimization isn’t a justification for complete mental laziness on the part of the programmer! However, I did find the behaviour in this case surprising and unintuitive, given ghc’s usual ability to optimize things, and having it change on me when I enabled profiling left me wondering where to go next. The code I presented here is considerably simplified from the original program, and represents a lot of work already expended trying to get to the root of the problem.

On Fri, Dec 8, 2017 at 9:18 AM, Neil Mayhew < neil_mayhew@users.sourceforge.net> wrote:
Good point. Having great optimization isn’t a justification for complete mental laziness on the part of the programmer! However, I did find the behaviour in this case surprising and unintuitive, given ghc’s usual ability to optimize things, and having it change on me when I enabled profiling left me wondering where to go next. The code I presented here is considerably simplified from the original program, and represents a lot of work already expended trying to get to the root of the problem.
This is actually why I (and likely SPJ) am inclined to consider it a bug; while it might be arguable as Sven does, the counterintuitive effect when you turn on profiling suggests that it's not intended or expected. (Although maybe it should be; the intuition is "profiling disables optimization" but what it really does is a bit more subtle than that overly simplistic notion.) -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Has anybody filed a bug about this yet? I think we really should. I recently observed a similar behaviour in code of a client. It looked like main = do l <- getSomeList let m = buildExpensiveHashMap l for_ l $ \element -> do print (... lookup l m ...) And it inlined the `buildExpensiveHashMap`, making the whole thing 100x slower. A bang on `m` fixed it to keep it outside the loop, but I found it weird that GHC considered that expression cheap enough to inline. Niklas On 09/12/2017 00:59, Brandon Allbery wrote:
On Fri, Dec 8, 2017 at 9:18 AM, Neil Mayhew
mailto:neil_mayhew@users.sourceforge.net> wrote: Good point. Having great optimization isn’t a justification for complete mental laziness on the part of the programmer! However, I did find the behaviour in this case surprising and unintuitive, given ghc’s usual ability to optimize things, and having it change on me when I enabled profiling left me wondering where to go next. The code I presented here is considerably simplified from the original program, and represents a lot of work already expended trying to get to the root of the problem.
This is actually why I (and likely SPJ) am inclined to consider it a bug; while it might be arguable as Sven does, the counterintuitive effect when you turn on profiling suggests that it's not intended or expected. (Although maybe it should be; the intuition is "profiling disables optimization" but what it really does is a bit more subtle than that overly simplistic notion.)
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com mailto:allbery.b@gmail.com ballbery@sinenomine.net mailto:ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 2017-12-22 06:57 PM, Niklas Hambüchen wrote:
Has anybody filed a bug about this yet? I think we really should. On 2017-12-08 06:58 AM, Neil Mayhew wrote: https://ghc.haskell.org/trac/ghc/ticket/14564 - there’s already been > a response (from Simon P-J) and it is being treated as a bug. Maybe you should add your example to the ticket, Niklas.
participants (5)
-
Brandon Allbery
-
Jonas Scholl
-
Neil Mayhew
-
Niklas Hambüchen
-
Sven Panne