
Hi While benchmarking a word count program I found that it wasn't running as fast as it could. I traced this back to the original definition of words, which isn't as good as it could be: This is the version in Base: words :: String -> [String] words s = case dropWhile isSpace s of "" -> [] s' -> w : words s'' where (w, s'') = break isSpace s' We can instantiate s' with s:ss, since we already pattern match: words :: String -> [String] words s = case dropWhile isSpace s of "" -> [] s:ss -> w : words s'' where (w, s'') = break isSpace (s:ss) Now we know that s is not a space, since if it was the dropWhile would have removed it. This means that we don't need to test it, and can put it straight on to w: words :: String -> [String] words s = case dropWhile isSpace s of "" -> [] s:ss -> (s:w) : words s'' where (w, s'') = break isSpace ss Now we have eliminated an extra isSpace test per character, and as it happens, isSpace is very slow. Is my reasoning correct? If so, can we make this optimisation? Thanks Neil

Hi
An optimised version of words:
words :: String -> [String] words s = case dropWhile isSpace s of "" -> [] s:ss -> (s:w) : words s'' where (w, s'') = break isSpace ss
Thinking harder, this is an improvement, but not as good as we can get. If s'' is non-empty, then the first character must be a space, which we can drop without testing: words :: String -> [String] words s = case dropWhile isSpace s of "" -> [] s:ss -> (s:w) : words (drop1 s'') where (w, s'') = break isSpace ss drop1 [] = [] drop1 (x:xs) = xs (of course, drop1 = drop 1, but if we're going for maximum efficience :-) ) I think this now ensures that we only test each character once for being a space. Thanks Neil

On Mon, 2007-07-09 at 16:10 +0100, Neil Mitchell wrote:
Hi
While benchmarking a word count program I found that it wasn't running as fast as it could. I traced this back to the original definition of words, which isn't as good as it could be:
[...]
Is my reasoning correct? If so, can we make this optimisation?
To really convince yourself and everyone else you could compare it against the spec, for both total and partial values. For our list library re-implementation we used SmallCheck and a modified version of SmallCheck for checking partial values. It turned up lots of bugs in our code and found several instances where the current base implementations are not the same as the spec (for various good reasons). http://www.cse.unsw.edu.au/~dons/code/streams/list/tests/Strictness/BaseVsSp... Duncan

Hi Actually, you can go slightly better: words s = case dropWhile isSpace s of [] -> [] (s:ss) -> (s:w) : drop1 sss where (w, sss) = break isSpace ss drop1 [] = [] drop1 (x:xs) = words xs Although a good optimising compiler may make this last leap all on its own.
To really convince yourself and everyone else you could compare it against the spec, for both total and partial values.
I'm pretty convinced that I applied reasoning rules at each stage. If you have a proof, testing is irrelevant :-) I've already modified the Yhc base library with this optimisation, and done some basic testing, and nothing broke. Thanks Neil

On Tue, 2007-07-10 at 00:06 +0100, Neil Mitchell wrote:
To really convince yourself and everyone else you could compare it against the spec, for both total and partial values.
I'm pretty convinced that I applied reasoning rules at each stage. If you have a proof, testing is irrelevant :-)
Ok, if you're sure. :-)
I've already modified the Yhc base library with this optimisation, and done some basic testing, and nothing broke.
I'm more hesitant because I've already rewritten words once and got it wrong. What's more it passed all the casual testing and quickcheck tests I used. That's because it was correct for total values, it only went wrong for partial values. It wasn't until I developed the SmallCheck on partial value tests that I discovered the bug. Duncan

ndmitchell:
Hi
Actually, you can go slightly better:
words s = case dropWhile isSpace s of [] -> [] (s:ss) -> (s:w) : drop1 sss where (w, sss) = break isSpace ss
drop1 [] = [] drop1 (x:xs) = words xs
Although a good optimising compiler may make this last leap all on its own.
To really convince yourself and everyone else you could compare it against the spec, for both total and partial values.
I'm pretty convinced that I applied reasoning rules at each stage. If you have a proof, testing is irrelevant :-)
I've already modified the Yhc base library with this optimisation, and done some basic testing, and nothing broke.
I'd second Duncan here -- strictness properties are *really* hard, so changes to existing (H98) functions must come with both QuickCheck and strictness checking properties, before I'd be comfortable accepting them. The missing support for partial values in QuickCheck is quite a hole that needs fixing. -- Don
participants (3)
-
dons@cse.unsw.edu.au
-
Duncan Coutts
-
Neil Mitchell