Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

13 Oct 2014

      I've switched my allegiance from my own version to Milan's, because it's a
tad faster, and also less verbose. One thing to watch out for: although I
don't particularly like this, the current version in Data.List makes  []
`isSuffixOf` undefined = True.  Unless there's a general consensus that
this doesn't matter, we need to preserve it.

I don't think Milan's version is too terribly verbose. I tested something
very similar to your #1 that was proposed on IRC by Reid Barton, and the
numbers just didn't look too wonderful. I don't think Milan's version is
too terribly verbose, and I think it's clearer than your #2. As for the
depth of caring about speed, I generally disagree with you: lists are
actually very good for some purposes, and, moreover, even when they're
*not*, people use them anyway and then other people wait for that code to
run.

On Mon, Oct 13, 2014 at 6:10 PM, Kim-Ee Yeoh  wrote:
...
On Tue, Oct 14, 2014 at 1:17 AM, David Feuer 
wrote:
...
I've done the benchmarks and the results are clear: the implementation I
gave is faster than the one you gave and the one in Data.List in all cases.
Yours is usually faster than the one in Data.List, but slower for very
short lists.
The 2x factor you're seeing over Andreas's diminishes once we put slightly
more effort into an apples-to-apples comparison.
1. Instead of drop (length xs) ys, let's define the equivalent
dropl :: [b] -> [a] -> [a]
dropl (_:xs) (_:ys) = dropl xs ys
dropl [] ys = ys
dropl xs [] = []
i.e. dropl xs ys ~ drop (length xs) ys.
Now with Andreas's original version:
xs `isSuffixOfA` ys =  xs == dropl (dropl xs ys) ys
that 2x gap narrows down to 10% for most cases.
2. There's that fast path which you optimize for, where the needle is
patently too long to be in the haystack. To narrow the semantic gap, we can
write
dropm :: [b] -> [a] -> Maybe [a]
dropm (_:xs) (_:ys) = dropm xs ys
dropm [] ys = Just ys
dropm _ []  = Nothing
xs `isSuffixOfA` ys    =  maybe False id $ do
   zs <- dropm xs ys
   ws <- dropm zs ys    -- ws = needle-sized slice of the end of haystack
   return $ xs == ws
Now, the long-needle-short-haystack case also becomes merely 10% off.
I'm -1 on both your proposal and the no.2 code because it's too much
verbosity for uncertain gain. The benchmarks look good, but is the function
even the bottleneck? For folks that care deeply about speed, lists is
almost always the wrong datatype anyway.
I'm weakly +1 on no.1 because it beats the current double reverse
definition!
-- Kim-Ee