I've done the benchmarks and the results are clear: the implementation I gave is faster than the one you gave and the one in Data.List in all cases. Yours is usually faster than the one in Data.List, but slower for very short lists.

Test code:

needles =
  [ ("shortNeedle", ".exe")
   ,("longNeedle", "And who by fire? Who by water? Who in the sunshine? Who in the nighttime?")]

haystacks =
  [ ("shortHaystack", "C:\\Program Files\\Windows\\Dunno.exe")
   ,("longHaystack", take 10000 $ map toEnum $ iterate (\x -> (x*131) .&. 0xFFFF) 7)]

impls = [("lib", L.isSuffixOf), ("Feuer", isSuffixOf), ("Abel", isSuffixOfA)]

tests = [("simple", (\impl (needle,haystack) -> fromEnum $ needle `impl` haystack)),
         ("use", (\impl (needle,haystack) ->
                     if needle `impl` haystack
                     then sum (map fromEnum haystack)
                     else product (map fromEnum haystack)))]

main = defaultMain
  [
  bgroup (unwords [fst test, fst needle, "in", fst haystack])
    [bench (fst impl) $ whnf (snd test $ snd impl) (snd needle, snd haystack)
       | impl <- impls] | test <- tests, needle <- needles, haystack <- haystacks]


Results:

benchmarking simple shortNeedle in shortHaystack/lib
time                 244.6 ns   (243.9 ns .. 245.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 245.2 ns   (244.8 ns .. 246.5 ns)
std dev              2.030 ns   (950.4 ps .. 4.378 ns)

benchmarking simple shortNeedle in shortHaystack/Feuer
time                 234.5 ns   (234.1 ns .. 235.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 234.6 ns   (234.2 ns .. 235.3 ns)
std dev              1.628 ns   (678.1 ps .. 2.758 ns)

benchmarking simple shortNeedle in shortHaystack/Abel
time                 268.1 ns   (267.8 ns .. 268.4 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 268.0 ns   (267.8 ns .. 268.4 ns)
std dev              947.8 ps   (538.1 ps .. 1.668 ns)


benchmarking simple shortNeedle in longHaystack/lib
time                 100.5 μs   (100.2 μs .. 100.9 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 100.8 μs   (100.5 μs .. 101.4 μs)
std dev              1.511 μs   (1.035 μs .. 2.095 μs)

benchmarking simple shortNeedle in longHaystack/Feuer
time                 55.66 μs   (55.61 μs .. 55.73 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 55.68 μs   (55.63 μs .. 55.79 μs)
std dev              222.0 ns   (128.7 ns .. 415.0 ns)

benchmarking simple shortNeedle in longHaystack/Abel
time                 94.60 μs   (94.39 μs .. 94.94 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 94.51 μs   (94.44 μs .. 94.68 μs)
std dev              332.8 ns   (183.6 ns .. 617.7 ns)


benchmarking simple longNeedle in shortHaystack/lib
time                 480.6 ns   (480.2 ns .. 481.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 480.6 ns   (480.1 ns .. 481.6 ns)
std dev              2.217 ns   (1.025 ns .. 4.032 ns)

benchmarking simple longNeedle in shortHaystack/Feuer
time                 187.0 ns   (186.8 ns .. 187.2 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 187.0 ns   (186.8 ns .. 187.4 ns)
std dev              922.0 ps   (427.3 ps .. 1.598 ns)

benchmarking simple longNeedle in shortHaystack/Abel
time                 340.9 ns   (339.4 ns .. 343.3 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 345.0 ns   (341.6 ns .. 351.0 ns)
std dev              14.40 ns   (9.622 ns .. 20.77 ns)
variance introduced by outliers: 60% (severely inflated)


benchmarking simple longNeedle in longHaystack/lib
time                 100.3 μs   (100.2 μs .. 100.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 100.4 μs   (100.3 μs .. 100.9 μs)
std dev              811.3 ns   (333.5 ns .. 1.575 μs)

benchmarking simple longNeedle in longHaystack/Feuer
time                 56.29 μs   (56.19 μs .. 56.41 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 56.23 μs   (56.19 μs .. 56.32 μs)
std dev              197.1 ns   (116.3 ns .. 342.1 ns)

benchmarking simple longNeedle in longHaystack/Abel
time                 94.46 μs   (94.32 μs .. 94.66 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 94.49 μs   (94.39 μs .. 94.70 μs)
std dev              459.8 ns   (277.2 ns .. 763.7 ns)


benchmarking use shortNeedle in shortHaystack/lib
time                 831.8 ns   (828.4 ns .. 836.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 831.2 ns   (829.3 ns .. 834.6 ns)
std dev              8.468 ns   (5.220 ns .. 13.26 ns)

benchmarking use shortNeedle in shortHaystack/Feuer
time                 819.4 ns   (816.7 ns .. 822.7 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 821.2 ns   (818.0 ns .. 828.2 ns)
std dev              15.89 ns   (7.988 ns .. 25.90 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking use shortNeedle in shortHaystack/Abel
time                 853.5 ns   (851.8 ns .. 855.4 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 852.6 ns   (851.5 ns .. 855.6 ns)
std dev              5.462 ns   (2.422 ns .. 10.51 ns)


benchmarking use shortNeedle in longHaystack/lib
time                 261.8 μs   (259.2 μs .. 264.7 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 260.2 μs   (259.5 μs .. 261.4 μs)
std dev              2.854 μs   (1.438 μs .. 4.475 μs)

benchmarking use shortNeedle in longHaystack/Feuer
time                 225.0 μs   (221.9 μs .. 227.1 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 221.0 μs   (220.2 μs .. 222.4 μs)
std dev              3.487 μs   (2.598 μs .. 4.385 μs)

benchmarking use shortNeedle in longHaystack/Abel
time                 244.5 μs   (232.5 μs .. 254.3 μs)
                     0.992 R²   (0.990 R² .. 0.997 R²)
mean                 232.1 μs   (229.4 μs .. 237.7 μs)
std dev              11.45 μs   (6.248 μs .. 16.89 μs)
variance introduced by outliers: 47% (moderately inflated)


benchmarking use longNeedle in shortHaystack/lib
time                 1.088 μs   (1.085 μs .. 1.091 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.087 μs   (1.086 μs .. 1.090 μs)
std dev              6.936 ns   (4.270 ns .. 9.691 ns)

benchmarking use longNeedle in shortHaystack/Feuer
time                 787.4 ns   (785.8 ns .. 789.9 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 786.7 ns   (785.9 ns .. 788.3 ns)
std dev              3.761 ns   (1.408 ns .. 6.358 ns)

benchmarking use longNeedle in shortHaystack/Abel
time                 930.7 ns   (927.7 ns .. 934.0 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 928.3 ns   (926.7 ns .. 931.3 ns)
std dev              7.241 ns   (3.785 ns .. 13.45 ns)


benchmarking use longNeedle in longHaystack/lib
time                 262.3 μs   (257.8 μs .. 266.1 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 258.6 μs   (257.7 μs .. 260.0 μs)
std dev              3.979 μs   (2.350 μs .. 5.888 μs)

benchmarking use longNeedle in longHaystack/Feuer
time                 224.9 μs   (222.1 μs .. 226.9 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 221.2 μs   (220.4 μs .. 222.3 μs)
std dev              3.168 μs   (2.143 μs .. 4.010 μs)

benchmarking use longNeedle in longHaystack/Abel
time                 233.1 μs   (231.4 μs .. 234.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 231.2 μs   (230.7 μs .. 232.0 μs)
std dev              2.099 μs   (1.419 μs .. 2.627 μs)

On Sun, Oct 12, 2014 at 12:02 PM, Andreas Abel <abela@chalmers.se> wrote:
If you talk about "additional memory" you assume that after `xs isSuffix ys`, ys is no longer needed.  Is this the typical use case?
At least not in the Agda codebase, according to my quick grep...

./Packaging/Database.hs:
  dbEntries = filter (".conf" `isSuffixOf`) filePaths

./TypeChecking/Monad/Open.hs:
  unless (ctx `isSuffixOf` ctx') $ fail $ "thing out of context (" ++ show ctx ++ " is not a sub context of " ++ show ctx' ++ ")"

./Interaction/Options.hs:
  isLiterate file = ".lagda" `isSuffixOf` file

./Syntax/Parser.hs:
  if "lagda" `isSuffixOf` filePath file then

As I said, optimizations should be backed by benchmarks, especially when trading a perspicuous implementation for a more complicated one...


On 12.10.2014 17:40, David Feuer wrote:
The haystack and the shared copy are only a needle's-length apart. The
first stage traverses H and N until one of them runs out, holding a copy
of the beginning  of each. This requires at most O(min{|H|, |N|})
additional memory (the original needle and the needle-length prefix of
the haystack). Assuming we didn't run out of haystack before we ran out
of needle (which assumption seems generally likely), we proceed to the
next step, traversing H and (drop |N| H) together. During this phase we
hold O(|N|) additional memory: the needle itself and the needle-length
portion of the longer of the two haystack remnants we hold. Note that in
the second phase, we *don't* hold on to the beginning of the haystack,
because we will never need it again! Then in the final phase, when the
shorter haystack remnant has run out, we still have the same O(|N|)
memory, which is now the needle itself and the needle-length suffix of
the haystack, and (==) traverses them both together. Thus the total
additional memory residency for this algorithm is O(min {|H|, |N|}).
Using the length approach requires that you hold the beginning of the
haystack for traversal while running length. To put it another way, you
force the entire spine of the haystack to run length, and then start
from the beginning. If the haystack is produced lazily, this potentially
requires O(|H|) extra memory. Since you also check the length of the
needle, your total additional memory comes to O(max {|H|, |N|}). I
apologize if my horrid variable names obscured what goes on in the
algorithm I described.

On Sun, Oct 12, 2014 at 10:14 AM, Andreas Abel <abela@chalmers.se
<mailto:abela@chalmers.se>> wrote:

    Well, you also traverse the haystack twice, in your getEndChunk
    function you simultaneously traverse the haystack and a (shared)
    copy of it.  Why is this so much better?

    I guess without data (timings, heap-profiles) we do not get further
    here.

    On 11.10.2014 14:47, David Feuer wrote:

        It can be, but your way traverses the entire haystack *twice*. The
        memory needs are equivalent to the reversing version, which I
        consider
        unacceptable.


    I do not understand this comment.  reverse needs O(n) memory, length
    O(1).

    Cheers,
    Andreas

        On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel <abela@chalmers.se
        <mailto:abela@chalmers.se>
        <mailto:abela@chalmers.se <mailto:abela@chalmers.se>>> wrote:

             David, the essence of your idea seems mutually drop the correct
             number of elements from needle and hay and then compare for
             equality.  Here is a concise implementation of your idea in
        terms of
             drop:

             isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
             [] `isSuffixOf` _  =  True
             xs `isSuffixOf` ys =  xs == drop (length (drop (length xs)
        ys)) ys

             This can be further simplified to

             isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
             [] `isSuffixOf` _  =  True
             xs `isSuffixOf` ys =  xs == drop (length ys - length xs) ys

             which is a very easy to understand program, I think,
        without need to
             reverse lists.



        _________________________________________________
        Libraries mailing list
        Libraries@haskell.org <mailto:Libraries@haskell.org>
        http://www.haskell.org/__mailman/listinfo/libraries
        <http://www.haskell.org/mailman/listinfo/libraries>



    --
    Andreas Abel  <><      Du bist der geliebte Mensch.

    Department of Computer Science and Engineering
    Chalmers and Gothenburg University, Sweden

    andreas.abel@gu.se <mailto:andreas.abel@gu.se>
    http://www2.tcs.ifi.lmu.de/~__abel/ <http://www2.tcs.ifi.lmu.de/~abel/>




_______________________________________________
Libraries mailing list
Libraries@haskell.org
http://www.haskell.org/mailman/listinfo/libraries



--
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

andreas.abel@gu.se
http://www2.tcs.ifi.lmu.de/~abel/