Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

12 Oct 2014


      The haystack and the shared copy are only a needle's-length apart. The
first stage traverses H and N until one of them runs out, holding a copy of
the beginning  of each. This requires at most O(min{|H|, |N|}) additional
memory (the original needle and the needle-length prefix of the haystack).
Assuming we didn't run out of haystack before we ran out of needle (which
assumption seems generally likely), we proceed to the next step, traversing
H and (drop |N| H) together. During this phase we hold O(|N|) additional
memory: the needle itself and the needle-length portion of the longer of
the two haystack remnants we hold. Note that in the second phase, we
*don't* hold on to the beginning of the haystack, because we will never
need it again! Then in the final phase, when the shorter haystack remnant
has run out, we still have the same O(|N|) memory, which is now the needle
itself and the needle-length suffix of the haystack, and (==) traverses
them both together. Thus the total additional memory residency for this
algorithm is O(min {|H|, |N|}). Using the length approach requires that you
hold the beginning of the haystack for traversal while running length. To
put it another way, you force the entire spine of the haystack to run
length, and then start from the beginning. If the haystack is produced
lazily, this potentially requires O(|H|) extra memory. Since you also check
the length of the needle, your total additional memory comes to O(max {|H|,
|N|}). I apologize if my horrid variable names obscured what goes on in the
algorithm I described.

On Sun, Oct 12, 2014 at 10:14 AM, Andreas Abel  wrote:
...
Well, you also traverse the haystack twice, in your getEndChunk function
you simultaneously traverse the haystack and a (shared) copy of it.  Why is
this so much better?
I guess without data (timings, heap-profiles) we do not get further here.
On 11.10.2014 14:47, David Feuer wrote:
...
It can be, but your way traverses the entire haystack *twice*. The
memory needs are equivalent to the reversing version, which I consider
unacceptable.
I do not understand this comment.  reverse needs O(n) memory, length O(1).
Cheers,
Andreas
On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel 
...
mailto:abela@chalmers.se> wrote:
David, the essence of your idea seems mutually drop the correct
    number of elements from needle and hay and then compare for
    equality.  Here is a concise implementation of your idea in terms of
    drop:
isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
    [] `isSuffixOf` _  =  True
    xs `isSuffixOf` ys =  xs == drop (length (drop (length xs) ys)) ys
This can be further simplified to
isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
    [] `isSuffixOf` _  =  True
    xs `isSuffixOf` ys =  xs == drop (length ys - length xs) ys
which is a very easy to understand program, I think, without need to
    reverse lists.
_______________________________________________
Libraries mailing list
Libraries@haskell.org
http://www.haskell.org/mailman/listinfo/libraries
--
Andreas Abel  <><      Du bist der geliebte Mensch.
Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden
andreas.abel@gu.se
http://www2.tcs.ifi.lmu.de/~abel/