The haystack and the shared copy are only a needle's-length apart. The first stage traverses H and N until one of them runs out, holding a copy of the beginning of each. This requires at most O(min{|H|, |N|}) additional memory (the original needle and the needle-length prefix of the haystack). Assuming we didn't run out of haystack before we ran out of needle (which assumption seems generally likely), we proceed to the next step, traversing H and (drop |N| H) together. During this phase we hold O(|N|) additional memory: the needle itself and the needle-length portion of the longer of the two haystack remnants we hold. Note that in the second phase, we *don't* hold on to the beginning of the haystack, because we will never need it again! Then in the final phase, when the shorter haystack remnant has run out, we still have the same O(|N|) memory, which is now the needle itself and the needle-length suffix of the haystack, and (==) traverses them both together. Thus the total additional memory residency for this algorithm is O(min {|H|, |N|}). Using the length approach requires that you hold the beginning of the haystack for traversal while running length. To put it another way, you force the entire spine of the haystack to run length, and then start from the beginning. If the haystack is produced lazily, this potentially requires O(|H|) extra memory. Since you also check the length of the needle, your total additional memory comes to O(max {|H|, |N|}). I apologize if my horrid variable names obscured what goes on in the algorithm I described.