One-Off detection. Question about Space/Time complexity

Problem spec from CareerCup https://careercup.com/question?id=5092486548553728. Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg. -True xyz,xz xyz,xyk xy,xyz -False xyz,xyz xyz,xzy x,xyz module Strings.OneLetter where` import Preludeimport qualified Data.Text as T` oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2` So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them. What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2 what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why. I don't think I can improve the time complexity. Am I right? Can the space complexity be improved? What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?

I think the space must be O(m+n) since you are using strict Text values.
But even if you weren't, you're using s1 and s2 in more than one place so
they will be shared, leading to them both being fully in memory at once.
The minimum possible space complexity could well be O(1) if using String or
lazy Text.
The time complexity looks like O(min(m,n)), because that's the cost of
`zip`/`filter`, not counting the cost of loading the two Texts into memory
in the first place (which will be O(m+n)). If you used String, it'd be
O(m+n) since that's the cost of the two calls to `length`. I think the
minimum possible time complexity will be O(min(m,n)), and that could
include the cost of loading the strings too.
On 22 Mar 2017 18:32, "Michael Litchard"
Problem spec from CareerCup https://careercup.com/question?id=5092486548553728.
Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.
-True
xyz,xz
xyz,xyk
xy,xyz
-False
xyz,xyz
xyz,xzy
x,xyz
module Strings.OneLetter where` import Preludeimport qualified Data.Text as T`
oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2`
So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.
What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2
what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.
I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?
What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post

It can be done in one sweep down both strings together - look for the first difference, and then test hypotheses regarding the 5 possible edits oneOff :: String -> String -> Bool oneOff [] [] = False oneOff [] (_:rest) = null rest oneOff (_:rest) [] = null rest oneOff s1@(c1:s1') s2@(c2:s2') | c1 == c2 = oneOff s1' s2' -- keep looking | otherwise = s1' == s2 -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted || s2' == s1 -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted || s1' == s2' -- c1 is s1 was replaced by c2 in s2, or v.v. I guess the above should be O(1) in space - it's just a list crawl with tests O(n) in the length of the shortest String Cheers, Andrew
On 22 Mar 2017, at 18:48, David Turner
wrote: I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.
The minimum possible space complexity could well be O(1) if using String or lazy Text.
The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.
On 22 Mar 2017 18:32, "Michael Litchard"
mailto:litchard.michael@gmail.com> wrote: Problem spec from CareerCup https://careercup.com/question?id=5092486548553728. Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.
-True
xyz,xz
xyz,xyk
xy,xyz
-False
xyz,xyz
xyz,xzy
x,xyz
module Strings.OneLetter where`
import Prelude import qualified Data.Text as T`
oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2` So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.
What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2
what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.
I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?
What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland

I also think it can be done in one sweep, but I think this code is O(max(m,n)-min(m,n)): s1' and s2' each appear twice so will be shared. On 23 March 2017 at 08:52, Andrew Butterfield < Andrew.Butterfield@scss.tcd.ie> wrote:
It can be done in one sweep down both strings together - look for the first difference, and then test hypotheses regarding the 5 possible edits
oneOff :: String -> String -> Bool
oneOff [] [] = False oneOff [] (_:rest) = null rest oneOff (_:rest) [] = null rest oneOff s1@(c1:s1') s2@(c2:s2') | c1 == c2 = oneOff s1' s2' -- keep looking | otherwise = s1' == s2 -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted || s2' == s1 -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted || s1' == s2' -- c1 is s1 was replaced by c2 in s2, or v.v.
I guess the above should be O(1) in space - it's just a list crawl with tests O(n) in the length of the shortest String
Cheers, Andrew
On 22 Mar 2017, at 18:48, David Turner
wrote: I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.
The minimum possible space complexity could well be O(1) if using String or lazy Text.
The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.
On 22 Mar 2017 18:32, "Michael Litchard"
wrote: Problem spec from CareerCup https://careercup.com/question?id=5092486548553728.
Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.
-True
xyz,xz
xyz,xyk
xy,xyz
-False
xyz,xyz
xyz,xzy
x,xyz
module Strings.OneLetter where` import Preludeimport qualified Data.Text as T`
oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2`
So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.
What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2
what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.
I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?
What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Yes - they both might be scanned twice, but O(2n) = O(n) = O(3n), even. The scanniing by the three uses of == only walk over *all* of the longer list if its length is at most that of the shorter Simply put, if s1 is much longer than s2, then those == scans will end when they reach the end of the shorter s2 Or am I missing something?
On 23 Mar 2017, at 15:19, David Turner
wrote: I also think it can be done in one sweep, but I think this code is O(max(m,n)-min(m,n)): s1' and s2' each appear twice so will be shared.
if m == n your timing reduces to O(0) !?
On 23 March 2017 at 08:52, Andrew Butterfield
mailto:Andrew.Butterfield@scss.tcd.ie> wrote: It can be done in one sweep down both strings together - look for the first difference, and then test hypotheses regarding the 5 possible edits oneOff :: String -> String -> Bool
oneOff [] [] = False oneOff [] (_:rest) = null rest oneOff (_:rest) [] = null rest oneOff s1@(c1:s1') s2@(c2:s2') | c1 == c2 = oneOff s1' s2' -- keep looking | otherwise = s1' == s2 -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted || s2' == s1 -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted || s1' == s2' -- c1 is s1 was replaced by c2 in s2, or v.v.
I guess the above should be O(1) in space - it's just a list crawl with tests O(n) in the length of the shortest String
Cheers, Andrew
On 22 Mar 2017, at 18:48, David Turner
mailto:dct25-561bs@mythic-beasts.com> wrote: I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.
The minimum possible space complexity could well be O(1) if using String or lazy Text.
The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.
On 22 Mar 2017 18:32, "Michael Litchard"
mailto:litchard.michael@gmail.com> wrote: Problem spec from CareerCup https://careercup.com/question?id=5092486548553728. Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.
-True
xyz,xz
xyz,xyk
xy,xyz
-False
xyz,xyz
xyz,xzy
x,xyz
module Strings.OneLetter where`
import Prelude import qualified Data.Text as T`
oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2` So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.
What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2
what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.
I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?
What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland

Sorry, I missed the vital word *space*. It looks time-efficient, but
doesn't look O(1) in space.
On 23 Mar 2017 16:12, "Andrew Butterfield"
It can be done in one sweep down both strings together - look for the first difference, and then test hypotheses regarding the 5 possible edits
oneOff :: String -> String -> Bool
oneOff [] [] = False oneOff [] (_:rest) = null rest oneOff (_:rest) [] = null rest oneOff s1@(c1:s1') s2@(c2:s2') | c1 == c2 = oneOff s1' s2' -- keep looking | otherwise = s1' == s2 -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted || s2' == s1 -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted || s1' == s2' -- c1 is s1 was replaced by c2 in s2, or v.v.
I guess the above should be O(1) in space - it's just a list crawl with tests O(n) in the length of the shortest String
Cheers, Andrew
On 22 Mar 2017, at 18:48, David Turner
wrote: I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.
The minimum possible space complexity could well be O(1) if using String or lazy Text.
The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.
On 22 Mar 2017 18:32, "Michael Litchard"
wrote: Problem spec from CareerCup https://careercup.com/question?id=5092486548553728.
Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.
-True
xyz,xz
xyz,xyk
xy,xyz
-False
xyz,xyz
xyz,xzy
x,xyz
module Strings.OneLetter where` import Preludeimport qualified Data.Text as T`
oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2`
So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.
What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2
what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.
I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?
What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 23/03/2017, at 9:52 PM, Andrew Butterfield
wrote: It can be done in one sweep down both strings together - look for the first difference, and then test hypotheses regarding the 5 possible edits
oneOff :: String -> String -> Bool
oneOff [] [] = False oneOff [] (_:rest) = null rest oneOff (_:rest) [] = null rest oneOff s1@(c1:s1') s2@(c2:s2') | c1 == c2 = oneOff s1' s2' -- keep looking | otherwise = s1' == s2 -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted || s2' == s1 -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted || s1' == s2' -- c1 is s1 was replaced by c2 in s2, or v.v.
I guess the above should be O(1) in space
The question is what "O(1) in space" means here. If you discount the storage required for the strings, fine. If however you think in terms of lazy lists of characters, the fact that this code will look at the tails of the lists up to 3 times means that those tails will be stored, so we are talking about O(n) space. I believe it is possible to walk down s1 s1' s2 s2' in a single pass, maintaining "streaming". Whether it is _worth_ doing that is another matter.
participants (4)
-
Andrew Butterfield
-
David Turner
-
Michael Litchard
-
Richard A. O'Keefe