One-Off detection. Question about Space/Time complexity

newer
Why does "enumFromThenTo x x y"...

Michael Litchard

22 Mar 2017 22 Mar '17

6:31 p.m.

Problem spec from CareerCup https://careercup.com/question?id=5092486548553728. Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg. -True xyz,xz xyz,xyk xy,xyz -False xyz,xyz xyz,xzy x,xyz module Strings.OneLetter where` import Preludeimport qualified Data.Text as T` oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2` So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them. What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2 what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why. I don't think I can improve the time complexity. Am I right? Can the space complexity be improved? What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?

Attachments:

attachment.html (text/html — 4.5 KB)

Show replies by date

David Turner

22 Mar 22 Mar

6:48 p.m.

I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once. The minimum possible space complexity could well be O(1) if using String or lazy Text. The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too. On 22 Mar 2017 18:32, "Michael Litchard" wrote:

...

Problem spec from CareerCup https://careercup.com/question?id=5092486548553728.

Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.

-True

xyz,xz

xyz,xyk

xy,xyz

-False

xyz,xyz

xyz,xzy

x,xyz

module Strings.OneLetter where` import Preludeimport qualified Data.Text as T`

oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2`

So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.

What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2

what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.

I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?

What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post

Andrew Butterfield

23 Mar 23 Mar

8:52 a.m.

New subject: One-Off detection. Question about Space/Time complexity

It can be done in one sweep down both strings together - look for the first difference, and then test hypotheses regarding the 5 possible edits oneOff :: String -> String -> Bool oneOff [] [] = False oneOff [] (_:rest) = null rest oneOff (_:rest) [] = null rest oneOff s1@(c1:s1') s2@(c2:s2') | c1 == c2 = oneOff s1' s2' -- keep looking | otherwise = s1' == s2 -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted || s2' == s1 -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted || s1' == s2' -- c1 is s1 was replaced by c2 in s2, or v.v. I guess the above should be O(1) in space - it's just a list crawl with tests O(n) in the length of the shortest String Cheers, Andrew

...

On 22 Mar 2017, at 18:48, David Turner wrote:

I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.

The minimum possible space complexity could well be O(1) if using String or lazy Text.

...

The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.

On 22 Mar 2017 18:32, "Michael Litchard" mailto:litchard.michael@gmail.com> wrote: Problem spec from CareerCup https://careercup.com/question?id=5092486548553728.

Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.

-True

xyz,xz

xyz,xyk

xy,xyz

-False

xyz,xyz

xyz,xzy

x,xyz

module Strings.OneLetter where`

import Prelude import qualified Data.Text as T`

oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2` So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.

What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2

what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.

I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?

What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland

David Turner

3:19 p.m.

I also think it can be done in one sweep, but I think this code is O(max(m,n)-min(m,n)): s1' and s2' each appear twice so will be shared. On 23 March 2017 at 08:52, Andrew Butterfield < Andrew.Butterfield@scss.tcd.ie> wrote:

...

It can be done in one sweep down both strings together - look for the first difference, and then test hypotheses regarding the 5 possible edits

oneOff :: String -> String -> Bool

oneOff [] [] = False oneOff [] (_:rest) = null rest oneOff (_:rest) [] = null rest oneOff s1@(c1:s1') s2@(c2:s2') | c1 == c2 = oneOff s1' s2' -- keep looking | otherwise = s1' == s2 -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted || s2' == s1 -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted || s1' == s2' -- c1 is s1 was replaced by c2 in s2, or v.v.

I guess the above should be O(1) in space - it's just a list crawl with tests O(n) in the length of the shortest String

Cheers, Andrew

On 22 Mar 2017, at 18:48, David Turner wrote:

I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.

The minimum possible space complexity could well be O(1) if using String or lazy Text.

The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.

On 22 Mar 2017 18:32, "Michael Litchard" wrote:

...
Problem spec from CareerCup https://careercup.com/question?id=5092486548553728.

Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.

-True

xyz,xz

xyz,xyk

xy,xyz

-False

xyz,xyz

xyz,xzy

x,xyz

module Strings.OneLetter where` import Preludeimport qualified Data.Text as T`

oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2`

So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.

What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2

what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.

I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?

What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Andrew Butterfield

4:10 p.m.

New subject: One-Off detection. Question about Space/Time complexity

Yes - they both might be scanned twice, but O(2n) = O(n) = O(3n), even. The scanniing by the three uses of == only walk over *all* of the longer list if its length is at most that of the shorter Simply put, if s1 is much longer than s2, then those == scans will end when they reach the end of the shorter s2 Or am I missing something?

...

On 23 Mar 2017, at 15:19, David Turner wrote:

I also think it can be done in one sweep, but I think this code is O(max(m,n)-min(m,n)): s1' and s2' each appear twice so will be shared.

if m == n your timing reduces to O(0) !?

...

On 23 March 2017 at 08:52, Andrew Butterfield mailto:Andrew.Butterfield@scss.tcd.ie> wrote: It can be done in one sweep down both strings together - look for the first difference, and then test hypotheses regarding the 5 possible edits

oneOff :: String -> String -> Bool

oneOff [] [] = False oneOff [] (_:rest) = null rest oneOff (_:rest) [] = null rest oneOff s1@(c1:s1') s2@(c2:s2') | c1 == c2 = oneOff s1' s2' -- keep looking | otherwise = s1' == s2 -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted || s2' == s1 -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted || s1' == s2' -- c1 is s1 was replaced by c2 in s2, or v.v.

I guess the above should be O(1) in space - it's just a list crawl with tests O(n) in the length of the shortest String

Cheers, Andrew

...
On 22 Mar 2017, at 18:48, David Turner mailto:dct25-561bs@mythic-beasts.com> wrote:

I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.

The minimum possible space complexity could well be O(1) if using String or lazy Text.

...
The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.

On 22 Mar 2017 18:32, "Michael Litchard" mailto:litchard.michael@gmail.com> wrote: Problem spec from CareerCup https://careercup.com/question?id=5092486548553728.

Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.

-True

xyz,xz

xyz,xyk

xy,xyz

-False

xyz,xyz

xyz,xzy

x,xyz

module Strings.OneLetter where`

import Prelude import qualified Data.Text as T`

oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2` So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.

What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2

what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.

I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?

What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland

David Turner

6:15 p.m.

Sorry, I missed the vital word *space*. It looks time-efficient, but doesn't look O(1) in space. On 23 Mar 2017 16:12, "Andrew Butterfield" wrote: Yes - they both might be scanned twice, but O(2n) = O(n) = O(3n), even. The scanniing by the three uses of == only walk over *all* of the longer list if its length is at most that of the shorter Simply put, if s1 is much longer than s2, then those == scans will end when they reach the end of the shorter s2 Or am I missing something? On 23 Mar 2017, at 15:19, David Turner wrote: I also think it can be done in one sweep, but I think this code is O(max(m,n)-min(m,n)): s1' and s2' each appear twice so will be shared. if m == n your timing reduces to O(0) !? On 23 March 2017 at 08:52, Andrew Butterfield wrote:

...

It can be done in one sweep down both strings together - look for the first difference, and then test hypotheses regarding the 5 possible edits

oneOff :: String -> String -> Bool

oneOff [] [] = False oneOff [] (_:rest) = null rest oneOff (_:rest) [] = null rest oneOff s1@(c1:s1') s2@(c2:s2') | c1 == c2 = oneOff s1' s2' -- keep looking | otherwise = s1' == s2 -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted || s2' == s1 -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted || s1' == s2' -- c1 is s1 was replaced by c2 in s2, or v.v.

I guess the above should be O(1) in space - it's just a list crawl with tests O(n) in the length of the shortest String

Cheers, Andrew

On 22 Mar 2017, at 18:48, David Turner wrote:

I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.

The minimum possible space complexity could well be O(1) if using String or lazy Text.

The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.

On 22 Mar 2017 18:32, "Michael Litchard" wrote:

...
Problem spec from CareerCup https://careercup.com/question?id=5092486548553728.

Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.

-True

xyz,xz

xyz,xyk

xy,xyz

-False

xyz,xyz

xyz,xzy

x,xyz

module Strings.OneLetter where` import Preludeimport qualified Data.Text as T`

oneLetter :: T.Text -> T.Text -> Bool oneLetter s1 s2 | s1 == s2 = False | max_l > (min_l + 1) = False | max_l == min_l = diff_size == 1 | otherwise = diff_size == 0 where length_s1 = T.length s1 length_s2 = T.length s2 max_l = max length_s1 length_s2 min_l = min length_s1 length_s2 diff_size = length $ filter (\(a,b) -> a /= b) zipped zipped = T.zip s1 s2`

So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.

What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2

what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.

I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?

What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Andrew Butterfield School of Computer Science & Statistics Trinity College Dublin 2, Ireland _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Richard A. O'Keefe

11:21 p.m.

...

On 23/03/2017, at 9:52 PM, Andrew Butterfield wrote:

It can be done in one sweep down both strings together - look for the first difference, and then test hypotheses regarding the 5 possible edits

oneOff :: String -> String -> Bool

oneOff [] [] = False oneOff [] (_:rest) = null rest oneOff (_:rest) [] = null rest oneOff s1@(c1:s1') s2@(c2:s2') | c1 == c2 = oneOff s1' s2' -- keep looking | otherwise = s1' == s2 -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted || s2' == s1 -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted || s1' == s2' -- c1 is s1 was replaced by c2 in s2, or v.v.

I guess the above should be O(1) in space

The question is what "O(1) in space" means here. If you discount the storage required for the strings, fine. If however you think in terms of lazy lists of characters, the fact that this code will look at the tails of the lists up to 3 times means that those tails will be stored, so we are talking about O(n) space. I believe it is possible to walk down s1 s1' s2 s2' in a single pass, maintaining "streaming". Whether it is _worth_ doing that is another matter.

3025

Age (days ago)

3026

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Andrew Butterfield
David Turner
Michael Litchard
Richard A. O'Keefe