Function to detect duplicates

older
ANNOUNCE: wxHaskell for ghc-6.12.1

Rafael Gustavo da Cunha Pereira Pinto

23 Feb 2010 23 Feb '10

11:30 a.m.

Hi folks, While solving a puzzle, I was posed the problem of finding if there was no duplicates on a list. First I used: noneRepeated=null.(filter (>1)).(map length).group.sort But this seemed very unneficient, so I thought that I could detect the duplicates while sorting, and devised this: import Control.Monad import Data.Maybe noneRepeated=isNothing . (foldl merge (Just [])) . (map sort) . pairs pairs []=[] pairs [x]=[[x]] pairs (x:y:xs)=[x,y]:pairs xs sort []=Just [] sort [x]=Just [x] sort [x,y] | x>y=Just [y,x] | y>x=Just [x,y] | x==y=Nothing merge::(Eq a, Ord a) => Maybe [a]->Maybe [a]->Maybe[a] merge _ Nothing = Nothing merge Nothing _ = Nothing merge (Just []) (Just xs)=Just xs merge (Just xs) (Just [])=Just xs merge (Just (x:xs)) (Just (y:ys)) | x==y = Nothing | x>y = (Just y) +? (merge (Just (x:xs)) (Just ys)) | x

Attachments:

attachment.html (text/html — 1.5 KB)

Show replies by date

Jonas Almström Duregård

23 Feb 23 Feb

11:57 a.m.

Hi Rafael, I assume you will perform this operation on some very large lists, or performance would not be an issue. Have you tested if your optimized version is better than your initial one? You should compare your implementation against something like this: import qualified Data.Set as Set noneRepeated :: (Ord a) => [a] -> Bool noneRepeated = accum Set.empty where accum _ [] = True accum s (x:xs) | Set.member x s = False | otherwise = accum (Set.insert x s) xs Also there is some discussion about the nub function that relates to this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/. /Jonas On 23 February 2010 12:30, Rafael Gustavo da Cunha Pereira Pinto wrote:

...

Hi folks,

While solving a puzzle, I was posed the problem of finding if there was no duplicates on a list.

First I used:

noneRepeated=null.(filter (>1)).(map length).group.sort

But this seemed very unneficient, so I thought that I could detect the duplicates while sorting, and devised this:

import Control.Monad import Data.Maybe

noneRepeated=isNothing . (foldl merge (Just [])) . (map sort) . pairs

pairs []=[] pairs [x]=[[x]] pairs (x:y:xs)=[x,y]:pairs xs

sort []=Just [] sort [x]=Just [x] sort [x,y] | x>y=Just [y,x] | y>x=Just [x,y] | x==y=Nothing

merge::(Eq a, Ord a) => Maybe [a]->Maybe [a]->Maybe[a] merge _ Nothing = Nothing merge Nothing _ = Nothing merge (Just []) (Just xs)=Just xs merge (Just xs) (Just [])=Just xs merge (Just (x:xs)) (Just (y:ys)) | x==y = Nothing | x>y = (Just y) +? (merge (Just (x:xs)) (Just ys)) | x
(+?) = liftM2 (:)

My version of the merge sort returns Nothing whenever it finds two equal entries, aborting all subsequent comparisons.

I have a few questions for the friendly people at this cafe:

1) Is there any improvement I can make? 2) Can it be parallelized (par, pseq)?

Best regards,

Rafael

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Gwern Branwen

24 Feb 24 Feb

8:25 p.m.

2010/2/23 Jonas Almström Duregård :

...

Hi Rafael,

I assume you will perform this operation on some very large lists, or performance would not be an issue. Have you tested if your optimized version is better than your initial one?

You should compare your implementation against something like this:

import qualified Data.Set as Set noneRepeated :: (Ord a) => [a] -> Bool noneRepeated = accum Set.empty where accum _ [] = True accum s (x:xs) | Set.member x s = False | otherwise = accum (Set.insert x s) xs

Also there is some discussion about the nub function that relates to this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/.

/Jonas

Or better yet, http://www.haskell.org/pipermail/libraries/2008-October/010778.html Much more thorough and practical w/r/t to actually getting faster nubs in the libraries. -- gwern

Daniel Fischer

9:13 p.m.

Am Mittwoch 24 Februar 2010 21:25:04 schrieb Gwern Branwen:

...

2010/2/23 Jonas Almström Duregård :

...
Hi Rafael,

I assume you will perform this operation on some very large lists, or performance would not be an issue. Have you tested if your optimized version is better than your initial one?

You should compare your implementation against something like this:

import qualified Data.Set as Set noneRepeated :: (Ord a) => [a] -> Bool noneRepeated = accum Set.empty where accum _ [] = True accum s (x:xs) | Set.member x s = False | otherwise = accum (Set.insert x s) xs

Also there is some discussion about the nub function that relates to this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/.

/Jonas

Or better yet, http://www.haskell.org/pipermail/libraries/2008-October/010778.html Much more thorough and practical w/r/t to actually getting faster nubs in the libraries.

Umm, using the nubOrd' code to nub a 1 million long list of pseudo random numbers takes (here) about 2.5 times the time and twice space as the Set- based ordNub. It does slightly better for 100,000 elements, but still takes more than twice the time (and 1.6 x the space). In my book, that's a compelling reason to go with the set-based implementation - unless we're talking about code to include directly in Data.List, but then I'd still _use_ the set-based one.

Ertugrul Soeylemez

23 Feb 23 Feb

12:03 p.m.

Rafael Gustavo da Cunha Pereira Pinto wrote:

...

While solving a puzzle, I was posed the problem of finding if there was no duplicates on a list.

First I used:

noneRepeated=null.(filter (>1)).(map length).group.sort

But this seemed very unneficient, so I thought that I could detect the duplicates while sorting, and devised this:

import Control.Monad import Data.Maybe

noneRepeated=isNothing . (foldl merge (Just [])) . (map sort) . pairs

import Data.List noneRepeated xs = xs == nub xs Greets Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://blog.ertes.de/

Daniel Fischer

12:28 p.m.

Am Dienstag 23 Februar 2010 13:03:45 schrieb Ertugrul Soeylemez:

...

Rafael Gustavo da Cunha Pereira Pinto wrote:

...
While solving a puzzle, I was posed the problem of finding if there was no duplicates on a list.

First I used:

noneRepeated=null.(filter (>1)).(map length).group.sort

But this seemed very unneficient, so I thought that I could detect the duplicates while sorting, and devised this:

import Control.Monad import Data.Maybe

noneRepeated=isNothing . (foldl merge (Just [])) . (map sort) . pairs

import Data.List

noneRepeated xs = xs == nub xs

Talk about inefficiency :) import Data.Set (Set) import qualified Data.Set as Set noneRepeated = go 0 Set.empty where go ct st (x:xs) | Set.size st < ct = False | otherwise = go (ct+1) (Set.insert x st) xs go ct st [] = ct == Set.size st

...

Greets Ertugrul

Jonas Almström Duregård

12:31 p.m.

Ertugrul: while your solution is minimalistic, Rafael deemed his ~n*log n implementation too inefficient. Thus your ~n^3 implementation is hardly an improvement... /Jonas On 23 February 2010 13:03, Ertugrul Soeylemez wrote:

...

Rafael Gustavo da Cunha Pereira Pinto wrote:

...
While solving a puzzle, I was posed the problem of finding if there was no duplicates on a list.

First I used:

noneRepeated=null.(filter (>1)).(map length).group.sort

But this seemed very unneficient, so I thought that I could detect the duplicates while sorting, and devised this:

import Control.Monad import Data.Maybe

noneRepeated=isNothing . (foldl merge (Just [])) . (map sort) . pairs

import Data.List

noneRepeated xs = xs == nub xs

Greets Ertugrul

-- nightmare = unsafePerformIO (getWrongWife >>= sex) http://blog.ertes.de/

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ertugrul Soeylemez

12:59 p.m.

Jonas Almström Duregård wrote:

...

Ertugrul: while your solution is minimalistic, Rafael deemed his ~n*log n implementation too inefficient. Thus your ~n^3 implementation is hardly an improvement...

My variant has an advantage, though. It is completely lazy, so it will take a shortcut, as soon as a duplicate is found. Depending on his application, this may be useful or not. I think the nub-based solution is the best one in general, but it's the base library implementation of nub, which is unfortunate. In fact, with a better nub implementation, this becomes an O(n * log n) time algorithm, too, but with the additional laziness advantage. The article you linked to contains such an implementation, I think. Greets Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://blog.ertes.de/

Daniel Fischer

1:09 p.m.

Am Dienstag 23 Februar 2010 13:59:49 schrieb Ertugrul Soeylemez:

...

Jonas Almström Duregård wrote:

...
Ertugrul: while your solution is minimalistic, Rafael deemed his ~n*log n implementation too inefficient. Thus your ~n^3 implementation is hardly an improvement...

Not quite as bad, nub is O(n^2).

...

My variant has an advantage, though. It is completely lazy, so it will take a shortcut, as soon as a duplicate is found. Depending on his application, this may be useful or not.

I think the nub-based solution is the best one in general, but it's the base library implementation of nub, which is unfortunate. In fact, with a better nub implementation, this becomes an O(n * log n) time

How can you nub in O(n*log n)? Remember, you only have Eq for nub.

...

algorithm, too, but with the additional laziness advantage. The article you linked to contains such an implementation, I think.

Greets Ertugrul

Jonas Almström Duregård

1:54 p.m.

...

...
noneRepeated xs = xs == nub xs

...

Not quite as bad, nub is O(n^2)

You are correct of course. Still, it will probably be a bit less inefficient if the length of the lists are compared (as opposed to the elements): noneRepeated xs = length xs == length (nub xs) On 23 February 2010 14:09, Daniel Fischer wrote:

...

Am Dienstag 23 Februar 2010 13:59:49 schrieb Ertugrul Soeylemez:

...
Jonas Almström Duregård wrote:

...
Ertugrul: while your solution is minimalistic, Rafael deemed his ~n*log n implementation too inefficient. Thus your ~n^3 implementation is hardly an improvement...

Not quite as bad, nub is O(n^2).

...
My variant has an advantage, though. It is completely lazy, so it will take a shortcut, as soon as a duplicate is found. Depending on his application, this may be useful or not.

I think the nub-based solution is the best one in general, but it's the base library implementation of nub, which is unfortunate. In fact, with a better nub implementation, this becomes an O(n * log n) time

How can you nub in O(n*log n)? Remember, you only have Eq for nub.

...
algorithm, too, but with the additional laziness advantage. The article you linked to contains such an implementation, I think.

Greets Ertugrul

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Daniel Fischer

2:08 p.m.

Am Dienstag 23 Februar 2010 14:54:36 schrieb Jonas Almström Duregård:

...

You are correct of course. Still, it will probably be a bit less inefficient if the length of the lists are compared (as opposed to the elements):

noneRepeated xs = length xs == length (nub xs)

Only if no repeated elements appear early. For xs = 1 : [1 .. 10^7], xs == nub xs will return False without noticeable delay, length xs == length (nub xs) will take VERY long.

Ertugrul Soeylemez

24 Feb 24 Feb

1:25 p.m.

Jonas Almström Duregård wrote:

...

...
...
noneRepeated xs = xs == nub xs

...
Not quite as bad, nub is O(n^2)

You are correct of course. Still, it will probably be a bit less inefficient if the length of the lists are compared (as opposed to the elements):

noneRepeated xs = length xs == length (nub xs)

[...]

...
How can you nub in O(n*log n)? Remember, you only have Eq for nub.

Again note that the big advantage of my method is laziness. The comparison will end on the first duplicate found. Using the following nub implementation the overall time complexity should be O(n * log n), but may be space-intensive, because it uses O(n) space. Also note that it has a different context (the type needs to be Ord instead of Eq): import qualified Data.Set as S import Data.List myNub :: Ord a => [a] -> [a] myNub = concat . snd . mapAccumL nubMap S.empty where nubMap s x | S.member x s = (s, []) | otherwise = (S.insert x s, [x]) Greets Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://blog.ertes.de/

Daniel Fischer

3:59 p.m.

Am Mittwoch 24 Februar 2010 14:25:20 schrieb Ertugrul Soeylemez:

...

Jonas Almström Duregård wrote:

...
...
...
noneRepeated xs = xs == nub xs

Not quite as bad, nub is O(n^2)

You are correct of course. Still, it will probably be a bit less inefficient if the length of the lists are compared (as opposed to the elements):

noneRepeated xs = length xs == length (nub xs)

[...]

...
How can you nub in O(n*log n)? Remember, you only have Eq for nub.

Again note that the big advantage of my method is laziness. The comparison will end on the first duplicate found.

Yes, and the suggestions Jonas and I posted had the same property :)

...

Using the following nub implementation the overall time complexity should be O(n * log n), but may be space-intensive, because it uses O(n) space.

Data.List.nub also uses O(n) space (but has a smaller constant factor).

...

Also note that it has a different context (the type needs to be Ord instead of Eq):

Yeah, that's the catch, it has a more restricted type. If you have only Eq, I don't think you can do better than O(n^2). That's why I was irritated by

...

...
I think the nub-based solution is the best one in general, but it's the base library implementation of nub, which is unfortunate. In fact, with a better nub implementation, this becomes an O(n * log n) time

, for the type of nub, the library implementation is rather good (perhaps it can still be improved, but not much, I think).

...

import qualified Data.Set as S import Data.List

myNub :: Ord a => [a] -> [a] myNub = concat . snd . mapAccumL nubMap S.empty where nubMap s x

| S.member x s = (s, []) | otherwise = (S.insert x s, [x])

I prefer {-# LANGUAGE BangPatterns #-} {-# OPTIONS_GHC -O2 #-} module OrdNub (ordNub, ordNubRare) where import qualified Data.Set as Set ordNub :: Ord a => [a] -> [a] ordNub = go Set.empty where go !st (x:xs) | x `Set.member` st = go st xs | otherwise = x : go (Set.insert x st) xs go _ [] = [] , it's faster. If you know that duplicates are rare, ordNubRare :: Ord a => [a] -> [a] ordNubRare = go 0 Set.empty where go sz st (x:xs) | sz1 == sz = go sz st xs | otherwise = x : go sz1 st1 xs where st1 = Set.insert x st !sz1 = Set.size st1 go _ _ [] = [] is even faster because it omits the lookups (but it sucks when there are many duplicates, of course).

...

Greets Ertugrul

Cheers, Daniel

Ertugrul Soeylemez

25 Feb 25 Feb

5:32 p.m.

Daniel Fischer wrote:

...

Am Mittwoch 24 Februar 2010 14:25:20 schrieb Ertugrul Soeylemez:

...
Again note that the big advantage of my method is laziness. The comparison will end on the first duplicate found.

Yes, and the suggestions Jonas and I posted had the same property :)

Indeed. My suggestion was a response to the sort-based method proposed first. =) Greets Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://blog.ertes.de/

Rafael Gustavo da Cunha Pereira Pinto

11:57 p.m.

Just to clarify the issue, I will propose the puzzle: There is a single 10 digit number that: 1) uses all ten digits [0..9], with no repetitions 2) the number formed by the first digit (right to left, most significant) is divisible by one 3) the number formed by the first 2 digits (again right to left) is divisible by two 4) the number formed by the first 3 digits is divisible by three and so on, until: 11) the number formed by the first 10 digits (all!) is by 10 Actually this can be solved by a little logic, but I wanted to give a try on brute force search using haskell. I am not looking very large lists, but I was expecting a handful of small lists. My algorithm follow these steps: 1) start with an list of empty list ([[]]), call it ds 2) I cons each member of [0..9] to ds 3) filter using: a) noneRepeated b) (listToNum d) `mod` l == 0, where l is the length of each sublist d (not computed, it is an accumulator that is incremented each time I cons) 4) repeat steps 2-3 until l==10 So, I represent each possible number as a reversed list of its digits... It ran REALLY fast (sub-second). So, bragging about Haskell with a Smalltalk-lover friend, by showing him how clean was the code and how easy was to profile, I figured out that I spent 99% on noneRepeated. After changing to the merge sort version, I have 30% on noneRepeated, 30% on listToNum and 30% on putStrLn. Pretty good! Besides, I could brag a little more about Hakell to that specific friend!! ;-) Best regards to you all!! Rafael PS: Here is the original search code, with the bad noneRepeated and still using length import Data.List digits=[0..9] noneRepeated::[Integer]->Bool noneRepeated=null.(filter (>1)).(map length).group.sort listToNum::[Integer]->Integer listToNum = (foldl (\a x->10*a+x) 0).reverse check::[Integer]->Bool check ds= and [noneRepeated ds, (listToNum ds) `mod` l==0] where l=fromIntegral $ length ds nextlevel::[[Integer]]->[[Integer]] nextlevel dss=filter (check) [d:ds | ds<-dss,d<-digits] main=do dss<-runlevel 10 0 [[]] print $ map (listToNum) dss runlevel 0 b dds=return dds runlevel a b dds=do let dds'=nextlevel dds putStrLn $ "Level "++(show (b+1))++": "++(show $ length dds')++" matches" print $ map (listToNum) dds' runlevel (a-1) (b+1) dds'

Daniel Fischer

26 Feb 26 Feb

1:30 a.m.

Am Freitag 26 Februar 2010 00:57:48 schrieb Rafael Gustavo da Cunha Pereira Pinto:

...

Just to clarify the issue, I will propose the puzzle:

There is a single 10 digit number that:

1) uses all ten digits [0..9], with no repetitions 2) the number formed by the first digit (right to left, most significant) is divisible by one 3) the number formed by the first 2 digits (again right to left) is divisible by two 4) the number formed by the first 3 digits is divisible by three and so on, until: 11) the number formed by the first 10 digits (all!) is by 10

Actually this can be solved by a little logic, but I wanted to give a try on brute force search using haskell.

Okay, so I won't talk about choosing a better algorithm :)

...

I am not looking very large lists, but I was expecting a handful of small lists.

And these are so short that actually noneRepeated xs = xs == nub xs is *faster* than sorting and grouping.

...

My algorithm follow these steps:

1) start with an list of empty list ([[]]), call it ds 2) I cons each member of [0..9] to ds 3) filter using: a) noneRepeated b) (listToNum d) `mod` l == 0, where l is the length of each

Reverse the tests, \l d -> (listToNum d) `mod` l == 0 is cheap in comparison to noneRepeated, even with noneRepeated xs = xs == nub xs.

...

sublist d (not computed, it is an accumulator that is incremented each time I cons) 4) repeat steps 2-3 until l==10

So, I represent each possible number as a reversed list of its digits... It ran REALLY fast (sub-second).

So, bragging about Haskell with a Smalltalk-lover friend, by showing him how clean was the code and how easy was to profile, I figured out that I spent 99% on noneRepeated.

That doesn't run long enough to get a reliable profile, even if you reduce the tick-time to 1ms.

...

After changing to the merge sort version, I have 30% on noneRepeated, 30% on listToNum and 30% on putStrLn. Pretty good!

Besides, I could brag a little more about Hakell to that specific friend!! ;-)

Best regards to you all!!

Rafael

PS: Here is the original search code, with the bad noneRepeated and still using length

import Data.List

digits=[0..9]

noneRepeated::[Integer]->Bool noneRepeated=null.(filter (>1)).(map length).group.sort

listToNum::[Integer]->Integer listToNum = (foldl (\a x->10*a+x) 0).reverse

Doesn't really matter, but try to acquire the habit of using foldl' rather than foldl (unless you need foldl for its additional laziness). You'll run into fewer laziness leaks that way.

...

check::[Integer]->Bool check ds= and [noneRepeated ds, (listToNum ds) `mod` l==0] where l=fromIntegral $ length ds

Use (&&) if you have only two tests.

...

nextlevel::[[Integer]]->[[Integer]] nextlevel dss=filter (check) [d:ds | ds<-dss,d<-digits]

Why not move the checks into the generation, nextlevel dss = filter ((== 0) . (`mod` l) . listToNum) [d:ds | ds <- dss, d <- digits, d `notElem` ds] where l = 1 + length (head dss) or nextlevel dss = let l = 1 + length (head dss) in [d:ds | ds <- dss, let n = 10*listToNum ds , d <- digits, d `notElem` ds, (n+d) `mod` l == 0] ? At least the d `notElem` ds seems very natural here (and it's more efficient, too).

...

main=do dss<-runlevel 10 0 [[]] print $ map (listToNum) dss

runlevel 0 b dds=return dds runlevel a b dds=do let dds'=nextlevel dds putStrLn $ "Level "++(show (b+1))++": "++(show $ length dds')++" matches" print $ map (listToNum) dds' runlevel (a-1) (b+1) dds'

Ketil Malde

3:50 p.m.

| Am Freitag 26 Februar 2010 00:57:48 schrieb Rafael Gustavo da Cunha Pereira | Pinto: | |> There is a single 10 digit number that: |> |> 1) uses all ten digits [0..9], with no repetitions |> 2) the number formed by the first digit (right to left, most |> significant) is divisible by one |> 3) the number formed by the first 2 digits (again right to left) is |> divisible by two |> 4) the number formed by the first 3 digits is divisible by three |> and so on, until: |> 11) the number formed by the first 10 digits (all!) is by 10 Since Ishaaq Chandy just posted about how to generalize nested list comprehensions, I thought this was an interesting way to approach this. First a couple of simple helper functions:

...

val = foldl (\x y -> x*10+y) 0 divides d n = n `mod` d == 0

So you could solve it using a set of list comprehensions:

...

solutions = [[x1,x2,x3,x4,x5,x6,x7,x8,x9,x10] | x1 <- [0..9] , x2 <- [0..9], divides 2 $ val [x1,x2] , x3 <- [0..9], divides 3 $ val [x1,x2,x3] , x4 <- [0..9], divides 4 $ val [x1,x2,x3,x4] , x5 <- [0..9], divides 5 $ val [x1,x2,x3,x4,x5] , x6 <- [0..9], divides 6 $ val [x1,x2,x3,x4,x5,x6] , x7 <- [0..9], divides 7 $ val [x1,x2,x3,x4,x5,x6,x7] , x8 <- [0..9], divides 8 $ val [x1,x2,x3,x4,x5,x6,x7,x8] , x9 <- [0..9], divides 9 $ val [x1,x2,x3,x4,x5,x6,x7,x8,x9] , x10 <- [0] , length (nub [x1,x2,x3,x4,x5,x6,x7,x8,x9,x10]) == 10 ]

This is a nicely declarative way to do it, and a pretty clear way to formulate the original problem statement. But it's a bit tedious with all the repetitions, so you would rather recurse to make it more general. Since list comprehensions are just a different way to work in the list monad (where | becomes 'guard'), I managed to come up with this:

...

solve :: [Int] -> [[Int]] solve prefix = do let l = length prefix if l == 10 then return prefix else do x <- [0..9] let r = prefix++[x] guard (divides (l+1) (val r) && nub r == r) solve r

-k (PS: I'm happy to hear any comments regarding style or other issues) -- If I haven't seen further, it is by standing in the footprints of giants

Daniel Fischer

8:14 p.m.

Am Freitag 26 Februar 2010 16:50:42 schrieb Ketil Malde:

...

| Am Freitag 26 Februar 2010 00:57:48 schrieb Rafael Gustavo da Cunha | Pereira | | Pinto: |> There is a single 10 digit number that: |> |> 1) uses all ten digits [0..9], with no repetitions |> 2) the number formed by the first digit (right to left, most |> significant) is divisible by one |> 3) the number formed by the first 2 digits (again right to left) is |> divisible by two |> 4) the number formed by the first 3 digits is divisible by three |> and so on, until: |> 11) the number formed by the first 10 digits (all!) is by 10

Since Ishaaq Chandy just posted about how to generalize nested list comprehensions, I thought this was an interesting way to approach this.

Yes. But it approaches the border, for 20 digits it would become annoying to type.

...

First a couple of simple helper functions:

...
val = foldl (\x y -> x*10+y) 0 divides d n = n `mod` d == 0

So you could solve it using a set of list comprehensions:

...
solutions = [[x1,x2,x3,x4,x5,x6,x7,x8,x9,x10]

| x1 <- [0..9]

First digit can't be 0, so make it [1 .. 9]. Since you use the fact that the last digit must be the 0, pull all others from [1 .. 9].

...

...
, x2 <- [0..9], divides 2 $ val [x1,x2]

, x1 /= x2

...
, x3 <- [0..9], divides 3 $ val [x1,x2,x3]

, x3 `notElem` [x1,x2] -- etc.

...
, x4 <- [0..9], divides 4 $ val [x1,x2,x3,x4] , x5 <- [0..9], divides 5 $ val [x1,x2,x3,x4,x5] , x6 <- [0..9], divides 6 $ val [x1,x2,x3,x4,x5,x6] , x7 <- [0..9], divides 7 $ val [x1,x2,x3,x4,x5,x6,x7] , x8 <- [0..9], divides 8 $ val [x1,x2,x3,x4,x5,x6,x7,x8] , x9 <- [0..9], divides 9 $ val [x1,x2,x3,x4,x5,x6,x7,x8,x9] , x10 <- [0] , length (nub [x1,x2,x3,x4,x5,x6,x7,x8,x9,x10]) == 10 ]

Doesn't look as nice, but early pruning saves a lot of work (in this case, for very small values of "a lot").

...

This is a nicely declarative way to do it, and a pretty clear way to formulate the original problem statement.

A very direct translation :)

...

But it's a bit tedious with all the repetitions, so you would rather recurse to make it more general. Since list comprehensions are just a different way to work in

the list monad (where | becomes 'guard'), I managed to come up with this:

...
solve :: [Int] -> [[Int]]

Not on a 32-bit system. Word would suffice there, but you don't know that in advance, so it'd be Int64 or Integer

...

...
solve prefix = do let l = length prefix if l == 10 then return prefix else do x <- [0..9]

You can guard (x `notElem` prefix) here, or use x `notElem` prefix below, but don't use nub r == r when you know that only the new element may be duplicated.

...

...
let r = prefix++[x] guard (divides (l+1) (val r) && nub r == r) solve r

-k

(PS: I'm happy to hear any comments regarding style or other issues)

I would make the length of the prefix a parameter of solve. It's admittedly less elegant, but all those calls to length hurt me :) Regarding style, I think I prefer solve prefix = case length prefix of 10 -> return prefix l -> do x <- [0 .. 9] ... over the if-then-else.

Ketil Malde

8:34 p.m.

Daniel Fischer skrev:

...

Am Freitag 26 Februar 2010 16:50:42 schrieb Ketil Malde:

...
...
solutions = [[x1,x2,x3,x4,x5,x6,x7,x8,x9,x10] | x1 <- [0..9]

...

First digit can't be 0, so make it [1 .. 9]. Since you use the fact that the last digit must be the 0, pull all others from [1 .. 9].

Originally, I pulled from alternating odds (x1 <- [1,3..9] etc) and evens, since this is fairly easy to deduce... I reverted this since the point was to use brute force.

...

...
...
solve :: [Int] -> [[Int]]

Not on a 32-bit system. Word would suffice there, but you don't know that in advance, so it'd be Int64 or Integer

Hm? The Ints are just individual digits here.

...

I would make the length of the prefix a parameter of solve.

I thought about generating a list with solutions for increasing lenghts, so that e.g. 'solve [] !! 10' would solve this particular problem.

...

solve prefix = case length prefix of 10 -> return prefix l -> do x <- [0 .. 9] ...

over the if-then-else.

Yes, much nicer. Thanks for the feedback! -k -- If I haven't seen further, it is by standing in the footprints of giants

Daniel Fischer

9:03 p.m.

Am Freitag 26 Februar 2010 21:34:28 schrieb Ketil Malde:

...

Daniel Fischer skrev:

...
Am Freitag 26 Februar 2010 16:50:42 schrieb Ketil Malde:

...
...
solutions = [[x1,x2,x3,x4,x5,x6,x7,x8,x9,x10]

| x1 <- [0..9]

First digit can't be 0, so make it [1 .. 9]. Since you use the fact that the last digit must be the 0, pull all others from [1 .. 9].

Originally, I pulled from alternating odds (x1 <- [1,3..9] etc) and evens, since this is fairly easy to deduce... I reverted this since the point was to use brute force.

Yes, but did you forget x10 or did you think that one was too obvious?

...

...
...
...
solve :: [Int] -> [[Int]]

Not on a 32-bit system. Word would suffice there, but you don't know that in advance, so it'd be Int64 or Integer

Hm? The Ints are just individual digits here.

Yup. I didn't realise that you don't call val for the 10-digit number(s). If you also did x10 <- [0 .. 9] and checked val [x1, x2, ..., x10] `mod` 10 == 0, it would overflow, that's what I was thinking of.

...

...
I would make the length of the prefix a parameter of solve.

I thought about generating a list with solutions for increasing lenghts, so that e.g. 'solve [] !! 10' would solve this particular problem.

That's nice, but I think it'd be ugly with a DFS, much nicer with a BFS, like Rafael did.

...

...
solve prefix = case length prefix of 10 -> return prefix l -> do x <- [0 .. 9] ...

over the if-then-else.

Yes, much nicer. Thanks for the feedback!

-k

Ketil Malde

23 Feb 23 Feb

12:42 p.m.

Rafael Gustavo da Cunha Pereira Pinto writes:

...

First I used:

noneRepeated=null.(filter (>1)).(map length).group.sort

...

But this seemed very unneficient, so I thought that I could detect the duplicates while sorting, and devised this: [...] 1) Is there any improvement I can make?

Well - it's a bit long, don't you think? Long enough that from a cursory glance, I'd say it's in the "no obvious errors" category. How about (inspired by quicksort, as you no doubt can tell): noneRepeated [] = True noneRepeated (x:xs) = noneRepeated lt && singleton eq && noneRepeated gt where lt = filter (x) xs eq = x:filter (==x) xs singleton [_] = True singleton _ = False

...

2) Can it be parallelized (par, pseq)?

You could force each of the sublists in parallel, but you might lose some laziness properties, so I'd carefully benchmark it. -k -- If I haven't seen further, it is by standing in the footprints of giants

Casey Hawthorne

24 Feb 24 Feb

10:32 p.m.

On Tue, 23 Feb 2010 08:30:18 -0300, you wrote:

...

Hi folks,

While solving a puzzle, I was posed the problem of finding if there was no duplicates on a list.

Must it be a list data structure(DS) or list ADT? Mergesort can be parallelized.

...

Best regards,

Rafael

-- Regards, Casey

Casey Hawthorne

10:43 p.m.

On Tue, 23 Feb 2010 08:30:18 -0300, you wrote:

...

Hi folks,

While solving a puzzle, I was posed the problem of finding if there was no duplicates on a list.

Must it be a list data structure(DS) or list ADT? Mergesort can be parallelized.

...

Best regards,

Rafael

If space is at a premium you might want to look at a Bloom Filter. http://en.wikipedia.org/wiki/Bloom_filter The Bloom filter, conceived by Burton Howard Bloom in 1970,[1] is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Elements can be added to the set, but not removed (though this can be addressed with a counting filter). The more elements that are added to the set, the larger the probability of false positives. The book "Real World Haskell" has an implementation. -- Regards, Casey

5612

Age (days ago)

5615

Last active (days ago)

List overview

Download

22 comments

7 participants

participants (7)

Casey Hawthorne
Daniel Fischer
Ertugrul Soeylemez
Gwern Branwen
Jonas Almström Duregård
Ketil Malde
Rafael Gustavo da Cunha Pereira Pinto