
Never underestimate teh power of the Int{Set,Map}: {-# LANGUAGE BangPatterns #-} import Data.Set(Set) import Data.IntSet(IntSet) import qualified Data.Set as S import qualified Data.IntSet as IS import Control.Parallel.Strategies(rnf) import Data.Monoid(Monoid(..)) import Data.List findsumsIS :: [Int] -> Int -> IntSet findsumsIS xs wanted = snd . foldl' f mempty $ xs where f (!candidates,!successes) next = let x = wanted - next in case x `IS.member` candidates of True -> (candidates, IS.insert next successes) False -> (IS.insert next candidates,successes) -- (i had to add bangs in f since it was blowing the stack) findsums :: [Int] -> Int -> Set (Int,Int) findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs where f (!candidates,!successes) next = if S.member (wanted-next) candidates then (candidates, S.insert (next,wanted-next) successes) else (S.insert next candidates,successes) {- Note that the list has 10 million elements, (time is roughly 0.4s with 1 million with IntSet). -} {- main = do let s = findsums (take 10000000 (cycle [1..999])) 500 print (rnf s `seq` ()) [m@monire ~]$ time ./FindSums () real 0m8.793s user 0m8.762s sys 0m0.022s -} {- main = do let s = findsumsIS (take 10000000 (cycle [1..999])) 500 print (rnf s `seq` ()) [m@monire ~]$ time ./FindSumsIS () real 0m4.488s user 0m4.459s sys 0m0.023s -} Matt
On Sunday 19 July 2009 09:26:14 Heinrich Apfelmus wrote:
Thomas Hartman wrote:
The code below is, I think, n log n, a few seconds on a million + element list.
I wonder if it's possible to get this down to O(N) by using a hashtable implemementation, or other better data structure.
-- findsums locates pairs of integers in a list that add up to a wanted sum.
findsums :: [Int] -> Int -> S.Set (Int,Int) findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs where f (candidates,successes) next = if S.member (wanted-next) candidates then (candidates, S.insert (next,wanted-next) successes) else (S.insert next candidates,successes)
Remember that hash tables are actually O(key length) instead of O(1), so I don't think you can break the log n for really large lists this way since the key length increases as well (unless most elements are equal anyway).