can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

Thomas Hartman

17 Jul 2009 17 Jul '09

10:24 p.m.

The code below is, I think, n log n, a few seconds on a million + element list. I wonder if it's possible to get this down to O(N) by using a hashtable implemementation, or other better data structure. Further, is there a hashtable implementation for haskell that doesn't live in IO? Maybe in ST or something? import Data.HashTable import qualified Data.Set as S import Data.List (foldl') testdata = [1,4,8,9,20,11,20,14,2,15] ++ [1..(10^6)] wantedsum = 29 -- set data structure -- findsums locates pairs of integers in a list that add up to a wanted sum. findsums :: [Int] -> Int -> S.Set (Int,Int) findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs where f (candidates,successes) next = if S.member (wanted-next) candidates then (candidates, S.insert (next,wanted-next) successes) else (S.insert next candidates,successes) -- hashtable data structure -- result: t -- fromList [(15,14),(16,13),(17,12),(18,11),(19,10),(20,9),(21,8),(22,7),(23,6),(24,5),(25,4),(26,3),(27,2),(28,1)] -- probably O(n log n) complexity since using tree based Data.Set (a few seconds on million+ element list) t = findsums testdata wantedsum

Show replies by date

Edward Kmett

18 Jul 18 Jul

1:11 a.m.

New subject: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

Haskell hash tables are a notorious performance pig, mostly due to the fact that when we deal with big arrays, if the mutable array changes at all the garbage collector will have to retraverse the entire thing during the next collection. Guess the most common scenario for imperative hash tables that are even lightly tweaked from time to time... ;) As for other non-IO hash tables, I've seen a couple of unboxed hash tables using STUArrays (which can side step this issue for unboxable data), IIRC one may have even been used for a language shootout problem. I even wrote (a rather poorly performing) Witold Litwin-style sorted linear hash table for STM a couple of years back (it should still be on hackage under 'thash'). Data.HashTable could be easily reimplemented in ST s, but it would still suffer the same GC problems as the current hash table, which no one likes. -Ed On Fri, Jul 17, 2009 at 6:24 PM, Thomas Hartman wrote:

...

The code below is, I think, n log n, a few seconds on a million + element list.

I wonder if it's possible to get this down to O(N) by using a hashtable implemementation, or other better data structure.

Further, is there a hashtable implementation for haskell that doesn't live in IO? Maybe in ST or something?

import Data.HashTable import qualified Data.Set as S import Data.List (foldl')

testdata = [1,4,8,9,20,11,20,14,2,15] ++ [1..(10^6)] wantedsum = 29

-- set data structure -- findsums locates pairs of integers in a list that add up to a wanted sum. findsums :: [Int] -> Int -> S.Set (Int,Int) findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs where f (candidates,successes) next = if S.member (wanted-next) candidates then (candidates, S.insert (next,wanted-next) successes) else (S.insert next candidates,successes)

-- hashtable data structure

-- result: t -- fromList [(15,14),(16,13),(17,12),(18,11),(19,10),(20,9),(21,8),(22,7),(23,6),(24,5),(25,4),(26,3),(27,2),(28,1)] -- probably O(n log n) complexity since using tree based Data.Set (a few seconds on million+ element list) t = findsums testdata wantedsum _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Matthias Görgens

9:50 a.m.

New subject: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

However you can use the wider idea of hashing: A nesting of two finite maps. One fast, but approximative map. And one slow, but exact map. The quintessential example is an array indexed with some hash function for the first map. And linked lists of (key,value) pairs as the latter. In Haskell you might want to use IntMap and a the mentioned list of pairs (combined with the lookup functions from Data.List). Of course you need to supply a function to hash your keys to Int for the IntMap.

Bulat Ziganshin

10:16 a.m.

New subject: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

Hello Thomas, Saturday, July 18, 2009, 2:24:21 AM, you wrote:

...

Further, is there a hashtable implementation for haskell that doesn't live in IO? Maybe in ST or something?

import Prelude hiding (lookup) import qualified Data.HashTable import Data.Array import qualified Data.List as List data HT a b = HT (a->Int) (Array Int [(a,b)]) -- size is the size of array (we implement a closed hash) -- hash is the hash function (a->Int) -- list is assoclist of items to put in hash create size hash list = HT hashfunc (accumArray (flip (:)) [] (0, arrsize-1) (map (\(a,b) -> (hashfunc a,(a,b))) list) ) where arrsize = head$ filter (>size)$ iterate (\x->3*x+1) 1 hashfunc a = hash a `mod` arrsize lookup a (HT hash arr) = List.lookup a (arr!hash a) main = do let assoclist = [("one", 1), ("two", 2), ("three", 3)] hash = create 10 (fromEnum . Data.HashTable.hashString) assoclist print (lookup "one" hash) print (lookup "zero" hash) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Thomas Hartman

3:23 p.m.

New subject: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

Thanks Bulat. FWIW, i take it that http://www.haskell.org/haskellwiki/Shootout/Knucleotide is what Edward was referring to, with the shootouts. It seems that a lot of progress has been made but not much has been migrated back to hackage. Going back to my original question, I am now looking for a dead simple motivating example for showing the example of using a (good) hashtable over Data.Map, with a tangible demo of O(n) over O(n log n) running times. I mean, something where running an input of (10^4) size versus (10^6) size shows a noticeably laggier run when using Set versus hashtable. I don't think maybe my original example quite qualifies because I think in practice the computation is dominated by space complexity. However, I haven't yet ported it over to a hashtable version, so not sure. (And the shootout example doesn't satisfy my sense of "dead simple.") 2009/7/18 Bulat Ziganshin :

...

Hello Thomas,

Saturday, July 18, 2009, 2:24:21 AM, you wrote:

...
Further, is there a hashtable implementation for haskell that doesn't live in IO? Maybe in ST or something?

import Prelude hiding (lookup) import qualified Data.HashTable import Data.Array import qualified Data.List as List

data HT a b = HT (a->Int) (Array Int [(a,b)])

-- size is the size of array (we implement a closed hash) -- hash is the hash function (a->Int) -- list is assoclist of items to put in hash create size hash list = HT hashfunc (accumArray (flip (:)) [] (0, arrsize-1) (map (\(a,b) -> (hashfunc a,(a,b))) list) )

where arrsize = head$ filter (>size)$ iterate (\x->3*x+1) 1 hashfunc a = hash a `mod` arrsize

lookup a (HT hash arr) = List.lookup a (arr!hash a)

main = do let assoclist = [("one", 1), ("two", 2), ("three", 3)] hash = create 10 (fromEnum . Data.HashTable.hashString) assoclist print (lookup "one" hash) print (lookup "zero" hash)

-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin

3:57 p.m.

New subject: Re[2]: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

Hello Thomas, Saturday, July 18, 2009, 7:23:10 PM, you wrote:

...

Going back to my original question, I am now looking for a dead simple motivating example for showing the example of using a (good) hashtable over Data.Map

spell checking? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Alberto G. Corona

8 Dec 8 Dec

9:28 p.m.

New subject: Re[2]: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

Any application where multiple updates are done in multiple threads . gain by using a hashTable 2009/7/18 Bulat Ziganshin

...

Hello Thomas,

Saturday, July 18, 2009, 7:23:10 PM, you wrote:

...
Going back to my original question, I am now looking for a dead simple motivating example for showing the example of using a (good) hashtable over Data.Map

spell checking?

-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Don Stewart

18 Jul 18 Jul

4:54 p.m.

New subject: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

tphyahoo:

...

The code below is, I think, n log n, a few seconds on a million + element list.

Have you tried the judy arrays library on Hackage? (It provides a hashtable, which I've used occasionally) -- Don

Heinrich Apfelmus

19 Jul 19 Jul

8:26 a.m.

New subject: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

Thomas Hartman wrote:

...

The code below is, I think, n log n, a few seconds on a million + element list.

I wonder if it's possible to get this down to O(N) by using a hashtable implemementation, or other better data structure.

-- findsums locates pairs of integers in a list that add up to a wanted sum.

findsums :: [Int] -> Int -> S.Set (Int,Int) findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs where f (candidates,successes) next = if S.member (wanted-next) candidates then (candidates, S.insert (next,wanted-next) successes) else (S.insert next candidates,successes)

Remember that hash tables are actually O(key length) instead of O(1), so I don't think you can break the log n for really large lists this way since the key length increases as well (unless most elements are equal anyway). In any case, I have this conjecture On which I will lecture: A Set of some things Is sorting for kings. ;-) findsums goal = uncurry sweep . (id &&& reverse) . sort where sweep [] _ = [] sweep _ [] = [] sweep (x:xs) (y:ys) = if x > y then [] else case compare (x+y) goal of LT -> sweep xs (y:ys) EQ -> (x,y) : sweep xs ys GT -> sweep (x:xs) ys This algorithm needs a proof of correctness, though. And it's longer that the Data.Set version, too. Regards, apfelmus -- http://apfelmus.nfshost.com

Jon Harrop

8 Dec 8 Dec

9:41 p.m.

New subject: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

On Sunday 19 July 2009 09:26:14 Heinrich Apfelmus wrote:

...

Thomas Hartman wrote:

...
The code below is, I think, n log n, a few seconds on a million + element list.

I wonder if it's possible to get this down to O(N) by using a hashtable implemementation, or other better data structure.

-- findsums locates pairs of integers in a list that add up to a wanted sum.

findsums :: [Int] -> Int -> S.Set (Int,Int) findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs where f (candidates,successes) next = if S.member (wanted-next) candidates then (candidates, S.insert (next,wanted-next) successes) else (S.insert next candidates,successes)

Remember that hash tables are actually O(key length) instead of O(1), so I don't think you can break the log n for really large lists this way since the key length increases as well (unless most elements are equal anyway).

Use a trie of hash tables with ~word-sized pieces of key. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e

Matt Morrow

9 Dec 9 Dec

11:06 p.m.

New subject: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

Never underestimate teh power of the Int{Set,Map}: {-# LANGUAGE BangPatterns #-} import Data.Set(Set) import Data.IntSet(IntSet) import qualified Data.Set as S import qualified Data.IntSet as IS import Control.Parallel.Strategies(rnf) import Data.Monoid(Monoid(..)) import Data.List findsumsIS :: [Int] -> Int -> IntSet findsumsIS xs wanted = snd . foldl' f mempty $ xs where f (!candidates,!successes) next = let x = wanted - next in case x `IS.member` candidates of True -> (candidates, IS.insert next successes) False -> (IS.insert next candidates,successes) -- (i had to add bangs in f since it was blowing the stack) findsums :: [Int] -> Int -> Set (Int,Int) findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs where f (!candidates,!successes) next = if S.member (wanted-next) candidates then (candidates, S.insert (next,wanted-next) successes) else (S.insert next candidates,successes) {- Note that the list has 10 million elements, (time is roughly 0.4s with 1 million with IntSet). -} {- main = do let s = findsums (take 10000000 (cycle [1..999])) 500 print (rnf s `seq` ()) [m@monire ~]$ time ./FindSums () real 0m8.793s user 0m8.762s sys 0m0.022s -} {- main = do let s = findsumsIS (take 10000000 (cycle [1..999])) 500 print (rnf s `seq` ()) [m@monire ~]$ time ./FindSumsIS () real 0m4.488s user 0m4.459s sys 0m0.023s -} Matt

...

On Sunday 19 July 2009 09:26:14 Heinrich Apfelmus wrote:

...
Thomas Hartman wrote:

...
The code below is, I think, n log n, a few seconds on a million + element list.

I wonder if it's possible to get this down to O(N) by using a hashtable implemementation, or other better data structure.

-- findsums locates pairs of integers in a list that add up to a wanted sum.

findsums :: [Int] -> Int -> S.Set (Int,Int) findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs where f (candidates,successes) next = if S.member (wanted-next) candidates then (candidates, S.insert (next,wanted-next) successes) else (S.insert next candidates,successes)

Remember that hash tables are actually O(key length) instead of O(1), so I don't think you can break the log n for really large lists this way since the key length increases as well (unless most elements are equal anyway).

5796

Age (days ago)

5941

Last active (days ago)

List overview

Download

10 comments

9 participants

participants (9)

Alberto G. Corona
Bulat Ziganshin
Don Stewart
Edward Kmett
Heinrich Apfelmus
Jon Harrop
Matt Morrow
Matthias Görgens
Thomas Hartman