Dear Haskell Cafe,

Given a set of sets, and a particular target set, I want to find the sets that are nearest (in terms of Hamming distance) to the target set.

I am using the following code:

import Data.List

import qualified Data.Set as Set

nearest_k :: Ord a => Int -> [(Set.Set a, v)] -> Set.Set a -> [(Set.Set a, v)]

nearest_k k bs b = take k bs' where

bs' = sortOn (hamming b) bs

hamming :: Ord a => Set.Set a -> (Set.Set a, v) -> Int

hamming x (y, _) = hamming_distance x y

hamming_distance :: Ord a => Set.Set a -> Set.Set a -> Int

hamming_distance xs ys = Set.size (Set.difference xs ys) + Set.size (Set.difference ys xs)

subsets :: [a] -> [[a]]

subsets [] = [[]]

subsets (x:xs) = subsets xs ++ map (x:) (subsets xs)

int_lists :: [[Int]]

int_lists = subsets [1..20]

values :: [(Set.Set Int, Int)]

values = map f (zip [1..] int_lists) where

f (i, x) = (Set.fromList x, i)

test = nearest_k 8 values (Set.fromList [1,2,3])

----

This works ok for the test above (with sets of ints), but is rather slow in my actual application (in which the sets are large sets of ground atoms of first-order logic). Is there some major optimization I should be doing here?

thanks,

Richard