Given a set of sets, and a particular target set, I want to find the sets that are nearest (in terms of Hamming distance) to the target set.
import Data.List
import qualified Data.Set as Set
nearest_k :: Ord a => Int -> [(Set.Set a, v)] -> Set.Set a -> [(Set.Set a, v)]
nearest_k k bs b = take k bs' where
bs' = sortOn (hamming b) bs
hamming :: Ord a => Set.Set a -> (Set.Set a, v) -> Int
hamming x (y, _) = hamming_distance x y
hamming_distance :: Ord a => Set.Set a -> Set.Set a -> Int
hamming_distance xs ys = Set.size (Set.difference xs ys) + Set.size (Set.difference ys xs)
subsets :: [a] -> [[a]]
subsets [] = [[]]
subsets (x:xs) = subsets xs ++ map (x:) (subsets xs)
int_lists :: [[Int]]
int_lists = subsets [1..20]
values :: [(Set.Set Int, Int)]
values = map f (zip [1..] int_lists) where
f (i, x) = (Set.fromList x, i)
test = nearest_k 8 values (Set.fromList [1,2,3])
This works ok for the test above (with sets of ints), but is rather slow in my actual application (in which the sets are large sets of ground atoms of first-order logic). Is there some major optimization I should be doing here?