Newbie seeking advice regarding data structure for a tricky algorithm

Hi, I'm trying to implement a fast kd-tree in Haskell. http://en.wikipedia.org/wiki/Kd-tree It's a binary tree representing space partitions. One of the fastest ways to build kd-trees is outlined in this paper: http://www.cgg.cvut.cz/~havran/ARTICLES/ingo06rtKdtree.pdf (I'm trying to implement Algorithm 5 on page 5 and the steps outlined in section 4.3.1.) I'll try to outline a simplified analogous algorithm to demonstrate my problem. It's all about classifying which children should go in a left or right branch using heuristics. Say I want to put the words 'foo', 'bar' and 'baz' into a binary tree. The heuristic requires I split the words into letters and sort them: 'aabbfoorz'. The heuristic then may decide, based on the sorted letters, that 'bar' and 'foo' should go in the left child and 'baz' goes in the right. Typically we'd then simply recurse and for example, the left child's words would be re-sorted into 'abfoor' and the heuristic is reapplied. If we assume that sorting is relatively expensive, we can avoid the re-sort for the children by unmerging the parent's sorted list of letters. Two sublists of a sorted list should already be sorted. If we know which word each letter belongs to it would be more efficient to tag the letters with 'left' or 'right' as the words are classified. Then we can iterate down the sorted letter list and produce new sorted sublists rather simply. So it's not actually that complicated, and I can imagine exactly how it could be done in C but I really don't know how to approach this in Haskell. The problem I'm having is how to keep a map between the words and its letters (which in the real problem is a map between a list of vectors and 6 tuples containing Doubles and enumerations) while keeping in mind you can't just map any letter to a word, but specific letters. e.g., the 'b's in the example must remember specifically whether they belong to 'bar' or 'baz'. Thanks for any insights, Toby.

Hi Toby,
On 4/24/07, Toby Hutton
Hi,
I'm trying to implement a fast kd-tree in Haskell. http://en.wikipedia.org/wiki/Kd-tree It's a binary tree representing space partitions.
Trees are pretty easy to implement in haskell, due to their inherent recursive nature. For example, you can do something like this: data Tree a = Node a [Tree a] Then the elements of the list are just the children subtrees of the current node. This is the approach taken by http://haskell.org/ghc/docs/latest/html/libraries/base/Data-Tree.html for example. I personally think this is easier than trying to define an actual binary tree. With this approach, the algorithms mentioned in the wikipedia article above should be relatively straight-forward to implement. As for the paper cited, and the 'analogous' algorithm, you lost me. I don't see what the algorithm you gives has to do with the problem, so I'm quite confused. My suggestion would be to try to translate one of the algorithms from wikipedia, and use that. If you can't work out how to do this, feel free to check back with me and/or the list. Good luck! [snip] Andrew

Hi, Toby Hutton wrote:
Say I want to put the words 'foo', 'bar' and 'baz' into a binary tree. The heuristic requires I split the words into letters and sort them: 'aabbfoorz'. The heuristic then may decide, based on the sorted letters, that 'bar' and 'foo' should go in the left child and 'baz' goes in the right. Typically we'd then simply recurse and for example, the left child's words would be re-sorted into 'abfoor' and the heuristic is reapplied.
If we assume that sorting is relatively expensive, we can avoid the re-sort for the children by unmerging the parent's sorted list of letters. Two sublists of a sorted list should already be sorted. If we know which word each letter belongs to it would be more efficient to tag the letters with 'left' or 'right' as the words are classified. Then we can iterate down the sorted letter list and produce new sorted sublists rather simply.
So it's not actually that complicated, and I can imagine exactly how it could be done in C but I really don't know how to approach this in Haskell.
What about just storing with each character a "reference" to the word it orignally comes from?
import Data.List (sortBy)
data Tagged = Tag String Char
tag :: Tagged -> String tag (Tag x _) = x
compareTagged :: Tagged -> Tagged -> Ordering (Tag _ x) `compareTagged` (Tag _ y) = x `compare` y
tagWord :: String -> [Tagged] tagWord word = map (Tag word) word
unmerge :: [a] -> (a -> Bool) -> ([a], [a]) unmerge xs p = foldr f ([], []) xs where f x (ls, rs) | p x = (x:ls, rs) | otherwise = (ls, x:rs)
data Tree = Empty | Leaf String | Node Tree Tree
tree :: ([Tagged] -> String -> Bool) -> [String] -> Tree tree heuristic words = tree' words sorted where
-- we only need to sort once ... sorted = sortBy compareTagged . concat . map tagWord $ words
tree' [] _ = Empty tree' [word] _ = Leaf word
tree' words sorted = let predicate = heuristic sorted (leftWords, rightWords) = unmerge words predicate -- ... because we reuse the ordered list by unmerging it (leftSorted, rightSorted) = unmerge sorted (predicate . tag) in Node (tree' leftWords leftSorted) (tree' rightWords rightSorted)
Tillmann
participants (3)
-
Andrew Wagner
-
Tillmann Rendel
-
Toby Hutton