
I'm running a project where i want to generate a map of a rather large data collection. The map is if the form: Data.Map.Map String a I'd like to be able to do inexact lookups on the map. Firstly, ignore the difference between upper lower case, that's easy. But secondly, have a function: search :: (Num b) => Data.Map.Map String a -> String -> b -> [a] This function is like the normal lookup, but it takes an extra argument b. B is the allowed character error rate. That is the Levenstein distance divided on the length. It should return all the strings that has an error rate lower that b and in a ordered manner, best first. Is it possible to design such a datastructure efficiently? Does there already exist something like this? If not, where would i begin to create this? Any advices or knowledge appreciated. -- Morten

Am 13.03.2012 um 09:15 schrieb Morten Olsen Lysgaard:
I'm running a project where i want to generate a map of a rather large data collection. The map is if the form: Data.Map.Map String a I'd like to be able to do inexact lookups on the map. Firstly, ignore the difference between upper lower case, that's easy. But secondly, have a function:
search :: (Num b) => Data.Map.Map String a -> String -> b -> [a]
This function is like the normal lookup, but it takes an extra argument b. B is the allowed character error rate. That is the Levenstein distance divided on the length. It should return all the strings that has an error rate lower that b and in a ordered manner, best first.
Is it possible to design such a datastructure efficiently? Does there already exist something like this? If not, where would i begin to create this?
You could take a trie and extend the lookup functions so that it also returns inexact matches up to a given error rate. That should be quite easy, because the definition of Levenshtein distance is recursive as well as the trie's lookup function.

On Tue, Mar 13, 2012 at 6:14 AM, Holger Siegel
Am 13.03.2012 um 09:15 schrieb Morten Olsen Lysgaard:
I'd like to be able to do inexact lookups on the map. Firstly, ignore the difference between upper lower case, that's easy. But secondly, have a function:
search :: (Num b) => Data.Map.Map String a -> String -> b -> [a]
This function is like the normal lookup, but it takes an extra argument b. B is the allowed character error rate. That is the Levenstein distance divided on the length. It should return all the strings that has an error rate lower that b and in a ordered manner, best first.
You could take a trie and extend the lookup functions so that it also returns inexact matches up to a given error rate. That should be quite easy, because the definition of Levenshtein distance is recursive as well as the trie's lookup function.
There are also Burkhard-Keller trees, which are a pretty cool data structure: http://hackage.haskell.org/package/bktrees-0.3.1 --S.
participants (3)
-
Holger Siegel
-
Morten Olsen Lysgaard
-
Sterling Clover