Re: [Haskell-cafe] google-like "do you mean?" feature

16 Apr 2009

      2009/4/16 Michael Mossey :
...
I was thinking that it might be useful to have a Google-like "do you mean
this?" feature. If the field name is //customer=, then the parser might
recognize a huge list of variants like //ustomer=, //customor=, etc... that
is, recognize them well enough to continue parsing and give a decent error
message in context.
Any ideas how to go about this?
To measure how similar two strings are, you can use a metric like
Levenshtein distance, Damerau-Levenshtein distance, or Jaro-Winkler
distance:

http://en.wikipedia.org/wiki/Levenshtein_distance
http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
http://en.wikipedia.org/wiki/Jaro-Winkler_distance

The first two basically count the number of mistakes that a user would
have to make to get from the correct string to the one you read from
the file. There's an 'edit-distance' package in Hackage that
implements the first two:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/edit-distance

When you find an unrecognised field name in the file, you could
calculate the edit distance to each correct field name, and if there's
one within a certain threshold, assume that's what the user meant (if
there's more than one close match, maybe it's better to report an
error than risk choosing the wrong one).

I imagine this brute-force approach would be fast enough, but if not
you could look at the techniques used by spell checkers to suggest
corrections. Maybe even use a spell checking library, if such a thing
exists (either pure Haskell or a binding to a library like aspell,
although I couldn't see either from a quick search in Hackage).

Andy

Re: [Haskell-cafe] google-like "do you mean?" feature

Andy Smith