On 11/28/07, Grzegorz Chrupala <grzegorz.chrupala@computing.dcu.ie> wrote:
You may have better luck checking out methods used in parsing natural
language. In order to use statistical parsing techniques such as
Probabilistic Context Free Grammars ([1],[2] ) the standard approach is to
extract rule probabilities from an annotated corpus, that is collection of
strings with associated parse trees. Maybe you could use your 2/3 of
addresses that you know are correctly parsed as your training material.
A PCFG parser can output all (or n-best) parses ordered according to
probabilities so that would seem to be fit your requirements.
[1] http://en.wikipedia.org/wiki/Stochastic_context-free_grammar
[2] http://www.cs.colorado.edu/~martin/slp2.html#Chapter14
--
Best,
Grzegorz
--